Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unbalanced data oversampling method based on minority class sample space distribution

A sample space, minority class technology, applied in the fields of electronics, information engineering, and communications, can solve problems such as poor classification results, and achieve the effect of improving classification accuracy and effectiveness

Inactive Publication Date: 2021-08-17
NANJING UNIV OF INFORMATION SCI & TECH
View PDF1 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Aiming at the deficiencies of the existing technology, the present invention aims to provide an unbalanced data oversampling method (SD-KMSMOTE) based on the spatial distribution of minority class samples to solve the problem of poor classification results caused by sample aliasing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data oversampling method based on minority class sample space distribution
  • Unbalanced data oversampling method based on minority class sample space distribution
  • Unbalanced data oversampling method based on minority class sample space distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The technical solution will be described in detail below with reference to the drawings of the present invention:

[0048] A unbalanced data over a small number of sample spatial distribution includes the following steps:

[0049] A unbalanced data over a small number of sample spatial distribution is a sample method,

[0050] Set the original training sample as S Number s A small number of samples L Number l Most types of samples M Number m Sample S Balance rate E for:

[0051]

[0052] Among them, when E The closer to 1, the closer the number of samples s and a few samples of the sample S;

[0053] (1) Filter isolation sample: calculate a small number of samples K adjacent to all sample points in all sample points, if a sample point The K adjacent point is m, it is defined Is isolated sample point, noise, filter it and delete it, where the number of sample points isolated is ;

[0054] (2) Calculate the number of samples that need to be inserted: set the balance r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an unbalanced data oversampling method based on minority class sample space distribution, and belongs to the technical field of electronics, communication and information engineering. According to the method, the imbalance of a data set is improved by adding noise filtering preprocessing, designing a new sample synthesis method and constructing a calculation rule of a weight value, the problem of poor classification effect caused by a sample aliasing phenomenon is solved, and the performance of an imbalance learning problem is improved.

Description

Technical field [0001] The present invention relates to the technical field of electronics, communication, and information engineering, and more particularly to unbalanced data over sampling methods based on minority sample spatial distribution. Background technique [0002] Unbalanced data is widely used in various real estate, such as financial fraud testing, medical disease diagnosis, network intrusion detection, network fault diagnosis, etc. This data set is called an unbalanced dataset when the number of samples of different categories of samples is varied. Usually, the number of samples is called a number of classes, and the number of samples is small, called a few categories. [0003] Although these small number of samples have fewer number of samples, the sample data quality is also poor, but it usually carries more important information. And in fact, we pay more attention to the ability of model correctly classified a small number of samples, such as a complex network sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/40G06K9/62G06N20/00
CPCG06N20/00G06V10/30G06F18/23213G06F18/24323G06F18/214
Inventor 潘成胜杨雯升张艳艳金爱鑫
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products