Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Oversampling method based on angle and direction clustering

An oversampling and angle technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of poor data set effect, easy to fall between most types of samples, ignoring importance and other problems, and achieve the solution effect. worsening effect

Pending Publication Date: 2021-07-20
HUNAN UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above defects or improvement needs of the prior art, the present invention provides an oversampling method based on angle and direction clustering, the purpose of which is to solve the problem of ignoring the importance of boundary samples in classification in the existing oversampling method Technical problems, and because the relationship distribution between samples is ignored, the synthetic samples are easy to fall between the majority class samples, which leads to the technical problems of low quality synthetic samples, and because the distance and density measurement methods will be in high-dimensional space will worsen, making the method less effective in high-dimensional imbalanced datasets.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Oversampling method based on angle and direction clustering
  • Oversampling method based on angle and direction clustering
  • Oversampling method based on angle and direction clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

[0061] The basic idea of ​​the present invention is to propose an oversampling method based on angle and direction clustering to improve the effect of unbalanced learning from a data-level perspective. The method uses the angle variance to learn the distribution of samples in the data space, and Based on Fisher's optimal segmentation algorithm, the label receiving list of data samples is obtained, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an oversampling method based on angle and direction clustering, and the method comprises the steps: obtaining an unbalanced data set, carrying out the clustering of the unbalanced data set through employing a clustering algorithm, generating a clustering label, an angle variance, and a sorting neighbor set for each sample in the unbalanced data set, and carrying out filtering processing on each sample of which the clustering label is noise so as to obtain a filtered sample; and calculating a first oversampling weight, a second oversampling weight and an optimal interpolation neighbor set of each minority class sample in the unbalanced data set according to the clustering label, the angle variance and the sorting neighbor set of each sample in the unbalanced data set, and calculating the oversampling weight of each cluster in the unbalanced data set and the number of new samples needing to be synthesized by each cluster according to the first oversampling weights of all minority class samples. The technical problem that the importance of boundary samples in classification is ignored in an existing oversampling method can be solved.

Description

technical field [0001] The invention belongs to the technical field of data mining, and more particularly, relates to an oversampling method based on angle and direction clustering. Background technique [0002] The development of machine learning and deep learning provides powerful support for classification and prediction. In classification problems, as a kind of supervised learning, it is necessary to provide a data set for the classification model for model training, however, the data set is unbalanced in most cases. The samples with a higher proportion of labels in the dataset are called majority class samples, and the samples with lower label proportions are called minority class samples. Imbalance means that the number of majority class samples in the dataset is often far more than the minority class samples. Minority class samples have a small amount of data, which makes the classifier unable to learn minority class samples effectively, and then it is difficult to m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23G06F18/23213G06F18/24G06F18/214
Inventor 李肯立覃舒婕杨志邦刘楚波阳王东
Owner HUNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products