Unbalance data classifying method based on cluster sampling kernel transformation

A technology of unbalanced data and classification methods, applied in database models, relational databases, electrical digital data processing, etc., can solve problems such as poor classification effects, achieve the effects of reducing data flooding, improving classification effects, and reducing imbalance ratios

Inactive Publication Date: 2014-09-24
HARBIN UNIV OF SCI & TECH
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that the classification effect of the traditional unbalanced data classification method is not good. The present invention provides a kind of unbalanced data classification method based on cluster sampling kernel transformation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalance data classifying method based on cluster sampling kernel transformation
  • Unbalance data classifying method based on cluster sampling kernel transformation
  • Unbalance data classifying method based on cluster sampling kernel transformation

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0028] Specific implementation mode one: combine figure 1 Describe this embodiment, the unbalanced data classification method based on cluster sampling kernel transformation described in this embodiment, it comprises the following steps:

[0029] Step 1: Vectorize the unbalanced data to be classified to obtain an unbalanced data set;

[0030] Step 2: Using the method of cluster sampling based on dynamic self-organizing map to resample the vectors in the unbalanced data set obtained in step 1, and obtain the unbalanced data set after resampling;

[0031] Step 3: Transform the kernel function of the classifier SVM, classify the resampled unbalanced data set obtained in step 2 using the kernel transformed classifier SVM, and obtain the classified unbalanced data set.

[0032] This embodiment mainly solves the classification problem facing the unbalanced data set, and the method realizes the organic combination of the two strategies of sample resampling and classifier improvement...

specific Embodiment approach 2

[0036] Specific embodiment 2: This embodiment is a further limitation of the unbalanced data classification method based on cluster sampling kernel transformation described in specific embodiment 1, using the method of cluster sampling based on dynamic self-organizing map to obtain unbalanced data in step 1 The vectors in the set are resampled, and the methods for obtaining the unbalanced data set after resampling include:

[0037] Step 21: Initialize the self-organizing map network, and set the training times variable cycles to zero;

[0038]Step 22: Initialize the weights of the output layer neuron nodes of the self-organizing map network, and set the weight w of all output layer neuron nodes ij Both assign random decimals, that is, t=0:0ij 1 ,x 2 ,...,x L ) is input to the self-organizing map network, each time a sample is input, the number of training variable cycles is increased by 1, and the total number of input samples is |D|;

[0039] Step 24: Calculate the sample ...

specific Embodiment approach 3

[0059] Specific embodiment three: this embodiment is a further limitation of the unbalanced data classification method based on cluster sampling kernel transformation described in specific embodiment two. In step three, the method for transforming the kernel function of the classifier SVM includes:

[0060] The transformation formula of the kernel function of the classifier SVM is:

[0061] K ~ ( x , x ′ ) = C ( x ) C ( x ′ ) K ( x , x ′ )

[0062] in K ( x , x ′ ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an unbalance data classifying method based on cluster sampling kernel transformation and belongs to the field of unbalance data classification. The unbalance data classifying method based on cluster sampling kernel transformation aims to solve the problem that a traditional unbalance data classifying method is poor in classifying effect. The unbalance data classifying method based on cluster sampling kernel transformation comprises the steps that (1) unbalance data to be classified are vectorized, and an unbalance data set is obtained; (2) resample is conducted on vectors in the unbalance data set based on a dynamic self-organizing map cluster sampling method, and an unbalance data set is obtained after resample is conducted; (3) a kernel function of a classifier SVM is transformed, the unbalance data set obtained in the step (2) after resample is conducted is classified by using the classifier SVM obtained after kernel transformation is conducted, and a classified unbalance data set is obtained. The unbalance data classifying method based on cluster sampling kernel transformation is applied to medical diagnoses, insurance and other fraud detection, protein detection, fault detection, client loss prediction and other fields.

Description

technical field [0001] The present invention belongs to the field of unbalanced data classification. Background technique [0002] The classification problem for imbalanced data sets is a difficult problem in the field of natural science, and has important practical application value in many fields such as biology, medicine, engineering, and computing. Facts have proved that in the case of unbalanced data categories, directly using traditional classification methods cannot achieve satisfactory recognition results. Therefore, how to find a classification method that adapts to the characteristics of the imbalanced dataset is a direction worthy of further exploration. [0003] Classification problem is a very important kind of data mining task, its goal is to summarize the general description of each category according to the existing categories of data. After more than 20 years of continuous development, the classification technology based on machine learning, especially the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/285
Inventor 李鹏张楷卉
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products