Unbalance data classifying method based on cluster sampling kernel transformation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of unbalanced data and classification methods, applied in database models, relational databases, electrical digital data processing, etc., can solve problems such as poor classification effects, achieve the effects of reducing data flooding, improving classification effects, and reducing imbalance ratios

Inactive Publication Date: 2014-09-24

HARBIN UNIV OF SCI & TECH

View PDF3 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that the classification effect of the traditional unbalanced data classification method is not good. The present invention provides a kind of unbalanced data classification method based on cluster sampling kernel transformation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0028] Specific implementation mode one: combine figure 1 Describe this embodiment, the unbalanced data classification method based on cluster sampling kernel transformation described in this embodiment, it comprises the following steps:

[0029] Step 1: Vectorize the unbalanced data to be classified to obtain an unbalanced data set;

[0030] Step 2: Using the method of cluster sampling based on dynamic self-organizing map to resample the vectors in the unbalanced data set obtained in step 1, and obtain the unbalanced data set after resampling;

[0031] Step 3: Transform the kernel function of the classifier SVM, classify the resampled unbalanced data set obtained in step 2 using the kernel transformed classifier SVM, and obtain the classified unbalanced data set.

[0032] This embodiment mainly solves the classification problem facing the unbalanced data set, and the method realizes the organic combination of the two strategies of sample resampling and classifier improvement...

specific Embodiment approach 2

[0036] Specific embodiment 2: This embodiment is a further limitation of the unbalanced data classification method based on cluster sampling kernel transformation described in specific embodiment 1, using the method of cluster sampling based on dynamic self-organizing map to obtain unbalanced data in step 1 The vectors in the set are resampled, and the methods for obtaining the unbalanced data set after resampling include:

[0037] Step 21: Initialize the self-organizing map network, and set the training times variable cycles to zero;

[0038]Step 22: Initialize the weights of the output layer neuron nodes of the self-organizing map network, and set the weight w of all output layer neuron nodes ij Both assign random decimals, that is, t=0:0ij 1 ,x 2 ,...,x L ) is input to the self-organizing map network, each time a sample is input, the number of training variable cycles is increased by 1, and the total number of input samples is |D|;

[0039] Step 24: Calculate the sample ...

specific Embodiment approach 3

[0059] Specific embodiment three: this embodiment is a further limitation of the unbalanced data classification method based on cluster sampling kernel transformation described in specific embodiment two. In step three, the method for transforming the kernel function of the classifier SVM includes:

[0060] The transformation formula of the kernel function of the classifier SVM is:

[0061] K ~ ( x , x ′ ) = C ( x ) C ( x ′ ) K ( x , x ′ )

[0062] in K ( x , x ′ ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an unbalance data classifying method based on cluster sampling kernel transformation and belongs to the field of unbalance data classification. The unbalance data classifying method based on cluster sampling kernel transformation aims to solve the problem that a traditional unbalance data classifying method is poor in classifying effect. The unbalance data classifying method based on cluster sampling kernel transformation comprises the steps that (1) unbalance data to be classified are vectorized, and an unbalance data set is obtained; (2) resample is conducted on vectors in the unbalance data set based on a dynamic self-organizing map cluster sampling method, and an unbalance data set is obtained after resample is conducted; (3) a kernel function of a classifier SVM is transformed, the unbalance data set obtained in the step (2) after resample is conducted is classified by using the classifier SVM obtained after kernel transformation is conducted, and a classified unbalance data set is obtained. The unbalance data classifying method based on cluster sampling kernel transformation is applied to medical diagnoses, insurance and other fraud detection, protein detection, fault detection, client loss prediction and other fields.

Description

technical field [0001] The present invention belongs to the field of unbalanced data classification. Background technique [0002] The classification problem for imbalanced data sets is a difficult problem in the field of natural science, and has important practical application value in many fields such as biology, medicine, engineering, and computing. Facts have proved that in the case of unbalanced data categories, directly using traditional classification methods cannot achieve satisfactory recognition results. Therefore, how to find a classification method that adapts to the characteristics of the imbalanced dataset is a direction worthy of further exploration. [0003] Classification problem is a very important kind of data mining task, its goal is to summarize the general description of each category according to the existing categories of data. After more than 20 years of continuous development, the classification technology based on machine learning, especially the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

CPCG06F16/285

Inventor 李鹏张楷卉

Owner HARBIN UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unbalance data classifying method based on cluster sampling kernel transformation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology