Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows

A technology for balancing data flow and classification prediction, applied in the fields of instruments, character and pattern recognition, computer parts, etc., can solve the problems of unsatisfactory performance, difficult application of mining methods, limited sampling effect, etc., to improve the recognition rate, The effect of good dynamic data learning adaptability

Inactive Publication Date: 2017-11-10
NORTHEASTERN UNIV
View PDF0 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the large amount of data in the data stream and the continuous arrival, it cannot be loaded into the memory at one time, making it difficult to apply traditional mining methods effectively.
The classification of data streams has always been one of the important research topics in the field of data stream mining. There are two difficulties in learning and classifying data streams: First, due to the continuous change of data over time, it will inevitably lead to the randomness of the concepts contained in the data. What is changing with time, that is, concept drift, how to effectively adapt to the drift change has become a hot and difficult point in data mining in recent years. Second, dynamic data is often accompanied by unbalanced category distribution. For example, fraudulent transactions in commercial transaction data may only account for 1% or even less. Since the information of the majority class occupies a dominant position, traditional classifiers tend to lean towards the majority class during training and prediction, so they cannot effectively deal with Minority class samples are used for learning, and in many fields, the recognition rate of minority classes is often more important, so the performance of traditional classifiers is often not satisfactory
However, since Naive Bayes is based on the assumption that all features are independent and there is no interdependence, its posterior probability cannot accurately measure the degree of correlation between historical samples and current concepts in data with complex distributions.
In addition, these two algorithms only use historical data for minority class upsampling, and do not generate new sample information, resulting in limited sampling effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows
  • Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows
  • Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. the embodiment. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0037] An embodiment of the present invention provides a weighted integrated classification prediction method for unbalanced data streams combined with selective upsampling, including: screening the minority class samples of historical data blocks according to the similarity, and selecting the samples most similar in concept to the current training data block ; For the selected samples, the new samples are synthesi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of data mining, and discloses a selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows. The method comprises the following steps of: screening minority class samples of history data blocks according to a similarity, and selecting samples closest to the current training data block in the aspect of concept; synthesizing the selected samples into new samples in a decision boundary area so as to selectively implement up-sampling; and carrying out weighted ensemble classification on the new sample by adoption of a probability distribution relevancy-based weight distribution strategy. According to the method, the minority class sample information is effectively increased through selecting history data with high similarities and synthesizing new data at the boundary area, so that the decision domain of the minority class is enlarged; and meanwhile, in order to adapt the dynamic data with concept drift and use an ensemble classification thought, the probability distribution relevancy-based weight distribution strategy is designed, so that the overall classification precision is enhanced. Experiment results show that the method is capable of effectively improving the minority class identification rate and the overall classification performance, and has the advantage of better processing the unbalanced data flows.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to an unbalanced data flow weighted integrated classification prediction method combined with selective upsampling. Background technique [0002] With the rapid development of communication technology, a large amount of dynamic data has appeared in many application fields, such as business transaction analysis, intrusion detection and industrial control. Since the data in the data stream presents a large amount and comes continuously, it cannot be loaded into the memory at one time, which makes it difficult to apply the traditional mining method effectively. The classification of data streams has always been one of the important research topics in the field of data stream mining. There are two difficulties in learning and classifying data streams: First, due to the continuous change of data over time, it will inevitably lead to the randomness of the concepts contained in the dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/2415
Inventor 曹鹏刘筱力单宣峰刘爽栗伟覃文军冯朝路杨金柱
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products