Data classification method

A data classification and classifier technology, applied in the field of data processing, can solve the problems of increasing sample weight, difficult training, incomplete training, etc., to achieve the effect of ensuring complete training, improving accuracy and performance

Pending Publication Date: 2018-09-28
湖南湖大金科科技发展有限公司
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) If full samples are used for iterative training, after each iteration, the number of samples will increase exponentially, increasing the difficulty of training;
[0006] (2) If random sampling is used to form the corresponding weight ratio, some samples will be missed, resulting in incomplete training;
[0007] (3) For samples with repeated errors, the original algorithm will consistently increase the weight of the samples. If the sample is an outlier, it will cause the subsequent classifier to over-train the outlier, thus deviating from the actual data sample.
[0008] At present, the improvement of the classification algorithm mainly includes two methods: the way of improving the algorithm itself, and the way of combining and superimposing multiple algorithms. Among them, the way of improving the classification algorithm itself is usually through some characteristics of the algorithm itself. Improvements, such as adding discriminant formulas, integrating other algorithms, and improving the algorithm structure, but due to the general complexity of the machine learning algorithm itself, the improved algorithm is basically based on a specific application scenario and is not universal. There are problems such as difficulty in improving and redundant and complex algorithms; as for the combination method, since it will not disrupt the structural characteristics of its own algorithm, it can complement each other according to the characteristics of different algorithms, which has great advantages and is applicable However, the combination improvement method of Adaboost algorithm is usually a simple combination method at present, which does not consider the above-mentioned problems in the training process of Adaboost algorithm itself, and there will still exist such as the number of samples above will increase exponentially, training Difficulty or incomplete training and problems such as over-training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method
  • Data classification method
  • Data classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below in conjunction with the accompanying drawings and specific preferred embodiments, but the protection scope of the present invention is not limited thereby.

[0042] Such as figure 1 As shown, the data classification method of this embodiment, the steps include:

[0043] S1. Obtain a training set sample for training a classifier, and equally divide the obtained training set sample according to the number of iterations required for training to obtain multiple training subset samples;

[0044] S2. Based on the Adaboost algorithm, multiple weak classifiers are used to train each training subset sample respectively, and when each weak classifier is trained, a part of the training subset samples and some error samples obtained by the previous weak classifier are selected to form the final The training samples of , the final ADB strong classifier is obtained from each weak classifier after training;

[0045] S3. Use the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a data classification method. The method comprises the steps of: S1, obtaining training set samples for training a classifier, and averaging the obtained training set samples according to the number of iterations required for training to obtain a plurality of training subset samples; S2, based on the Adaboost algorithm, training each of the training subset samples respectively by using a plurality of weak classifiers, when each weak classifier performs training, selecting some training subset samples and some error samples obtained by the previous weak classifier to constitute and form a final training sample, and obtaining a final ADB strong classifier from various weak classifiers after completing training; and S3, using the trained ADB strong classifier to classify to-be-classified data, and outputting a classification result. According to the method disclosed by the present invention, the data during classification training is complete, the trainingdata can be prevented from multiplying and over-fitting, and the method has the advantages of a simple implementation principle, high classification efficiency and precision, and the like.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data classification method. Background technique [0002] Data classification is to map data to specified categories. Adaboost (Adaptive Boostin, adaptive enhancement) is an adaptive data classification algorithm that trains different classifiers (weak classifiers) for the same training set, and then classifies these weak classifiers The classifiers are assembled to form a stronger final classifier (strong classifier). Its adaptation is: the wrong samples of the last weak classifier will be strengthened, and all samples after weighting will be used to train the next basic classifier again. At the same time, a new weak classifier is added in each round until a predetermined small enough error rate is reached, or a pre-specified maximum number of iterations is reached. The Adaboost algorithm has a strong recurrent learning ability and can better combine and strengthen we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/2413G06F18/214
Inventor 赵寒枫陈佐杨胜刚陈邦道梅雪松余湘军李浩之王芍
Owner 湖南湖大金科科技发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products