Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Subway fault data classification method based on unbalanced data set

A technology of fault data and classification methods, applied in data processing applications, instruments, calculations, etc., can solve problems such as intrusion into the distribution space of negative samples, sample deletion, and insufficient consideration of spatial distribution, so as to achieve good model generalization ability and improve recognition The effect of rate, good classification effect

Pending Publication Date: 2020-09-04
NANJING UNIV OF SCI & TECH
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is that undersampling technology can easily delete samples containing important information while deleting samples.
[0006] The traditional SMOTE algorithm does not consider the spatial distribution of samples enough, and lacks judgment rules for synthetic samples, which leads to the intrusion of synthetic positive samples into the distribution space of negative samples and affects the data classification effect.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subway fault data classification method based on unbalanced data set
  • Subway fault data classification method based on unbalanced data set
  • Subway fault data classification method based on unbalanced data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0114] Step 1. Obtain the unbalanced data set D required for the experiment from the Guangzhou Metro operation data;

[0115] Step 2, divide the data set D into training data set D Train and the test dataset D Test ,Specific steps are as follows:

[0116] 2.1) Randomly divide the unbalanced data set into 5 parts with the same number of samples;

[0117] 2.2) One of the 5 samples is randomly selected as the test data set, and the other 4 samples are used as the training data set.

[0118] Step 3, put D Train The data samples in are divided into positive sample sets N min (minority class samples) and negative class sample set N maj (majority class samples), and calculate the number of samples to be sampled: T=N maj -N min ;

[0119] Step 4, use the k-Means clustering algorithm to classify the positive data set N min Clustering to get k clusters C i ,i=1,2,...,k. The specific steps of the K-Means clustering algorithm are as follows:

[0120] 4.1) The input data is pos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a subway fault data classification method based on an unbalanced data set. The method comprises the following steps: inputting an original unbalanced data set, and dividing theunbalanced data set into a training data set and a test data set; the training data set is divided into a positive class sample set and a negative class sample set, wherein the positive class sampleset is a minority class sample, and the negative class sample set is a majority class sample; dividing the positive class sample set into K different clusters by using a K-Means clustering algorithm;for each cluster, sampling the data set by using an improved SMOTE algorithm to finally obtain a balanced data set; taking the SVM as a weak classifier, and constructing an integrated classifier by using an AdaBoost algorithm; and evaluating the performance of the integrated classifier by using the test data set. The method can effectively improve the recognition rate of a small number of types ofsamples in the unbalanced data set while guaranteeing the overall accuracy, and has a better effect in the classification prediction of the unbalanced data set.

Description

technical field [0001] The invention belongs to the technical field of data mining, in particular to a subway fault data classification method based on an unbalanced data set. Background technique [0002] During the long-term operation of the subway, the probability of equipment failure is very high. If it cannot be dealt with in time, it will cause great losses. Therefore, timely and effective fault diagnosis of the subway is becoming increasingly important. In fault diagnosis, fault data classification is the key technology. Classification methods are widely used in the field of prediction, and most classification methods require that the distribution of data is relatively uniform. If the distribution of the data is seriously unbalanced, the minority data is likely to be treated as noise. Data in real life often presents the characteristics of unbalanced distribution, that is, in the data set, the number of samples of different categories varies greatly. A large number...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06Q50/26
CPCG06Q50/26G06F18/23213G06F18/2411G06F18/214
Inventor 张永左婷婷谢志鸿方立超单梁徐志良
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products