An uncertain data classification method based on direct discriminant sequence mining

A technology of determining data and sequence mining, applied in data mining, electrical digital data processing, special data processing applications, etc., can solve problems such as uncertain sequence data and single data format

Inactive Publication Date: 2019-03-08
NORTHEASTERN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the disadvantages of the uncertain data classification method in the prior art that the uncertain data format is single and not applicable to many uncertain sequence data generated in real life, the problem to be solved by the present invention is to improve efficiency and scalability, and at the same time An Uncertain Data Classification Method Based on Direct Distinctive Sequence Mining with Higher Classification Accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An uncertain data classification method based on direct discriminant sequence mining
  • An uncertain data classification method based on direct discriminant sequence mining
  • An uncertain data classification method based on direct discriminant sequence mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The present invention will be further elaborated below in conjunction with the accompanying drawings of the description.

[0069] A kind of uncertain data classification method based on direct distinguishing sequence mining of the present invention comprises the following steps:

[0070] 1) Initialize the class label InitializeLabel(), give the class label set C{c 1 ,c 2 ,...,c n};

[0071] 2) Under the class label set, derive the minimum support threshold based on a given information gain threshold: min_sup=f(maxIG), that is, use the minimum support setting algorithm MinSupGen to find the minimum support threshold;

[0072] 3) Under the minimum support threshold, the pattern growth strategy based on prefix projection is used to enumerate subsequences and generate pattern candidates x:

[0073] x=PrefixSpanGrowth(X), X is the prefix projection sequence;

[0074] 4) Mining the distinguishing sequence in the generated pattern candidate x, using the IGMine algorithm a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an uncertain data classification method based on direct distinguishing sequence mining. For the uncertain data set UTD, firstly, a class label is initialized and a class labelset is given. Under the class label set, the minimum support threshold is derived based on the given information gain threshold. The pattern growth strategy is used to enumerate the sub-sequences, generate the pattern candidate x, mine the discrimination sequence, and adopt the reduction strategy to generate the final discrimination sequence result set Rs. Then, the result set Rs is checked, and the closed sequence detection algorithm is used to determine whether each candidate sequence in Rs is a probabilistic frequent closed sequence or not. If the sequence is probabilistic frequent closed sequence, the discriminant sequence satisfying the condition is added to the result set RsTmp. Finally, by combining with the rule-based classification method or support vector machine existing maturedata classification methods, the data classification is completed. As that complement of the discriminate pattern mining on the uncertain data set, the method of the invention remarkably improves theefficiency, and the result set is more concise.

Description

technical field [0001] The invention discloses a data mining technology, in particular to an uncertain data classification method based on direct distinction sequence mining. Background technique [0002] In recent years, with the expansion of the application scope of uncertain data, the research on uncertain data processing is becoming more and more popular. For sequential pattern mining on certain data sets, there are mostly mature methods, such as the Clospan algorithm for frequent pattern mining, the BIDE algorithm for frequent closed pattern mining, and the DDPMine algorithm for differential pattern mining. Therefore, the current research direction of sequence mining on certain data sets is mainly devoted to proposing more efficient methods, customizing powerful pruning rules, etc. However, there are few related methods for sequential pattern mining on uncertain datasets. For example, based on the prefix projection mode U-PrefixSpan algorithm, this method innovatively...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F16/2458
CPCG06F2216/03G06F18/2411
Inventor 赵宇海印莹刘陆洋王国仁
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products