Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Streaming data integration classification method and device based on concept drift

A technology of concept drift and classification method, applied in the field of flow data integration classification method and device based on concept drift, can solve problems such as frequent data flow, and achieve the effect of ensuring classification accuracy, solving concept drift, and coping with concept drift phenomenon

Inactive Publication Date: 2018-11-06
QILU UNIV OF TECH
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The present invention can solve the frequent and dynamic concept drift of data flow within an acceptable time complexity to a large extent, and deal with the problem of real-time classification processing of data flow while ensuring classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Streaming data integration classification method and device based on concept drift
  • Streaming data integration classification method and device based on concept drift
  • Streaming data integration classification method and device based on concept drift

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] This embodiment discloses a flow data integration classification method based on concept drift, such as figure 1 described, including the following steps:

[0040] The embodiment adopts four datasets: SEA dataset, Converttype dataset, HyperPlane dataset, and Electricity dataset. The embodiments all adopt the large-scale data online analysis platform MOA. Use the data flow generator to simulate data flow, and process the data flow in blocks. Set a threshold for the data block. Within the threshold range, the arriving data samples are filled into the current data block, split into samples with class labels and samples without class labels, and those with class labels are redistributed to each category with for training base classifiers.

[0041] The data flow can be formalized as x 1 ,x 2 ,...x t-1 ,x t ,x t+1 ,(x t =(S 1 ,S 2 ,…S d ,Y)), t is the timestamp, d is the number of sample attributes, and s is the feature vector of the sample. Y is a class label, a...

Embodiment 2

[0070] The purpose of this embodiment is to provide a computing device.

[0071] A stream data integration classification device based on concept drift, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, it realizes:

[0072] Obtain multiple data blocks including labeled and unlabeled sample data;

[0073] training a single-class base classifier for each class in a plurality of said data blocks according to the class label;

[0074] Constructing an integrated classification matrix according to the single-class base classifiers corresponding to a plurality of the data blocks;

[0075] When a new data block arrives, the integrated classification matrix is ​​updated, and the class label is calculated for the unlabeled samples.

Embodiment 3

[0077] The purpose of this embodiment is to provide a computer-readable storage medium.

[0078] A computer-readable storage medium having stored thereon a computer program that, when executed by a processor, performs:

[0079] Obtain multiple data blocks including labeled and unlabeled sample data;

[0080] training a single-class base classifier for each class in a plurality of said data blocks according to the class label;

[0081] Constructing an integrated classification matrix according to the single-class base classifiers corresponding to a plurality of the data blocks;

[0082] When a new data block arrives, the integrated classification matrix is ​​updated, and the class label is calculated for the unlabeled samples.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a streaming data integration classification method and device based on concept drift, comprising the following steps: acquiring a plurality of data blocks including sample datawith class labels and sample data without class labels; training a single-class base classifier for each class in the plurality of the data blocks; constructing an integrated classification matrix according to the single-class base classifiers corresponding to the plurality of data blocks; when a new data blocks arrives, updating the integrated classification matrix, and calculating the class labels of the sample data without class labels. The method and device can solve the problem of frequent and dynamic concept drift of a data stream to a large extent within acceptable time complexity, andachieves real-time data stream classification processing while ensuring classification accuracy.

Description

technical field [0001] The invention belongs to the field of massive sequence data classification, and in particular relates to a method and device for stream data integration classification based on concept drift. Background technique [0002] Streaming data is a massive and fast-arriving sequence data. Data mining on streaming data has been widely used in recent years, and there is also a lot of research on classification methods for streaming data. Traditional single classifiers are used to deal with static non-hidden In the case of concept drift, it has a certain processing efficiency, but the classification accuracy is not high in the face of dynamically changing data streams. Integrating multiple individual classifiers in a combined way can effectively monitor the concept drift of data streams over time. Commonly used integrated classification methods include horizontal integration and vertical integration, which have high classification accuracy and concept drift resp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/241G06F18/214
Inventor 耿玉水张建国鲁芹孙涛刘嵩王新刚赵晶
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products