Real-time mining method and device based on distributed data

A technology of distributed data and data, applied in the fields of electronic digital data processing, structured data retrieval, special data processing applications, etc., can solve the problems of difficult to meet the real-time data processing, can only continue to wait, low efficiency and so on

Active Publication Date: 2017-09-29
SHANXI CHINA MOBILE COMM CORP
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Especially in the data preparation stage, it often takes a long time, and the subsequent modeling process can only continue to wait, which is inefficient. Especially, in the face of the massive data processing in the current big data processing, it is also very inefficient and difficult to Meet the needs of real-time data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Real-time mining method and device based on distributed data
  • Real-time mining method and device based on distributed data
  • Real-time mining method and device based on distributed data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] A real-time mining method based on distributed data, such as figure 1 As shown, the method includes:

[0061]Step 101, decomposing the concentrated serial data into data that can perform parallel operations;

[0062] Here, the specific process of this step includes: performing real-time data processing on the centralized serial data according to the stream processing mechanism, so as to decompose it into data that can perform parallel operations. Wherein, the stream processing mechanism may also be referred to as a stream input and output mode. The input data is concentrated serial data, and after real-time data processing by the stream processing mechanism, data that can perform parallel operations can be output.

[0063] Step 102, when the accumulated data that can perform parallel operations reaches a predetermined amount, perform data modeling according to a memory-based distributed algorithm to obtain a data model;

[0064] Here, an example of a memory-based dist...

Embodiment 2

[0075] A real-time mining device based on distributed data according to an embodiment of the present invention, the device includes: a first processing unit, configured to decompose the concentrated serial data into data that can perform parallel operations according to the stream processing mechanism; and A modeling unit, configured to perform data modeling according to a memory-based distributed algorithm when the accumulated data that can perform parallel operations reaches a predetermined amount, to obtain a data model; and a processing unit configured to obtain a data model based on the data model and auxiliary The data is processed and the data processing results are obtained.

[0076] In the first embodiment of the present invention, there are multiple data models, and after the data modeling is performed to obtain the first data model, the data used for the Xth data model is used as the auxiliary data;

[0077] The processing unit is further configured to process the a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a real-time mining method and device based on distributed data. The method comprises the steps that centralized serial data is decomposed into executable parallelly-operated data; when the executable parallelly-operated data is accumulated to reach a predetermined quantity, and data modeling is performed according to a distributed algorithm based on a memory to obtain a data model; processing is performed according to the data model and auxiliary data to obtain a data processing result.

Description

technical field [0001] The invention relates to mining technology, in particular to a real-time mining method and device based on distributed data. Background technique [0002] Existing data mining platforms basically adopt the client / server (C / S) architecture, using a minicomputer with excellent performance as the server and an ordinary PC as the client. The main job of the server is to do calculations, including operations such as Extract-Transform-Load (ETL, Extract-Transform-Load), algorithmic operations, scoring, etc., and the client is mainly to do preliminary data processing and display functions. A more common method is to divide a minicomputer into a partition, including CPU, memory and disk array, as a server, and use an ordinary PC or notebook as a client. [0003] This model has encountered great challenges in the era of big data. First of all, due to the high price of minicomputers, many companies have begun to go to IOE and rarely buy minicomputers. The abo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/27
Inventor 秦晓飞王峰胡建强茹志强邢刚
Owner SHANXI CHINA MOBILE COMM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products