Spark platform-based parallel sequential pattern mining method

A sequential mode mining and sequential mode technology, which is applied in the fields of instruments, computing, and electrical digital data processing, can solve problems such as unbalanced load, high IO overhead, and low efficiency of computing power, so as to improve efficiency and solve unbalanced load. Effect

Inactive Publication Date: 2017-09-08
WUHAN UNIV
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0034] Aiming at the problem that the existing serialized sequential pattern mining algorithm has low computing power when processing massive data and the existing Hadoop-based parallel s...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark platform-based parallel sequential pattern mining method
  • Spark platform-based parallel sequential pattern mining method
  • Spark platform-based parallel sequential pattern mining method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0050] The process of the sequential pattern mining algorithm based on the Spark platform designed by the present invention is shown in the appendix figure 1 , all steps can be automatically run by those skilled in the art using computer software technology. The specific implementation process of the embodiment is as follows:

[0051] Step 1, database segmentation;

[0052] Divide the sequence database into database fragments of the same size (the number of fragments is determined according to the number of working nodes in the cluster), so that the total l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Spark platform-based parallel sequential pattern mining method. Aiming at the problem of conventional serial sequential pattern mining algorithms that the computing power is inefficient when massive data is processed, and the problems of conventional Hadoop-based parallel sequential pattern mining algorithms that the IO (Input/Output) overhead is high and the load is unbalanced, the method has the characteristics that a reasonable sequential database decomposing strategy is designed, and the problem of unbalanced load is solved to the maximum limit; and on the basis of the characteristics of a MapReduce programming frame, primal GSP (Generalized Sequential Pattern) algorithms are paralleled, and the massive data sequential pattern mining efficiency is increased by utilizing the large-scale parallel computing power of a Spark cloud computing platform.

Description

technical field [0001] The invention belongs to the technical field of sequential pattern mining, in particular to a parallel sequential pattern mining method based on a Spark platform. Background technique [0002] (1) Sequential pattern mining technology [0003] [Document 1] first proposed the concept of sequential pattern mining. Sequential pattern mining is to mine frequently occurring ordered events or subsequences in sequence databases. Sequential pattern mining, as one of the important research contents in the field of data mining research, has a wide range of application requirements, such as user purchase behavior analysis, biological sequence analysis, taxi frequent trajectory pattern discovery, and human mobile behavior pattern analysis. [Document 2] proposed a GSP algorithm that uses redundant candidate pattern pruning strategy and hash tree to realize fast memory access of candidate patterns. [Document 3] proposed the SPADE algorithm based on vertical data r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 余啸刘进吴思尧崔晓晖张建升井溢洋
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products