Parallel sequence pattern mining method based on Spark cloud computing platform

A sequential pattern mining and cloud computing platform technology, applied in computing, instruments, electrical digital data processing, etc., can solve problems such as unbalanced load, high IO overhead, and low efficiency of computing power, so as to improve efficiency and solve unbalanced load Effect

Inactive Publication Date: 2017-11-14
WUHAN UNIV
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0041] In order to solve the problem of inefficient computing power of the serialized sequential pattern mining algorithm when processing massive data and the existing Hadoop-based parallel sequential ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel sequence pattern mining method based on Spark cloud computing platform
  • Parallel sequence pattern mining method based on Spark cloud computing platform
  • Parallel sequence pattern mining method based on Spark cloud computing platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0061] The process of the sequential pattern mining algorithm based on the Spark cloud computing platform designed by the present invention is shown in the appendix figure 1 , all steps can be automatically run by those skilled in the art using computer software technology. The method mainly includes three steps: database segmentation step, support degree counting step and projection database generation step. These three steps are performed iteratively until no new sequential patterns are generated.

[0062] The specific implementation process of the embodi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parallel sequence pattern mining method based on a Spark cloud computing platform. For the problems that the existing serialization sequence pattern mining algorithm has low-effective algorithm capacity during processing mass data and the existing parallel sequence pattern mining algorithm based on Hadoop has high IO overhead and imbalanced loading, the invention designs a reasonable projection sequence database splitting strategy, so that the problem of imbalanced loading can be solved to maximum. On the basis, parallelization on an original PrefixSpan algorithm can be realized according to the characteristic of a MapReduce programming frame, so that the mass data sequence pattern mining efficiency can be improved by utilizing the massive parallel computing capacity of the Spark cloud computing platform. The technical scheme provided by the invention has the characteristics of easiness, and rapidness and can better improve the efficiency of sequence pattern mining.

Description

technical field [0001] The invention belongs to the technical field of sequential pattern mining, in particular to a parallel sequential pattern mining method based on a Spark cloud computing platform. Background technique [0002] (1) Sequential pattern mining technology [0003] [Document 1] first proposed the concept of sequential pattern mining. Sequential pattern mining is to mine frequently occurring ordered events or subsequences in sequence databases. Sequential pattern mining, as one of the important research contents in the field of data mining research, has a wide range of application requirements, such as user purchase behavior analysis, biological sequence analysis, taxi frequent trajectory pattern discovery, and human mobile behavior pattern analysis. Following are the definitions of some terms in sequential pattern mining. [0004] Definition 1: For a set I={i k , k=1, 2,..., m} is a set containing m different items, called a subset for an itemset. [0...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 余啸刘进吴思尧崔晓晖张建升井溢洋
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products