Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Spark platform-based uncertain set frequent item mining method

A technology for determining data and data items, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of large memory overhead of algorithms, does not consider the different importance of data items, and cannot meet the accuracy of data set mining methods and efficiency issues to achieve the effect of improving execution efficiency

Active Publication Date: 2018-09-07
KUNMING UNIV OF SCI & TECH
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From the perspective of the principle of the algorithm, reducing the candidate set can improve the time complexity of the algorithm to a certain extent, but these algorithms do not take into account the different importance of data items; the non-recursive mode mining algorithm fully integrates the data items Compressed to a tree, the memory overhead of the algorithm is large
In summary, no matter which method is used, it cannot meet the accuracy and efficiency required by mining methods for large-scale uncertain datasets.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark platform-based uncertain set frequent item mining method
  • Spark platform-based uncertain set frequent item mining method
  • Spark platform-based uncertain set frequent item mining method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention will be further described below in combination with the accompanying drawings and specific embodiments.

[0055] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0056] Such as figure 1 As shown, Spark divides massive data into groups and assigns the processing of massive data to each work stage under the master node to complete together through the steps of division, and finally integrates the calculation results of each sub-node to obtain the final result.

[0057] Suc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Spark platform-based uncertain set frequent item mining method, and belongs to the field of data mining. According to the method, a novel UWEEP-tree structure is put forwardon the basis of a Spark big data framework to process data sets in parallel without repeatedly scanning the data sets and generating plenty of candidate sets, so that the algorithm execution efficiency is greatly improved; and meanwhile, both the survival probabilities and weight values of uncertain data items are considered, so that frequent items more according with user demands are mined and anew thought is provided for the uncertain set frequent item mining method.

Description

technical field [0001] The invention relates to a method for mining frequent items of an uncertain data set based on a Spark platform, and belongs to the technical field of data mining. Background technique [0002] With the rapid development of Internet technology, various data are generated in the practical application of the network. Among these massive data, many data are incomplete or exist in an uncertain form. Discovering interesting knowledge and content from uncertain data has become a new research direction and hotspot. However, most mining algorithms for uncertain data sets cannot improve the execution efficiency of the algorithm while taking into account the survival probability of data items and their own importance. Usually, frequent pattern mining algorithms for uncertain data sets are divided into three types: one is based on probability distribution or expectation based algorithms considering the probability of occurrence of data items; the other is weight-...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 丁家满杨阳
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products