Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Large-Scale Data Mining Method Guaranteeing Quality Monotonicity

A large-scale data and monotonic technology, applied in the fields of electronic digital data processing, digital data information retrieval, special data processing applications, etc., can solve problems such as difficulty in mining result quality and resource constraints, approximate result quality monotonicity, etc.

Active Publication Date: 2019-01-25
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The data capacity and type diversity of big data make us willing to use algorithms to generate approximate results to mine big data. Traditional algorithms are difficult to achieve mining results when mining big data under limited time and resource constraints. The balance between the quality of , and the resource constraints and the problem of guaranteeing the monotonicity of the quality of approximate results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Large-Scale Data Mining Method Guaranteeing Quality Monotonicity
  • A Large-Scale Data Mining Method Guaranteeing Quality Monotonicity
  • A Large-Scale Data Mining Method Guaranteeing Quality Monotonicity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following examples describe the present invention in more detail.

[0047] The present invention proposes a large data mining method that guarantees quality monotonicity, the flow of the method is as follows figure 1 shown. The specific embodiment of the present invention is:

[0048] The first stage: perform operations such as data preprocessing on the data set, and represent the data as a form that can be processed by the mining part.

[0049] Step 1) Obtain the original iris data set (as shown in Table 1).

[0050] Step 2) Dimensionality reduction is performed on the data by principal component analysis method. Prevent the phenomenon of dimension disaster.

[0051] In this example, the iris data set contains information on 150 iris species, each of which is taken from one of three iris species: Setosa, Versicolour, and Virginica. The characteristics of each flower are described by the following 5 attributes:

[0052] (1) Sepal length (cm)

[0053] (2) Sepa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data mining method capable of guaranteeing quality monotony. The method comprises the following steps: after an original big data set is compressed by a PCA (principal components analysis) technology, mapping the original big data set onto an R-tree data structure; then, carrying out mining processing on the data set by an improved K-nearest neighbor classification algorithm. The method mainly comprises the following two parts including a coding part and a mining part, wherein the coding part utilizes R-tree to present data, data with high similarity in the data is combined to serve as one node of the R-tree so as to achieve a purpose of mass data compression and improve the efficiency of the mining part; the mining part utilizes the thought of the improved K-nearest neighbor classification algorithm to process the data node and predict the classification of an input test point. According to the large-scale data mining method, the problem that the quality of a mining result and resource restriction cannot be balanced and the quality monotony of an approximate result cannot be guaranteed when big data is mined by a traditional algorithm under the restriction of limited time and resource restriction can be solved.

Description

technical field [0001] The invention relates to a method for efficiently processing data, through which the monotonicity of the quality of large-scale data mining results is guaranteed, and belongs to the cross-technical application field of data mining, big data and computer software. Background technique [0002] The data capacity and type diversity of big data make us willing to use algorithms to generate approximate results to mine big data. Traditional algorithms are difficult to achieve mining results when mining big data under limited time and resource constraints. The balance of quality against resource constraints and the problem of ensuring quality monotonicity of approximated results. To solve this problem, based on Shannon entropy, we design a big data mining method that guarantees quality monotonicity. The mining method is divided into two parts: the coding part and the mining part. By ensuring the monotonicity of the entropy of the coding part of the algorithm ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2458
CPCG06F16/2465
Inventor 陈志党凯乐岳文静黄继鹏芮路
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products