An xml big data clustering integration method for parallel AP propagation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An integrated method and big data technology, applied in the direction of electrical digital data processing, special data processing applications, semi-structured data query, etc., can solve the problems of data noise, many isolated points, fast generation speed, and huge volume, etc., to eliminate Effects on ambiguity puzzles, widening differences, and improving performance

Active Publication Date: 2017-05-17

西安蓝雪信息技术有限公司

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] At present, XML big data, like other types of big data, has the characteristics of large volume, complex structure, fast generation speed, huge value but low density, and the data volume ranges from MB to GB, TB, PB to ZB. In addition, it The data presents non-convex characteristics and is very unevenly distributed, with many data noises and outliers, and many data appear on the Web in the form of data streams. Therefore, for these fast-changing and highly time-sensitive XML big data , if traditional algorithms are used for clustering integration, these integration methods have obvious deficiencies in solving large XML data sets, which are mainly manifested in: (1) large storage space occupied, slow prediction speed, and poor prediction effect; (2) Online machine learning is difficult, effective for small-scale data, but poor for large-scale data; (3) poor dynamics and real-time performance, unable to process streaming data; (4) due to lack of prior knowledge, the algorithm cannot grasp the global characteristics of XML data distribution Inaccurate, eventually leading to unsatisfactory requirements for clustering accuracy and clustering results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0035] Step 1: Perform preprocessing such as cleaning, dividing and extracting for each XML big data, that is, after cleaning each XML big data, extract all nodes and their nodes from the big data through the division method combining scale and content Subset, calculate the frequency of the subset of nodes in its data, divide the nodes and their descendants belonging to the same subject content into the same subset as much as possible according to the frequency of nodes, and divide the nodes of different subject content into different sub-sets. and extract n subtrees from the divided subset according to the frequency of keywords, find all the paths from the root node to the leaf nodes of each extracted subtree, and use the path as the input source for disambiguation to resolve ambiguity Words are disambiguated, and the semantic relevance and context semantic similarity of each keyword are obtained;

[0036] Its similarity is obtained as follows: Assume that n subtree sets D'=(...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a parallel AP propagating XML big data clustering integration method. The method includes the steps that preprocessing such as cleaning, dividing and extracting is conducted on each piece of XML big data; all keywords in an extracted subtree are regarded as the feature description of a data point; a clustering integration basic idea is adopted; a large similarity matrix decomposition idea is also related; ultimate clustering integration is achieved. According to the parallel AP propagating XML big data clustering integration method, a random subspace classifier is established, and parallel random selection of the subtree is conducted to enlarge the difference of clustering members and improve the clustering performance; disambiguation processing is introduced, the ambiguity problem caused by the inconformity of semantic related environments and content in each subtree is solved, meanwhile, semantic similarity and path similarity are integrated, and the influence of inaccurate XML document similarity calculation on an initial clustering result is eliminated; a system capacity theory is used, the iterative approach of an attribution matrix and an absorption matrix in an AP algorithm is improved, so that clustering integration of the XML big data is realized, and the clustering integration method efficiency is improved.

Description

technical field [0001] The invention belongs to the application field of big data integration methods, in particular to an XML big data clustering and integration method for parallel AP propagation. Background technique [0002] At present, XML big data, like other types of big data, has the characteristics of large volume, complex structure, fast generation speed, huge value but low density, and the data volume ranges from MB to GB, TB, PB to ZB. In addition, it The data presents non-convex characteristics and is very unevenly distributed, with many data noises and outliers, and many data appear on the Web in the form of data streams. Therefore, for these fast-changing and highly time-sensitive XML big data , if traditional algorithms are used for clustering integration, these integration methods have obvious deficiencies in solving large XML data sets, which are mainly manifested in: (1) large storage space occupied, slow prediction speed, and poor prediction effect; (2) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F17/30G06F9/44

CPCG06F16/83

Inventor 蒋勇

Owner 西安蓝雪信息技术有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

An xml big data clustering integration method for parallel AP propagation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology