Method and device of excavation of subject of text big data based on characteristic space decomposition

A feature space, big data technology, applied in special data processing applications, electrical digital data processing, instruments, etc., to achieve the effect of improving time and space efficiency and high-speed processing

Active Publication Date: 2013-05-22
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Due to the above characteristics, the probabilistic topic model analysis technology is limited to small-scale feature sets and small-scale topic sets, and it is difficult to efficiently perform topic analysis on typical big data such as Internet web page information and large digital libraries.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device of excavation of subject of text big data based on characteristic space decomposition
  • Method and device of excavation of subject of text big data based on characteristic space decomposition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The text data mining method of the present embodiment, its step comprises:

[0046] 1. Input preparation:

[0047] 0) Obtain an original text document (such as web page) database;

[0048]1) Express each text document as a feature vector (usually a word vector) to form a document library for topic analysis;

[0049] 2. Model solution:

[0050] 0) Spatial decomposition: Decompose the space of the topic analysis model in terms of features, thus forming several subspaces;

[0051] 1) Loop (until convergence) to solve in parallel,

[0052] 1-1) The model parameter P associated with (each) subspace i Load the corresponding parallel executable;

[0053] 1-2) (Each) parallel execution body loads the data subset D suitable for the sub-model it contains i ;

[0054] 1-3) (Each) parallel execution body calculates the corresponding sub-statistic S i ;

[0055] 1-4) Results summary: Aggregate the sub-statistics of all parallel executives to obtain the global statistics S, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a device of excavation of a subject of text big data based on characteristic space decomposition. The method comprises two associated parts: one part is a space decomposition method based on the subject characteristic, the second part is an acceleration method based on model solution of multiple sub-spaces. The key of the space decomposition method is to utilize model characteristics to decouple the data samples and the subject assemblies, and therefore segmentation and decomposition of the data space and the subject space are achieved simultaneously, a plurality of sub-model spaces smaller than a full model space are obtained, and complexity of a storage space of a calculation solution algorithm is effectively reduced. At the same time relevant independence among the sub spaces can be utilized simultaneously to reflect the sub spaces to all kinds of parallel entities, and therefore time complexity of the calculation algorithm is effectively reduced. The method of the excavation of the subject of the text big data based on the characteristic space decomposition is capable of sufficiently utilizing parallel processing capability of a calculation device, and achieving parallel expansion processing of large-scaled subject modeling spaces and large-scaled data assemblies.

Description

technical field [0001] The invention belongs to the technical field of computer data mining, and in particular relates to a high-efficiency data processing method and device for topic mining text big data based on a topic analysis model, which is used for typical big data such as Internet web pages and large digital library documents. Perform topic analysis and mining efficiently. Background technique [0002] Computer data mining technology is mainly an intelligent information processing process that uses computers to mine human-understandable information or knowledge from large amounts of data and utilizes them. The rise of the knowledge economy and the vigorous development of the Internet have created an urgent need for computer data mining technology. The main carrier of knowledge is the text of natural language, and the text without further processing is just raw data, which must be analyzed semantically so that the computer can obtain useful knowledge from it. The ke...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 李文波孙乐
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products