Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for analyzing potential topic phrases in text data

A technology of text data and topics, applied in digital data processing, natural language data processing, instruments, etc., can solve problems such as readability, poor consistency and visualization, inability to obtain effective topic phrase results, and lack of statistical information for phrases, etc. Achieve the effects of strong readability and consistency, wide application value, and high model accuracy

Active Publication Date: 2021-08-31
HUAIYIN INSTITUTE OF TECHNOLOGY
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Purpose of the invention: In order to overcome the deficiencies of the prior art, the present invention provides a method for analyzing potential topic phrases of text data, which overcomes the readability and consistency of the topic results obtained from the traditional topic model training based on the "bag of words" and poor visualization; and solve the problem that similar methods cannot obtain effective topic phrase results due to the lack of statistical information of phrases; the present invention also provides a system for analyzing potential topic phrases in text data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for analyzing potential topic phrases in text data
  • A method and system for analyzing potential topic phrases in text data
  • A method and system for analyzing potential topic phrases in text data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0092] The present invention will be further described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0093] The present invention provides a method for analyzing potential subject phrases of text data, such as figure 1 shown, including the following steps:

[0094] S1 collects a text data set, and performs word segmentation on the text data set to obtain a word expression form of the text data set.

[0095] S2 extracts the effective phrases formed by word collocation according to the words in the text data set, and obtains the mixed expression form of words and phrase sets that are not matched into effective phrases.

[0096] This step specifical...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and system for analyzing potential subject phrases of text data. The method includes: collecting a text data set, and performing word segmentation on the text data set to obtain the expression form of words in the text data set; extracting words according to the text data set The effective phrases formed after the words are matched, and the mixed expression form of the words and phrase sets that are not matched into effective phrases are obtained; the word vector model is obtained after the word vector training is performed on the text data set of the mixed expression form; the DR‑Phrase LDA is constructed and Solve various parameters; train DR‑Phrase LDA, and output potential topic phrases of text data according to the training results. The present invention adopts the phrase topic model based on the word vector, and this model can reasonably improve the statistical information of the phrase in the model training with the help of Chinese linguistic laws in the training of the probability topic model, and specifically uses the method of the word vector to measure the relationship between the phrase component words, Quantitatively reflects the semantic relationship of words in the whole text and phrase parts, making the model more accurate.

Description

technical field [0001] The invention relates to the field of text data mining and analysis, in particular to a method and system for analyzing potential topic phrases of text data. Background technique [0002] With the development of information technology, a large amount of electronic texts have been accumulated in various fields, resulting in information overload. In order to help people quickly retrieve, find and effectively use this information, text semantic and structural analysis has become one of the current research hotspots. Among them, the analysis of potential subject information from text data is one of the key technologies for advanced application systems such as information retrieval, recommendation systems, and automatic summarization. Existing common methods use traditional "bag of words"-based probabilistic topic models such as LDA and PLDA for text topic analysis. The topic results analyzed by these methods are presented in the form of keywords, while h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/216
CPCG06F40/216G06F40/289
Inventor 马甲林张琳程清雯
Owner HUAIYIN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products