Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-document automatic abstract generation method based on phrase subject modeling

A topic modeling and automatic summarization technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve the problem that the influence of automatic summarization cannot be ignored

Active Publication Date: 2016-08-17
ZHEJIANG UNIV
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Similarly, the impact of the same high-frequency words on automatic summarization cannot be ignored

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-document automatic abstract generation method based on phrase subject modeling
  • Multi-document automatic abstract generation method based on phrase subject modeling
  • Multi-document automatic abstract generation method based on phrase subject modeling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to better understand the technical scheme of the present invention, the following in conjunction with the attached figure 1 The present invention is further described.

[0058] The specific steps of this example implementation example are as follows:

[0059] 1) Preprocessing sample multi-documents: Use the Mallet natural language processing tool to segment the documents to obtain phrases and their frequency of occurrence (the length of the phrase is limited to no more than 3), and stop words (such as the, this) need to be removed during this process , invalid words (such as wepurpose), and then construct a word vector space.

[0060] 2) Phrase topic modeling: Based on the LDA topic model, phrases are used instead of words as the object of calculation, the joint probability distribution of documents is calculated, and transformed into the phrase topic model. The schematic diagram of the phrase topic model is as follows figure 2 As shown, and then use the Gib...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-document automatic abstract generation method based on phrase subject modeling. Multiple sample documents are subjected to word segmentation to obtain phrases and frequency of occurrence of the phrases, and the documents are expressed in the form of a phrase bag; joint probability distribution of the documents is calculated on the basis of an LDA subject model, the LDA subject model is converted into a phrase subject model, then a Gibbs sampling algorithm is used for estimating implicit parameters in the phrase subject model according to Bayesian probability, and lastly probability distribution of the subject in words is obtained; the tested documents are subjected to word segmentation, the subject weight and word frequency weight of obtained sentences are calculated and obtained, the final weight of the sentences is obtained by means of weighting calculation, and abstract content is generated according to the final weight. The method is more standard and precise, the relationship between different words is taken into consideration, the subject weight of the sentences is introduced, and the generation result better conforms to the practical essay abstract writing conditions of people after the subject weight of the sentences is introduced.

Description

technical field [0001] The invention relates to a multi-document automatic summarization algorithm, in particular to a multi-document automatic summarization generation method based on phrase topic modeling. Background technique [0002] With the rapid popularization of the Internet, it is more and more convenient for people to obtain information and knowledge. At the same time, due to the explosive growth of network information, it takes a lot of energy for people to process a large amount of text information. So how to deal with people's processing a large amount of text information has naturally become a hot spot of current research. [0003] Multi-document automatic summarization technology is proposed to solve this problem. At present, the application of automatic essays in news articles is relatively mature. The characteristics of news articles are that news articles from different media center on the same event, and use the same words as possible to describe the even...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/345G06F40/289
Inventor 鲁伟明庄越挺张占江
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products