Graph model text abstract generation method based on word frequency and semantics

A graph model and text technology, applied in the field of text summarization, can solve problems such as impracticality, long-distance dependence, and learning frameworks that cannot accurately obtain sentence semantic information, so as to save time and improve pertinence

Pending Publication Date: 2020-05-08
LIAONING UNIVERSITY
View PDF7 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 2. Previously supervised algorithms required training corpus, manual labeling of corpus, etc., which is not practical; compared with supervised learning, unsupervised learning uses unlabeled data
[0005] 3. The abstract generation method based on machine learning can fully obtain the semantic information of words and sentences through the training corpus, but this kind of method relies too much on the corpus containing multiple target words, and is only suitable for processing short texts. Too long text input sequences will As a result, the learning framework cannot accurately obtain the semantic information of the sentence, and the encoder cannot accurately extract the semantic information of the text, resulting in a long-distance dependency problem, which leads to the inability of the model to converge, which in turn affects the accuracy of summary generation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Graph model text abstract generation method based on word frequency and semantics
  • Graph model text abstract generation method based on word frequency and semantics
  • Graph model text abstract generation method based on word frequency and semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] 1. Word segmentation: First, each sentence is segmented and part-of-speech tagged, using the standard word segmenter in the natural language processing package Hanlp. Using the principle of shortest path word segmentation (Viterbi algorithm is used to solve the shortest path) and HMM (hidden Markov model), realize text word segmentation and part-of-speech tagging. It also realizes the recognition of entity names such as numbers, person names, place names, and organization names, and in order to improve the pertinence of word segmentation in specific fields, a user dictionary function is further added to the word segmentation module.

[0066] 2. Filter terms: remove stop words, and perform noise reduction processing on the text. The system filters terms in units of sentences. The first is stop word filtering, low-frequency word filtering, and part-of-speech filtering.

[0067] 3. Word vector training: The BM25 algorithm uses word frequency information to represent the wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a graph model text abstract generation method based on word frequency and semanteme. The method comprises the following steps of 1) performing word segmentation on sentences ina text, and performing part-of-speech tagging; 2) filtering the lexical items, and only reserving the lexical items with specific part-of-speech; and 3) training word vectors by using a Word2Vec model and a BM25 algorithm to form a feature word vector set, further representing sentences, and constructing a sentence-word text matrix; 4) constructing a text undirected graph model through the text matrix; and 5) performing iterative computation of sentence node weights by using a TextRank algorithm until convergence, and selecting TOP-K sentences to generate text abstracts. 6) experimental results show that the method is suitable for industrial production, compared with a traditional text automatic abstracting method considering a single word frequency characteristic of a text and based on atext semantic characteristic, according to the method, under the optimal adjustment factor combination, a higher Rouge value is obtained, it is proved that the method effectively integrates text wordfrequency and semantic features, and then the abstract generation accuracy is improved through a TextRank algorithm based on contextual information.

Description

technical field [0001] The invention relates to a method for generating text summaries, in particular to a method for generating text summaries based on word frequency and semantic graph model. Background technique [0002] The BM25 algorithm or the traditional neural network algorithm is commonly used in text summarization to evaluate the correlation between search terms and documents. However, the following problems will be highlighted in the selection stage of the text similarity used in text summarization generation, so that the text similarity has a significant impact on performance. lacking. [0003] 1. Traditional neural network algorithms require a large corpus, long training time, slow generation of summaries, and poor applicability. Using a shallow neural network to compress the dimension of words can shorten the training time and generate summaries faster. [0004] 2. Previously supervised algorithms required training corpus and manual labeling of corpus, which ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/34G06F40/284G06F40/289G06K9/62G06N3/04G06N3/08G06Q10/06
CPCG06F16/345G06N3/08G06Q10/06393G06N3/045G06F18/29
Inventor 王青松马腾张衡张鑫琪王军接磊刘庆楠王雪彤祝慷骏
Owner LIAONING UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products