Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Extraction type unsupervised text abstraction method

An unsupervised, extractive technology, applied in the field of text summarization, which can solve the problems of inaccuracy, reduced efficiency, and path dependence of automatic text summarization, and achieve the effect of shortening reading time, improving efficiency, and compressing redundancy.

Inactive Publication Date: 2019-07-12
重庆华龙网海数科技有限公司
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The current automatic text summarization method, especially the extractive automatic text summarization method, has certain defects. For example, when judging important sentences in the original text, there will be path dependence. In the case of long-term operation, there will be certain misjudgments. , leading to the inability to make timely corrections, leading to the inaccuracy of automatic text summarization, and manual intervention will lead to reduced efficiency and increased costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extraction type unsupervised text abstraction method
  • Extraction type unsupervised text abstraction method
  • Extraction type unsupervised text abstraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] The present invention provides an extractive unsupervised text summarization method, the steps are as follows:

[0040] S1. Divide the text into several constituent units (words, sentences) and establish a graph model;

[0041] S2. Use the voting mechanism to sort the important components in the text, and only use the information of a single document itself to realize keyword extraction and abstract;

[0042] Among them, the process of building a model and determining the weight is as follows:

[0043] S201. Preprocessing: dividing the content of the input text or text set into sentences to obtain

[0044] T=[S 1 , S 2 ,...,S m ];

[0045] S202, construct graph G=(V, E), wherein V is a sentence set, perform word segmentation on sentences and remove stop words, and obtain

[0046] S i =[t i,1 , t i,2 ,...,t i,n ];

[0047] Among them, t i,j ∈ S j are reserved candidate keywords;

[0048] S203. Sentence similarity calculation: construct the edge set E in the...

Embodiment 2

[0062] The present invention provides an extractive unsupervised text summarization method, the steps are as follows:

[0063] S1. Divide the text into several constituent units (words, sentences) and establish a graph model;

[0064] S2. Use the voting mechanism to sort the important components in the text, and only use the information of a single document itself to realize keyword extraction and abstract;

[0065] Among them, the process of building a model and determining the weight is as follows:

[0066] S201. Preprocessing: dividing the content of the input text or text set into sentences to obtain

[0067] T=[S 1 , S 2 ,...,S m ];

[0068] S202, construct graph G=(V, E), wherein V is a sentence set, perform word segmentation on sentences and remove stop words, and obtain

[0069] S i =[t i,1 , t i,2 ,...,t i,n ];

[0070] Among them, t i,j ∈ S j are reserved candidate keywords;

[0071] S203. Sentence similarity calculation: construct the edge set E in the g...

Embodiment 3

[0086] The present invention provides an extractive unsupervised text summarization method, the steps are as follows:

[0087] S1. Divide the text into several constituent units (words, sentences) and establish a graph model;

[0088] S2. Use the voting mechanism to sort the important components in the text, and only use the information of a single document itself to realize keyword extraction and abstract;

[0089] Among them, the process of building a model and determining the weight is as follows:

[0090] S201. Preprocessing: dividing the content of the input text or text set into sentences to obtain

[0091] T=[S 1 , S 2 ,...,S m ];

[0092] S202, construct graph G=(V, E), wherein V is a sentence set, perform word segmentation on sentences and remove stop words, and obtain

[0093] S i =[t i,1 , t i,2 ,...,t i,n ];

[0094] Among them, t i,j ∈ S j are reserved candidate keywords;

[0095] S203. Sentence similarity calculation: construct the edge set E in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an extraction type unsupervised text abstract method. The method comprises the following steps: S1, dividing a text into a plurality of composition units (words and sentences),and establishing a graph model; s2, sorting important components in the text by utilizing a voting mechanism, and extracting and abstracting keywords only by utilizing information of a single document, wherein the process of establishing the model and determining the weight comprises the following steps: S201, performing preprocessing; and S202, constructing a graph G = (V, E), wherein the V is asentence set, carrying out word segmentation on sentences, and removing stop words to obtain S203, calculating sentence similarity: constructing an edge set E in the graph G, and giving two sentencesbased on the content coverage rate between the sentences. According to the method, the text information redundancy can be compressed, and storage resources are reduced; the effectiveness of reading the information by the user is improved, and the text reading time is shortened; weights and a weight library can be adjusted according to external data, and high timeliness is improved. And the efficiency is improved and the operation cost is reduced.

Description

technical field [0001] The invention relates to a text summarization method, in particular to an extractive unsupervised text summarization method. Background technique [0002] With the explosive growth of information (especially text information) in recent years, we can come into contact with massive amounts of information every day, such as news, papers, Weibo, academic reports, etc. Extracting important and short content from a large amount of text information has become an urgent need, among which automatic text summarization (automatic text summarization) provides an efficient solution. [0003] According to the definition proposed by Radev, an abstract is "a piece of text extracted from one or more texts, which contains important information in the original text, and its length is not more than or much less than half of the original text". Automatic text summarization aims to automatically output concise, fluent summaries that retain key information through machines....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/34
CPCG06F16/3335G06F16/3344G06F16/345
Inventor 周航成
Owner 重庆华龙网海数科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products