Extraction type unsupervised text abstraction method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An unsupervised, extractive technology, applied in the field of text summarization, which can solve the problems of inaccuracy, reduced efficiency, and path dependence of automatic text summarization, and achieve the effect of shortening reading time, improving efficiency, and compressing redundancy.

Inactive Publication Date: 2019-07-12

重庆华龙网海数科技有限公司

View PDF4 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The current automatic text summarization method, especially the extractive automatic text summarization method, has certain defects. For example, when judging important sentences in the original text, there will be path dependence. In the case of long-term operation, there will be certain misjudgments. , leading to the inability to make timely corrections, leading to the inaccuracy of automatic text summarization, and manual intervention will lead to reduced efficiency and increased costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0039] The present invention provides an extractive unsupervised text summarization method, the steps are as follows:

[0040] S1. Divide the text into several constituent units (words, sentences) and establish a graph model;

[0041] S2. Use the voting mechanism to sort the important components in the text, and only use the information of a single document itself to realize keyword extraction and abstract;

[0042] Among them, the process of building a model and determining the weight is as follows:

[0043] S201. Preprocessing: dividing the content of the input text or text set into sentences to obtain

[0044] T=[S 1 , S 2 ,...,S m ];

[0045] S202, construct graph G=(V, E), wherein V is a sentence set, perform word segmentation on sentences and remove stop words, and obtain

[0046] S i =[t i，1 , t i，2 ,...,t i，n ];

[0047] Among them, t i，j ∈ S j are reserved candidate keywords;

[0048] S203. Sentence similarity calculation: construct the edge set E in the...

Embodiment 2

[0062] The present invention provides an extractive unsupervised text summarization method, the steps are as follows:

[0063] S1. Divide the text into several constituent units (words, sentences) and establish a graph model;

[0064] S2. Use the voting mechanism to sort the important components in the text, and only use the information of a single document itself to realize keyword extraction and abstract;

[0065] Among them, the process of building a model and determining the weight is as follows:

[0066] S201. Preprocessing: dividing the content of the input text or text set into sentences to obtain

[0067] T=[S 1 , S 2 ,...,S m ];

[0068] S202, construct graph G=(V, E), wherein V is a sentence set, perform word segmentation on sentences and remove stop words, and obtain

[0069] S i =[t i，1 , t i，2 ,...,t i，n ];

[0070] Among them, t i，j ∈ S j are reserved candidate keywords;

[0071] S203. Sentence similarity calculation: construct the edge set E in the g...

Embodiment 3

[0086] The present invention provides an extractive unsupervised text summarization method, the steps are as follows:

[0087] S1. Divide the text into several constituent units (words, sentences) and establish a graph model;

[0088] S2. Use the voting mechanism to sort the important components in the text, and only use the information of a single document itself to realize keyword extraction and abstract;

[0089] Among them, the process of building a model and determining the weight is as follows:

[0090] S201. Preprocessing: dividing the content of the input text or text set into sentences to obtain

[0091] T=[S 1 , S 2 ,...,S m ];

[0092] S202, construct graph G=(V, E), wherein V is a sentence set, perform word segmentation on sentences and remove stop words, and obtain

[0093] S i =[t i，1 , t i，2 ,...,t i，n ];

[0094] Among them, t i，j ∈ S j are reserved candidate keywords;

[0095] S203. Sentence similarity calculation: construct the edge set E in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an extraction type unsupervised text abstract method. The method comprises the following steps: S1, dividing a text into a plurality of composition units (words and sentences),and establishing a graph model; s2, sorting important components in the text by utilizing a voting mechanism, and extracting and abstracting keywords only by utilizing information of a single document, wherein the process of establishing the model and determining the weight comprises the following steps: S201, performing preprocessing; and S202, constructing a graph G = (V, E), wherein the V is asentence set, carrying out word segmentation on sentences, and removing stop words to obtain S203, calculating sentence similarity: constructing an edge set E in the graph G, and giving two sentencesbased on the content coverage rate between the sentences. According to the method, the text information redundancy can be compressed, and storage resources are reduced; the effectiveness of reading the information by the user is improved, and the text reading time is shortened; weights and a weight library can be adjusted according to external data, and high timeliness is improved. And the efficiency is improved and the operation cost is reduced.

Description

technical field [0001] The invention relates to a text summarization method, in particular to an extractive unsupervised text summarization method. Background technique [0002] With the explosive growth of information (especially text information) in recent years, we can come into contact with massive amounts of information every day, such as news, papers, Weibo, academic reports, etc. Extracting important and short content from a large amount of text information has become an urgent need, among which automatic text summarization (automatic text summarization) provides an efficient solution. [0003] According to the definition proposed by Radev, an abstract is "a piece of text extracted from one or more texts, which contains important information in the original text, and its length is not more than or much less than half of the original text". Automatic text summarization aims to automatically output concise, fluent summaries that retain key information through machines....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/33G06F16/34

CPCG06F16/3335G06F16/3344G06F16/345

Inventor 周航成

Owner 重庆华龙网海数科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Extraction type unsupervised text abstraction method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology