Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unsupervised learning-based text automatic abstract method, system and device, and medium

An unsupervised learning and automatic summarization technology, applied in the field of text summarization, can solve the problem of high data acquisition cost, achieve the effect of solving high acquisition cost, ensuring accuracy and readability, and reducing cost

Active Publication Date: 2019-06-28
SOUTH CHINA UNIV OF TECH
View PDF8 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Most of the existing automatic text summarization methods are based on the process of supervised learning training, and supervised learning requires a large amount of manually labeled data. This process usually has the problem of high data acquisition costs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised learning-based text automatic abstract method, system and device, and medium
  • Unsupervised learning-based text automatic abstract method, system and device, and medium
  • Unsupervised learning-based text automatic abstract method, system and device, and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] This embodiment provides a method for automatic text summarization based on unsupervised learning, which is realized by using a generation network, a classification and discrimination network, and an authenticity discrimination network. The specific descriptions of the generation network, classification and discrimination network, and authenticity discrimination network are as follows:

[0073] 1) The input of the generation network is the original text (long text) to be processed, and the output is a shorter text. When the generated network is trained strong enough, the output text can be regarded as a summary (short text) of the original text; during testing, the generated network is the only network used, and the structure of the generated network is as follows: figure 1 shown.

[0074] 2) The classification discriminant network is the first discriminant network, whose input is the original text (long text) and the abstract (short text) that generates the network out...

Embodiment 2

[0117] Such as Figure 9 As shown, the present embodiment provides a text automatic summarization system based on unsupervised learning, the system includes an acquisition module 901, a building module 902, a first pre-training module 903, a second pre-training module 904, and a third pre-training module 905, confrontation training module 906 and text summary module 907, the specific functions of each module are as follows:

[0118] The obtaining module 901 is used to obtain a training set, randomly scramble the original text and the abstract in the training set, obtain the original text set and the abstract set, and obtain a data set of text classification;

[0119] The building module 902 is used to build a generation network, a classification and discrimination network, and an authenticity discrimination network.

[0120] The first pre-training module 903 is used to pre-train the generation network by using the original text set.

[0121] The second pre-training module 90...

Embodiment 3

[0128] This embodiment provides a computer device, which may be a server, a computer, etc., such as Figure 10 As shown, it includes a processor 1002 connected through a system bus 1001, a memory, an input device 1003, a display 1004 and a network interface 1005. The processor is used to provide calculation and control capabilities. The memory includes a non-volatile storage medium 1006 and internal Memory 1007, the non-volatile storage medium 1006 stores an operating system, computer programs and databases, the internal memory 1007 provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium, and the processor 1002 executes memory storage During the computer program, realize the text automatic summarization method of above-mentioned embodiment 1, as follows:

[0129] Obtain the training set, randomly scramble the original text and the abstract in the training set, obtain the original text collection and the abstrac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text automatic abstract method, system and device based on unsupervised learning and a medium, and the method comprises the steps: obtaining a training set, carrying out therandom disordering of an original text and an abstract in the training set, obtaining an original text set and an abstract set, and obtaining a data set of text classification; establishing a generation network, a classification discrimination network and an authenticity discrimination network; pre-training the generated network by adopting the original text set; pre-training the classification discrimination network by adopting the data set of text classification; pre-training the authenticity discrimination network by adopting the abstract set and the pre-trained text output by the generation network; performing confrontation training on the generation network, the classification discrimination network and the authenticity discrimination network; inputting the original text to be processed into the generative network after confrontation training, and outputting the abstract of the original text. Original text-without manual marking And the abstract paired data can be trained and learned, so that the data acquisition cost is greatly reduced.

Description

technical field [0001] The invention relates to an automatic text summarization method, system, equipment and medium based on unsupervised learning, and belongs to the field of text summarization. Background technique [0002] Automatic text summarization uses a computer to automatically generate a summary of the input text, and the summary needs to contain the main information of the original document. The main idea of ​​automatic text summarization is to find a subset of data that contains the "main information" of the entire source text, which is one of the applications of machine learning and data mining. With the rapid development of big data, the demand for automatic text summarization is also increasing. [0003] With the development of technology in the field of deep learning and natural language processing, the technology of text summarization is becoming more and more mature. This technology is widely used in today's industry, such as news headline generation, sc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F16/35
Inventor 庄浩杰王聪孙庆华
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products