Machine learning-based Chinese automatic summarization method

A technology of automatic summarization and machine learning, applied in instruments, special data processing applications, electrical digital data processing, etc., can solve the problems of scarce information, insufficient update time advantage, and not suitable for accurate search by industry.

Inactive Publication Date: 2016-11-16
BEIJING DINGTAI ZHIYUAN TECH CO LTD
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Baidu search information is not specific, so the information is complicated, human-computer interaction is poor, the search results are not professional, and there are many noisy data. It does not support vertical search in various professional fields. It is suitable for rough search and not suitable for precise search by industry.
Secondly, the advantage of its rapid update time has not been fully utilized, and the commercial flavor is too heavy. Usually, the homepage of the searched keywords is basically occupied by companies with high rankings and high bids. It is difficult to find the real natural search results needed.
In addition, Baidu's search ranking technology is not authoritative enough, there are many advertisements, spam websites and dead links in the search results, and the really valuable information is too scarce
The manual summarization of the search results is poor, and the refined manual summaries cannot be intelligently calculated
[0009] Ali attaches great importance to business models and has advantages in artificial intelligence product recommendation algorithms and product classification. However, what Alibaba temporarily provides is information exchange, and it is dead information. Although the information is updated quickly, the amount of information is large, and the information is true, the information It is dead, they will not automatically find the company, they need the company to find the information by itself, because there is a lot of information on the Alibaba website, it is difficult to find some useful information at once, and when the company finds this information through hard work, This information is outdated
[0010] Although Tencent has social information data based on WeChat, its main profit point is in games, and a lot of energy is devoted to the development of games and business models. It does not make good use of social data. And the investment is not large, so there are no breakthrough products in the field of artificial intelligence and natural language processing
[0011] Due to the characteristics of the Chinese language itself and the complexity of Chinese processing, the effect of automatic text summarization is still unsatisfactory. At present, domestic research on summarization is still in its infancy, and most research results only exist in the laboratory. Formed commercial products emerge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine learning-based Chinese automatic summarization method
  • Machine learning-based Chinese automatic summarization method
  • Machine learning-based Chinese automatic summarization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0035] In the field of artificial intelligence, natural language processing is a very important research direction. Moreover, in the system of natural language processing, the most attractive field is intelligent summarization technology.

[0036] Specifically, automatic text summarization (automatic summarization / abstraction) is a technology that uses computers to automatically implement text analysis, content induction and abstract generation. This technology has a very important use in today's rapid development of Internet te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a machine learning-based Chinese automatic summarization method which comprises the following steps: inputting a text, and preprocessing the text; performing text structure division on the preprocessed text information, dividing the preprocessed text into a plurality of semantic paragraphs representing different themes, and calculating the importance degrees of the semantic paragraphs and the importance degrees of paragraphs; performing concept acquisition on the preprocessed text, converting all word expressions in the text into concept expressions, and calculating the importance degree of a concept, the frequency of the concept and the position of the concept; calculating the importance degrees of sentences according to structure information acquired by text division, the frequency of the concept, the position of the concept, the importance degree of the paragraphs and the importance degree of the semantic paragraphs; extracting the sentences with the importance degrees greater than preset values from all the semantic paragraphs; and arranging the sentences with the importance degrees greater than the preset values according to the original order, and outputting the sentences as a summarization result. The machine learning-based Chinese automatic summarization method can automatically generate a summary of a Chinese text.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a machine learning-based automatic Chinese summarization method. Background technique [0002] Domestically, in 1985, Wang Bing published an article titled "Overview of American Machine-Edited Abstracts" in the Journal of Information Science, which introduced the research situation of automatic abstractions abroad. Afterwards, domestic scholars began to explore automatic summarization, and began to study Chinese automatic summarization system. After nearly 30 years of development, China has made certain achievements in the research of Chinese automatic summarization. [0003] Under the leadership of Professor Wang Yongcheng, Shanghai Jiaotong University has developed an automatic summarization system for Chinese documents based on the location method and keyword method, integrating multiple abstraction methods (such as location method, demonstrative phrase syntax...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/345
Inventor 高强曹志松顾海英
Owner BEIJING DINGTAI ZHIYUAN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products