Automatic extraction method for abstracts based on public company announcements

An automatic extraction and announcement technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve the problems of affecting the accuracy of node sentence weights, ignoring, affecting the accuracy of abstracts, etc.

Active Publication Date: 2016-12-14
SUN YAT SEN UNIV
View PDF7 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the method based on graph sorting has defects in calculating the similarity between sentences, ignoring the unique characteristics of listed company announcements
The title of a listed company’s announcement document often contains a lot of key information, so a sentence with a high degree of similarity to the title of the announcement is more likely to become an abstract, and this sentence will have a greater impact on the surrounding sentences. In addition, the listed company’s announcement It often contains a lot of key terms (restructuring, allotment, repurchase, additional issuance, net profit, increase or decrease over the same period, risk, etc.), and the sentences containing these key terms are more likely to become abstracts. Of course, company announcements are often more standardized. Therefore, the position of the sentence in the paragraph also contains a lot of information, so the method of only using the similarity between sentences does not take into account these many factors, which affects the accuracy of the node sentence weight and the accuracy of the formed summary. sex

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic extraction method for abstracts based on public company announcements
  • Automatic extraction method for abstracts based on public company announcements
  • Automatic extraction method for abstracts based on public company announcements

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to make the objectives, technical solutions and advantages of the present invention more clear, the following references are made to the appended figure 1 Do further detailed explanation.

[0052] An automatic extraction method based on an announcement summary of a listed company, specifically comprising the following steps:

[0053] S1: Crawling listed company announcement documents from the stock exchange to form an announcement document database, wherein each document is used as a target document to be extracted;

[0054] S2: adopt word2vec model, obtain word vector from text corpus;

[0055] The specific steps include:

[0056] (1) participle;

[0057] Perform word segmentation processing on the announcement document, filter out low-frequency words and remove stop words, special symbols, punctuation marks and some tag information;

[0058] (2) Build a Huffman tree;

[0059] In the constructed Huffman tree, all non-leaf nodes store a parameter vector, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic extraction method for abstracts based on public company announcements. The automatic extraction method for abstracts comprises the following steps: S1, obtaining public company announcement files from securities exchanges to form an announcement file database; S2, utilizing a word2vec model to obtain word vectors from a text corpus; S3, calculating out the similarity between sentences to construct a sentence graph model; S4, calculating out the weight of the sentences; S5, adjusting a sentence weight matrix according to sentence positions; S6, choosing sentences which are maximum in weight and free of redundancy to form an abstract. The automatic extraction method for abstracts based on the public company announcements can provide accurate abstract files with higher readability for financial market investors, help the investors to understand in a shorter time and well make investment judgments and also provide important indexes for quantitative fund companies.

Description

technical field [0001] The invention relates to the field of data extraction, in particular to an automatic extraction method based on an abstract of a listed company announcement. Background technique [0002] As of mid-June 2016, there were a total of 2,832 stocks in the Shanghai and Shenzhen stock markets, and hundreds to thousands of announcements were issued every day. With the rapid development of the Internet, the cost of editing is getting lower and lower, the dissemination of information is faster and faster, and the number of daily announcements is also increasing rapidly. At present, the announcements of listed companies are generally lengthy and the terminology is professional. However, most investors in China are retail investors and do not have enough time to read the announcements carefully. Moreover, it is difficult for ordinary investors to quickly identify the important content and make reasonable judgments. Therefore, it is very important and valuable to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/258
Inventor 郑子彬李阳
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products