Document information extraction method and system based on text classification and reading understanding

A technology for reading comprehension and text classification, applied in the field of information content processing, can solve the problems of shortening training and prediction time, difficult model training, low extraction accuracy, etc., to improve prediction accuracy, solve entity nesting, and strong versatility Effect

Active Publication Date: 2022-04-08
杭州实在智能科技有限公司
View PDF11 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The purpose of the present invention is to overcome the problems of model training difficulties, time-consuming increase and low extraction accuracy in existing document information extraction methods in the prior art, and provides a method that can greatly shorten training and prediction time, and improve document extraction. Document Information Extraction Method and System Based on Text Classification and Reading Comprehension Based on Accuracy and Speed ​​of Model in Field Extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document information extraction method and system based on text classification and reading understanding
  • Document information extraction method and system based on text classification and reading understanding
  • Document information extraction method and system based on text classification and reading understanding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] Such as figure 2 As shown, the present invention provides a document information extraction method based on text classification and reading comprehension, including the following steps;

[0076] S1, inputting a document, parsing and identifying the document, and converting the document into a plain text format;

[0077] S2, preprocessing the text content in the document to obtain input data;

[0078] S3, generating corresponding word vectors, word vectors and context vectors according to the input data in step S2, and splicing the word vectors, word vectors and context vectors to obtain spliced ​​vectors;

[0079] S4, if the spliced ​​vector is an answerable type, then use the entity text question corresponding to the spliced ​​vector as the input of the next step;

[0080] S5, using the reading comprehension model to obtain the position of the most matching long label data corresponding to the entity text question through calculation;

[0081] S6. Obtain the long t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of information content processing, and particularly relates to a document information extraction method and system based on text classification and reading understanding. The method comprises the following steps: S1, converting a document into a plain text format; s2, preprocessing the document to obtain input data; s3, generating corresponding word vectors, word vectors and context vectors, and splicing the word vectors, the word vectors and the context vectors to obtain spliced vectors; s4, if the spliced vector is an answerable type, taking the spliced vector as the input of the next step; s5, obtaining the position of the most matched long label data; and S6, finally outputting the to-be-extracted long entity field. The system comprises a text information intelligent extraction module, a data preprocessing module, a feature extraction module, a text classification module, a reading understanding module, a long entity label data generation module and a data post-processing module. The method has the characteristics that the training and prediction time can be greatly shortened, and the field extraction precision and speed of the document extraction model are improved.

Description

technical field [0001] The invention belongs to the technical field of information content processing, and in particular relates to a document information extraction method and system based on text classification and reading comprehension. Background technique [0002] In today's highly informatized office, employees in corporate offices spend nearly one-third of their daily time dealing with text. For example, legal personnel have to review a large number of contracts and draft agreements; accounting personnel have to review a large number of reports. This kind of work has the characteristics of high repetition and heavy workload, and the efficiency of manual processing is low, and it is easy to cause huge irreparable losses due to mistakes. In recent years, with the application and development of machine learning and deep learning in the field of natural language processing, intelligent document review systems have entered a stage of rapid development. [0003] The intell...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/151G06F40/279G06F40/166G06F16/35G06N3/04G06N3/08
Inventor 闫凯峰孙林君
Owner 杭州实在智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products