Document information extraction method and system based on text classification and reading comprehension

A reading comprehension and text classification technology, applied in the field of information content processing, can solve problems such as shortening training and prediction time, difficulty in model training, and low extraction accuracy, improving prediction accuracy, solving entity nesting, and having strong versatility. Effect

Active Publication Date: 2022-07-19
杭州实在智能科技有限公司
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The purpose of the present invention is to overcome the problems of model training difficulties, time-consuming increase and low extraction accuracy in existing document information extraction methods in the prior art, and provides a method that can greatly shorten training and prediction time, and improve document extraction. Document Information Extraction Method and System Based on Text Classification and Reading Comprehension Based on Accuracy and Speed ​​of Model in Field Extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document information extraction method and system based on text classification and reading comprehension
  • Document information extraction method and system based on text classification and reading comprehension
  • Document information extraction method and system based on text classification and reading comprehension

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] like figure 2 As shown, the present invention provides a document information extraction method based on text classification and reading comprehension, comprising the following steps;

[0076] S1, input a document, parse and identify the document, and convert the document into a plain text format;

[0077] S2, preprocess the text content in the document to obtain input data;

[0078] S3, according to the input data in step S2, generate the corresponding word vector, word vector and context vector, and splicing the word vector, word vector and context vector to obtain a spliced ​​vector;

[0079] S4, if the spliced ​​vector is an answerable type, the entity text question corresponding to the spliced ​​vector is used as the input of the next step;

[0080] S5, using the reading comprehension model, obtain the position of the most matching long label data corresponding to the entity text question by calculation;

[0081] S6, obtain long label data according to the posi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of information content processing, and in particular relates to a document information extraction method and system based on text classification and reading comprehension. The method includes S1, converting the document into a plain text format; S2, preprocessing the document to obtain input data; S3, generating the corresponding word vector, word vector and context vector, and splicing to obtain the spliced ​​vector; S4, if splicing The latter vector is an answerable type, which is used as the input of the next step; S5, the position of the most matching long label data is obtained; S6, the final output is the long entity field to be extracted. The system includes a text information intelligent extraction module, a data preprocessing module, a feature extraction module, a text classification module, a reading comprehension module, a long entity label data generation module, and a data post-processing module. The invention has the characteristics of greatly shortening the training and prediction time, and improving the accuracy and speed of the document extraction model when extracting fields.

Description

technical field [0001] The invention belongs to the technical field of information content processing, and in particular relates to a document information extraction method and system based on text classification and reading comprehension. Background technique [0002] In today's highly informatized office, corporate office employees spend nearly 1 / 3 of their daily time dealing with text. For example, legal personnel have to review a large number of contracts and draft agreements; accounting personnel have to review a large number of reports. This kind of work has the characteristics of high repetition and large workload, and the manual processing efficiency is low, and it is easy to cause huge irreparable losses due to mistakes. In recent years, with the application and development of machine learning and deep learning in the field of natural language processing, the intelligent document review system has entered a stage of rapid development. [0003] The document intellig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/151G06F40/279G06F40/166G06F16/35G06N3/04G06N3/08
Inventor 闫凯峰孙林君
Owner 杭州实在智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products