Document information extraction method and system based on text classification and reading understanding

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for reading comprehension and text classification, applied in the field of information content processing, can solve the problems of shortening training and prediction time, difficult model training, low extraction accuracy, etc., to improve prediction accuracy, solve entity nesting, and strong versatility Effect

Active Publication Date: 2022-04-08

杭州实在智能科技有限公司

View PDF11 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0016] The purpose of the present invention is to overcome the problems of model training difficulties, time-consuming increase and low extraction accuracy in existing document information extraction methods in the prior art, and provides a method that can greatly shorten training and prediction time, and improve document extraction. Document Information Extraction Method and System Based on Text Classification and Reading Comprehension Based on Accuracy and Speed of Model in Field Extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0075] Such as figure 2 As shown, the present invention provides a document information extraction method based on text classification and reading comprehension, including the following steps;

[0076] S1, inputting a document, parsing and identifying the document, and converting the document into a plain text format;

[0077] S2, preprocessing the text content in the document to obtain input data;

[0078] S3, generating corresponding word vectors, word vectors and context vectors according to the input data in step S2, and splicing the word vectors, word vectors and context vectors to obtain spliced vectors;

[0079] S4, if the spliced vector is an answerable type, then use the entity text question corresponding to the spliced vector as the input of the next step;

[0080] S5, using the reading comprehension model to obtain the position of the most matching long label data corresponding to the entity text question through calculation;

[0081] S6. Obtain the long t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of information content processing, and particularly relates to a document information extraction method and system based on text classification and reading understanding. The method comprises the following steps: S1, converting a document into a plain text format; s2, preprocessing the document to obtain input data; s3, generating corresponding word vectors, word vectors and context vectors, and splicing the word vectors, the word vectors and the context vectors to obtain spliced vectors; s4, if the spliced vector is an answerable type, taking the spliced vector as the input of the next step; s5, obtaining the position of the most matched long label data; and S6, finally outputting the to-be-extracted long entity field. The system comprises a text information intelligent extraction module, a data preprocessing module, a feature extraction module, a text classification module, a reading understanding module, a long entity label data generation module and a data post-processing module. The method has the characteristics that the training and prediction time can be greatly shortened, and the field extraction precision and speed of the document extraction model are improved.

Description

technical field [0001] The invention belongs to the technical field of information content processing, and in particular relates to a document information extraction method and system based on text classification and reading comprehension. Background technique [0002] In today's highly informatized office, employees in corporate offices spend nearly one-third of their daily time dealing with text. For example, legal personnel have to review a large number of contracts and draft agreements; accounting personnel have to review a large number of reports. This kind of work has the characteristics of high repetition and heavy workload, and the efficiency of manual processing is low, and it is easy to cause huge irreparable losses due to mistakes. In recent years, with the application and development of machine learning and deep learning in the field of natural language processing, intelligent document review systems have entered a stage of rapid development. [0003] The intell...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/151G06F40/279G06F40/166G06F16/35G06N3/04G06N3/08

Inventor 闫凯峰孙林君

Owner 杭州实在智能科技有限公司

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Document information extraction method and system based on text classification and reading understanding

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology