Document information extraction method and system based on text classification and reading comprehension

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reading comprehension and text classification technology, applied in the field of information content processing, can solve problems such as shortening training and prediction time, difficulty in model training, and low extraction accuracy, improving prediction accuracy, solving entity nesting, and having strong versatility. Effect

Active Publication Date: 2022-07-19

杭州实在智能科技有限公司

View PDF11 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0016] The purpose of the present invention is to overcome the problems of model training difficulties, time-consuming increase and low extraction accuracy in existing document information extraction methods in the prior art, and provides a method that can greatly shorten training and prediction time, and improve document extraction. Document Information Extraction Method and System Based on Text Classification and Reading Comprehension Based on Accuracy and Speed of Model in Field Extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0075] like figure 2 As shown, the present invention provides a document information extraction method based on text classification and reading comprehension, comprising the following steps;

[0076] S1, input a document, parse and identify the document, and convert the document into a plain text format;

[0077] S2, preprocess the text content in the document to obtain input data;

[0078] S3, according to the input data in step S2, generate the corresponding word vector, word vector and context vector, and splicing the word vector, word vector and context vector to obtain a spliced vector;

[0079] S4, if the spliced vector is an answerable type, the entity text question corresponding to the spliced vector is used as the input of the next step;

[0080] S5, using the reading comprehension model, obtain the position of the most matching long label data corresponding to the entity text question by calculation;

[0081] S6, obtain long label data according to the posi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of information content processing, and in particular relates to a document information extraction method and system based on text classification and reading comprehension. The method includes S1, converting the document into a plain text format; S2, preprocessing the document to obtain input data; S3, generating the corresponding word vector, word vector and context vector, and splicing to obtain the spliced vector; S4, if splicing The latter vector is an answerable type, which is used as the input of the next step; S5, the position of the most matching long label data is obtained; S6, the final output is the long entity field to be extracted. The system includes a text information intelligent extraction module, a data preprocessing module, a feature extraction module, a text classification module, a reading comprehension module, a long entity label data generation module, and a data post-processing module. The invention has the characteristics of greatly shortening the training and prediction time, and improving the accuracy and speed of the document extraction model when extracting fields.

Description

technical field [0001] The invention belongs to the technical field of information content processing, and in particular relates to a document information extraction method and system based on text classification and reading comprehension. Background technique [0002] In today's highly informatized office, corporate office employees spend nearly 1 / 3 of their daily time dealing with text. For example, legal personnel have to review a large number of contracts and draft agreements; accounting personnel have to review a large number of reports. This kind of work has the characteristics of high repetition and large workload, and the manual processing efficiency is low, and it is easy to cause huge irreparable losses due to mistakes. In recent years, with the application and development of machine learning and deep learning in the field of natural language processing, the intelligent document review system has entered a stage of rapid development. [0003] The document intellig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F40/151G06F40/279G06F40/166G06F16/35G06N3/04G06N3/08

Inventor 闫凯峰孙林君

Owner 杭州实在智能科技有限公司

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Document information extraction method and system based on text classification and reading comprehension

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology