Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for automatically extracting text information

A technology for automatic extraction and text information, applied in text database indexing, unstructured text data retrieval, natural language data processing, etc., can solve problems such as poor applicability, and achieve the effect of improving accuracy

Active Publication Date: 2019-01-04
UNIV OF SCI & TECH BEIJING
View PDF7 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a method for automatic text information extraction to solve the existing problem in the prior art that the training data required for the offline-based automatic text information extraction model needs to be manually marked at one time, and the applicability is not strong The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for automatically extracting text information
  • A method for automatically extracting text information
  • A method for automatically extracting text information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will describe in detail with reference to the drawings and specific embodiments.

[0052] The present invention provides an automatic text information extraction method aiming at the problem that the training data required by the existing offline-based automatic text information extraction model needs to be manually marked at one time, and the applicability is not strong.

[0053] Such as figure 1 As shown, the text information automatic extraction method that the embodiment of the present invention provides, comprises:

[0054] Step 1, obtain the text file uploaded by the user, and convert it into a document format that can be analyzed character by character by a computer;

[0055] Step 2, preprocessing the text content in the format-converted document to form a multi-level text unit that is easy to analyze using natural language proce...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for automatically extracting text information, which can continuously improve the accuracy of automatically extracting marker content and label. The method comprises the following steps: acquiring a text file uploaded by a user, and converting the text file into a document format capable of character-by-character analysis by a computer; preprocessing the text content of the converted document to form a multi-level text unit which is easy to parse by using natural language processing technology; capturing text segments selected by a user, determining markup contents corresponding to the text segments based on the formed multi-level text units, and recommending labels for each markup contents; based on the determined markup content and the recommended tags foreach markup content, an online learning training idea being adopted to train the automatic text extraction model, so as to realize the automatic extraction of markup content and tags. The invention is suitable for automatic text information extraction operation.

Description

technical field [0001] The invention relates to the fields of natural language processing and computer-aided systems, in particular to a method for automatically extracting text information. Background technique [0002] In modern scientific research activities, reading scientific and technological literature is one of the important ways for researchers to acquire professional knowledge, collect relevant data, and understand hot spots and development directions in the field. With the continuous progress of basic scientific research in our country, scientific research results and corresponding scientific and technological literature have shown a blowout growth. At the same time, due to the development of data mining technology, the demand for massive data analysis is unprecedented. Therefore, the use of natural language processing technology to automatically extract the sentence information of scientific papers has become an important way to obtain scientific research informa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/21G06F17/27G06F16/31
CPCG06F40/106G06F40/211
Inventor 黄海友袁兆麟马博渊胡金龙魏晓燕刘婷
Owner UNIV OF SCI & TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products