Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for extracting file information

An extraction method and technology for document information, applied in the field of information recognition, can solve problems such as lack of generality and errors in recognition results.

Inactive Publication Date: 2017-02-15
BEIJING FORESTRY UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method can realize the recognition of non-plain text files, but there are certain errors in the recognition results, and currently only realizes the processing of multiple-choice questions, which is not universal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for extracting file information
  • A method for extracting file information
  • A method for extracting file information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0026] This embodiment uses Word test paper information as an example to illustrate the information extraction method provided by the present invention. The file information applicable to the method of the present invention is any file information whose header of the information content includes identification characters, and the identification characters include various plain text information, not limited to on digital information. In addition, since there is no need to identify information such as formulas, tables and / or pictures in the content of the file information, the Word test paper information may include content information of formulas, tables and / or pictures.

[0027] see figure 1 and figure 2 , the method includes the following steps:

[0028] 101: Obtain file information in sequence by paragraphs;

[0029] Usually Word test paper in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for extracting file information. The method includes the steps that file information is obtained in sequence in a paragraph mode, whether the paragraph contains at least one identification character is searched, if the identification character is found, the paragraph is used as the initial paragraph of an information block. At least one identification character of the file information is identified, so that needed information blocks can be quickly and accurately cut from the file information, formulas, sheets and / or pictures and other information in file content do not need to be identified, and the method further is suitable for files containing formulas and other information, and the application range of the method is widened. The method is combined with a support vector machine and shallow syntactic parsing, so that after primary identification, error results can be corrected, and identification accuracy is improved.

Description

technical field [0001] The present invention relates to the technical field of information identification, and in particular, to a method for extracting file information. Background technique [0002] Due to the popularization and development of the Internet, a large amount of information can now be searched from the Internet. By using the searched information to automatically build an information base that meets the requirements, the work of establishing the information base can be more automated. This method is especially suitable for the processing of test paper information. How to identify a large amount of test paper information and use the identified test question information to automatically complete the initialization of the test question bank is a key step in the construction of the test question bank system and an important research topic in computer-aided teaching. [0003] The traditional test question bank construction work is to manually enter the test questio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F40/253
Inventor 李冬梅覃延陈志泊
Owner BEIJING FORESTRY UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products