Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for rapidly extracting text from Word document

A technology for extracting text and text format, applied in digital data protection, electrical digital data processing, instruments, etc., can solve the problems of long running time, low efficiency, affecting the use of application systems, etc., to achieve high use value, convenient operation, avoidance of inefficient effects

Active Publication Date: 2014-07-02
AEROSPACE INFORMATION
View PDF2 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] For method one, it is currently the mainstream method for electronic signature products to obtain document formats. It is easy to use and has good compatibility with the Word application system. However, every time a Word element is parsed, the Com interface needs to be called once, which is inefficient and takes a long time to run. When the document is large, the running speed is very slow, which seriously affects the use of the application system, so the electronic signature products using this method do not support the signature of large documents in text format
[0007] For the second method, on the one hand, it is not compatible with the Word application, and it is difficult to integrate and use; on the other hand, it does not support doc format documents well, and the stability is poor. When the document is more complicated, it is easy to get the format.
[0008] For method 3, only the docx document format is announced at present, and the doc document format is not supported, so Word2003 and Word2000 documents cannot be supported

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for rapidly extracting text from Word document
  • Method and device for rapidly extracting text from Word document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to facilitate the understanding of the embodiments of the present invention, further explanations will be given below in conjunction with the accompanying drawings and by taking specific embodiments as examples, and each embodiment does not constitute a limitation of the present invention.

[0026] A method for quickly extracting text formats in a Word document can be combined with some document processing applications to quickly extract all text formats in the document. One of its specific implementations, combined with the electronic signature device, when performing operations such as signature verification in a Word document, it is necessary to extract all text formats of the current document, such as figure 1 shown, including the following steps:

[0027] Step 101, divide the document into multiple parts. Specifically, obtain the Com pointer of the current document to be processed passed in by the upper-level electronic signature program; call the ms-word...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An embodiment of the invention provides a method and device for rapidly extracting text from a Word document. The method comprises the steps of segmenting the document into a plurality of parts; the content of each part of the document is converted to be in the character string format, and obtaining character string data corresponding to the content of each part of the document; extracting a set of typefaces and colors used in the character string data; parsing each of the character string data, and storing the character string data according to a tree-shaped data structure; extracting the text from the tree-shaped data structure and gathering all the text. According to the method and device for rapidly extracting the text from the Word document, an ms-com interface is combined with character string processing, a series of parsing rules is worked out, the advantage that the ms-com interface is convenient to operate is used, the defect that efficiency is low because the ms-com interface is called repeatedly is overcome, and all the text in the document can be extracted rapidly. The method and device for rapidly extracting the text from the Word document can be combined with an electronic signature or other application programs related to document processing, and have high use value.

Description

technical field [0001] The invention relates to the fields of document processing, information security, etc., and in particular to a method and device for quickly extracting text formats from Word documents in the application of electronic signatures. Background technique [0002] With the development of technology, more and more enterprises, institutions and state agencies have gradually adopted electronic office, which has greatly improved work efficiency. Followed by the security issues brought about by the electronic office. Since electronic documents are easy to be copied or tampered with, issues such as whether the issued electronic documents have been modified or issued by the issuer follow. The emergence of electronic signature products solves the above problems and provides a technical basis for the security requirements of electronic documents. [0003] Microsoft Word is a document processing application program produced by Microsoft Corporation, and Word docume...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/60
CPCG06F40/151
Inventor 王申金端峰郭向国
Owner AEROSPACE INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products