A document analysis method and device
An analysis method and document technology, which is applied in the directions of instruments, calculations, electrical digital data processing, etc., can solve the problem of low accuracy of document analysis, and achieve the effect of reducing the workload of manual maintenance and achieving high accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0054] The present invention proposes a document parsing method for parsing an original document including several matching relationships. The matching relationship here refers to the matching content corresponding to a certain matching item in a certain part of the original document. See figure 1 It is a document parsing flow chart of Embodiment 1 of the present invention, including the following steps:
[0055] S1. Extracting text content from the original document;
[0056] Embodiments of the present invention do not limit the format of the original document, which may be any one of doc, docx, wps, txt, mht, html, htm, pdf or other common format types, and does not perform any modification on the format of the extracted text content. There is no limit, it can be any one of html format content, plain text content or base64 encoded content or other common format types.
[0057] S2. Segment the text content according to the preset segment identifier, put the segmented text co...
Embodiment 2
[0067] The present invention also proposes a resume document parsing method for parsing original documents including several matching relationships. The matching relationship here means that a certain part of the content in the original document is the matching content corresponding to a certain matching item, such as "" in the resume document Name", "gender", "place of residence" are matching items, and "Zhang San", "male", and "Shenzhen" are the matching content corresponding to the above matching items, including the following steps:
[0068] S1. Extracting text content from the original document;
[0069] Embodiments of the present invention do not limit the format of the original document, which may be any one of doc, docx, wps, txt, mht, html, htm, pdf or other common format types, and does not perform any modification on the format of the extracted text content. There is no limit, it can be any one of html format content, plain text content or base64 encoded content or ...
Embodiment 3
[0120] The present invention proposes a document parsing device for parsing an original document with a specific format, where the specific format refers to a matching item and matching content corresponding to the matching item, including a content extraction module, a content stacking module, and a content analysis module; among them,
[0121] A content extraction module for extracting text content from the original document;
[0122] The content stacking module is used to segment the text content according to the segment identifier, put the segmented text content into the original content stack, and store a piece of content at a stack point;
[0123] The content parsing module is used to sequentially take out the stack point content of the original content stack as the current stack point content; if the current stack point content satisfies the matching condition of a keyword corresponding to a matching item, the current stack point is called the current matching stack poi...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com