Table data analysis method and device based on PDF file

A technology of tabular data and analytical methods, which is applied in the field of data processing and can solve the problems of poor accuracy of analytical results and so on.

Pending Publication Date: 2019-12-13
BEIJING GRIDSUM TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above problems, the present invention provides a form data analysis method and device based on a PDF file, the main purpose of which is to s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Table data analysis method and device based on PDF file
  • Table data analysis method and device based on PDF file
  • Table data analysis method and device based on PDF file

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0084] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0085] In order to improve the accuracy of form data analysis based on PDF files, an embodiment of the present invention provides a form data analysis method based on PDF files, such as figure 1 As shown, the method includes:

[0086] 101. Obtain the lines in the page and the attribute information of the lines.

[0087] Usually, when it is necessary to parse out the data and tables in the PDF file, the PDF file can be initially parsed accordin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a table data analysis method and device based on a PDF file, relates to the technical field of data processing, and mainly aims to improve the accuracy of a table data analysisresult in the PDF file. The method comprises the steps of obtaining lines and attribute information of the lines in a page, wherein the lines comprise transverse lines and vertical lines, and the attribute information of the lines comprises position information and length information; determining lines in the same table according to the attribute information of the lines, and marking the lines asgrouping lines; determining a table boundary corresponding to the grouped lines according to the attribute information of the lines in the grouped lines; according to the attribute information of themultiple lines in the grouping lines corresponding to the table, combining cells meeting preset conditions in the table, adding data information in the page into the corresponding cells in the table,wherein the table is composed of the cells, and the cells are composed of the lines. The method and the device are used for analyzing the table data in the PDF file.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a form data analysis method and device based on PDF files. Background technique [0002] With the continuous development of technology, the information on the network has exploded. In the field of data processing, since PDF (Portable Document Format, portable document format, PDF for short) files can be used in multiple systems, the use of such files in the field of data processing is becoming more and more frequent. Therefore, how to extract data from PDF files, especially tabular data, has been paid more and more attention by those skilled in the art. [0003] At present, in the field of data processing, when it is necessary to parse and extract form data in a PDF file, it is usually parsed from the PDF file to be parsed to get each constituent element and its related attribute information on each page, and then according to the above The corresponding html (H...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/24
Inventor 袁芳婷
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products