Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Digital reconstruction system and method forprinted text layout

A printing and layout technology, applied in neural learning methods, electrical digital data processing, character recognition, etc., can solve problems such as digital reconstruction and the inability to discover the layout and image structure of ordinary printed text

Pending Publication Date: 2022-02-01
PEKING UNIV
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] According to the results of current OCR technology and layout analysis, OCR and its application system can recognize and reconstruct text layouts with fixed structures (such as invoices, certificates, etc.), or only recognize or extract text, but cannot Printed text layout images for fully automated structure discovery and holistic digital reconstruction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Digital reconstruction system and method forprinted text layout
  • Digital reconstruction system and method forprinted text layout
  • Digital reconstruction system and method forprinted text layout

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0060] Ordinary printed text layouts include text, tables, formulas, illustrations and other elements, whose positions are uncertain and in various forms. Currently, there is no system capable of digitally reconstructing text layout images while maintaining structure and content.

[0061] The invention adopts machine learning and model recognition methods to establish a full-automatic digital reconstruction system for common printed text layouts. The present invention applies the semantic segmentation technology to the structural analysis and mining of printed layout images to form blocks of text, tables, formulas, and illustrations, and then identifies and reconstructs the text, tables, formulas, and illustrations for these semantic blocks, and finally These recognition results are assembled according to their position information to obtain HTML file...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a digital reconstruction system and method for a printed text layout. The system comprises: a layout semantic segmentation module for performing semantic structure analysis on an input text layout image, segmenting the input text layout image into a plurality of semantic blocks according to different semantic types, and realizing segmentation and positioning of different semantic blocks, wherein the semantic blocks include text blocks, table blocks, formula blocks and illustration blocks; an OCR module used for identifying and reconstructing texts in the text blocks or the table blocks; a formula identification module used for identifying formulas in the formula blocks or the table blocks and carrying out formula identification and reconstruction; a table recognition module used for performing table structure and content recognition and reconstruction on the table blocks; and an assembling module used for assembling and synthesizing recognition and reconstruction results of the semantic blocks according to the position structure information of the semantic blocks, and outputting a complete text layout in an HTML (Hypertext Markup Language) format to realize digital reconstruction of a text layout image.

Description

technical field [0001] The invention relates to a digital reconstruction system and method for printed text layout. Background technique [0002] With the rapid development of big data and artificial intelligence technology, large quantities of printed text materials need to be digitized in order to establish data sets for retrieval systems and machine learning. However, there is no fully automatic method and system for digitizing text layout images in the prior art, and only manual or semi-automatic manual operations can be performed. [0003] The content understanding and recognition of text layout images is the data source of many artificial intelligence technologies, and it is also the only way for the digital preservation of documents and books, and has a wide range of application markets. There are already a large number of open source or paid OCR (Optical Character Recognition, optical character recognition) text recognition systems in the prior art. These systems c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06V30/412G06V30/413G06V30/414G06V20/62G06V10/28G06V10/764G06V30/19G06V30/10G06V10/82G06K9/62G06N3/04G06N3/08G06F40/151
CPCG06N3/08G06F40/151G06N3/045G06F18/241
Inventor 马尽文
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products