Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese printing style formula identification method

A formula recognition and printing technology, applied in the field of recognition, can solve problems such as inability to recognize mathematical formulas

Inactive Publication Date: 2008-03-26
HARBIN ENG UNIV
View PDF0 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

So far, the pure one-dimensional character recognition technology has been quite mature, and there are relatively well-formed recognition systems (such as Ziguang, Hanwang, etc.), which have a high recognition rate, but these systems cannot recognize the mathematical formulas in the document
Therefore, this two-dimensional structural mathematical formula has become a bottleneck restricting the development of OCR technology.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese printing style formula identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention is described in more detail below in conjunction with accompanying drawing example:

[0054] The purpose of the present invention is to overcome the deficiencies of the existing OCR system technology, and provide a printed mathematical formula recognition technology supported by layout analysis and Chinese character recognition technology. It includes 3 modules of layout analysis, Chinese character recognition and mathematical formula recognition, among which layout analysis and Chinese character recognition are the pre-processing of mathematical formula recognition. It is with these two parts that the formula recognition module can correctly locate and recognize formulas. So the three of them are inseparable.

[0055] 1. Document layout analysis

[0056]Layout analysis is one of the pre-processing techniques for text recognition. It uses image processing, artificial intelligence and other technologies to complete the segmentation and attribute l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an identification method to identify Chinese printed formula, including analysis of typeface, character recognition and mathematical formula recognition three modules; typeface analysis module two-value pretreats BMP images, split out literal block, image block and list block using projection method combined with bottom-up typeface analysis algorithm, preserve image block and list block; Chinese character recognition module false merge rows against literal block, select segmentation parameters, extract characteristics and recognize Chinese characters, record the rejected results, combine adjacent rejected results in the same row in order to locate formula region; mathematical formula recognition is to extract,divide formula characters in the rejected character region, merger some characters and recognize; finally obtain the relationship between characters through structure analysis of formula characters, and output the final one-dimensional character strings. the identification effect of this invention is famous after test.

Description

(1) Technical field [0001] The invention relates to a recognition method, in particular to a method for recognizing the content of printed Chinese documents, especially for printed mathematical formulas. (2) Background technology [0002] In 1929, Tausheck obtained a patent for Optical Character Recognition (OCR). Because it is easy to be accepted and mastered by people, it has increasingly become the focus of people's research together with speech recognition and behavior recognition. After nearly a century of development, OCR has become one of the most active research contents in the field of pattern recognition today. So far, the pure one-dimensional character recognition technology has been quite mature, and there are relatively well-established recognition systems (such as Ziguang, Hanwang, etc.), which have a high recognition rate, but these systems cannot recognize mathematical formulas in documents. Therefore, this two-dimensional structural mathematical formula has...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/20G06K9/68
Inventor 王科俊李永华冯伟兴刘维平陈卉付斌唐墨
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products