Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Correction method and apparatus for OCR result

A recognition result and a technology to be recognized, applied in character and pattern recognition, instruments, calculations, etc., can solve the problems of high cost of manual proofreading and low degree of automatic recognition

Inactive Publication Date: 2017-09-29
BEIJING SINOVOICE TECH CO LTD
View PDF6 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method and device for correcting OCR recognition results to solve the problems of low degree of automatic recognition and high cost of manual correction in the prior art when manually correcting OCR recognition results with low recognition accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Correction method and apparatus for OCR result
  • Correction method and apparatus for OCR result
  • Correction method and apparatus for OCR result

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0023] refer to figure 1 , which shows a flow chart of the steps of an embodiment of a method for correcting an OCR recognition result of the present invention, which may specifically include the following steps:

[0024] Step 101, using OCR technology to recognize the characters to be recognized in the target area in the paper document, and obtain the initial recognition result string str;

[0025] Wherein, OCR technology can be used to recognize characters in a certain region (ie, target region) in which characters need to be recognized in a paper document (such as an invoice), so as to obtain the string str of the initial recognition result.

[0026] Step 102, according to the attribute of the character to be recognized in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a correction method and apparatus for an optical character recognition (OCR) result. The method comprises: with an OCR technique, a to-be-recognized character in a target region in a papery document is identified and an initial recognition result character string str is obtained; according to an attribute of the to-be-recognized character in the papery document and a preset set character range of the value of the attribute, a plurality of candidate character strings stri in the preset set character range are constructed to form a candidate character string set; a minimum editing distance dmin between the initial recognition result character string str and each candidate character string stri in the candidate character string set is calculated; on the basis of the minimum editing distances dmin, a similarity value Si between the initial recognition result character string str and each candidate character string stri is calculated; and a candidate character string strj corresponding to a maximum similarity Smax is outputted as a correction result of the initial recognition result character string str. Therefore, automatic correction of an OCR result can be realized; and the cost of human correction can be lowered.

Description

technical field [0001] The invention relates to the technical field of character proofreading, in particular to a method and device for correcting OCR recognition results. Background technique [0002] Optical Character Recognition (OCR) technology refers to electronic devices (such as scanners or digital cameras) that check characters printed on paper, determine their shapes by detecting dark and bright patterns, and then use character recognition methods to translate the shapes The process of converting text into computer text; that is, for printed characters, the text in the paper document is converted into an image file of black and white dot matrix by optical means, and the text in the image is converted into text format by recognition software for word processing The technology of further editing and processing by software. [0003] Because optical character recognition often needs to recognize many characters, even if the recognition accuracy of a single character is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/20G06K9/62
CPCG06V10/22G06F18/22
Inventor 李健徐亮伍更新张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products