Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An OCR font-based similar character recognition method

A text recognition and character recognition technology, applied in the computer field, can solve the problems of reduced recognition accuracy, inconsistent recognition results, and low precision, and achieves the effect of improving recognition accuracy, avoiding mutual interference, and improving recognition efficiency.

Active Publication Date: 2019-03-08
中电万维信息技术有限责任公司
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This technology has a relatively good recognition rate for general characters, but there are still certain technical difficulties in the field of Chinese characters with rich structures and fonts, especially for similar fonts, such as: (午, gan, gan), (run, bubble, cannon) ) and other characters have the problems of low recognition efficiency and low accuracy
In addition, the existing technology cannot judge characters with the same glyph and different fonts. It is very easy to make mistakes when recognizing the same glyph and different fonts. The results of repeated recognition are different. Sometimes manual intervention is required to correct errors, which greatly reduces the accuracy of recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An OCR font-based similar character recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] A kind of similar text recognition method based on OCR font, comprises the steps:

[0026] A. Original OCR image preprocessing

[0027] Perform text correction on slanted characters, remove noise in pictures, and convert image contrast and Gamma correction into grayscale images;

[0028] B. Image text detection

[0029] The character pixel feature information is extracted from the preprocessed grayscale image, and the CNN neural network is used to extract the character pixel feature information to convert it into a feature vector in the form of one-hot encoding, which is used as the basis for the character pixel feature information recognition of the character recognition module ;

[0030] C. Recognition calculation

[0031] Use different fonts of the standard font as training samples n, and each different font of the standard font is recorded as n 1 , n 2 ,,,, calculate the Euclidean distance D of each font of the training sample n1、 D. n2、、、, The character rec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the field of computer technology, in particular to the field of pattern recognition and depth learning, and more particularly to an OCR font-based similar character recognition method. By changing the traditional font recognition method, both character text and font can be recognized. By comparing multiple samples and adding threshold filter, not only the accuracyof text recognition is greatly improved, but also the character font can be recognized effectively. The method is especially suitable for similar font and similar font character recognition, to achieve the double accurate recognition of font and font. Each character is cut into 96*96 pixels in size by horizontal and vertical segmentation, the method facilitates the extraction of pixel characteristic information, avoids the mutual interference between adjacent characters, and effectively improves the recognition efficiency. The designer of the invention cuts each character into 96*96 pixels invarious pictures such as books, newspapers, clothes and screen shots to extract the character pixel characteristic information, and the extraction rate is close to 100%.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to the field of pattern recognition and deep learning, and more specifically to a method for recognizing similar characters based on OCR fonts. Background technique [0002] Optical Character Recognition (OCR) is a way to convert image files printed on paper into text files by combining optical technology and computer technology. OCR recognition can be used for bank notes, a large number of documents, archives, tax Automatic scanning and long-term storage of first-class notes. [0003] OCR recognition is usually measured by recognition rate, recognition speed, layout understanding and layout reconstruction degree. This technology has a relatively good recognition rate for general characters, but there are still certain technical difficulties in the field of Chinese characters with rich structures and fonts, especially for similar fonts, such as: (午, gan, gan), (run, bubbl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/20G06K9/34G06K9/62
CPCG06V10/22G06V10/267G06F18/22
Inventor 席敬焦勇伏虎
Owner 中电万维信息技术有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products