Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Word bank-based OCR semantic correction method and system, medium, equipment and terminal

A correction method and technology of thesaurus, applied in the field of semantic network, can solve problems such as error correction of difficult to correct words, difficult to implement, and blind error correction efficiency without specificity, so as to achieve high-efficiency error correction and ensure the effect of semantic correctness

Pending Publication Date: 2021-11-26
深圳市网联安瑞网络科技有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] (1) The common word matching technology needs to match all the vocabulary of the recognized sentence, and the efficiency of blind error correction without targeting is low;
[0009] (2) The semantic correction method based on natural language processing technology is suitable for detecting wrong words, but it is difficult to predict correct words and prone to miscorrection;
[0010] (3) The semantic correction method based on natural language processing technology requires a large number of annotations and predictions, which is difficult to implement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word bank-based OCR semantic correction method and system, medium, equipment and terminal
  • Word bank-based OCR semantic correction method and system, medium, equipment and terminal
  • Word bank-based OCR semantic correction method and system, medium, equipment and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

[0052] The OCR semantic correction method based on the thesaurus provided by the disclosed embodiments of the present invention includes:

[0053] Use the confidence of the character recognition result (that is, the probability value of the softmax output when predicting which character, the same below) to assist in locating and identifying the character error position, and then combin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a word bank-based OCR semantic correction method and system, a medium, equipment and a terminal, and relates to the technical field of semantic networks. The method comprises: filtering out all the characters of which the character recognition confidence is smaller than a threshold value 0.4, and recording a result of the first five of each character; extracting suspicion words from the five results of the characters with the confidence coefficient smaller than a threshold value 0.4, and finally, correspondingly obtaining five groups of suspicion words, wherein each group of suspicion words can have one or more suspicion words; respectively matching the five groups of suspicion words and the word bank according to the sequence of confidence from high to low, respectively calculating matching distances, and evaluating the matching distances by adopting an editing distance; and outputting the result with the minimum matching distance as a result. The character recognition result top5 and the word bank are combined, the wrong characters are accurately positioned and recognized, efficient error correction is performed on the recognized wrong characters in a targeted manner, and the semantic correctness of the recognition result is ensured.

Description

technical field [0001] The invention belongs to the technical field of semantic network, and in particular relates to a thesaurus-based OCR semantic correction method, system, program storage medium for receiving user input, computer equipment, and information data processing terminal, which can be applied to artificial intelligence, deep learning, and computer image vision . Background technique [0002] OCR (Optical Character Recognition) technology has basically matured in the recognition of texts with better printing quality, and some software on the market can achieve a very high single-word recognition rate for these texts. However, for some texts with low printing quality or ambiguous handwriting, the single-character recognition rate of OCR will drop significantly, and post-processing technology must be relied on to improve the overall recognition rate of the full text. At present, there are two commonly used and effective methods in post-processing technology. One ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/335G06F40/30
CPCG06F16/3344G06F16/335G06F40/30
Inventor 廖伟石珺李志鹏郭认飞
Owner 深圳市网联安瑞网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products