Method and system for detecting Chinese keywords in document images based on word matching

A technology of keyword detection and document image, which is applied in the field of text image recognition, can solve the problems of insufficient accuracy and robustness of Chinese keyword recognition in the diversity of Chinese characters, unstable image quality of document images, etc., to reduce the risk of omission, The effect of improving accuracy and improving integrity

Active Publication Date: 2021-08-10
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the above-mentioned problems in the prior art, that is, in order to solve the problem of insufficient accuracy and robustness of Chinese keyword recognition caused by the instability of document image image quality and the diversity of Chinese character arrangements, the first aspect of the present invention proposes A method for detecting Chinese keywords in document images based on word matching is proposed, the method includes the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for detecting Chinese keywords in document images based on word matching
  • Method and system for detecting Chinese keywords in document images based on word matching
  • Method and system for detecting Chinese keywords in document images based on word matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, rather than Full examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0055] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of text image recognition, and specifically relates to a method and system for detecting Chinese keywords in document images based on word matching, aiming to solve the problem of Chinese keyword recognition caused by unstable image quality of document images and diversity of Chinese character arrangement For the problem of insufficient accuracy and robustness, the method of the present invention includes: performing binarization processing on the document image to obtain the first image; performing character detection to obtain the first candidate character set; filtering the first candidate character set to obtain the second candidate character set character set, the first noise candidate character set; filter characters from the first noise candidate character set and add to the second candidate character set to obtain the third candidate character set; perform candidate character combination to obtain the first candidate word set; carry out missing character The secondary detection obtains the second set of candidate words; based on the cost function, the final keyword detection result is selected. The invention improves the accuracy of document keyword recognition and has high robustness.

Description

technical field [0001] The invention belongs to the technical field of text image recognition, and in particular relates to a method and a system for detecting Chinese keywords in document images based on word matching. Background technique [0002] The development of science and technology has made the way of information processing advance by leaps and bounds. In order to realize the processing of information editing, searching and data analysis, it is of great significance to quickly input the text information of paper materials into the computer, and OCR (Optical Character Recognition) technology was born from this. Document images widely exist in various fields such as transportation, finance, logistics, taxation, and administrative management. With the rapid popularization of smart terminals, automatic document recognition technology has great economic benefits and extensive social value. [0003] However, it is difficult for general OCR technology to provide structured...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/34G06K9/46
CPCG06V30/40G06V30/153G06V10/44
Inventor 王春恒贾馥溪赵晋媛肖柏华
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products