Image formula Chinese document retrieval method based on content
An image format and document retrieval technology, which is applied in the field of information processing, can solve the problem of ineffective processing of character degraded image format documents, etc., and achieve the effect of simple retrieval method, low cost and fast speed
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach 1
[0024] Specific implementation mode one: according to the instructions attached figure 1 with 2 Specifically illustrate this embodiment, a kind of content-based image format Chinese document retrieval method of this embodiment, it comprises the following steps:
[0025] Step 1: Obtain the Chinese document in image format to be retrieved, and perform character segmentation for each Chinese document in image format, and then obtain a single character image in each Chinese document in image format ;
[0026] Step 2: According to the acquired single character image , extracting the character image feature vector of the character image;
[0027] Step 3: Based on the principle of local sensitive hash transformation, construct a hash function h, and extract the character image The character image feature vector correspondingly transforms into a pseudocode , and according to the pseudocode Establish a character indexing database, the pseudocode consists of L 16-bit intege...
specific Embodiment approach 2
[0040] Embodiment 2: This embodiment is a further description of Embodiment 1. In Embodiment 1, in step 3, the specific process of constructing the hash function h is as follows: first define the fixed-point set of the regular polyhedron in the m-dimensional space ,in, , and define the rotation matrix A, and then establish the hash function , is a unit vector, the hash function The mapped result set is .
specific Embodiment approach 3
[0041] Specific implementation mode three: this implementation mode is a further description of specific implementation mode one or two, in specific implementation mode one or two, in step three, the pseudo code 16-bit integer The range of the number L is 1-50.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com