A general OCR training data generation system and method based on machine learning
A training data and machine learning technology, applied in the field of text recognition, can solve problems such as blurred text, poor contrast between text and background, poor recognition effect, etc., and achieve the effect of increasing the fitting ability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0043] Such as figure 1 Shown, a kind of training data generation method based on general OCR of machine learning, it comprises the following steps:
[0044] Text information generation: Randomly extract 5-10 words from the corpus as text information;
[0045] Font information generation: Randomly select fonts from the font library to generate font information;
[0046] Selection and size processing of the background image: randomly extract the background image from the image library, and crop the image according to the text information generated by the font information;
[0047] Text color selection: Perform a clustering algorithm analysis on the pixel RGB values of the image background to find the cluster center, then randomly select 500 colors from the text color library, and calculate the distance from each color to the RGB value of the background color value cluster center, Randomly select a color from the 200 colors with the furthest distance as the text color;
...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com