Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Scene text detection method based on end-to-end full convolutional neural network

A convolutional neural network and text detection technology, applied in the field of scene text detection based on end-to-end full convolutional neural network, can solve the problem of inability to accurately express the geometric characteristics of text, and achieve the effect of good application value

Active Publication Date: 2018-07-17
ZHEJIANG UNIV
View PDF4 Cites 38 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional methods generally use a fixed receptive field to extract the feature expression of the text and ignore the diversification of the target space structure of the text. Although these methods have certain innovations, they cannot accurately express the geometric characteristics of the text, which is very important in this task. important

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scene text detection method based on end-to-end full convolutional neural network
  • Scene text detection method based on end-to-end full convolutional neural network
  • Scene text detection method based on end-to-end full convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0062] The implementation method of this embodiment is as described above, and the specific steps will not be described in detail. The following only shows the effect of the case data. The present invention is implemented on two data sets with ground-truth labels, namely:

[0063] MSRA-TD500 dataset: This dataset contains 300 training images and 200 testing images.

[0064] ICDAR 2015 dataset: This dataset contains 1000 training images and 500 testing images.

[0065] In this embodiment, experiments are carried out on each data set, and the images in the data set are for example figure 2 shown.

[0066] The main process of text detection is as follows:

[0067] 1) Extract the multi-scale feature map of the image through the basic full convolutional network;

[0068] 2) Fusion of feature maps on three scales to obtain initial features;

[0069] 3) Use a layer of convolution operation to predict the affine transformation matrix of each sample point on the feature map, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a scene text detection method based on an end-to-end full convolutional neural network, which is used for the problem of finding a multi-directional text position in animage of a natural scene. The method specifically comprises the following steps: obtaining a plurality of image data sets for training scene text detection, and defining an algorithm target; carryingout feature learning on the image by using a full convolution feature extraction network; predicting an affine transformation matrix in an instance level for each sample point on the feature map, andcarrying out feature expression on the text according to the predicted affine transformation deformation sampling grid; classifying feature vectors of a candidate text, and carrying out coordinate regression and affine transformation regression to jointly optimize the model; using the learning framework to detect the precise position of the text; and carrying out non-maximum suppression on the bounding box set output by the network to obtain a final text detection result. The method disclosed by the present invention is used for scene text detection of real image data, and has a better effectand robustness for multi-directional, multi-scale, multi-lingual, shape distortion and other complicated situations.

Description

technical field [0001] The invention belongs to the field of computer vision, and in particular relates to a scene text detection method based on an end-to-end full convolutional neural network. Background technique [0002] Scene text detection is defined as the problem of finding multi-directional, multi-scale, multi-lingual text region locations in natural scene images. In recent years, it has been widely used in computer vision tasks such as scene understanding and image retrieval. There are two key points in this task: the first is how to well model multi-directional and severely distorted text objects to generate effective feature expressions; the second is how to use an end-to-end network to directly output detection results. For the first point, the present invention believes that the key to feature expression of scene text is to accurately model its spatial geometric characteristics, and use affine transformation to encode its spatial structure to produce a more ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/04G06F17/30
CPCG06F16/355G06N3/045
Inventor 李玺王芳芳赵黎明
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products