Image target detection method based on natural language semantics

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A target detection and natural language technology, applied in the field of image target detection based on natural language semantics, can solve problems such as not supporting end-to-end training, efficiency and accuracy need to be improved, and can not be combined with natural language target recognition to improve detection efficiency , the effect of improving the accuracy

Inactive Publication Date: 2017-06-13

TSINGHUA UNIV

View PDF2 Cites 46 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The image target recognition task includes four basic sub-tasks, including the generation of target candidate sets, feature extraction of candidate targets, classification of candidate targets, and position correction of candidate targets. The Faster-RCNN model is a typical representative of traditional target recognition methods. , this method uses a deep convolutional neural network combined with an RPN network to solve these four subtasks. The RPN network used to generate target candidate sets is essentially a deep convolutional neural network, so the entire model can be used in an end-to-end manner. Compared with the previous target recognition method that solves these subtasks in different ways, Faster-RCNN has greatly improved the efficiency of training and the accuracy of recognition, but the recognition object of the Faster-RCNN model It is still only a pre-defined class of objects, and it cannot be combined with natural language for target recognition

Existing methods for target detection combined with natural language, such as the SCRC model (spatial context recurrent convnet), mostly use non-deep learning methods to generate target candidate sets, such as selective search, etc., and then use convolutional neural networks The long-short-term memory model (LSTM) extracts image and natural language features for target detection. The entire framework does not support end-to-end training, and the efficiency and accuracy of detection need to be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0018] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0019] The present invention as figure 1 shown, including the following steps:

[0020] 1. Train the shared convolutional neural network and RPN network parts of the Faster-RCNN module on the ImageNet dataset.

[0021] 2. Use the image data with target natural semantic annotations given in the ReferIt dataset to train the LSTM model.

[0022] 3. For the trained model, given an image and a natural language phrase description of the target to be queried, the corresponding target is detected from the image.

[0023] Specifically, for the input image, first use the shared convolutional neural network to extract the feature map of the image. The convolutional neural network consists of a series of convolution, activation function activation and pooling opera...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image target detection method based on natural language semantics. Input of the method includes natural language phrase description of an image to be detected and a target to be detected; the image target detection method includes the steps that a global feature graph of the image to be detected is calculated through a convolutional neural network, then the global feature graph is input into an RPN network to calculate an alternative target set, a regional feature graph of an alternative target is extracted from the calculated alternative target set through an RoI pooling layer, the global feature graph of the image, the regional feature graph of the alternative target region and position information are used as context and combined with query phrase word vectors to represent input of an LSTM module and calculate the conditional probability of query phrases generated in the target region, and a detection result is returned according to the conditional probability. The natural language processing module LSTM model is fused into the Faster-RCNN frame, the shared computation characteristic of the Faster-RCNN frame and the image characteristics extraction advantage of the convolutional network are used for improving target detection efficiency and accuracy based on natural language semantics.

Description

technical field [0001] The invention belongs to the technical field of image analysis and recognition, in particular to an image target detection method based on natural language semantics. Background technique [0002] Image target recognition is one of the core tasks in the field of computer vision research. In recent years, with the successful application of deep learning in the image field, the research on target recognition has also made breakthrough progress, and the detection accuracy has achieved a lot compared with traditional methods. It has been greatly improved, and has been commercially applied to people's lives in some fields, such as Alibaba's facial recognition payment, intelligent traffic road target recognition, etc. However, the recognition results of traditional target recognition methods are often some pre-defined objects of a certain category, such as faces, cars, etc., and the content contained in an image is far more than some independent objects. Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/46G06N3/08

CPCG06N3/084G06V10/424G06V2201/07

Inventor 覃征叶树雄王国龙徐凯平黄凯李志鹏

Owner TSINGHUA UNIV

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image target detection method based on natural language semantics

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology