Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image-text retrieval system and method based on multi-angle self-attention mechanism

A retrieval system and attention technology, applied in the field of cross-modal retrieval, can solve problems such as insufficient features and achieve the effect of performance improvement

Pending Publication Date: 2019-07-09
FUDAN UNIV
View PDF5 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides an image-text retrieval system based on multi-stage training and multi-angle self-attention mechanism in order to overcome the shortcomings of the features extracted by the existing CNN+RNN model in the image-text retrieval technology that are not detailed enough and the optimization method. method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image-text retrieval system and method based on multi-angle self-attention mechanism
  • Image-text retrieval system and method based on multi-angle self-attention mechanism
  • Image-text retrieval system and method based on multi-angle self-attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038]It can be seen from the background technology that the instance features extracted by the existing image-text retrieval methods are relatively rough, which cannot reflect the key semantic information well, and there is room for improvement in the optimization method. The applicant conducts research on the above-mentioned problems and believes that the key information can be extracted from different angles. For example, given an image, different people may pay attention to different content, such as dogs or grass, and the same is true for text. To this end, the self-attention mechanism is used to extract the key information from different angles, and at the same time, further research is done on the optimization of difficult examples. It is found that the overall optimization and then the optimization of difficult examples can make the proposed framework more effective. Good optimization, learn better network parameters.

[0039] In this embodiment, image region features...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of cross-modal retrieval, and particularly relates to an image-text retrieval system and method based on a multi-angle self-attention mechanism. The systemcomprises a deep convolutional network, a bidirectional recurrent neural network, an image, a text self-attention network, a multi-modal space mapping network and a multi-stage training module. The deep convolutional network is used for acquiring an embedding vector of an image region feature in an image embedding space. The bidirectional recurrent neural network is used for acquiring an embedding vector of a word feature in a text space, and the two vectors are respectively input to the image and the text self-attention network. The image and text self-attention network is used for acquiringan embedded representation of an image key area and an embedded representation of key words in sentences. The multi-modal space mapping network is used for acquiring the embedded representation of the image text in the multi-modal space. The multi-stage training module is used for learning parameters in the network. A good result is obtained on a common data set Flickr30k and an MSCOCO, and the performance is greatly improved.

Description

technical field [0001] The invention belongs to the technical field of cross-modal retrieval, and in particular relates to an image-text retrieval system and method based on a multi-angle self-attention mechanism. Background technique [0002] In various multimodal information processing tasks, the research on cross-modal analysis and processing between images and texts is a very important one among many research directions. Specifically, it includes tasks such as automatic generation of image descriptions and mutual search of images and texts. Here we focus on cross-modal retrieval, that is, image-text mutual search tasks. Image-text mutual search is to input an image and need to find K sentences with the most similar semantics. Or enter a sentence and find the K images most semantically related to it. Image-text mutual search is a very challenging task, because it involves two very important branch research fields of pattern recognition, namely computer vision and natur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/53G06F16/535G06F16/33G06N3/04
CPCG06N3/045
Inventor 张玥杰李文杰张涛
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products