Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Visual question and answer method of original feature injection network based on composite attention

A technology of original features and attention, applied in the field of visual question answering, can solve the problems of forgetting the edge information of the original image, ignoring the autocorrelation information of the image area, etc.

Active Publication Date: 2021-06-04
CHINA UNIV OF PETROLEUM (EAST CHINA)
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing attention models mainly consider the possible interactions between image regions and interrogative words, while ignoring the autocorrelation information of image regions themselves.
In addition, some network structures are iterative in multiple layers, usually making some valuable but unattended original image edge information completely forgotten after multiple bilateral co-attention operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual question and answer method of original feature injection network based on composite attention
  • Visual question and answer method of original feature injection network based on composite attention
  • Visual question and answer method of original feature injection network based on composite attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The accompanying drawings are for illustrative purposes only and should not be construed as limiting the patent.

[0058] The present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments.

[0059] figure 1 Schematic diagram of the architecture for the original feature injection network based on composite attention. Such as figure 1 As shown, the whole visual question answering framework is mainly composed of two parts: compound attention mechanism and original feature injection module.

[0060] figure 2 It is a schematic diagram of the sensory feature enhancement module. Such as figure 2 As shown, input a feature F ∈ R d×K , generate F through three 1*1 convolution kernels respectively q , F k , F v .

[0061] f q =W q F, F k =W k F, F v =W v f

[0062] (1)

[0063] in Is the weight matrix of the 1*1 convolution kernel, H=2048.

[0064] by F q , F k Calculate the attention F of F A .

[0065]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a visual question-answering method of an original feature injection network based on composite attention. In the conventional method, an attention mechanism and dense iterative operation are mainly adopted to carry out fine-grained matching. However, these methods cause autocorrelation information of image regions to be ignored, which will result in overall semantic understanding deviations. In addition, after multiple bilateral common attention operations, some valuable but unattended edge information of the image is often neglected completely. According to the invention, the original feature injection network based on composite attention is proposed for the first time to research the corresponding relation between the image and the problem. A region strengthening network with composite attention is designed, and more complete visual semantics are mined and understanding deviation is avoided by considering the relationship between regions and utilizing bilateral information and autocorrelation. And an original feature injection module is provided, and valuable but unconcerned edge information of the image is recovered. According to the method, a large number of experiments are carried out on VQA2.0 to prove the effectiveness of the proposed model.

Description

technical field [0001] The invention belongs to a visual question answering method, and relates to the technical fields of computer vision and natural language processing. Background technique [0002] Visual question answering is formulated in most studies as a classification problem, with images and questions as input, and answers as output categories (due to the limited number of possible answers). Since the visual question answering task was proposed after deep learning methods became widely popularized, almost all current visual question answering solutions use CNN to model the image input and RNN to model the question. Attention mechanisms have been extensively studied in visual question answering. This includes visual attention, which deals with where to look, and problematic attention, which deals with where to read. Since images and questions are two different modalities, it is trivial to jointly embed the two modalities to describe image / question pairs uniformly....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/532G06F16/538G06F16/583G06K9/46G06N20/00G06F40/30
CPCG06F16/532G06F16/538G06F16/583G06N20/00G06F40/30G06V10/44
Inventor 吴春雷路静王雷全吴杰段海龙
Owner CHINA UNIV OF PETROLEUM (EAST CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products