Visual question and answer method of original feature injection network based on composite attention

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of original features and attention, applied in the field of visual question answering, can solve the problems of forgetting the edge information of the original image, ignoring the autocorrelation information of the image area, etc.

Active Publication Date: 2021-06-04

CHINA UNIV OF PETROLEUM (EAST CHINA)

View PDF8 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, existing attention models mainly consider the possible interactions between image regions and interrogative words, while ignoring the autocorrelation information of image regions themselves.

In addition, some network structures are iterative in multiple layers, usually making some valuable but unattended original image edge information completely forgotten after multiple bilateral co-attention operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0057] The accompanying drawings are for illustrative purposes only and should not be construed as limiting the patent.

[0058] The present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments.

[0059] figure 1 Schematic diagram of the architecture for the original feature injection network based on composite attention. Such as figure 1 As shown, the whole visual question answering framework is mainly composed of two parts: compound attention mechanism and original feature injection module.

[0060] figure 2 It is a schematic diagram of the sensory feature enhancement module. Such as figure 2 As shown, input a feature F ∈ R d×K , generate F through three 1*1 convolution kernels respectively q , F k , F v .

[0061] f q =W q F, F k =W k F, F v =W v f

[0062] (1)

[0063] in Is the weight matrix of the 1*1 convolution kernel, H=2048.

[0064] by F q , F k Calculate the attention F of F A .

[0065]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual question-answering method of an original feature injection network based on composite attention. In the conventional method, an attention mechanism and dense iterative operation are mainly adopted to carry out fine-grained matching. However, these methods cause autocorrelation information of image regions to be ignored, which will result in overall semantic understanding deviations. In addition, after multiple bilateral common attention operations, some valuable but unattended edge information of the image is often neglected completely. According to the invention, the original feature injection network based on composite attention is proposed for the first time to research the corresponding relation between the image and the problem. A region strengthening network with composite attention is designed, and more complete visual semantics are mined and understanding deviation is avoided by considering the relationship between regions and utilizing bilateral information and autocorrelation. And an original feature injection module is provided, and valuable but unconcerned edge information of the image is recovered. According to the method, a large number of experiments are carried out on VQA2.0 to prove the effectiveness of the proposed model.

Description

technical field [0001] The invention belongs to a visual question answering method, and relates to the technical fields of computer vision and natural language processing. Background technique [0002] Visual question answering is formulated in most studies as a classification problem, with images and questions as input, and answers as output categories (due to the limited number of possible answers). Since the visual question answering task was proposed after deep learning methods became widely popularized, almost all current visual question answering solutions use CNN to model the image input and RNN to model the question. Attention mechanisms have been extensively studied in visual question answering. This includes visual attention, which deals with where to look, and problematic attention, which deals with where to read. Since images and questions are two different modalities, it is trivial to jointly embed the two modalities to describe image / question pairs uniformly....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/532G06F16/538G06F16/583G06K9/46G06N20/00G06F40/30

CPCG06F16/532G06F16/538G06F16/583G06N20/00G06F40/30G06V10/44

Inventor 吴春雷路静王雷全吴杰段海龙

Owner CHINA UNIV OF PETROLEUM (EAST CHINA)

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Visual question and answer method of original feature injection network based on composite attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology