Visual question and answer enhancement method based on graph convolution

A vision and convolution technology, applied in the fields of computer vision and natural language, can solve problems such as not being able to explore high-level semantics well, and achieve the effect of improving accuracy

Active Publication Date: 2019-11-01
HANGZHOU DIANZI UNIV
View PDF2 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to use the relationship between the GCN network and the objects in the picture to solve the problem that the visual question answering cannot explore the high-level sema

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual question and answer enhancement method based on graph convolution
  • Visual question and answer enhancement method based on graph convolution
  • Visual question and answer enhancement method based on graph convolution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0015] The visual question answering enhancement method based on graph convolution proposed by the present invention, such as figure 1 As shown, the first step of our model is to first extract features, use GRU to obtain the feature representation of the problem, and use the output of the bottom-up attention model extracted from Faster R-CNN as the feature representation of the image;

[0016] Then the graph learner learns the adjacency matrix of the image objects based on the question, and adds the relations between the objects detected by the relational feature detector. Finally we process graph features and combine them with questions to form a multi-class classification to predict the correct answer.

[0017] The specific imp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a visual question and answer enhancement method based on graph convolution. The method comprises the following steps: the step 1, respectively extracting feature representations of a picture and a problem; the step 2, extracting a relationship between targets in a picture generated based on the problem; and the step 3, generating the picture with the problem information into graph, selecting a most relevant target for each vertex, generating a new feature representation for each vertex, and performing maximum pooling and classification on the graph. According to the method, high-level semantics are explored by utilizing the relationship between the GCN network and the object in the picture, and the method has great significance for a visual question-answering technology.

Description

technical field [0001] The invention belongs to the technical fields of computer vision and natural language. In particular, the invention relates to a method for enhancing visual question answering based on graph convolution. [0002] technical background [0003] Visual Question Answering (VQA) is an emerging topic that has attracted much attention in recent years. It combines the fields of computer vision and natural language processing (NLP) and requires us to have a good understanding of both. VQA systems take images and free-form natural language questions as input and generate natural language answers as output. Most VQA methods treat the task as a classification task and extract image and question features separately. Afterwards, they explored the problem of multimodal feature fusion of image representations learned from deep convolutional neural networks (CNNs) and from time-series models such as long short-term memory (LSTM) and gated recurrent neural networks (GR...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/583G06F16/9032G06K9/46G06N3/04G06N3/08
CPCG06F16/5854G06F16/90332G06N3/08G06V10/422G06N3/048G06N3/045
Inventor 颜成钢俞灵慧孙垚棋张继勇张勇东
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products