Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Fine-grained visual question-answering method combined with multi-view attention mechanism

An attention, fine-grained technology, applied in neural learning methods, computer components, biological neural network models, etc., to achieve high efficiency, improve accuracy and comprehensiveness, and high accuracy

Active Publication Date: 2020-01-21
HUAQIAO UNIVERSITY
View PDF3 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to overcome the deficiencies of the prior art, and provide a fine-grained visual question answering method combined with a multi-view attention mechanism, which can effectively improve the extraction of visual semantic information accuracy and comprehensiveness, and reduce the influence of redundant data and noise data, thereby improving the fine-grained recognition ability of the visual question answering system and the judgment of complex problems, and improving the accuracy of the visual question answering system and the reliability of the model to a certain extent explanatory

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fine-grained visual question-answering method combined with multi-view attention mechanism
  • Fine-grained visual question-answering method combined with multi-view attention mechanism
  • Fine-grained visual question-answering method combined with multi-view attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0046] In order to solve the shortcomings of the prior art, the present invention provides a fine-grained visual question answering method combined with a multi-view attention mechanism. Visual question answering can be viewed as a multi-task classification problem, where each answer can be viewed as a classification category. In a general visual question answering system, the One-Hot method is used to encode the answers to obtain the One-Hot vectors corresponding to each answer to form an answer vector table. One-Hot encoding is the representation of categorical variables as binary vectors. This first requires mapping categorical values ​​to integer values, and then each integer value is represented as a binary vector, which is zero-valued except for the index of the integer, which is labeled as 1.

[0047] like figure 1 As shown, the fine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a fine-grained visual question-answering method combined with a multi-view attention mechanism. The guiding effect of specific semantics of the problem is fully considered. Amulti-view attention model is provided. A plurality of salient target areas related to a current task target (problem) can be effectively selected From multiple perspectives, region information related to answers is acquired in images and question texts, regional significance features are extracted in the images under the guidance of question semantics. The characteristic expression of finer granularity is realized; the multi-view attention model has the advantages that the multi-view attention model is constructed, the situation that a plurality of important semantic expression areas exist in the image is expressed, the depicting capacity is high, the effectiveness and comprehensiveness of the multi-view attention model are improved, and therefore the semantic relevance of image area significant features and question features is effectively enhanced, and the accuracy and comprehensiveness of semantic understanding of visual questions and answers are improved. The visual question-answering task is carried out by adopting the method, the steps are simple, the efficiency is high, the accuracy is high, the method can be completely used for business, and the market prospect is good.

Description

technical field [0001] The invention relates to the technical fields of computer vision and natural language processing, and more specifically, relates to a fine-grained visual question answering method combined with a multi-view attention mechanism. Background technique [0002] With the rapid development of computer vision and natural language processing, visual question answering system has become one of the more and more popular research fields of artificial intelligence. Visual question answering technology is an emerging topic. Its task is to combine the two disciplines of computer vision and natural language processing, take a given image and natural language questions related to the image as input, and generate a natural language answer as output. Visual question answering is a key application direction in the field of artificial intelligence. By simulating real-world scenarios, visual question answering can help users with visual impairments perform real-time human-...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/32G06F16/332G06F16/58G06F16/583G06N3/04G06N3/08
CPCG06F16/3329G06F16/583G06F16/5866G06N3/08G06V20/10G06V10/25G06N3/048G06N3/045
Inventor 彭淑娟李磊柳欣范文涛钟必能杜吉祥
Owner HUAQIAO UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products