The invention relates to a fine-grained visual question-answering method combined with a multi-view attention mechanism. The guiding effect of specific
semantics of the problem is fully considered. Amulti-view
attention model is provided. A plurality of salient target areas related to a current task target (problem) can be effectively selected From multiple perspectives, region information related to answers is acquired in images and question texts, regional significance features are extracted in the images under the guidance of question
semantics. The characteristic expression of finer
granularity is realized; the multi-view
attention model has the advantages that the multi-view
attention model is constructed, the situation that a plurality of important semantic expression areas exist in the image is expressed, the depicting capacity is high, the effectiveness and comprehensiveness of the multi-view attention model are improved, and therefore the
semantic relevance of
image area significant features and question features is effectively enhanced, and the accuracy and comprehensiveness of semantic understanding of visual
questions and answers are improved. The visual question-answering task is carried out by adopting the method, the steps are simple, the efficiency is high, the accuracy is high, the method can be completely used for business, and the market prospect is good.