Image description generation method fusing visual common sense and enhancing multilayer global features

A global feature and image description technology, applied in the field of computer vision, can solve problems such as insufficient mining of visual semantic relations, redundant information of multi-layer global features, etc.

Active Publication Date: 2021-09-10
CHONGQING NORMAL UNIVERSITY
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide an image description generation method that integrates visual common sense and enhances multi-layer global features. There is a technical problem of redundant information in the extracted multi-layer global features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description generation method fusing visual common sense and enhancing multilayer global features
  • Image description generation method fusing visual common sense and enhancing multilayer global features
  • Image description generation method fusing visual common sense and enhancing multilayer global features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0034] In describing the present invention, it should be understood that the terms "length", "width", "upper", "lower", "front", "rear", "left", "right", "vertical", The orientation or positional relationship indicated by "horizontal", "top", "bottom", "inner", "outer", etc. are based on the orientation or positional relationship shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than Nothing indicating or implying that a referenced device or element...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computer vision, and particularly discloses an image description generation method fusing visual common sense and enhancing multi-layer global features, and the method comprises the steps of fusing visual common sense features extracted by a VCR-CNN and local features extracted by a Faster R-CNN, and obtaining fusion features; mining a visual semantic relationship between the objects by adopting an X linear attention mechanism to obtain high-level local features and multi-level global features; enhancing the multi-layer global features by adopting an AoA mechanism, and performing linear mapping to obtain fused global features; screening the fusion global features by using long and short term memory of visual selection, weighing related information and adaptively selecting for high-level local features by using an X linear attention mechanism, and finally, using a semantic decoding gated linear unit to generate an output word sequence. The problems that an image description generation model of local features is insufficient in visual semantic relation mining, and redundant information exists in multi-layer global features extracted by an attention mechanism are solved.

Description

technical field [0001] The invention relates to the technical field of computer vision, in particular to an image description generation method that integrates common sense of vision and enhances multi-layer global features. Background technique [0002] Image description generation is one of the advanced tasks in the field of computer vision, and its purpose is to enable the computer to automatically generate a natural language description of a given image. Compared with low-level and mid-level tasks such as image classification and target detection, it not only needs to recognize the salient objects and their attributes in the image, understand the relationship between them, but also express them in accurate and fluent natural language. A very challenging task. When humans acquire information, the visual system will actively focus on the target area of ​​interest and extract relevant important information. Inspired by the human visual system, attention mechanisms have be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/46G06N3/04G06N3/08
CPCG06N3/08G06N3/044G06N3/045G06F18/2411G06F18/253
Inventor 杨有方小龙尚晋胡峻滔姚露边雅琳
Owner CHONGQING NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products