Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-angle and multi-mode fused image description generation method and system

An image description and multi-modal technology, applied in the field of image processing, can solve the problems of single angle of image description content, lack of content, and inability to fully describe image content, etc., to achieve the effect of eliminating redundancy and improving learning ability

Active Publication Date: 2019-11-15
QILU UNIV OF TECH
View PDF5 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional image description method has a single angle of image description content, lack of content, and cannot fully describe the content displayed in the image

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-angle and multi-mode fused image description generation method and system
  • Multi-angle and multi-mode fused image description generation method and system
  • Multi-angle and multi-mode fused image description generation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Such as figure 1 As shown, someone will see an adult wearing a blue shirt and a blue baseball cap, someone will see a child holding a doll, someone will see a red car next to an adult, someone will see a white car next to a red car, The scenes that people see are all the pictures shown on the image, but the viewing angles are different. figure 1 (a)-(d) are different objects identified from the figure respectively, for figure 1 A corresponding descriptive statement may include:

[0031] 1.a man in a blue shirt playing frisbee with a little boy in the park.

[0032] 2.a red car beside the man dressing a blue shirt in the park.

[0033] 3.a little boy holding a toy in the park.

[0034] 4.a white beside the tree in the park.

[0035] The purpose of this embodiment is to learn a complete image description from multiple perspectives by combining image and text modalities, so as to fully express the content contained in the image. Based on this, this embodiment disclos...

Embodiment 2

[0080] The purpose of this embodiment is to provide an image description generation system that integrates multiple angles and multiple modalities.

[0081] In order to achieve the above purpose, this embodiment provides a multi-angle and multi-modal image description generation system, including:

[0082] The visual feature extraction module receives the image to be described, extracts the global visual features and local visual features of the image and fuses them to obtain the fused visual features;

[0083] The sentence generation module adopts a single-layer long-short-term memory network, and takes the fusion visual features as input to obtain the first sentence image description;

[0084] The sentence regeneration module generates the first sentence semantic vector according to the first sentence image description; adopts the attention-based long-short-term memory network language generation model, and uses the local visual features and the first sentence semantic vecto...

Embodiment 3

[0086] The purpose of this embodiment is to provide an electronic device.

[0087] An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program, the following steps are implemented, including:

[0088] receiving the image to be described, extracting the global visual features and local visual features of the image and fusing them to obtain the fused visual features;

[0089] Using a single-layer long-short-term memory network, the fusion of visual features is used as input to obtain the first image description;

[0090] Generate the first sentence semantic vector according to the first sentence image description;

[0091] An attention-based long-short-term memory network language generation model is adopted, and the local visual features and the first sentence semantic vector are used as input to generate the next image description sentence, thereby obtaining a comple...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-angle and multi-mode fused image description generation method and system, and the method comprises the following steps: receiving a to-be-described image, extracting the global visual features and local visual features of the image, and carrying out the fusion of the global visual features and local visual features, and obtaining fused visual features; using a single-layer long-short-term memory network, the fused visual features serving as input, and obtaining a first sentence of image description; generating a first sentence semantic vector according to the first sentence image description; and generating a next image description sentence by adopting an attention-based long-term and short-term memory network language generation model and taking the localvisual features and the first sentence semantic vector as input, thereby obtaining complete image description. According to the method, two modes of visual features and text semantic features are fused, and an attention mechanism is combined, so that multi-angle comprehensive description of the image is realized.

Description

technical field [0001] The invention belongs to the technical field of image processing, and in particular relates to an image description generation method and system that integrates multiple angles and multiple modes. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] In recent years, the fields of natural language processing (NLP) and computer vision (CV) have made tremendous progress in analyzing and generating text and understanding images and videos. In daily work, there are many scenarios that require combining language and visual information, such as interpreting photos in the context of newspaper articles. In addition to this, the web provides a wealth of data combining linguistic and visual information: labeled photos, newspaper illustrations, videos with subtitles, and multimodal information on social media. In these scenarios, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08G06K9/62G06F17/27
CPCG06N3/08G06N3/044G06N3/045G06F18/253
Inventor 杨振宇张姣
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products