Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Bidirectional reconstruction network video description method based on hierarchical attention mechanism

A network video and attention technology, applied in the computer field, can solve problems such as irrelevant background information, text description interference, and low semantic similarity, and achieve high semantic similarity, minimize reconstruction loss, and reduce interference.

Active Publication Date: 2020-03-27
南京赤马信息技术有限公司
View PDF7 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: first, the scale of extracting video frame features is single, and it is difficult to fully represent the rich video information; second, only consider the forward information propagation from video content to text description, there is no Consider the reverse information propagation from text description to video content, so that the semantic similarity between the generated text description and video content is not high; third, the correlation between the video frame region features and the generated text description is not considered, when the described object When it is small, it is easy to introduce irrelevant background information and interfere with the generation of text descriptions
Therefore, these methods are difficult to accurately describe video content and cannot fully capture video semantic information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bidirectional reconstruction network video description method based on hierarchical attention mechanism
  • Bidirectional reconstruction network video description method based on hierarchical attention mechanism
  • Bidirectional reconstruction network video description method based on hierarchical attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with accompanying drawing.

[0039] The two-way reconstruction network video description method based on the hierarchical attention mechanism focuses on extracting multi-scale video features to fully represent the temporal and spatial structure of the video, and at the same time uses the hierarchical attention mechanism to make the bidirectional reconstruction network model built pay more attention to the generated description sentences Most relevant video features. The main idea is to use convolutional neural network as an encoder to extract multi-scale regional features of video frames, and use hierarchical attention mechanism to process video features to obtain dynamic representation of video features; use long short-term memory neural network as decoder, minimize cross entropy The loss function obtains the probability distribution of vocabulary words and generates sentences accordingly; the reconst...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a bidirectional reconstruction network video description method based on a hierarchical attention mechanism. The method comprises the following steps: firstly, extracting multi-scale region features of a video frame by using a convolutional neural network as an encoder, and processing the video features by using a hierarchical attention mechanism to obtain video feature dynamic representation; secondly, obtaining probability distribution of vocabulary words by minimizing a cross entropy loss function by taking a long-short-term memory neural network as a decoder and taking video feature dynamic representation and text description thereof as input, and obtaining a generated statement according to the probability distribution; and thirdly, constructing a bidirectionalreconstruction network taking the hidden vector of the decoder as input, minimizing the reconstruction loss, and outputting reconstructed video features, so as to enable the generated text description and video content to have very high semantic similarity. According to the method, multi-scale video features can be effectively extracted to reflect a video space-time structure, irrelevant information interference is reduced, potential video semantic information is mined, and more accurate, more natural and smoother video content description is generated.

Description

technical field [0001] The invention belongs to the technical field of computers, in particular to the technical field of video description in visual computing, and relates to a two-way reconstruction network video description method based on a hierarchical attention mechanism. Background technique [0002] In today's Internet era, smart devices such as live broadcast network platforms, video surveillance, and mobile phones generate a large amount of video data every day, and these data are showing explosive growth. It is time-consuming and labor-intensive to manually describe the content of these videos, and the field of video description was born. The video description method can be mainly used in practical application scenarios such as video title generation, video retrieval, and visually impaired people watching videos. [0003] The video description task is to describe the content of the video with a piece of text. Its goal is not only to capture the complex high-dimen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06N3/04
CPCG06N3/049G06V20/41G06V20/46G06N3/045
Inventor 李平张盼胡海洋徐向华
Owner 南京赤马信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products