Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Method for Multi-Event Natural Language Description in Video Based on Event Relation Coding

A natural language and event relationship technology, applied to computer components, instruments, biological neural network models, etc., can solve problems such as inability to obtain event relationships, unsatisfactory effects, descriptive language accuracy and naturalness, etc., to achieve accurate output , the effect of reducing information loss

Active Publication Date: 2021-07-02
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the description task of multi-event video, the existing methods have different shortcomings.
The main problems are: 1) For the description of multi-event videos, these methods cannot obtain the relationship between events; 2) For video clips with large differences in length, the effect of using a unified encoder-decoder architecture is not good. ideal
The shortcomings of these two aspects lead to a decline in the accuracy and naturalness of the description language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Multi-Event Natural Language Description in Video Based on Event Relation Coding
  • A Method for Multi-Event Natural Language Description in Video Based on Event Relation Coding
  • A Method for Multi-Event Natural Language Description in Video Based on Event Relation Coding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0021] The specific embodiment of the present invention proposes a multi-event natural language description algorithm in video oriented to event relation coding, refer to figure 1 , the algorithm includes the following steps S1 to S4:

[0022] S1. A three-dimensional convolutional neural network is used to extract depth features from a given video sequence, and several depth feature vectors are obtained to form a depth feature sequence. For a given video sequence, the operation form obtained from the video sequence and event proposal can be written as: in, is the vocabulary sequence of the sentence, p={p start ,p end} is the start and end interval of a given event, Represents a sequence of deep features for a video sequence.

[0023] In order to obtain the depth feature sequence of the video sequence, first, for the given video sequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a multi-event natural language description algorithm in video oriented to event relationship encoding, which includes the following steps: S1. Using a three-dimensional convolutional neural network to extract depth features from a given video sequence to obtain several depth feature vectors , forming a deep feature sequence; S2, based on the deep feature sequence, using a recurrent neural network as a time series analysis method to calculate the proposed start and end intervals of events in the video sequence; S3, selecting the events to be described in the video sequence Describe the event, and re-encode the corresponding subsequence of the event to be described in the deep feature sequence according to the proposed start and end interval of the event to obtain the descriptor of the event to be described; S4. Using attention-based The LSTM adaptive decoder of the model decodes the descriptor to obtain a natural language used to describe the event to be described.

Description

technical field [0001] The invention relates to the technical field of natural language description, in particular to an algorithm for detecting events from videos and describing the events with natural language. Background technique [0002] Visual natural language description (Captioning) is the task of converting visual information into natural language. Usually this task uses codec architecture as a key technology. As the main steps in this process, the quality of the features output by the encoder and the generative model of the decoder have a significant impact on the final natural language results. Visual natural language description has been explored in both video and image. In general, state-of-the-art prevalents use neural network-based computational models to address the actual modeling of architectures. On the image, the convolutional neural network has good results in many visual understanding tasks, and the work in the image description task often uses this ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06N3/04
CPCG06N3/049G06V20/41G06V20/44G06V20/46
Inventor 袁春杨大力
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products