Cross-modal time sequence behavior positioning method and device for multi-granularity cascade interaction network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A behavior positioning and multi-granularity technology, applied in the field of visual-language cross-modal learning, can solve the problems of not making full use of multi-granularity text query information, not fully modeling the timing dependence characteristics of video local contexts, etc., to improve accuracy, The effect of improving positioning accuracy

Active Publication Date: 2022-02-18

ZHEJIANG LAB

View PDF11 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the existing methods do not make full use of multi-granularity text query information in the visual-language cross-modal interaction link, and do not fully model the local context timing dependence of the video in the video representation encoding link

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0075] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0076] The invention discloses a multi-granularity cascaded interactive network cross-modal temporal behavior positioning method and device, based on the multi-granularity cascaded interactive network visual-language cross-modal temporal behavior positioning, which is used to solve untrimmed video based on given The timing behavior positioning problem of a given text query. This method proposes a simple and effective multi-granularity cascaded cross-modal interaction network to improve the cross-modal alignment ability of the model. In addition, the present invention introduces a local-global context-aware video encoder, which is used to improve the context timin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal time sequence behavior positioning method and device for a multi-granularity cascade interactive network, and aims at solving the problem of time sequence behavior positioning based on given text query in an unpruned video. According to the invention, a new multi-granularity cascade cross-modal interaction network is implemented, cascade cross-modal interaction is carried out in a coarse-to-fine mode, and the cross-modal alignment capability of the model is improved. In addition, the invention introduces a local-global context-aware video encoder, and the local-global context-aware video encoder is used for improving the context time sequence dependence modeling capability of the video encoder. The visual-language cross-modal alignment method is simple in implementation method and flexible in means, and has the advantage of improving visual-language cross-modal alignment precision, and the model obtained through training can remarkably improve time sequence positioning accuracy on paired video-query test data.

Description

technical field [0001] The invention relates to the field of visual-language cross-modal learning, in particular to a cross-modal temporal sequence behavior positioning method and device. Background technique [0002] With the rapid development of multimedia and network technology, as well as the increasing popularity of large-scale video surveillance in transportation, campuses, shopping malls and other places, massive video data shows rapid geometric growth, and video understanding has become an important and urgent problem to be solved. Among them, temporal behavior localization is the basis and important part of video understanding. The research on temporal behavior localization based on visual unimodality limits the behavior to be localized to a predefined behavior set. However, the behaviors in the real world are complex and diverse, and the predefined behavior set is difficult to meet the needs of the real world. like figure 1 As shown, the visual-language cross-mod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/735G06F16/78G06F16/783G06N3/04G06N3/08H04N19/149H04N19/21

CPCG06F16/735G06F16/7844G06F16/7867H04N19/21H04N19/149G06N3/08G06N3/044

Inventor 王聪鲍虎军宋明黎

Owner ZHEJIANG LAB

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal time sequence behavior positioning method and device for multi-granularity cascade interaction network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology