Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video retrieval method based on multi-mode and self-supervised representation learning

A multi-modal, video technology, applied in the field of computer technology and image processing, to achieve high accuracy and recall rate, high information carrying capacity and robustness, and reduce complexity.

Pending Publication Date: 2022-01-18
ZHEJIANG UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At the same time, we have also observed that a large number of videos are actually stealing content from others, making secondary edits in violation of regulations, and obtaining huge illegal benefits at low cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video retrieval method based on multi-mode and self-supervised representation learning
  • Video retrieval method based on multi-mode and self-supervised representation learning
  • Video retrieval method based on multi-mode and self-supervised representation learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The method of the present invention will be further described below in conjunction with the accompanying drawings.

[0023] The present invention proposes a video retrieval method based on multimodal and self-supervised characterization learning, which does not rely on task-oriented labeling data, and only needs to collect image data within the platform or the Internet to train the characterization network. Given a search video, videos with similar images or similar events can be found in the tens of millions of video databases. This technology can be a solution to issues such as news event aggregation, copyright protection infringement retrieval, and multi-modal retrieval on short video platforms.

[0024] A video retrieval method based on multimodal and self-supervised representation learning, the specific implementation steps are as follows:

[0025] Step 1: Collect a sufficient number of images and corresponding text information. The text information includes title...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video retrieval method based on multi-mode and self-supervised representation learning. The method is applied to the field of video retrieval. When a search video is given, videos with similar pictures or events can be found in a ten-million-level video library. The method can be used for solving the problems of news event aggregation, copyright protection infringement retrieval, multi-modal retrieval and the like of a short video platform. The method mainly comprises the following steps: 1, constructing a supervision data set through unlabeled picture data and picture-text pair data, and training a picture feature extraction network by using the supervision data set; 2, constructing a feature frequency library through a method of performing feature extraction on video frames and calculating domain density; and 3, extracting video representation and constructing a video library, and performing video retrieval by using a neighbor retrieval method. The video retrieval method based on multi-mode and self-supervised representation learning provided by the invention has relatively high accuracy and recall rate in a test data set, and has good robustness.

Description

technical field [0001] The invention belongs to the field of computer technology and image processing, in particular to a video retrieval method based on multimodal and self-supervised representation learning. Background technique [0002] Before 2015, image retrieval and image-text search were one of the most important technologies on the Internet. It is very important to search for pictures by text on search engines, search for pictures by pictures, and search for product pictures on e-commerce platforms. Search technology also urgently needs to quickly move from graphic demand to video demand. [0003] Video retrieval is a very important yet challenging problem, and in recent years, we have witnessed a dramatic increase in the amount of video generated over the Internet, exacerbated by the rapid development of social media applications and video sharing platforms. Due to the large number of videos released by users in a very short period of time on video platforms, thes...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/73G06F16/783G06V20/40G06V10/70G06N20/00
CPCG06F16/73G06F16/7844G06F16/7847G06N20/00Y02D10/00
Inventor 丁勇朱子奇徐晓舒汤峻
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products