Method and system for cross-modal text generation based on dual learning

A technology for generating videos and cross-modalities, applied in the fields of electronic digital data processing, digital data information retrieval, instruments, etc., can solve the problems of unstable learning process and lack of diversity.

Active Publication Date: 2022-07-15
TSINGHUA UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Such a learning process is unstable, and the generated videos are usually similar without diversity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for cross-modal text generation based on dual learning
  • Method and system for cross-modal text generation based on dual learning
  • Method and system for cross-modal text generation based on dual learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.

[0030] The following describes the method and system for generating video based on dual learning based on cross-modal text according to the embodiments of the present invention with reference to the accompanying drawings. First, the method and system for generating video based on cross-modal text based on dual learning according to the embodiments of the present invention will be described with reference to the accompanying drawings. Methods.

[0031] figure 1 This is a flow chart of a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for generating video from cross-modal text based on dual learning, wherein the method includes the following steps: constructing a text-to-video generation model; constructing a video-to-text mapping model; using a dual learning mechanism to combine Train the generative model and the mapping model to obtain the training model; input the preset text into the training model to generate the corresponding initial video; use the mapping model to map the initial video to new text, and feed back the new text to the generative model to judge the new text. Whether the text matches the preset text, and then repair the initial video to obtain the final mapping video. This method considers the bidirectional mapping between text information and video information to better realize the generation of text to video, and at the same time, it also makes the generated video higher in quality and more closely matched with user needs.

Description

technical field [0001] The present invention relates to the technical field of multi-modal generation models, in particular to a method and system for generating video from cross-modal text based on dual learning. Background technique [0002] Currently, user experience is very important in terms of language and visual interaction scenarios between users and machines. The user inputs text or language, and the machine can generate the corresponding video according to the user's input, but there are still some problems as to whether the generated video is realistic and whether it is consistent with the user's input. For example, the existing method of generating video from text only considers the one-way mapping of text to video, maps text data and video data to the same latent space, and then reconstructs the video according to the data points in the latent space to achieve the purpose of generating video from text. . At the technical level, the specific steps are to first ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/435G06F16/438G06F40/289
CPCG06F16/435G06F16/438
Inventor 朱文武刘月王鑫
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products