Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech-driven lip-synchronous face video synthesis algorithm based on concatenated convolution LSTM

A technology of lip synchronization and video synthesis, applied in the field of computer vision, it can solve problems such as under-constrained, and achieve the effect of expanding the receptive field and increasing the depth

Pending Publication Date: 2019-02-05
ZHEJIANG UNIV
View PDF13 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, it is challenging to recover high-fidelity high-dimensional low-frequency video directly from low-dimensional high-frequency speech audio signals or text-to-speech audio signals, which is a severely underconstrained ill-conditioned problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech-driven lip-synchronous face video synthesis algorithm based on concatenated convolution LSTM
  • Speech-driven lip-synchronous face video synthesis algorithm based on concatenated convolution LSTM
  • Speech-driven lip-synchronous face video synthesis algorithm based on concatenated convolution LSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0038] According to the embodiment that the complete method of the present invention is specifically implemented is as follows:

[0039] Such as figure 2 As shown, the following system modules are used:

[0040] The input module is used to receive the audio signal of the user's input voice or the audio signal of the text-synthesized speech, and then send it to the cascaded con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech-driven lip synchronous face video synthesis algorithm of cascaded convolution LSTM. The speech video of the target person is captured as the background video, and the3D face model of the target is obtained by reconstructing the 3D face of the image sequence, and the facial animation vector sequence of the background video is obtained. The audio signal extracts thespeech features of the filter bank; The speech features of the filter bank are used as the input of the concatenated convolution short-time memory network, and the facial animation vector sequence isused as the output for the training test. Facial animation vector sequences of audio signals are used to replace facial animation vector sequences of target 3D face models to generate new 3D face models and render face images to synthesize lip-shaped synchronous face videos. The invention retains more voiceprint information, innovates to obtain the speech characteristics of the filter bank through the two-dimensional convolution neural network, expands the receptive field of the convolution neural network, increases the network depth, and obtains accurate lip-shaped synchronous face video.

Description

technical field [0001] The present invention relates to the field of computer vision and related technologies of audio signal processing, in particular to a voice-driven lip sync human face video algorithm based on cascaded convolution long short-term memory network structure (cascaded convolution LSTM). Background technique [0002] After years of exploration and development, computer vision has been applied in many fields such as digital entertainment, medical health, and security monitoring. Synthesizing realistic visual content not only has great commercial value, but also has been expected by the industry. Many movie special effects would not be possible without the comprehensive visual effects of computer synthesis. At present, there are already a large number of artificially synthesized videos on the Internet. In addition, speech recognition and text-to-speech technologies have also been widely used in chatbots. The present invention hopes to make the online chat r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06T13/40G10L21/10G10L21/0356
CPCG06T13/40G10L21/0356G10L21/10G10L2021/105Y02D10/00
Inventor 朱建科江泽胤子
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products