Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cross-modal generation method based on voice and face images

A face image and speech synthesis technology, applied in the field of deep learning, to achieve the effects of accelerated convergence, strong robustness, and scientific and reasonable design

Active Publication Date: 2021-02-19
TIANJIN UNIV
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the most common way is to use the GAN network to generate face images, which can generate face images that are very close to the original photos, and the quality is amazing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal generation method based on voice and face images
  • Cross-modal generation method based on voice and face images
  • Cross-modal generation method based on voice and face images

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described in detail below through the specific examples, the following examples are only descriptive, not restrictive, and cannot limit the protection scope of the present invention with this.

[0027] A transmembrane state generation method based on voice and face image, characterized in that: the method includes face reconstruction based on residual prior voice and personalized speech synthesis of residual prior human face image.

[0028] For Speech Reconstruction Face Model with Residual Prior, in order to alleviate the mismatch between speech and face in speech-based face generation, an end-to-end encoder-decoder structure based speech reconstruction face model is proposed , this structure complements the speech features in the speech extraction network with additional prior facial features. Two prior facial features (i.e. neutral and gender prior facial features) were explored according to gender. Furthermore, the encoder and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a cross-modal generation method based on voice and a face image. The method comprises the steps of voice reconstruction of a face and personalized voice synthesis of the faceimage. A voice reconstruction face model based on residual priori is provided for voice reconstruction of a face, and the face of the person is generated according to an input section of unknown voice. According to personalized voice synthesis of the face image, a face image personalized voice synthesis model based on residual priori is provided, and the voice of the person is synthesized according to the given face image and a section of text. The invention is scientific and reasonable in design, the effect of the voice reconstruction face model can generate the face image very similar to theoriginal face, the robustness is very high, the number of the generated faces is not a fixed number, the voice of any speaker is input, and the face similar to the speaker can be reconstructed. And the residual priori face image personalized speech synthesis model is also used for synthesizing the speech of the person according to any face image. In addition, the proposed residual priori knowledge method can accelerate convergence of the model and achieve a better effect.

Description

technical field [0001] The invention belongs to the technical field of deep learning, and relates to a method for generating a transmembrane state based on voice and face images. Background technique [0002] The deep learning of transmembrane states has always been a hot topic in academia and industry. One of its focuses is to study the mapping relationship between knowledge and information in different modalities. The mapping between modalities is to map an entity in a The process of transitioning from one modality to another. The technology of reconstructing face images from speech and the technology of synthesizing personalized speech from face images are also a cross-modal learning method, which can reconstruct the information of face modality and face image modality from speech modality. state to synthesize speech modality information. [0003] In the research of image generation, the most commonly used method is the transposed convolutional network, which passes the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62G10L13/027G10L13/08G06N3/04
CPCG10L13/027G10L13/08G06V40/168G06N3/045G06F18/214
Inventor 喻梅胡晓晟王建荣徐天一赵满坤
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products