Expression synthesis method and device based on phoneme driving and computer storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An expression synthesis and phoneme technology, which is applied in the field of image processing, can solve problems such as fixed scenes, inability to obtain expression synthesis videos, and blurred faces

Active Publication Date: 2020-08-07

BEIJING CENTURY TAL EDUCATION TECH CO LTD

View PDF4 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Generally speaking, people's facial information includes expression information and lip shape (mouth shape) information. Under normal circumstances, the expression information and lip shape information will change with the change of pronunciation. However, in the current related technology, It is not yet possible to obtain real-like expression synthesis videos, especially prone to problems such as blurred faces, missing backgrounds, or fixed scenes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

no. 1 example

[0033] figure 1 A schematic flow chart of the phoneme-driven expression synthesis method according to the first embodiment of the present invention is shown. Such as figure 1 As shown, the phoneme-driven expression synthesis method of the present embodiment mainly includes the following steps:

[0034] Step S1, identifying the target speech text according to the pre-built database to obtain a phoneme sequence, and converting the phoneme sequence into a corresponding replacement expression parameter sequence.

[0035] Optionally, the target speech text in the embodiment of the present invention refers to a speech file recorded in text form, which is, for example, any existing speech text file, and may also be generated by converting an audio file using audio-to-text software Speech text file.

[0036] Optionally, the audio file may be an existing voice resource or a voice resource generated by temporary recording. In addition, the audio-to-text software may be audio convers...

no. 2 example

[0053] image 3 A schematic flow chart showing the phoneme-driven expression synthesis method according to the second embodiment of the present invention is shown.

[0054] In this embodiment, the above-mentioned recognition of the target speech text to obtain a phoneme sequence, and converting the phoneme sequence into a replacement expression parameter sequence according to the pre-built database (that is, step S1) may also include:

[0055] Step S11, editing the corresponding relationship between each phoneme data and each replacement expression parameter to generate a pre-built database.

[0056] Optionally, the above step S11 also includes the following processing steps:

[0057] First, step S111 is executed to construct phoneme data in the pre-built database.

[0058] In the prior art, the extracted phonemes generally include 18 vowel phonemes and 25 consonant phonemes, a total of 43 pronunciation phonemes, as shown in the following list 1, plus silent phonemes, a tota...

no. 3 example

[0080] Figure 4 A schematic flow chart of the method for synthesizing expressions based on phoneme drive according to the third embodiment of the present invention is shown.

[0081] In an optional embodiment, rendering the target two-dimensional image sequence frame by frame (ie step S4) may also include the following processing steps:

[0082] Step S41, acquiring a target 2D image corresponding to the current frame in the target 2D image sequence and performing rendering processing.

[0083] Step S42, repeat step S41, that is, the step of acquiring a target 2D image corresponding to the current frame in the target 2D image sequence and performing rendering processing until all target 2D images corresponding to each frame in the target 2D image sequence Images are rendered.

[0084] read on Figure 5 , in an optional embodiment, the acquisition of a target two-dimensional image corresponding to the current frame in the target two-dimensional image sequence and performing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an expression synthesis method and device based on phoneme driving and a computer storage medium, and the method mainly comprises the steps: recognizing a target voice text according to a preset database, so as to obtain a phoneme sequence, and converting the phoneme sequence into a replacement expression parameter sequence; extracting to-be-replaced original sub-video datafrom the original video data based on the voice duration of the target voice text; constructing a three-dimensional face model based on faces in the original sub-video data, extracting to-be-replacedexpression parameters of the three-dimensional face model frame by frame to generate a to-be-replaced expression parameter sequence, and replacing the to-be-replaced expression parameter sequence with the replaced expression parameter sequence; utilizing the replacement expression parameter sequence to drive a three-dimensional face model to generate a target two-dimensional image sequence, and rendering the target two-dimensional image sequence frame by frame; and splicing the rendered target two-dimensional image sequence to generate target sub-video data for replacing the original sub-video data. According to the invention, the expression synthesis video with a more real effect can be efficiently and accurately obtained.

Description

technical field [0001] Embodiments of the present invention relate to image processing technology, and in particular to a phoneme-driven expression synthesis method, device and computer storage medium. Background technique [0002] With the advancement of computer technology, face-based image processing technology has developed from two-dimensional to three-dimensional, and it has attracted widespread attention because of its stronger sense of reality. [0003] Generally speaking, people's facial information includes expression information and lip shape (mouth shape) information. Under normal circumstances, the expression information and lip shape information will change with the change of pronunciation. However, in the current related technology, It is not yet possible to obtain a real-like expression synthesis video, especially prone to problems such as blurred faces, missing backgrounds, or fixed scenes. Contents of the invention [0004] In view of this, one of the te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06T17/00G06T15/20G06T11/60

CPCG06T17/00G06T15/205G06T11/60

Inventor 王骁冀志龙刘霄

Owner BEIJING CENTURY TAL EDUCATION TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Expression synthesis method and device based on phoneme driving and computer storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

no. 1 example

no. 2 example

no. 3 example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology