Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video voice conversion method, video voice conversion device and server

A technology of video voice and conversion method, applied in voice analysis, voice synthesis, voice recognition, etc., can solve the problems of high cost, low efficiency, difficult to guarantee accuracy, etc., achieve the accuracy of results, reduce costs, and avoid accurate lower sex effect

Active Publication Date: 2014-12-31
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF8 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The disadvantages of the existing technology are: the above-mentioned three methods are all translated by humans, which is costly and inefficient, and the accuracy is difficult to be guaranteed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video voice conversion method, video voice conversion device and server
  • Video voice conversion method, video voice conversion device and server
  • Video voice conversion method, video voice conversion device and server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] Figure 1A It is a flow chart of the video-to-speech conversion method provided in Embodiment 1 of the present invention, Figure 1B It is a schematic diagram of segmentation of a speech signal in a source language provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where it is necessary to convert the speech signal of the source language in the video into the speech signal of the target language, and the method can be executed by a video-to-speech conversion device, which can be set in a server. The method specifically includes the following operations:

[0026] 101: Extracting the speech signal of the source language in the video, segmenting the speech signal of the source language, and obtaining at least one sub-speech signal of the source language;

[0027] Here, when the speech signal of the source language in the video is relatively long, segmenting the speech signal of the source language according to a certain metho...

Embodiment 2

[0046] Figure 2A For the video-to-speech conversion method provided in Embodiment 2 of the present invention, Figure 2B A schematic diagram of an interface for selecting a target language type for a user in Embodiment 2 of the present invention. This embodiment is applicable to the situation that the voice signal of the source language in the video is converted into the voice signal of the target language before playing the video. The playback device can be set in the same server or in different servers. The method specifically includes the following operations:

[0047] 201: The video-to-voice conversion device determines at least one target language to be converted according to the setting information;

[0048] 202: The video-to-speech conversion device performs the following operations for each target language that needs to be converted: extract the speech signal of the source language in the video, segment the speech signal of the source language, and obtain at least ...

Embodiment 3

[0055] image 3 It is the video-to-speech conversion method provided by Embodiment 3 of the present invention. This embodiment is applicable to the situation that the voice signal of the source language in the video is converted into the voice signal of the target language in real time after receiving the play request, and the method can be performed by a video voice conversion device and a video playback device, and the video voice conversion device and the video playback device can be set in the same server or different servers. The method specifically includes the following operations:

[0056] 301: The video and audio playback device receives a video playback request, and the playback request includes the target language type selected by the user or automatically selected;

[0057] Among them, an example of the user selecting the target language type can be found in Figure 2B , the user can select Mandarin or Sichuanese as the target language type in the menu of "Simul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a video voice conversion method, a video voice conversion device and a server, and relates to the technical field of multimedia processing. The video voice conversion method, the video voice conversion device and the server are used for reducing the translation cost of voice in the video, and improving translation efficiency and translation accuracy. According to the method, the voice signal of a source language in the video is extracted, the voice signal of the source language is segmented, and sub voice signals of at least one section of source language are obtained; for the sub voice signals of each segment of source language, the sub voice signals of the source voice are converted into sub voice signals of a target language according to a pre-built voice model; the obtained sub voice signals of each segment of target language are merged with the video to obtain a video including the voice signals of the target language. When the video voice conversion method, the video voice conversion device and the server provided by the invention are adopted, the voice translation cost in the video can be reduced, and the translation efficiency and the translation accuracy can be improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of multimedia processing, and in particular, to a video-to-voice conversion method, device and server. Background technique [0002] Many times in life, you will come into contact with foreign language videos, such as Hollywood movies, foreign language learning tutorial videos, etc. For people who are not good at foreign languages, they need some auxiliary translation subtitles when watching these videos, but many foreign language videos There are no subtitles. If the viewer cannot understand the foreign language, the foreign language video at this time is meaningless to the viewer. [0003] In the prior art, in order to enable people to understand foreign language videos, the following three methods are mainly adopted: one is to add subtitles obtained by human translation in advance in the foreign language videos; the other is to make foreign language videos into dubbed films , th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/26G10L13/02G06F17/28H04N5/278
CPCG06F40/40G10L13/02G10L15/26H04N5/278
Inventor 秦铎浩沈国龙
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products