Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Synthetic speech detection method based on speech segmentation

A technology for synthesizing speech and detection methods, applied in speech analysis, speech recognition, instruments, etc., can solve the problem of high threat degree of ASV system, and achieve the effect of improving accuracy, improving detection accuracy, and high detection accuracy.

Active Publication Date: 2021-06-22
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF13 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The principle of converted speech attack is similar to that of synthetic speech attack, and it poses a greater threat to the ASV system
At the same time, these two attacks often appear in other speech recognition technology application scenarios, such as phone fraud, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Synthetic speech detection method based on speech segmentation
  • Synthetic speech detection method based on speech segmentation
  • Synthetic speech detection method based on speech segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to facilitate those skilled in the art to understand the technical content of the present invention, the content of the present invention will be further explained below in conjunction with the accompanying drawings.

[0039] The present invention is divided into a training stage and a deployment stage, the training stage is carried out on the server, the deployment stage is carried out after the training stage is completed, and the data in the training stage is deployed on the voice equipment.

[0040] The training phase mainly includes two parts: data processing and model training.

[0041] Step A data preprocessing is mainly to process the input original voice signal, detect the sampling rate, and perform endpoint detection of the voice signal (to find out the beginning and end of the voice signal), voice framing (approximately considered to be voice within 10-30ms) The signal is short-term stable, and the speech signal is divided into sections for analysis)...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a synthetic speech detection method based on speech segmentation, belongs to the field of speech detection, and aims to solve the problem of low detection precision in the prior art. The method comprises the following steps of: extracting two kinds of features in an audio, namely a CQCC feature of a voiced segment of the audio and an average zero-crossing rate feature of a silent (mute) segment of the audio; and adopting two GMM models to fit the two kinds of features respectively, giving different weights to the two GMM models, carrying out testing, and finding the most appropriate weight. The detection precision of synthetic speech is obviously improved.

Description

technical field [0001] The invention belongs to the field of voice detection, in particular to a synthetic voice detection technology. Background technique [0002] With the development of artificial intelligence, embedded devices have undergone tremendous changes. The application of image recognition and face unlocking in embedded devices greatly facilitates production and life. Speech recognition, as a representative of acoustic artificial intelligence, has been more and more widely used in embedded devices such as voice assistants, voice printing and unlocking. Speech recognition technology refers to the technology that enables computers to convert voice signals into corresponding text or commands through the process of recognition and analysis. Automatic Speaker Verification (ASV) is a speech recognition technology that identifies individuals by distinguishing the speech print features of human speech. In many cases, ASV technology can replace traditional password aut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/04G10L15/06G10L25/24
CPCG10L15/02G10L15/04G10L15/063G10L25/24
Inventor 詹瑾瑜江维蒲治北杨永佳边晨雷洪江昱呈于安泰
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products