Neural network training and voice endpoint detection method and device

A neural network and training method technology, applied in the field of voice endpoint detection, can solve the problems of inaccurate cutting of voice segments, high delay, false triggering, etc., and achieve the effect of accurate results

Active Publication Date: 2020-06-19
AISPEECH CO LTD
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of implementing this application, the inventor found that the existing solutions have at least the following defects: 1. High delay, which affects user experience; 2. No voice is detected, and the voice segment is rejected; 3. Mis-triggered, non-voice segment Detected as voice; 4. The voice segment is not cut correctly, the beginning of the voice segment is cut, and the end is cut

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network training and voice endpoint detection method and device
  • Neural network training and voice endpoint detection method and device
  • Neural network training and voice endpoint detection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0084] As an implementation manner, the above-mentioned electronic equipment is applied to a neural network training device, including:

[0085] at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor so that the at least one processor can:

[0086] randomly mixing speech audio data and non-speech audio data to form mixed audio data;

[0087] extracting acoustic features of the mixed audio data;

[0088] The acoustic feature is input in the FSMN model, and the training of the FSMN model makes the classification of the output of the FSMN model substantially equal to the speech audio data and the non-speech audio data in the mixed audio data Classification.

[0089] As another implementation manner, the above-mentioned electronic equipment is applied to a voice endpoint detection device, including:

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a neural network training and voice endpoint detection method and device. The method comprises the steps: randomly mixing voice audio data and non-voice audio data to form mixed audio data; extracting acoustic features of the mixed audio data; and inputting the acoustic features into an FSMN model, and training the FSMN model to enable the classification of the voice audiodata and the non-voice audio data output by the FSMN model to be basically equal to the classification of the voice audio data and the non-voice audio data in the mixed audio data. According to the method, the non-voice audio data and the voice audio data are mixed and then the mixed data are used for inputting of a feedforward sequence memory neural network to train the neural network, so that the neural network can output information that whether the audio data units belong to voice audio data or non-voice audio data and then the information can be used for voice endpoint detection, and theresult of voice endpoint detection is more accurate.

Description

technical field [0001] The invention belongs to the technical field of voice endpoint detection, and in particular relates to neural network training and a voice endpoint detection method and device. Background technique [0002] In related technologies, Voice Activity Detection (VAD) is also called voice endpoint detection and voice boundary detection. It is used to detect whether there is a speech segment in the continuous audio stream data. [0003] like figure 1 As shown, the start (T1) and end (T2) time of the voice segment is calculated in real time. In order to ensure the effect of subsequent voice recognition or voice wake-up, the start time will be advanced and the end time will be delayed, and finally two time points T0 and T3 will be output. [0004] In the process of implementing this application, the inventor found that the existing solutions have at least the following defects: 1. High delay, which affects user experience; 2. No voice is detected, and the voi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/04G10L15/06G10L15/08G10L25/30
CPCG10L15/04G10L15/063G10L15/08G10L25/30
Inventor 胡雪成
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products