Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for the detection of speech segments

A speech segment and noise technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as lack of robustness, operation depends on the level of noise signals, and increase in speech segment detection errors

Inactive Publication Date: 2012-09-19
TELEFONICA SA
View PDF8 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has the disadvantage that the operation depends on the level of the noise signal, so the results of this method are not suitable in the presence of large-scale noise
[0011] However, despite a large number of proposed methods, the task of speech segment detection continues to present considerable difficulties today
The methods proposed so far, i.e. those based on comparing parameters with thresholds and on statistical classification, are not robust enough under unfavorable noise conditions, especially in the presence of non-stationary noise, which makes Speech segment detection errors increase in case
For this reason, using these methods in particularly noisy environments, such as the interior of a car, presents significant problems
[0012] In other words, the methods proposed so far to detect speech segments, i.e. those based on comparing parameters of the signal with thresholds as well as those based on statistical comparisons, present significant robustness issues in unfavorable noise environments
The operation of these methods degrades especially in the presence of non-stationary noise
[0013] Impossible or especially difficult to use automatic speech recognition systems in certain environments (such as the interior of a car) due to lack of robustness in certain situations
In these cases, the use of methods based on comparing parameters of the signal with thresholds or on the basis of statistical comparisons to detect speech segments does not provide suitable results
Accordingly, automatic speech recognizers get many wrong results and will often discard user utterances, which makes this type of system extremely difficult to use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for the detection of speech segments
  • Method for the detection of speech segments
  • Method for the detection of speech segments

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] According to a preferred embodiment of the invention, the method for detecting noise segments and speech segments is carried out in three stages.

[0041] As a step preceding the method, the input signal is divided into frames of very short duration (between 5 ms and 50 ms), which are processed successively.

[0042] Such as figure 1 As shown, in a first stage 10 the energy is calculated for each frame 1 . Calculate the average energy value of this frame and the previous N frames (Box 11: Calculate the average energy of the previous N frames), where N is an integer, and the value of N varies according to the environment; in an environment with minimal noise, Typically N=10, and for noisy environments, N>10. This average value is then compared with the first energy threshold Threshold_energ1 (Box 12: Validate the average energy threshold), whose value is modified in the second stage according to the noise level, while the initial value of the first energy threshold is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a method for the detection of noise and speech segments in a digital audio input signal, said input signal being divided into a plurality of frames comprising: - a first stage (10) in which a first classification of a frame as noise is performed if the mean energy value for this frame and the previous N frames is not greater than a first energy threshold, N>1; - a second stage (20) in which for each frame that has not been classified as noise in the first stage it is decided if said frame is classified as noise or as speech based on combining at least a first criterion of spectral similarity of the frame with acoustic noise and speech models, a second criterion of analysis of the energy of the frame and a third criterion of duration; and of using a state machine for detecting the beginning of a segment as an accumulation of a determined number of consecutive frames with acoustic similarity greater than a first threshold and for detecting the end of said segment; - a third stage (30) in which the classification as speech or as noise of the signal frames carried out in the second stage is reviewed using criteria of duration.

Description

technical field [0001] The invention belongs to the field of speech technology, especially the field of speech recognition and speaker verification, and specifically belongs to the field of detection of speech and noise. Background technique [0002] Automatic speech recognition is a particularly complex task. One of the reasons is that it is difficult to detect the start and end of a speech segment uttered by a user, properly distinguishing the start and end of a speech segment from periods of silence that occur before beginning to speak, after ending speaking, and during the time the user speaks. The period of pause for breathing. [0003] The detection and delimitation of voiced speech segments is fundamental for two reasons. First, for reasons of computational efficiency: the algorithms used in speech recognition are very demanding in terms of computational load, so applying these algorithms to the entire acoustic signal without eliminating periods in which the user's ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L11/02G10L15/14G10L25/78
CPCG10L15/144G10L25/78
Inventor 卡洛斯·加西亚马丁内斯海伦卡·杜先斯巴罗贝斯毛里西奥·赛德拉维申斯大卫·卡德纳斯桑切斯
Owner TELEFONICA SA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products