Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

268 results about "Voice activity detection" patented technology

Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol applications, saving on computation and on network bandwidth.

Method and an apparatus for voice activity detection

InactiveUS20120232896A1Easy to adaptFast processingSpeech recognitionDecision combinationSpeech sound
A voice activity detection apparatus (1) comprising: a signal condition analyzing unit (3) which analyses at least one signal parameter of an input signal to detect a signal condition SC of said input signal; at least two voice activity detection units (4-i) comprising different voice detection characteristics, wherein each voice activity detection unit (4-i) performs separately a voice activity detection of said input signal to provide a voice activity detection decision VADD; and a decision combination unit (5) which combines the voice activity detection decisions VADDs provided by said voice activity detection units (4-i) depending on the detected signal condition SC to provide a combined voice activity detection decision cVADD.
Owner:HUAWEI TECH CO LTD

Multi-band structure self-adaptive filter switching method for AEC (acoustic echo cancellation)

ActiveCN106782593AAchieving Convergence Speed ​​AdvantageOvercome speedSpeech analysisMulti bandAdaptive filter
The invention discloses a multi-band structure self-adaptive filter switching method for AEC (acoustic echo cancellation). Firstly, a far-end voice signal is acquired; a voice endpoint is detected, and a VAD (voice activity detection) flag bit and an improved envelope decision threshold are output; the voice signal is fed into a loudspeaker to serve as a desired signal and also input into a self-adaptive filter; the self-adaptive filter adopts a switchable multi-band structure and a corresponding self-adaptive algorithm, parameters of the filter are adjusted by use of the least mean square criterion according to feedback information, and the optimal solution is obtained. According to the provided switching method, voice characteristics are considered sufficiently under the condition that steady maladjustment is guaranteed, and optimized configuration of the convergence rate and the algorithm complexity is realized while advantages of the algorithm in the convergence rate are utilized. During actual application of echo cancellation, a single algorithm does not easily meet various variable demands. The variable switching algorithm provides more probability for a user and has great significance in application of self-adaptive echo cancellation.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Signal presence detection using bi-directional communication data

A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.
Owner:DITECH NETWORKS

Method, device and electronic equipment for voice activity detection

ActiveCN102044242ACapable of self-adaptive adjustmentImprove the performance of voice activation detectionSpeech analysisTime domainOperation mode
The embodiment of the invention discloses a method, device and electronic equipment for voice activity detection. The method comprises the following steps: acquiring time domain sorting parameters and frequency domain sorting parameters from audio frames; acquiring first distances between the time domain sorting parameters and the long-time sliding average value of the time domain sorting parameters in historical background noise frames; acquiring second distances between the frequency domain sorting parameters and the long-time sliding average value of the frequency domain sorting parameters in historical background noise frames; and determining whether the audio frames are foreground voice frames or background noise frames according to the first distances, the second distances and a determining polynomial group based on the first and second distances, wherein at least one coefficient in the determining polynomial group is a variable which can be changed with the operation mode of voice activity detection or the characteristics of input signals. The technical scheme can endue the determining criterion with self-adaptive regulation capability, thereby improving the performance of voice activity detection.
Owner:HUAWEI TECH CO LTD

Voice activity detection method in complex background noise

ActiveCN102194452ADifferentiate voiceDistinguish background noiseSpeech analysisBackground noiseSpeech sound
The invention discloses a voice activity detection method in complex background noise. The method sequentially comprises the following steps of: (1) performing TEO (Teager Energy Operator) operation on data; (2) pre-weighting input data x(n); (3) performing band-pass filtering; (4) framing and windowing; (5) calculating an evolution value of autocorrelation of each frame and a standard variance thereof; (6) calculating Stati of 20 frames at the initial stage, and a mean (Stati) and a standard variance std (Stati) thereof, comparing the std (Stati) with a preset threshold to judge whether voice is available; (7) calculating subsequent data; (8) calculating Stati of continuous FrameN frames, and performing secondary determination according to the mean (Stati) and the standard variance std (Stati) thereof; (9) considering that the speech interval Speechmin is equal to 100-200ms and duration Silencemin is equal to 500-1,000ms, judging that voice occurs under the condition that Statusfinalis equal to 0 when continuous Ns (the value is related to the FrameN) atatus is equal to 1; and judging that the voice is ended under the condition that Statusfinal is equal to 1 when continuous NE (the value is also related to the FrameN) atatus is equal to 0, and finally judging actual end points of the voice.
Owner:西安烽火电子科技有限责任公司

System and method for reducing VOIP (voice over internet protocol) communication resource overhead

The invention discloses a system for reducing VOIP (voice over internet protocol) communication resource overhead, comprising an input layer, a convolution layer, a sampling sub-layer and an output layer, each layer being composed of a characteristic spectrum, each characteristic spectrum containing nerve cells; a method of using the system to reduce VOIP communication resource overhead includes specifically: 1, training a convolutional neural network; 2, initializing the convolutional neural network; 3, inputting voice to be measured into a VAD (voice activity detection) system; 4, extracting voice characteristic parameter MFCC and its first-order differential characteristic parameter from each frame in order; 5, composing the parameters of each frame into a one-dimensional characteristic map taken into the convolutional neural network system; 6, the convolutional neural network system outputting in order a result [x, y] of each frame of the voice to be detected, and the VAD system making judgment and recording the results. The system and method have the advantages that the convolutional neural network system is used in the VAD system for detecting, the misjudgment rate of the VAD system is reduced, calculation time and bandwidth are saved, and VOIP voice resource overhead can be reduced at the premise of ensuring communication quality.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Voice acquiring method and device adopting plurality of microphones

The invention provides a voice acquiring method and a voice acquiring device adopting a plurality of microphones. The method comprises the following steps: carrying out voice acquiring by adopting theplurality of microphones, wherein the microphones correspond to different voice acquiring channels, and thus voice signals of each voice acquiring channel are obtained; carrying out analog-digital conversion on the voice signals, thus obtaining voice digital signals; carrying out framing processing on PCM binary data of the voice digital signals, thus obtaining short-time stable audio signals corresponding to each frame of PCM binary data; carrying out voice activity detection on the short-time stable audio signals in sequence according to the frames, and determining the frames correspondingto the short-time stable audio signals as voice frames or non-voice frames; carrying out voice quality detection on fragment audio files corresponding to the voice frames by adopting the preset framenumber as the step size, and saving the fragment audio files with the qualified quality; and splicing the saved fragment audio files with the qualified quality for synthesizing the complete audio file.
Owner:SPEAKIN TECH CO LTD

Voice activity detection apparatus and method

A voice activity detection method comprising the steps of (a) Estimating in a noise power estimator the noise power within a signal having a speech component and a noise component, and (b) Calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model.
Owner:KK TOSHIBA

Echo reduction system

The present invention relates to a method for reducing an echo in a microphone signal generated by a microphone, comprising echo compensating the microphone signal by subtracting an estimated echo signal from the microphone signal to generate an echo compensated signal, detecting a speech activity of a local speaker on the basis of the microphone signal and the estimated echo signal and suppressing a residual echo in the echo compensated signal on the basis of the detected speech activity to obtain an output signal. The invention further relates to a system for processing a microphone signal generated by a microphone, comprising echo compensation filtering means configured to receive and echo compensate the microphone signal to output an echo compensated signal based on the received microphone signal, a speech activity detection means configured to detect speech activity of a local speaker by receiving and analyzing the echo compensated signal and to output a detection signal and a residual echo suppressing means configured to receive the detection signal and to receive and filter the echo compensated signal on the basis of the detection signal to output an output signal.
Owner:CERENCE OPERATING CO

Intelligent voice mixing method and device for multi-party voice communication

The invention discloses an intelligent voice mixing method and device for multi-party voice communication, and belongs to the technical field of multimedia. The method comprises the steps that in the voice communication process, current frame data of all active voice channels except a home terminal are obtained; voice active detection results of the current frame data of all the active voice channels and the short time average energy of all the active voice channels are obtained; voice channels for conducting voice mixing processing are selected according to the voice active detection results of the current frame data of all the active voice channels, the short time average energy of all the active voice channels, the number of voice channels with effective voice and gating identifiers corresponding to all the active voice channels; superposition voice mixing processing is conducted on the current frame data of the selected voice channels, and voice mixing data obtained after the superposition voice mixing are output. By means of the intelligent voice mixing method and device, noise generated in the multi-party voice communication is lowered, the clarity of voice in the multi-party voice communication is improved, and the execution efficiency of the multi-party voice communication is improved.
Owner:GUANGZHOU HUADUO NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products