Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

508 results about "Voice activity" patented technology

Voice activity detection is an essential component of many audio systems, such as automatic speech recognition and speaker recognition.

Robust separation of speech signals in a noisy environment

A method for improving the quality of a speech signal extracted from a noisy acoustic environment is provided. In one approach, a signal separation process is associated with a voice activity detector. The voice activity detector is a two-channel detector, which enables a particularly robust and accurate detection of voice activity. When speech is detected, the voice activity detector generates a control signal. The control signal is used to activate, adjust, or control signal separation processes or post-processing operations to improve the quality of the resulting speech signal. In another approach, a signal separation process is provided as a learning stage and an output stage. The learning stage aggressively adjusts to current acoustic conditions, and passes coefficients to the output stage. The output stage adapts more slowly, and generates a speech-content signal and a noise dominant signal. When the learning stage becomes unstable, only the learning stage is reset, allowing the output stage to continue outputting a high quality speech signal.
Owner:QUALCOMM INC

Voice query extension method and system

A voice query extension method and system. The voice query extension method includes: detecting voice activity of a user from an input signal and extracting a feature vector from the voice activity; converting the feature vector into at least one phoneme sequence and generating the at least one phoneme sequence; matching the at least one phoneme sequence with words registered in a dictionary, extracting a string of the matched words with a linguistic meaning, and selecting the string of the matched words as a query; determining whether the query is in a predetermined first language, and when the query is not in the first language as a result of the determining, converting the query using a phoneme to grapheme rule, and generating a query in the first language; and searching using the query in the first language.
Owner:SAMSUNG ELECTRONICS CO LTD

Method and apparatus for enhancing noise-corrupted speech

A noise suppression device receives data representative of a noise-corrupted signal which contains a speech signal and a noise signal, divides the received data into data frames, and then passes the data frames through a pre-filter to remove a dc-component and the minimum phase aspect of the noise-corrupted signal. The noise suppression device appends adjacent data frames to eliminate boundary discontinuities, and applies fast Fourier transform to the appended data frames. A voice activity detector of the noise suppression device determines if the noise-corrupted signal contains the speech signal based on components in the time domain and the frequency domain. A smoothed Wiener filter of the noise suppression device filters the data frames in the frequency domain using different sizes of a window based on the existence of the speech signal. Filter coefficients used for Wiener filter are smoothed before filtering. The noise suppression device modifies magnitude of the time domain data based on the voicing information outputted from the voice activity detector.
Owner:META C CORP

Method and apparatus for audio intelligibility enhancement and computing apparatus

Method and apparatus for audio intelligibility enhancement and computing apparatus are provided. The method includes the following steps. Environment noise is detected by performing voice activity detection according to a detected audio signal from at least a microphone of a computing device. Noise information is obtained according to the detected environment noise and a first audio signal. A second audio signal is outputted by boosting the first audio signal under an adjustable headroom by the computing device according to the noise information and the first audio signal.
Owner:HTC CORP

Small array microphone for beam-forming and noise suppression

Techniques are provided to suppress noise and interference using an array microphone and a combination of time-domain and frequency-domain signal processing. In one design, a noise suppression system includes an array microphone, at least one voice activity detector (VAD), a reference generator, a beam-former, and a multi-channel noise suppressor. The array microphone includes multiple microphones—at least one omni-directional microphone and at least one uni-directional microphone. Each microphone provides a respective received signal. The VAD provides at least one voice detection signal used to control the operation of the reference generator, beam-former, and noise suppressor. The reference generator provides a reference signal based on a first set of received signals and having desired voice signal suppressed. The beam-former provides a beam-formed signal based on a second set of received signals and having noise and interference suppressed. The noise suppressor further suppresses noise and interference in the beam-formed signal.
Owner:FORTEMEDIA

Personalized Voice Activity Detection

ActiveUS20080255842A1Reduce transmission bandwidthLeveling precisionSpeech recognitionPersonalizationVoice activity
A method of transferring a real-time audio signal transmission, including: registering voice patterns (or other characteristics) of on more users to be used to identify the voices of the users, accepting an audio signal as it is created as a sequence of segments, analyzing each segment of the accepted audio signal to determine if it contains voice activity (314), determining a probability level that the voice activity of the segment is of a registered user (320&322); and selectively transferring the contents, of a segment responsive to the determined probability level (324).
Owner:SHIMHI SHAUL

Detection of voice activity in an audio signal

InactiveUS20060053007A1Improve intelligibility and pleasantness of speechQuick changeSpeech recognitionDigital dataFrequency spectrum
A device comprising a voice activity detector for detecting voice activity in a speech signal using digital data formed on the basis of samples of an audio signal. The voice activity detector comprises a first element adapted to examine whether the signal has a highpass nature. The voice activity detector also comprises a second element adapted to examine the frequency spectrum of the signal. The voice activity detector is adapted to provide an indication of speech when the first element has determined that the signal has a highpass nature or the second element has determined that the signal does not have a flat frequency response.
Owner:NOKIA SOLUTIONS & NETWORKS OY

Multiple microphone voice activity detector

Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a predetermined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.
Owner:QUALCOMM INC

Robust separation of speech signals in a noisy environment

A method for improving the quality of a speech signal extracted from a noisy acoustic environment is provided. In one approach, a signal separation process is associated with a voice activity detector. The voice activity detector is a two-channel detector, which enables a particularly robust and accurate detection of voice activity. When speech is detected, the voice activity detector generates a control signal. The control signal is used to activate, adjust, or control signal separation processes or post-processing operations to improve the quality of the resulting speech signal. In another approach, a signal separation process is provided as a learning stage and an output stage. The learning stage aggressively adjusts to current acoustic conditions, and passes coefficients to the output stage. The output stage adapts more slowly, and generates a speech-content signal and a noise dominant signal. When the learning stage becomes unstable, only the learning stage is reset, allowing the output stage to continue outputting a high quality speech signal.
Owner:QUALCOMM INC

Speech recognition using microphone antenna array

A system and method of audio processing provides enhanced speech recognition. Audio input is received at a plurality of microphones. The multi-channel audio signal from the microphones may be processed by a beamforming network to generate a single-channel enhanced audio signal, on which voice activity is detected. Audio signals from the microphones are additionally processed by an adaptable noise cancellation filter having variable filter coefficients to generate a noise-suppressed audio signal. The variable filter coefficients are updated during periods of voice inactivity. A speech recognition engine may apply a speech recognition algorithm to the noise-suppressed audio signal and generate an appropriate output. The operation of the speech recognition engine and the adaptable noise cancellation filter may advantageously be controlled based on voice activity detected in the single-channel enhanced audio signal from the beamforming network.
Owner:HIGHBRIDGE PRINCIPAL STRATEGIES LLC AS COLLATERAL AGENT

Signal processing apparatus having voice activity detection unit and related signal processing methods

A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.
Owner:REALTEK SEMICON CORP

Method and system for a shared antenna control using the output of a voice activity detector

Aspects of a method and system for a shared antenna control using the output of a voice activity detector are provided. A single radio chip for use within a wireless device may handle communication of a Bluetooth (BT) and a Wireless Local Area Network (WLAN) protocol via a single antenna. Simultaneous reception via BT and WLAN channels may be enabled. The single radio chip may enable detection of voice activity in the BT channel and may reduce the BT transmission priority level some time after the voice activity indicates that the BT channel is not transmitting voice information to enable error concealment. Voice activity detection may be based on PCM samples in the BT channel. The single radio chip may transmit an ACK signal to an access point after the BT transmission priority level is reduced.
Owner:AVAGO TECH WIRELESS IP SINGAPORE PTE

Noise reduction method and apparatus

Apparatuses for noise reduction and noise processing methods for reducing noise in audio signals are presented. The noise level of an input signal at an input terminal is measured and the noise-to-signal ratio is established. A reduced voice activity detector is used to determine whether the input signal comprises speech or not. If the measured noise level exceeds a threshold level a switch connects the input signal to means for noise reduction. However, if the measured noise level does not exceed the threshold level, i.e. when noise reduction is not needed, the switch disconnects the means for noise reduction and the input signal is passed unchanged. Power is saved by powering off the means for noise reduction when it is not needed.
Owner:BLACKBERRY LTD

Headset with hear-through mode

A headset for voice communication is disclosed, the headset comprising at least one earphone having a speaker and one or more microphones. The headset is configured to be operated in a first mode in which an electronic noise cancelling circuit is configured to receive ambient audio via at least a first of the one or more microphones to implement an active noise cancelling function and to provide a noise cancelling audio signal to the speaker, and in a second mode in which ambient audio is provided as a hear-through audio signal to the speaker. The headset for voice communication is configured to detect whether a call is ongoing, and to provide a call signal in response to the detection. The headset comprises the electronic noise cancelling circuit, a voice activity detection unit configured to indicate when a user speaks, a switching element configured to switch the headset between operating in the first mode and operating in the second mode, wherein, when the headset is operated in the first mode and the call signal indicates that the user is not in a call, the switching element is configured to switch the headset from operating in the first mode to operating in the second mode when the voice activity detection unit indicates that the user speaks.
Owner:GN AUDIO AS

Self-fault detection system and method for microphone array and audio-based device

Disclosed herein is a self-fault detection system and method in a microphone array system, in which features for self-fault detection of a microphone array are formed using internal values of a voice activity detector (VAD) with respect to audio signals respectively outputted from a plurality of microphones, the features generated with respect to each of the microphones are mutually and automatically compared without a special reference signal, thereby self-detecting fault microphones.
Owner:KOREA INST OF SCI & TECH

Telephone receiver circuit with dynamic sidetone signal generator controlled by voice activity detection

A telephone receiver circuit with sidetone signal generation controlled by voice activity detection in accordance with the present invention uses the voice activity detector (VAD) to detect the presence of voice activity within the microphone signal and dynamically adjust the sidetone signal generation to compensate for noisy environments by eliminating or reducing the sidetone signal in the absence of voice activity. Hence, a sidetone signal is generated in the presence of voice activity, when feedback is required, while the sidetone signal is not generated in the absence of voice activity, since audio feedback for the user is not required then.
Owner:NAT SEMICON CORP

Voice activity detector based on spectral flatness of input signal

A voice activity detector that detects talkspurts in a given signal at a high accuracy, so as to improve the quality of voice communication. A frequency spectrum calculator calculates frequency spectrum of a given input signal. A flatness evaluator evaluates the flatness of this power spectrum by, for example, calculating the average of power spectral components and then adding up the differences between those components and the average. The resultant sum of differences, in this case, is used as a flatness factor of the spectrum. A voice / noise discriminator determines whether the input signal contains a talkspurt or not, by comparing the flatness factor of the frequency spectrum with a predetermined threshold.
Owner:FUJITSU LTD

Method to transmit silence compressed voice over IP efficiently in DOCSIS cable networks

InactiveUS6847635B1Accurately and quickly transmitSent very quickly and very accuratelyBroadband local area networksTwo-way working systemsModem deviceNetwork packet
A method to use the data packet carrying ability of cable TV networks as described in DOCSIS to accurately and quickly transmit a voice call from a cable user to another user over a cable TV cable. The cable telephone, or cable modem, is given the ability to detect when voice activity from the subscriber is above and below a predetermined value. When the cable modem has voice activity, the cable modem knows that it will have a continuous stream of voice data packets which need to be sent very quickly and very accurately to the CMTS. Therefore, the cable modem requests a periodic stream of time slots from the CMTS. When the cable modem detects no voice activity or a silence period from the subscriber, the cable modem indicates that the periodic stream of time slots is no longer needed and the CMTS stops providing the periodic stream of time slots. When voice activity of the subscriber resumes, the cable modem again request the periodic stream of time slots, and transmits the voice data packets or cells in these time slots.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP +1

Linear prediction based noise suppression

Various time-domain noise suppression methods and devices for suppressing a noise signal in a speech signal are provided. For example, a time-domain noise suppression method comprises estimating a plurality of linear prediction coefficients for the speech signal, generating a prediction error estimate based on the plurality of prediction coeficients, generating an estimate of the speech signal based on the plurality of linear prediction coefficients, using a voice activity detector to determine voice activity in the speech signal, updating a plurality of noise parameters based on the prediction error and if the voice activity detector determines no voice activity in the speech signal, generating an estimate of the noise signal based on the plurality of noise parameters, and passing the speech signal through a filter derived from the estimate of the noise signal and the estimate of the speech signal to generate a clean speech signal estimate.
Owner:MACOM TECH SOLUTIONS HLDG INC +1

A voice activity detector for packet voice network

A voice activity detector to analyze a short-term averaged energy (STAE), a long-term averaged energy (LTAE), and a peak-to-mean likelihood ratio (PMLR) in order to determine whether a current audio frame being transmitted represents voice or silence. This is accomplished by determining whether a sum of the STAE and a factor is greater than the LTAE. If not, the current audio frame represents silence. If so, a second set of determinations is performed. Herein, a determination is made as to whether the difference between the LTAE and the STAE is less than a predetermined threshold. If so, the current audio frame represents voice. Otherwise, the PMLR is determined and compared to a selected threshold. If the PMLR is greater than the selected threshold, the current audio frame represents a voice signal. Otherwise, it represents silence.
Owner:NORTEL NETWORKS LTD

Voicing measure for a speech CODEC system

A system and method is provided that employs a frequency domain interpolative CODEC system for low bit rate coding of speech which comprises a linear prediction (LP) front end adapted to process an input signal providing LP parameters which are quantized and encoded over predetermined intervals and used to compute a LP residual signal. An open loop pitch estimator adapted to process the LP residual signal, a pitch quantizer, and a pitch interpolator also provides a pitch contour within the predetermined intervals. A voice activity detector adapted to process the LP parameters and the open loop pitch contour over the predetermined intervals is also provided as well as a signal processor responsive to the LP residual signal and the pitch contour and adapted to perform the following functions: extract a prototype waveform (PW) from the LP residual and the open loop pitch contour for a number of equal sub-intervals within the predetermined invervals; normalize the PW by a gain value of the PW; encode a magnitude of the PW; and provide a voicing measure where the voicing measure characterizes a degree of vocing of the input speech signal and is derived from several input parameters that are correlated to degrees of periodicity of the signal over the predetermined intervals. The voicing measure is provided for the purpose of regenerating a PW phase at a decoder; and providing improved quantization of the PW magnitude at an encoder. The voicing measure is encoded jointly with a PW nonstationarity measure vector using a spectrally weighted vector quantizer having a codebook partioned based on a voiced and unvoiced mode.
Owner:HUGHES NETWORK SYST

VOIP barge-in support for half-duplex DSR client on a full-duplex network

Providing VOIP barge-in support for a half-duplex DSR client on a full-duplex network by buffering, in a half-duplex DSR client, input audio from the full-duplex network; playing, through the half-duplex DSR client, the buffered input audio; pausing, during voice activity on the half-duplex DSR client, the playing of the buffered input audio; sending, during voice activity on the half-duplex DSR client, speech for recognition through the full-duplex network to a voice server; receiving in the half-duplex DSR client through the full-duplex network from the voice server notification of speech recognition, the notification bearing a time stamp; and, responsive to receiving the notification, resuming the playing of the buffered input audio, including playing only buffered VOIP audio data bearing time stamps later than the time stamp of the recognition notification.
Owner:NUANCE COMM INC

Low Power Mechanism for Keyword Based Hands-Free Wake Up in Always ON-Domain

A low power keyword based speech recognition hardware architecture for hands free wake up of devices is provided. This system can be used in always ON domain for detection of voice activity, due to its low power operational ability. The system goes into deep low power state by deactivating all the non-required processes, if no activity is detected for a pre-specified time. Upon detection of the valid voice activity the system searches for the detection of the spoken keyword, if the valid keyword is detected, all the application processes are activated and system goes into full functional mode and if the voice activity doesn't contain the valid keyword present in the database then the system goes back into the deep low power state.
Owner:3ILOGIC DESIGNS

Speech intelligibility in telephones with multiple microphones

The present invention is directed to improved speech intelligibility in telephones with multiple microphones. Such a telephone includes a first microphone, a second microphone, a voice activity detector (VAD), a receiver module, and a signal processor. The first microphone outputs a first audio signal, which comprises a voice component when a near-end user talks and a background noise component. The second microphone outputs a second audio signal. The VAD generates a voice activity signal responsive to a ratio between the first audio signal and the second audio signal. The voice activity signal identifies time intervals in which the voice component of the near-end user is present in the first audio signal. The receiver module receives a third audio signal, which comprises a voice component of a far-end user. The signal processor modifies the third audio signal responsive to the voice activity signal.
Owner:AVAGO TECH INT SALES PTE LTD

Elimination of clipping associated with VAD-directed silence suppression

A method and apparatus for elimination of clipping associated with VAD-directed silence suppression includes receiving a voice signal in a buffer during the delay between the start of voice activity and the detection of the voice activity. Then, the voice signal is played from the buffer in condensed form, e.g., by dropping packets or slightly accelerating playback of the signal from the buffer. After voice activity is detected, the voice signal may continue to be buffered and condensed until the buffer is completely depleted. The voice signal may then be transmitted directly, without being buffered or condensed.
Owner:CISCO TECH INC

Packet prioritization and associated bandwidth and buffer management techniques for audio over IP

The present invention is directed to voice communication devices in which an audio stream is divided into a sequence of individual packets, each of which is routed via pathways that can vary depending on the availability of network resources. All embodiments of the invention rely on an acoustic prioritization agent that assigns a priority value to the packets. The priority value is based on factors such as whether the packet contains voice activity and the degree of acoustic similarity between this packet and adjacent packets in the sequence. A confidence level, associated with the priority value, may also be assigned. In one embodiment, network congestion is reduced by deliberately failing to transmit packets that are judged to be acoustically similar to adjacent packets; the expectation is that, under these circumstances, traditional packet loss concealment algorithms in the receiving device will construct an acceptably accurate replica of the missing packet. In another embodiment, the receiving device can reduce the number of packets stored in its jitter buffer, and therefore the latency of the speech signal, by selectively deleting one or more packets within sustained silences or non-varying speech events. In both embodiments, the ability of the system to drop appropriate packets may be enhanced by taking into account the confidence levels associated with the priority assessments.
Owner:AVAYA INC

Voice-activity detection using energy ratios and periodicity

A voice activity detector (100) filters (204) out noise energy and then computes a high-frequency (2400 Hz to 4000 Hz) versus low-frequency (100 Hz to 2400 Hz) signal energy ratio (224), total voiceband (100 Hz to 4000 Hz) signal energy (214), and signal periodicity (208) on successive frames of signal samples. Signal periodicity is determined by estimating the pitch period (206) of the signal, determining a gain value of the signal over the pitch period as a function of the estimated pitch period, and estimating a periodicity of the signal over the pitch period as a function of the estimated pitch period and the gain value. Voice is detected (230–232) in a segment if either (a) the difference between the average high-frequency versus low-frequency signal energy ratio and the present segment's high-frequency versus low-frequency energy ratio either exceeds (310) a high threshold value or is exceeded (312) by a low threshold value, or (b) the average periodicity of the signal is lower (306) than a low threshold value, or (c) the difference between the average total signal energy and the present segment's total energy exceeds (304) a threshold value and the average periodicity of the signal is lower (304) than a high threshold value, or (d) the average total signal energy exceeds (412) a minimum average total signal energy by a threshold value and voice has been detected (410) in the preceding segment.
Owner:AVAYA INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products