Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

552 results about "Audio recognition" patented technology

Modular Audio Recognition Framework (MARF) is an open-source research platform and a collection of voice, sound, speech, text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework that attempts to facilitate addition of new algorithms.

Audio identification system and method

InactiveUS7174293B2Facilitate interactive acceptance and processingImprove accuracySpeech recognitionStatic storageThe InternetEngineering
A method and system for direct audio capture and identification of the captured audio. A user may then be offered the opportunity to purchase recordings directly over the Internet or similar outlet. The system preferably includes one or more user-carried portable audio capture devices that employ a microphone, analog to digital converter, signal processor, and memory to store samples of ambient audio or audio features calculated from the audio. Users activate their capture devices when they hear a recording that they would like to identify or purchase. Later, the user may connect the capture device to a personal computer to transfer the audio samples or audio feature samples to an Internet site for identification. The Internet site preferably uses automatic pattern recognition techniques to identify the captured samples from a library of recordings offered for sale. The user can then verify that the sample is from the desired recording and place an order online. The pattern recognition process uses features of the audio itself and does not require the presence of artificial codes or watermarks. Audio to be identified can be from any source, including radio and television broadcasts or recordings that are played locally.
Owner:ICEBERG IND

Advertising using extracted context sensitive information and data of interest from voice/audio transmissions and recordings

Method and apparatus that use voice / audio recognition and analysis technologies to deliver assigned context sensitive information and data of interest (keywords, phrases, mood, etc.). Context sensitive information and data of interest can be extracted from any voice / audio transmissions and voice / audio recordings (or any transmission or recording that includes voice / audio) for advertising purposes. This invention includes the said assigned context sensitive information and data of interest extraction method using voice / audio recognition and analysis technologies. Most importantly, this invention opens up new doors to advertising using extracted context sensitive information and data of interest from voice / audio transmissions and recordings (or any transmission that includes voice / audio).
Owner:LIU EDWARD

Multi-mode audio recognition and auxiliary data encoding and decoding

ActiveUS20140108020A1Improving communication over networkOptimize networkSpeech analysisData capacityFeature extraction
Audio signal processing enhances audio watermark embedding and detecting processes. Audio signal processes include audio classification and adapting watermark embedding and detecting based on classification. Advances in audio watermark design include adaptive watermark signal structure data protocols, perceptual models, and insertion methods. Perceptual and robustness evaluation is integrated into audio watermark embedding to optimize audio quality relative the original signal, and to optimize robustness or data capacity. These methods are applied to audio segments in audio embedder and detector configurations to support real time operation. Feature extraction and matching are also used to adapt audio watermark embedding and detecting.
Owner:DIGIMARC CORP

Multi-mode audio recognition and auxiliary data encoding and decoding

ActiveUS20140142958A1Improving communication over networkOptimize networkSpeech analysisData capacityFeature extraction
Audio signal processing enhances audio watermark embedding and detecting processes. Audio signal processes include audio classification and adapting watermark embedding and detecting based on classification. Advances in audio watermark design include adaptive watermark signal structure data protocols, perceptual models, and insertion methods. Perceptual and robustness evaluation is integrated into audio watermark embedding to optimize audio quality relative the original signal, and to optimize robustness or data capacity. These methods are applied to audio segments in audio embedder and detector configurations to support real time operation. Feature extraction and matching are also used to adapt audio watermark embedding and detecting.
Owner:DIGIMARC CORP

Rolling audio recognition

An audio fingerprint is generated by transforming an audio sample of a recording to a time-frequency domain and storing each time-frequency pair in a matrix array, detecting a plurality of local maxima for a predetermined number of time slices, selecting a predetermined number of largest-magnitude maxima from the plurality of local maxima detected by said detecting, and generating one or more hash values corresponding to the predetermined number of largest-magnitude maxima.
Owner:ROVI TECH CORP

System and methods for continuous audio matching

The present invention relates to the continuous monitoring of an audio signal and identification of audio items within an audio signal. The technology disclosed utilizes predictive caching of fingerprints to improve efficiency. Fingerprints are cached for tracking an audio signal with known alignment and for watching an audio signal without known alignment, based on already identified fingerprints extracted from the audio signal. Software running on a smart phone or other battery-powered device cooperates with software running on an audio identification server.
Owner:SOUNDHOUND AI IP LLC

Extended videolens media engine for audio recognition

A system, method, and computer program product for automatically analyzing multimedia data audio content are disclosed. Embodiments receive multimedia data, detect portions having specified audio features, and output a corresponding subset of the multimedia data and generated metadata. Audio content features including voices, non-voice sounds, and closed captioning, from downloaded or streaming movies or video clips are identified as a human probably would do, but in essentially real time. Particular speakers and the most meaningful content sounds and words and corresponding time-stamps are recognized via database comparison, and may be presented in order of match probability. Embodiments responsively pre-fetch related data, recognize locations, and provide related advertisements. The content features may be also sent to search engines so that further related content may be identified. User feedback and verification may improve the embodiments over time.
Owner:SONY CORP

Video surveillance system and method with combined video and audio recognition

A novel video surveillance system is made up of video and audio compression engine, a storage device and, a video and audio recognition engine. The video recognition engine detects such events as face recognition, motion detection etc, whereas audio recognition engine detects voice and other sound signatures indicating a potential alarm situation, e.g., panic voices such as screaming and yelling, or sounds such as gun shots, explosions. Combined recognition of audio and video signals provides for higher true alarm generation and lower false alarms level of the surveillance system. Additionally, the audio recognition engine provides information for directing video cameras in the direction of interest allowing better capture of an interesting scene.
Owner:IBM CORP

Audio matching with semantic audio recognition and report generation

System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and / or values for certain features and are tagged for specific ranges and / or values. The semantic information may be associated with audio signature dataExtracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
Owner:THE NIELSEN CO (US) LLC

Audio signal de-identification

Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio.
Owner:MULTIMODAL TECH INC

Method for identifying local discharge signals of switchboard based on support vector machine model

The invention discloses a method for identifying local discharge signals of a switchboard based on a support vector machine model. The method comprises a model training process and an audio identifying process, and particularly comprises the following steps of: preprocessing audio signals; extracting effective audios according to short-time energy and a zero-crossing rate; segmenting the effective audios and extracting characteristic parameters such as Mel cepstrum coefficients, first order difference Mel cepstrum coefficients, high zero-crossing rate and the like of each segment of the audios; training a sample set by using a support vector machine tool, and establishing a corresponding support vector machine model; after preprocessing audio signals to be identified and extracting and segmenting the effective audios, classifying and identifying segment-characteristic-based samples to be tested according to the support vector machine model; and post-processing classification results, and judging whether partial discharge signals exist. By using the method, the existence of the partial discharge signals of the switchboard is accurately identified, the happening of major accidents involving electricity is prevented and avoided, economic losses caused by insulation accidents are reduced, and the power distribution reliability is improved.
Owner:SOUTH CHINA UNIV OF TECH

Closed-loop command and response system for automatic communications between interacting computer systems over an audio communications channel

A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.
Owner:INTELLISIST

Audio Processing Techniques for Semantic Audio Recognition and Report Generation

System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and / or values for certain features and are tagged for specific ranges and / or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
Owner:THE NIELSEN CO (US) LLC

Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications

Content identification methods for consumer devices determine robust audio fingerprints that are resilient to audio distortions. One method generates signatures representing audio content based on a constant Q-factor transform (CQT). A 2D spectral representation of a 1D audio signal facilitates generation of region based signatures within frequency octaves and across the entire 2D signal representation. Also, points of interest are detected within the 2D audio signal representation and interest regions are determined around selected points of interest. Another method generates audio descriptors using an accumulating filter function on bands of the audio spectrum and generates audio transform coefficients. A response of each spectral band is computed and transform coefficients are determined by filtering, by accumulating derivatives with different lags, and computing second order derivatives. Additionally, time and frequency based onset detection determines audio descriptors at events and enhances descriptors with information related to an event.
Owner:ROKU INCORPORATED

Closed-loop command and response system for automatic communications between interacting computer systems over an audio communications channel

A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.
Owner:INTELLISIST

Audio recognition during voice sessions to provide enhanced user interface functionality

The user interface for a mobile communication device may be provided based on the current context of a voice session, as recognized by an automated audio recognition engine. In one implementation, the mobile device may transcribe, by an audio recognition engine in the mobile device, audio from a voice session conducted through the mobile device; detect, by the mobile device and based at least on the transcribed audio, changes in context during the voice session that relate to a change in functionality of the user interface of the mobile device; and update, by the mobile device, the user interface in response to the detected change in context.
Owner:SONY ERICSSON MOBILE COMM AB

Audio Recognition System

A system and method of identifying an audio track uses music identification software that produces a fingerprint or audio profile for an audio segment recorded with a portable communication device. The audio profile is transmitted from the portable communication device to a remote service provider over a communication network. The remote server receives the transmitted audio track profile and compares the profile to a stored database of audio tracks. If a matching audio track is identified by the remote server, metadata relating to the identified audio track is transmitted from the remote server to the portable communication device. The received audio track metadata is then displayed on the portable communication device.
Owner:VINCI BRANDS LLC

Voice denoising method based on audio recognition

The invention provides a speech noise reduction method based on audio recognition, which reduces the noise of a receiving end by aiming at the speech communication under complex noise environment, belonging to the field of computer science and technology. Most of the existing noise reduction methods are only suitable for stable noise environment and can not remove the noise under the situations of complex noise environment, especially the situation of frequent mutagenicity noise and the like. The method leads a mode recognition idea in the communication speech noise reduction, divides an audio signal into a speech signal and a non-speech signal, automatically identifies the input signal by extracting the speech characteristic and designing a sorter model, and judges the audio type; if the audio type is noise, the audio signal is removed; if the audio type is speech, the audio signal is remained and processed further. The method meets the real-time requirement and has better reduction noise effect at the same time, can be suitable for the situations with complex communication environments such as manned spaceflight speech communication, construction sites, battlefields and the like, and provides an idea and a method for the noise reduction of signals.
Owner:UNIV OF SCI & TECH BEIJING

Methods and devices for obtaining and pushing information and information interaction system

The invention provides a method for obtaining information. The method comprises the following steps that: sound is detected and collected, and then, events are triggered; the sound, played in real time in the environment, of the current channel is collected, and audio data is obtained; the audio data or audio feature information or audio fingerprints are sent to a server, so that the server obtains the audio fingerprints, and a matched channel mark corresponding to the channel audio fingerprints matched with the audio fingerprints is determined according to a real-time buffered channel audio fingerprint database; and preset information which corresponds to the matched channel mark, is obtained from the preset information database and is sent by the server is received. When the method for obtaining information provided by the invention is utilized, the audio identification can be carried out through the server only through triggering the collected sound on a user terminal, further, relevant information of programs playing in the current channel can be obtained, and the information obtaining efficiency is greatly improved. The invention also provides a device for obtaining the information, a method for pushing the information, a device for pushing the information and an information interaction system.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Device and method for extracting audio/video content information

The invention provides an audio / video processing device and an audio / video processing method. The audio / video processing device comprises a receiving unit, a decoding unit, a user interface unit, an information extracting unit and an information storage unit, wherein the receiving unit receives signals and outputs transmission streams; the decoding unit decodes the output transmission streams; the user interface unit receives a determined content output by users; the information extracting unit extracts a prescribed content; and the information storage unit stores the prescribed content, wherein the determined content includes a determined video content or determined audio content, and the alternative one is determined by an audio / video contrasting relation table. The information extracting unit comprises an audio identification unit, a video identification unit and an information matching unit, wherein the audio identification unit identifies the determined audio content from audio steams from the decoding unit; the video identification unit identifies the determined video content from video steams from the decoding unit; the information matching unit determines if an identification result of the audio identification unit is matched with an identification result of the video identification unit, and when the identification result of the audio identification unit is matched with the identification result of the video identification unit, the information matching unit records the prescribed content which corresponds to the determined video content or the determined audio consent in the information storage unit.
Owner:HITACHI LTD

Intelligent interaction system and method

The invention relates to an intelligent interaction system and method. The system includes an audio receiving module, a real-time processing module and an execution module, wherein the audio receiving module is used for receiving audio information inputted by a user, the real-time processing module is used for performing parallel online real-time processing on the audio information, and the execution module is used for executing corresponding operation according to identification results transmitted by the real-time processing module. The parallel online real-time processing includes the following steps that: classification processing and identification processing corresponding to different types are performed on the audio information; if credible classification types are obtained before the ending of audio input, identification processing on classification types except the credible classification types is terminated; identification results corresponding to the credible classification types can be obtained and are transmitted to the execution module. With the intelligent interaction system and method of the invention adopted, the user can use audio identification and voice interaction functions easily and quickly, and user experience can be enhanced.
Owner:科大讯飞(北京)有限公司

Video feature extraction method, device and computer device

The present invention provides a video feature extraction method, a video feature extraction device and a computer device. The video feature extraction method includes the following steps that: a target video is divided according to a predetermined unit time length; at least two frames of images included in video segments are obtained; the at least two frames of images are identified, feature information contained in the images is obtained, and the image feature information of the video segments is obtained according to the feature information included in the images; the text feature information of the video segments is obtained according to the caption recognition result of each frame of image and the real-time speech recognition result of the video segments; semantic analysis is performed, so that the feature information of the video segments is obtained; and a mapping relationship between the feature information of the video segments and the target video is established. With the video feature extraction method, the video feature extraction device and the computer device of the invention adopted, the feature information of the video can be automatically extracted through image video and audio recognition technology, and the extraction of the feature information is refined to the dimension of the video segments of unit time length in the video, and therefore, the obtained feature information is more comprehensive.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Video surveillance system and method with combined video and audio recognition

A novel video surveillance system is made up of video and audio compression engine, a storage device and, a video and audio recognition engine. The video recognition engine detects such events as face recognition, motion detection etc, whereas audio recognition engine detects voice and other sound signatures indicating a potential alarm situation, e.g., panic voices such as screaming and yelling, or sounds such as gun shots, explosions. Combined recognition of audio and video signals provides for higher true alarm generation and lower false alarms level of the surveillance system. Additionally, the audio recognition engine provides information for directing video cameras in the direction of interest allowing better capture of an interesting scene.
Owner:IBM CORP

Audio recognition method and device

The invention discloses an audio recognition method and an audio recognition device, and relates to the field of audio technologies. The method comprises the steps of intercepting an audio stream with a first time length from the source data of a video file; and obtaining corresponding audio information through retrieving according to the audio stream with the first time length, and showing to a user, wherein the step of obtaining the corresponding audio information through retrieving according to the audio stream with the first time length comprises dividing the audio stream into at least two sub-audio streams according to a preset rule, and sequentially retrieving the sub-audio streams obtained through dividing to obtain the audio information. According to the audio recognition method, the audio stream can be directly extracted from the current played video source data for retrieval without additional recording operation and influence of a noisy environment, the operation is simple, the accuracy rate is high, the retrieval process does not influence the user to normally watch the video, and the retrieval efficiency and the success rate of retrieval can be improved.
Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

Extensible audio recognition method based on man-machine interaction

The invention belongs to the technical field of audio processing, and relates to an extensible audio recognition system based on man-machine interaction and a method thereof. The extensible audio recognition system comprises an audio acquisition device, a voice recognition module, a loading sample unit, a finite-state machine, a classification storage characteristic sample database and an instruction execution module. The audio recognition method is based on high recognition rate of isolate word speed recognition to a speaker dependent, and enables the system to store voice segments which can not be recognized into the sample database in an online learning mode after a process of man-machine interaction through the assistance of a user on the premise of fully training the user, and in addition, the cost to recognition is reduced through divided module storage and loading. The core algorithm of the invention is based on voice signals, is not limited to languages of speakers, and can support the recognition of mixed languages (for example, Chinese and English and the like). The method has lower false recognition rate and no recognition rate, and improves the reliability and adaptability of the system through dialogue interaction and online increment training.
Owner:FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products