Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

1071 results about "Speech output" patented technology

Like speech input, speech output is a familiar and natural form of communication, so it is also an appropriate complement in a character-based interface. However, speech output also has its liabilities. In some environments, speech output may not be preferred or audible.

Techniques for disambiguating speech input using multimodal interfaces

A technique is disclosed for disambiguating speech input for multimodal systems by using a combination of speech and visual I / O interfaces. When the user's speech input is not recognized with sufficiently high confidence, a the user is presented with a set of possible matches using a visual display and / or speech output. The user then selects the intended input from the list of matches via one or more available input mechanisms (e.g., stylus, buttons, keyboard, mouse, or speech input). These techniques involve the combined use of speech and visual interfaces to correctly identify user's speech input. The techniques disclosed herein may be utilized in computer devices such as PDAs, cellphones, desktop and laptop computers, tablet PCs, etc.
Owner:WALOOMBA TECH

Pronunciation correction of text-to-speech systems between different spoken languages

Pronunciation correction for text-to-speech (TTS) systems and speech recognition (SR) systems between different languages is provided. If a word requiring pronunciation by a target language TTS or SR is from a same language as the target language, but is not found in a lexicon of words from the target language, a letter-to-speech (LTS) rules set of the target language is used to generate a letter-to-speech output for the word for use by the TTS or SR configured according to the target language. If the word is from a different language as the target language, phonemes comprising the word according to its native language are mapped to phonemes of the target language. The phoneme mapping is used by the TTS or SR configured according to the target language for generating or recognizing an audible form of the word according to the target language.
Owner:MICROSOFT TECH LICENSING LLC

Dynamic modification of voice selection based on user specific factors

The present invention discloses a solution for customizing synthetic voice characteristics in a user specific fashion. The solution can establish a communication between a user and a voice response system. A data store can be searched for a speech profile associated with the user. When a speech profile is found, a set of speech output characteristics established for the user from the profile can be determined. Parameters and settings of a text-to-speech engine can be adjusted in accordance with the determined set of speech output characteristics. During the established communication, synthetic speech can be generated using the adjusted text-to-speech engine. Thus, each detected user can hear a synthetic speech generated by a different voice specifically selected for that user. When no user profile is detected, a default voice or a voice based upon a user's speech or communication details can be used.
Owner:IBM CORP

Developing voice response applications from pre-recorded voice and stored text-to-speech prompts

InactiveUS6345250B1Faster and more programmer friendlyFaster and more programmer friendly environmentInterconnection arrangementsAutomatic exchangesSpeech soundInteractive Voice Response Technology
An interactive voice response application on a computer telephony system includes a method of playing voice prompts from a mixed set of pre-recorded voice prompts and voice prompts synthesised from a text-to-speech process. The method comprises: reserving memory for a synthesised prompt and a pre-recorded prompt associated with a particular prompt identifier; on a play prompt request selecting the pre-recorded prompt if available and outputting through a voice output; otherwise selecting the synthesised prompt and playing the selected voice prompt through the voice output. If neither pre-recorded or synthesised data are available then text associated with the voice prompt is output through a text-to-speech output.
Owner:NUANCE COMM INC

Speech generation device with a projected display and optical inputs

In several embodiments, a speech generation device is disclosed. The speech generation device may generally include a projector configured to project images in the form of a projected display onto a projection Surface, an optical input device configured to detect an input directed towards the projected display and a speaker configured to generate an audio output. In addition, the speech generation device may include a processing unit communicatively coupled to the projector, the optical input device and the speaker. The processing unit may include a processor and related computer readable medium configured to store instructions executable by the processor, wherein the instructions stored on the computer readable medium configure the speech generation device to generate text-to-speech output.
Owner:TOBII DYNAVOX AB

System and method for personalizing an interactive voice broadcast of a voice service based on particulars of a request

A system and method for personalizing an interactive voice output of a voice service based on determinations about the caller or call, the output containing information derived from on line analytical processing (OLAP) systems, where content stored in the database can be readily obtained by a requester on the fly and in a personalized, interactive manner. The system and method include a call server for receiving an incoming call from a caller, an inference module for determining information about a voice service to be provided to the caller, a control module related to the information determined and for facilitating the interactive voice broadcast with the caller, and a personalization module for passing determined information to the control module.
Owner:MICROSTRATEGY

Customizing the speaking style of a speech synthesizer based on semantic analysis

A method is provided for customizing the speaking style of a speech synthesizer. The method includes: receiving input text; determining semantic information for the input text; determining a speaking style for rendering the input text based on the semantic information; and customizing the audible speech output of the speech synthesizer based on the identified speaking style.
Owner:SOVEREIGN PEAK VENTURES LLC

Combined speech recongnition and text-to-speech generation

ActiveUS20050038657A1Speech recognitionSpeech synthesisVocabulary speech recognitionSpeech sound
Text-to-speech (TTS) generation is used in conjunction with large vocabulary speech recognition to say words selected by the speech recognition. The software for performing the large vocabulary speech recognition can share speech modeling data with the TTS software. TTS or recorded audio can be used to automatically say both recognized text and the names of recognized commands after their recognition. The TTS can automatically repeats text recognized by the speech recognition after each of a succession of end of utterance detections. A user can move a cursor back or forward in recognized text, and the TTS can speak one or more words at the cursor location after each such move. The speech recognition can be used to produces a choice list of possible recognition candidates and the TTS can be used to provide spoken output of one or more of the candidates on the choice list.
Owner:CERENCE OPERATING CO

Compressing and using a concatenative speech database in text-to-speech systems

A method and apparatus are provided for compressing and using a concatenative speech database in TTS systems to improve the quality of speech output generated by handheld TTS systems by allowing synthesis to occur on the client. According to one embodiment of the present invention, a G.723 encoder receives diphone waveforms, and compresses them into diphone residuals. While compressing the diphone waveforms, the encoder generates Linear Predictive Coding (LPC) coefficients. The diphone residuals, and the encoder-generated LPC coefficients are then stored in encoder-generated compressed packet.
Owner:INTEL CORP

Speech and text driven hmm-based body animation synthesis

ActiveUS20100082345A1Simple capabilityAnimationSpeech synthesisProbit modelHide markov model
An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio / video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and / or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc., that are synchronized with a speech output corresponding to the text and / or speech input.
Owner:MICROSOFT TECH LICENSING LLC

Magnetic resonance imaging having patient video, microphone and motion tracking

Critical needs for MRI patient instruction, testing, comfort, motion control, and speech communication are provided for better imaging which leads to more effective medical care. An MRI Digital Video Projection System is disclosed which provides better quality display to the patient to better inform, instruct, test, and comfort the patient plus the potential to stimulate the brain with microsecond onset times to better diagnose brain function. An MRI Motion Tracker and Patient Augmented Visual Feedback System enables monitoring patient body part motion, providing real time feedback to the patient and / or technician to substantially improve diagnostic yield of scanning sessions, particularly for children and mentally challenged individuals. An MR Forward Predictive Noise Canceling Microphone System removes the intense MRI acoustic noise improving patient communication, patient safety and enabling coding of speech output. These systems can be used individually but maximum benefit is from providing all three.
Owner:PITTSBURGH UNIV OF +1

Sentiment prediction from textual data

A semantically organized domain space is created from a training corpus. Affective data are mapped onto the domain space to generate affective anchors for the domain space. A sentiment associated with an input text is determined based the affective anchors. A speech output may be generated from the input text based on the determined sentiment.
Owner:APPLE INC

Digital assistant providing automated status report

Systems and processes for operating a digital assistant are provided. In one example process, a speech input is received from a user. A user intent is determined based on the speech input. Determining the user intent includes generating text based on the speech input, performing natural language processing of the text, and determining the user intent based on a result of the natural language processing. In accordance with the user intent, status information associated with at least one of the one or more electronic devices is requested. The status information associated with the at least one of the one or more electronic devices is received. A spoken output is generated and represents the status information associated with the at least one of the one or more electronic devices. The spoken output is caused to be provided to the user.
Owner:APPLE INC

Voice control method and mobile terminal apparatus

A voice control method and a mobile terminal apparatus are provided. The mobile terminal apparatus includes a voice receiving module, a voice outputting module, a voice wake-up module and a language recognition module. When the voice wake-up module determined that a first voice signal matches to identification information, the voice receiving module is turned on. When the voice receiving module receives a second voice signal after the first voice signal, the language recognition module parses the second voice signal and obtains a voice recognition result. When the voice recognition result includes an executing request, the language recognition module executes a responding operation, and the voice receiving module is turned off from receiving a third voice signal. When the voice recognition result does not include the executing request, the language recognition module executes a speech conversation mode.
Owner:VIA TECH INC

Access to enhanced conferencing services using the tele-chat system

A system (10) and method (50) for enabling phone users to participate in an instant messaging based conference can include the steps of receiving (52) a speech input from a telephone (26 or 28) through a teleconferencing system (24), transcribing (54) the speech input to a first text message and transmitting (58) the first text message to a plurality of devices (18, 20, 26 or 28) coupled to an instant messaging network belonging to the instant messaging based conference. The method can further include the steps of receiving (60) a second text message from any one among the plurality of devices on the instant messaging based conference, converting (62) the second text message to a speech output, and transmitting (68) the speech output to the telephone via the teleconferencing system.
Owner:NUANCE COMM INC

Speech and text driven HMM-based body animation synthesis

An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio / video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and / or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc., that are synchronized with a speech output corresponding to the text and / or speech input.
Owner:MICROSOFT TECH LICENSING LLC

Method and apparatus for improving cpap patient compliance

A method of operating a device for treating sleep disordered breathing (SDB), wherein the device provides continuous positive airway pressure during sleep, includes applying a treatment pressure to a patient, monitoring the patient for speech output, generating a signal in response to detected speech of the patient, and, in response to the signal, reducing the treatment pressure applied to the patient.
Owner:RESMED LTD

Method and apparatus for providing speech output for speech-enabled applications

Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.
Owner:CERENCE OPERATING CO

Multi-round dialogue intelligent voice interaction system and device

The invention discloses a multi-round dialogue intelligent voice interaction system and device. The system comprises a hybrid semantic understanding module, a semantic understanding adaptive module and an automatic dialogue management module. The voice input is converted into a text and input to a hybrid semantic understanding module after being subjected to voice recognition; wherein the hybrid semantic understanding module is used for understanding user intention and extracting corresponding state information, an automatic dialogue management module is used for guiding a dialogue process, outputting dialogue texts and converting the dialogue texts into voice output based on the user intention to realize dialogue, and the semantic understanding self-adaptive module is used for optimized learning of the hybrid semantic understanding module. According to the invention, a plurality of modules such as speech recognition, natural language understanding, natural language generation, speechsynthesis and dialogue management are integrated to form a whole set of multi-round dialogue intelligent speech interaction system which is easy to expand and configure and can be applied to any scene.
Owner:百融云创科技股份有限公司

Intelligent voice interaction system and method thereof

The present invention provides an intelligent voice interactive system and interactive method, the system includes: a processor, a storage device, a voice processing unit, a voice input device, a voice output device, a communication processing unit; the processor, the storage device, the voice processing unit and the communication processing unit are set on the circuit board, the storage device, the voice processing unit and the communication processing unit are connected with the processor by the concentration line forming an embedded control board; the voice input device, the voice output device are connected with the voice processing unit of the embedded control board respectively; the communication processing unit is set with a communication interface used for connnecting with the computer of client end software for install custom. The system is used as a common intelligent interactive platform, and user can set different interactive scenes and content aware, having wide applications, such as man-machine interactive processing system, intelligent toy or service robot etc, having strong application, strong practicability, processing work without computer.
Owner:BEIHANG UNIV

Method and system for intelligent prompt control in a multimodal software application

Dialog manager and methods for integrating multi-modal data capture device inputs or speech recognition inputs with speech output capabilities. A work flow description is extracted from objects in a graphical user interface and a multi-modal user interface is defined. A dialog engine synchronizes the flow of information, in accordance with the work flow description, between input / output devices and an application. The prompts for inputting data, which are output via a plurality of peripheral devices, are controlled in an intelligent manner by the dialog engine based on the input state of the peripheral devices. Functionality such as barge-in, prompt-holdoff, priority prompts, and talk-ahead is provided.
Owner:VOCOLLECT

Voice enabled knowledge system

This invention discloses a voice enabled knowledge system, comprising a speech recognition engine and text to speech engine. The speech recognition engine further comprises a representation unit to represent the spoken words, a model classification unit to classify the spoken words, a training database to match the spoken words with preset words and a search unit to search for the spoken word in said training database, based on the results of said model classification. The text to speech engine for conversion of an input text to speech, comprises a text pre-processing unit for analyzing the input text in a sentence form, a prosody unit for word recognition using said acoustic model, a concatenation unit for converting the diphone equivalents into words and thereafter into a sentence and an audio output device for speech output.
Owner:MUKHERJEE SANTOSH KUMAR

Vehicle communication system

The present invention relates to a vehicle communication system comprising a plurality of microphones adapted to detect speech signals of different vehicle passengers, a mixer combining the audio signal components of the different microphones to a resulting speech output signal, a weighting unit determining the weighting of the audio signal components for the resulting speech output signal, where the weighting unit determines the weighting of the signal components based upon non-acoustical information about the presence of a vehicle passenger.
Owner:HARMAN BECKER AUTOMOTIVE SYST

Machine translation apparatus, method, and computer program product

A machine translation apparatus includes a receiving unit that receives an input of a plurality of speeches; a detecting unit that detects a speaker of a speech from among the speeches; a recognition unit that performs speech recognition on the speeches; a translating unit that translates a recognition result to a translated sentence; an output unit that outputs the translated sentence in speech; and an output control unit that controls output of speech by referring to processing stages from receiving to outputting a first speech that is input first from among a plurality of the speeches, a speaker detected with respect to the first speech, and a speaker detected with respect to a second speech that is input after the first speech from among a plurality of the speeches.
Owner:KK TOSHIBA

Methods and apparatus for providing privacy for a user of an audio electronic device

InactiveUS7088828B1Eliminate or suppress unwanted voices overheardTwo-way loud-speaking telephone systemsEar treatmentComputer moduleLoudspeaker
The invention is directed to techniques for suppressing the voice of a user of an audio electronic device, such as a mobile phone, from being heard by an unintended listener. In one arrangement, the invention includes an input microphone, an electronics module, and a suppression speaker. The user speaks into the input microphone, and the electronics module generates an antivoice signal from a voice signal received from the input microphone. The suppression speaker outputs an antivoice output that combines with the voice of the user to form a voice suppression zone next to the speaker. The user thus can carry on his or her conversation in private as long as the unintended listener is within the voice suppression zone. Alternately, the user can avoid distracting other users of similar devices nearby, such as in a crowded office environment with many individuals using telephones at the same time. In another arrangement, the invention includes multiple suppression speakers that can be oriented in different directions to provide one or more voice suppression zones.
Owner:CISCO TECH INC

Language learning system and method with a visualized pronunciation suggestion

A language learning system and method with a visualized pronunciation suggestion is disclosed. A sound wave corresponding to a sample voice output is used to suggest the user to make corrections with respect to tones and stresses. Through sectional evaluations and corrections, the invention can effectively improve the speaking ability of the user. The disclosed system includes a language database, a follow-reading module, a display control module, a sectional evaluation module, a correction module, and a suggestion-editing module. The disclosed method includes the steps of: extracting a full-sentence sample, outputting a full-sentence voice message and an associated message, prompting the user to imitate and recording the voice data, comparing the voice data with the full-sentence voice message and outputting the similarity, and evaluating the voice data.
Owner:INVENTEC CORP

Speech output with confidence indication

A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score.
Owner:NUANCE COMM INC

Voice recognition testing system and method

InactiveCN106548772AIntuitive access to speech recognitionEnable large-scale testingSpeech recognitionSpeech synthesisInformation processingSpeech identification
The invention discloses a voice recognition testing system comprising an audio frequency generation module, a voice output module and an information processing module, wherein the audio frequency generation module is used for generating an audio file from a test text based on testing parameters, the voice output module is used for playing the audio file so as to provide voice input for a voice recognizer to be tested, the information processing module is used for processing voice recognizing results from the voice recognizer so as to obtain a testing report form on the voice recognizer, and the testing report form comprises recognition performance parameters of the voice recognizer under different conditions relevant to the testing parameters.
Owner:SHANGHAI XIAOI ROBOT TECH CO LTD

Remote control device for use with insulin infusion systems

Methods and apparatuses providing accessibility options for the blind and poorly sighted for use with insulin therapy systems. A remote control device, with speech output capability and comprehensive speech menu system, may monitor or intercept data generated by a blood glucose meter, and delay, modify and retransmit said data to an insulin pump. The remote control device may also be used to program and set operating parameters of an insulin pump and also record data received from an insulin pump. The remote control device may also be used in conjunction with a personal computing device.
Owner:AXSOL INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products