Patents

Literature

Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.

56 results about "Prosody" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

In linguistics, prosody is concerned with those elements of speech that are not individual phonetic segments (vowels and consonants) but are properties of syllables and larger units of speech, including linguistic functions such as intonation, tone, stress, and rhythm. Such elements are known as suprasegmentals.

Multi-lingual speech synthesis

InactiveUS20050144003A1Increase the number ofCost efficientSpeech synthesisSpeech synthesisSpeech sound

A method for speech synthesis of a word in a first language, comprising dividing the word into a first sequence of pronunciation phonemes in the first language, mapping the first phoneme sequence to a second sequence of pronunciation phonemes in at least one second language, and generating an audio output of the phonemes in the second phoneme sequence using prosody models adapted for the at least one second language. According to this method, an audio output of a word in a first language can be generated by a speech synthesizing engine not having actual support for this language. Instead, the pronunciation phonemes of the word are mapped onto phonemes of at least one second language, for which the speech synthesizing engine does have support.

Multi-lingual speech synthesis

Multi-lingual speech synthesis

Multi-lingual speech synthesis

Owner:NOKIA CORP

Speech synthesis apparatus and speech synthesis method

InactiveUS20050119890A1Speech synthesisAcousticsSpeech synthesis

The present invention includes: a characteristic parameter DB 106 that holds, with respect to each speech-unit, speech-unit data indicating a loan word attribute and acoustic characteristics; a language analysis unit 104 and a prosody prediction unit 109 that obtain text data and respectively predict a loan word attribute and acoustic characteristics of each of a plurality of speech-units that form text indicated by the text data; a speech-unit selection unit 108 that selects, from the characteristic parameter DB 106, speech-unit data that represents the loan word attribute and the acoustic characteristics similar to the predicted loan word attribute and acoustic characteristics of each speech-unit; and a speech synthesis unit 110 that generates synthesized speech using a plurality of the selected speech-units and outputs the synthesized speech.

Speech synthesis apparatus and speech synthesis method

Speech synthesis apparatus and speech synthesis method

Speech synthesis apparatus and speech synthesis method

Owner:PANASONIC CORP

System for tuning synthesized speech

ActiveUS20080167875A1Speech synthesisNatural language processingGraphics

An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

System for tuning synthesized speech

System for tuning synthesized speech

System for tuning synthesized speech

Owner:CERENCE OPERATING CO

Confirmation system for command or speech recognition using activation means

InactiveUS20080114603A1Speech recognitionPush-to-talkAutomatic speech

A system and method for confirming command or speech recognition results returned by an automatic speech recognition (ASR) engine from a command issued by an operator of a vehicle or platform, such as an aircraft or unmanned air-vehicle (UAV). The operator transmits a command signal to the ASR engine, initiated by an activation means, such as a push-button (formally known as push-to-talk or push-to-recognize). A recognition result is communicated to the user and the system awaits the confirmation for a limited period of time. During this period, in one embodiment, a low tone with high prosody is played to notify the user that the system is ready to receive the confirmation. If the user quickly presses and releases the push-button a predetermined number of times (for instance, twice to make a double-click), the result is confirmed and the ASR forwards a command signal to a system controlled thereby. Otherwise, the ASR waits for another speech command.

Confirmation system for command or speech recognition using activation means

Confirmation system for command or speech recognition using activation means

Confirmation system for command or speech recognition using activation means

Owner:ADACEL

Method and system for adjusting the voice prompt of an interactive system based upon the user's state

ActiveUS7881934B2Enhance better drivingPromote alertnessSpeech recognitionSpeech synthesisSpeech soundSignal processing

The voice prompt of an interactive system is adjusted based upon a state of a user. An utterance of the user is received, and the state of the user is determined based upon signal processing of the utterance of the user. Once the state of the user is determined, the voice prompt is adjusted by adjusting at least one of a tone of voice of the voice prompt, a content of the voice prompt, a prosody of the voice prompt, and a gender of the voice prompt based upon the determined state of the user.

Method and system for adjusting the voice prompt of an interactive system based upon the user's state

Method and system for adjusting the voice prompt of an interactive system based upon the user's state

Method and system for adjusting the voice prompt of an interactive system based upon the user's state

Owner:TOYOTA INFOTECHNOLOGY CENT CO LTD

Method and apparatus for preventing speech comprehension by interactive voice response systems

ActiveUS20060074677A1Reduce the possibilitySignal can be recognizedSecret communicationTelevision systemsSpeech comprehensionInteractive voice response system

A method and apparatus utilizing prosody modification of a speech signal output by a text-to-speech (TTS) system to substantially prevent an interactive voice response (IVR) system from understanding the speech signal without significantly degrading the speech signal with respect to human understanding. The present invention involves modifying the prosody of the speech output signal by using the prosody of the user's response to a prompt. In addition, a randomly generated overlay frequency is used to modify the speech signal to further prevent an IVR system from recognizing the TTS output. The randomly generated frequency may be periodically changed using an overlay timer that changes the random frequency signal at a predetermined intervals.

Method and apparatus for preventing speech comprehension by interactive voice response systems

Method and apparatus for preventing speech comprehension by interactive voice response systems

Method and apparatus for preventing speech comprehension by interactive voice response systems

Owner:NUANCE COMM INC

Prosody conversion

InactiveUS7996222B2Speech recognitionSpeech synthesisSyllableAcoustics

A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and / or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.

Prosody conversion

Prosody conversion

Prosody conversion

Owner:WSOU INVESTMENTS LLC

Computerized speech synthesizer for synthesizing speech from text

ActiveUS8219398B2High-quality speechPromote generationSpeech synthesisNatural language processingSpeech synthesis

Disclosed are novel embodiments of a speech synthesizer and speech synthesis method for generating human-like speech wherein a speech signal can be generated by concatenation from phonemes stored in a phoneme database. Wavelet transforms and interpolation between frames can be employed to effect smooth morphological fusion of adjacent phonemes in the output signal. The phonemes may have one prosody or set of prosody characteristics and one or more alternative prosodies may be created by applying prosody modification parameters to the phonemes from a differential prosody database. Preferred embodiments can provide fast, resource-efficient speech synthesis with an appealing musical or rhythmic output in a desired prosody style such as reportorial or human interest. The invention includes computer-determining a suitable prosody to apply to a portion of the text by reference to the determined semantic meaning of another portion of the text and applying the detennined prosody to the text by modification of the digitized phonemes. In this manner, prosodization can effectively be automated.

Computerized speech synthesizer for synthesizing speech from text

Computerized speech synthesizer for synthesizing speech from text

Computerized speech synthesizer for synthesizing speech from text

Owner:LESSAC TECH INC

Method For Adding Realism To Synthetic Speech

ActiveUS20160140952A1Improve realismSpeech synthesisMobile deviceUser profile

The present disclosure provides a method for adding realism to synthetic speech. The method includes receiving text (218) that is to be converted into synthetic speech from a mobile device (108). The text (218) may include embedded emoticons indicating a first prosody information and a predefined sound stored in a stored data repository (208). The method also includes identifying a user associated with the text (218) based on a comparison between metadata associated with the text (218) and user profiles stored in the stored data repository (208); retrieving a speech font from a speech data corpus associated with the user stored in the stored data repository (208). The speech font includes a second prosody information and a predefined accent of the user. The method further includes converting the text (218) into synthetic speech based on the retrieved speech font, which is being modulated based on the emoticon.

Method For Adding Realism To Synthetic Speech

Method For Adding Realism To Synthetic Speech

Method For Adding Realism To Synthetic Speech

Owner:CLEARONCE COMM INC

Method, device, and computer readable storage medium for text-to-speech synthesis using machine learning on basis of sequential prosody feature

PendingUS20200394998A1Accurate communicationEffective applicationMachine learningNeural learning methodsNatural language processingSpeech synthesis

The present disclosure relates to a text-to-speech synthesis method using machine learning based on a sequential prosody feature. The text-to-speech synthesis method includes receiving input text, receiving a sequential prosody feature, and generating output speech data for the input text reflecting the received sequential prosody feature by inputting the input text and the received sequential prosody feature to an artificial neural network text-to-speech synthesis model.

Method, device, and computer readable storage medium for text-to-speech synthesis using machine learning on basis of sequential prosody feature

Method, device, and computer readable storage medium for text-to-speech synthesis using machine learning on basis of sequential prosody feature

Method, device, and computer readable storage medium for text-to-speech synthesis using machine learning on basis of sequential prosody feature

Owner:NEOSAPIENCE INC

Prosody Generation Using Syllable-Centered Polynomial Representation of Pitch Contours

ActiveUS20140195242A1Smooth connectionSpeech recognitionSpeech synthesisSyllableStress level

The present invention discloses a parametrical representation of prosody based on polynomial expansion coefficients of the pitch contour near the center of each syllable. The said syllable pitch expansion coefficients are generated from a recorded speech database, read from a number of sentences by a reference speaker. By correlating the stress level and context information of each syllable in the text with the polynomial expansion coefficients of the corresponding spoken syllable, a correlation database is formed. To generate prosody for an input text, stress level and context information of each syllable in the text is identified. The prosody is generated by using the said correlation database to find the best set of pitch parameters for each syllable. By adding to global pitch contours and using interpolation formulas, complete pitch contour for the input text is generated. Duration and intensity profile are generated using a similar procedure.

Prosody Generation Using Syllable-Centered Polynomial Representation of Pitch Contours

Prosody Generation Using Syllable-Centered Polynomial Representation of Pitch Contours

Prosody Generation Using Syllable-Centered Polynomial Representation of Pitch Contours

Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Method and System for Preventing Speech Comprehension by Interactive Voice Response Systems

InactiveUS20090228271A1Reduce the possibilitySpeech recognitionSpeech synthesisSpeech comprehensionInteractive voice response system

A method of and system for generating a speech signal with an overlayed random frequency signal using prosody modification of a speech signal output by a text-to-speech (TTS) system to substantially prevent an interactive voice response (IVR) system from understanding the speech signal without significantly degrading the speech signal with respect to human understanding. The present invention involves modifying a prosody of the speech output signal by using a prosody of the user's response to a prompt. In addition, a randomly generated overlay frequency is used to modify the speech signal to further prevent the IVR system from recognizing the TTS output. The randomly generated frequency may be periodically changed using an overlay timer that changes the random frequency signal at a predetermined intervals.

Method and System for Preventing Speech Comprehension by Interactive Voice Response Systems

Method and System for Preventing Speech Comprehension by Interactive Voice Response Systems

Method and System for Preventing Speech Comprehension by Interactive Voice Response Systems

Owner:NUANCE COMM INC

Training method and device for prosody model used for speech synthesis

ActiveCN104867491AImprove accuracyPause smoothly and naturallySpeech synthesisSpeech synthesisSpeech sound

The invention discloses a training method and device for a prosody model used for speech synthesis, wherein the training method for the prosody model used for speech synthesis comprises the following steps: S1, extracting textual features and marker features corresponding to participles from a training corpus text; S2, generalizing the participles in the training corpus text on the basis of Chinese thesaurus; S3, training the prosody model according to the textual features, the marker features and the generalized participles. According to the training method and device for the prosody model used for speech synthesis, by extracting the textual features and marker features corresponding to participles from the training corpus text, generalizing the participles in the training corpus text on the basis of Chinese thesaurus and then training the prosody model according to the textual features, the marker features and the generalized participles, the prosody model is more perfect, and further the prosody prediction accuracy is improved.

Training method and device for prosody model used for speech synthesis

Training method and device for prosody model used for speech synthesis

Training method and device for prosody model used for speech synthesis

Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Prosody generating devise, prosody generating method, and program

InactiveUS20070118355A1Suppression of distortionSpecial data processing applicationsSpeech synthesisPattern selectionAcoustics

A prosody generation apparatus capable of suppressing distortion that occurs when generating prosodic patterns and therefore generating a natural prosody is provided. A prosody changing point extraction unit in this apparatus extracts a prosody changing point located at the beginning and the ending of a sentence, the beginning and the ending of a breath group, an accent position and the like. A selection rule and a transformation rule of a prosodic pattern including the prosody changing point is generated by means of a statistical or learning technique and the thus generate rules are stored in a representative prosodic pattern selection rule table and a transformation rule table beforehand. A pattern selection unit selects a representative prosodic pattern from the representative prosodic pattern selection rule table according to the selection rule. A prosody generation unit transforms the selected pattern according to the transformation rule and carries out interpolation with respect to portions other than the prosody changing points so as to generate prosody as a whole.

Prosody generating devise, prosody generating method, and program

Prosody generating devise, prosody generating method, and program

Prosody generating devise, prosody generating method, and program

Owner:SOVEREIGN PEAK VENTURES LLC

Speech synthesis method and speech synthesizer

ActiveUS7562018B2Improve naturalnessReduce generationSpeech synthesisControl signalSpeech synthesis

A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.

Speech synthesis method and speech synthesizer

Speech synthesis method and speech synthesizer

Speech synthesis method and speech synthesizer

Owner:PANASONIC INTELLECTUAL PROPERTY CORP OF AMERICA

System and method for cross-speaker style transfer in text-to-speech and training data generation

ActiveUS20220068259A1Speech recognitionSpeech synthesisAcousticsSpeech sound

Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize / train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.

System and method for cross-speaker style transfer in text-to-speech and training data generation

System and method for cross-speaker style transfer in text-to-speech and training data generation

System and method for cross-speaker style transfer in text-to-speech and training data generation

Owner:MICROSOFT TECH LICENSING LLC

Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

ActiveUS8155963B2Improve abilitiesQuality improvementSpeech synthesisText databaseSpeech corpus

A method (and system) which autonomously generates a cohesive script from a text database for creating a speech corpus for concatenative text-to-speech, and more particularly, which generates cohesive scripts having fluency and natural prosody that can be used to generate compact text-to-speech recordings that cover a plurality of phonetic events.

Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

Owner:CERENCE OPERATING CO

Electronic apparatus and method for controlling thereof

PendingCN112771607ASpeech synthesisSentence segmentationAcoustics

An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.

Electronic apparatus and method for controlling thereof

Electronic apparatus and method for controlling thereof

Electronic apparatus and method for controlling thereof

Owner:SAMSUNG ELECTRONICS CO LTD

Speech synthesis apparatus and method

ActiveUS20200335080A1Reduce discontinuityStable prosodySpeech synthesisAcousticsSpeech synthesis

The present disclosure relates to a speech synthesis apparatus and method that can remove discontinuity between phoneme units when generating a synthesized sound from the phoneme units, thereby implementing natural utterances and producing a high-quality synthesized sound having stable prosody.

Speech synthesis apparatus and method

Speech synthesis apparatus and method

Speech synthesis apparatus and method

Owner:SK TELECOM CO LTD

Prosody model training method and device thereof

ActiveCN111261140AImprove adaptabilityHigh precisionNeural architecturesNeural learning methodsNatural language processingPersonalization

The invention relates to a prosody model training method and a device thereof. The method comprises the following steps: receiving a training corpus containing prosodic annotation information; inputting the training corpus into a prosody model to be trained to obtain a prosody output result; and training network parameters of the prosody model to be trained according to the prosody output result and / or the prosodic annotation information to obtain a target prosody model. Through the technical scheme of the invention, the target prosody model is a personalized prosody model with relatively highadaptability and precision, and the annotation universality can be better learned from training data from different sources, so that the prediction accuracy of prosodic word boundaries and prosodic phrase boundaries and the robustness of the prosody model can be improved.

Prosody model training method and device thereof

Prosody model training method and device thereof

Owner:BEIJING UNISOUND INFORMATION TECH +1

Voice synthetic method and device, dictionary constructional method and computer ready-read medium

InactiveCN1117344CReduce capacityPromote generationIndoor gamesSpeech synthesisSpeech recognitionLetter to sound

A plurality of tasks of a speech synthesizing process in which at least one of speakers, emotion or situation at the time when speeches are made, and contents of the speeches is different are set (s1), word dictionaries, prosody dictionaries, and waveform dictionaries corresponding to respective tasks are organized (s2), and when a character string is to be synthesized is input with the task specified through a game system, etc., a speech synthesizing process is performed using the word dictionary, the prosody dictionary, and the waveform dictionary corresponding to the specified task (s3). Therefore, a speech message can be generated depending on the personality of a speaker, the emotion or situation at the time when a speech is made, and the contents of the speech.

Voice synthetic method and device, dictionary constructional method and computer ready-read medium

Voice synthetic method and device, dictionary constructional method and computer ready-read medium

Voice synthetic method and device, dictionary constructional method and computer ready-read medium

Owner:KONAMI DIGITAL ENTERTAINMENT CO LTD +1

Speech synthesis system

InactiveUS20110196680A1Preventing excessive deterioration in degree of naturalness of the synthesized speechSpeech synthesisAcousticsSpeech synthesis

When a system (100) is used for synthesizing speech having prosody serving as a reference, the system stores speech element information representing a speech element capable of synthesizing speech having a degree of naturalness indicating a degree of similarity to speech uttered by a human higher than a predetermined reference value (speech element information storage (115)). The system accepts requested prosody information representing prosody requested by the user (requested prosody information accepting part (113)). The system generates intermediate prosody information representing intermediate prosody between the reference prosody and the requested prosody (intermediate prosody information generator (114)). The system executes a speech synthesis process to synthesize speech based on the generated intermediate prosody information and the stored speech element information (speech synthesizer (116)).

Speech synthesis system

Speech synthesis system

Speech synthesis system

Owner:NEC CORP

Hybrid predictive model for enhancing prosodic expressiveness

InactiveUS9484016B2Speech recognitionSpeech synthesisAlgorithmMedicine

Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.

Hybrid predictive model for enhancing prosodic expressiveness

Hybrid predictive model for enhancing prosodic expressiveness

Hybrid predictive model for enhancing prosodic expressiveness

Owner:IBM CORP

System and method for cross-speaker style transfer in text-to-speech and training data generation

ActiveUS11361753B2Speech recognitionSpeech synthesisAcousticsComputer science

Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize / train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.

System and method for cross-speaker style transfer in text-to-speech and training data generation

System and method for cross-speaker style transfer in text-to-speech and training data generation

System and method for cross-speaker style transfer in text-to-speech and training data generation

Owner:MICROSOFT TECH LICENSING LLC

Speech synthesis apparatus and method

ActiveUS11170755B2Reduce discontinuityStable prosodySpeech synthesisAcousticsSpeech synthesis

The present disclosure relates to a speech synthesis apparatus and method that can remove discontinuity between phoneme units when generating a synthesized sound from the phoneme units, thereby implementing natural utterances and producing a high-quality synthesized sound having stable prosody.

Speech synthesis apparatus and method

Speech synthesis apparatus and method

Speech synthesis apparatus and method

Owner:SK TELECOM CO LTD

Depression Auxiliary Detection Method and Classifier Based on Acoustic Features and Sparse Mathematics

ActiveCN107657964BImprove recognition rateEasy to implementPsychotechnic devicesSpeech recognitionImage manipulationNetwork model

The invention belongs to the technical field of voice processing and image processing, and discloses an auxiliary detection method and a classifier for depression based on acoustic features and sparse mathematics, and a depression discrimination based on joint recognition of voice and facial emotion; realizing glottis through an inverse filter For signal estimation, global analysis is used for the voice signal, feature parameters are extracted, the timing and distribution characteristics of the feature parameters are analyzed, and the prosody of different emotional voices is found as the basis for emotion recognition; MFCC is used as the feature parameter to analyze the voice signal to be processed, and the Multiple sets of training data are collected from the recorded data, and a neural network model is established for discrimination; the sparse linear combination of test samples is obtained by using the sparse representation algorithm based on OMP, and the facial emotions are discriminated and classified, and the obtained results are compared with speech recognition The results are linearly combined to obtain the final probability representing each data point. The depression recognition rate has been greatly improved and the cost is low.

Depression Auxiliary Detection Method and Classifier Based on Acoustic Features and Sparse Mathematics

Depression Auxiliary Detection Method and Classifier Based on Acoustic Features and Sparse Mathematics

Depression Auxiliary Detection Method and Classifier Based on Acoustic Features and Sparse Mathematics

Owner:NORTHWEST UNIV +1

Systems and methods for integrating recorded content

ActiveCN110557589BTelevision system detailsTelevision conference systemsContent adaptationVoice transformation

It would be desirable to have audio and / or video systems and processing tools that can automatically record audio / video and analyze such recordings to capture material that may be relevant to the user. In one or more embodiments disclosed herein, recordings may be compressed by using one or more tools, including but not limited to, converting speech to text and searching for relevant content, keywords, etc.; detecting and analyzing speakers and and / or classify content (e.g., events in a conversation); remove non-substantial content (e.g., silence and other extraneous content); adjust audio to increase playback speed; use rhythm and other markers in audio to identify areas of interest; Perform segmentation clustering; use pseudo-random or random samples to select content; and other methods of extracting information to provide a summary or representation of record content for user review.

Systems and methods for integrating recorded content

Systems and methods for integrating recorded content

Systems and methods for integrating recorded content

Owner:BAIDU USA LLC

Audio detection method, device, electronic device and readable storage medium

ActiveCN111312231BImprove accuracySpeech recognitionInformation processingAudio frequency

The present application relates to the technical field of information processing, and discloses an audio detection method, device, electronic equipment, and a readable storage medium. The audio detection method includes: receiving the audio to be detected and the text corresponding to the audio sent by the terminal; combining the audio with the text Perform alignment processing to obtain the start and end time of each phoneme of multiple phonemes corresponding to the text in the audio; extract the phoneme feature vector of each phoneme in the audio, and obtain the audio sequence feature of the audio based on the start and end time of each phoneme; The phoneme feature vector and the audio sequence feature are used to obtain the prosody detection result of the audio; the prosody detection result includes the accent feature and the pause feature of the audio; the prosody detection result is returned to the terminal, so that the terminal displays the text corresponding to the accent feature and the pause feature. The audio detection method provided by the present application can improve the accuracy of prosody detection results.

Audio detection method, device, electronic device and readable storage medium

Audio detection method, device, electronic device and readable storage medium

Audio detection method, device, electronic device and readable storage medium

Owner:TENCENT TECH (SHENZHEN) CO LTD

Clockwork hierarchical variational encoder

PendingCN112005298ANeural learning methodsSpeech synthesisSyllablePitch contour

A method (400) for representing an intended prosody in synthesized speech (152) includes receiving a text utterance (320) having at least one word (250), and selecting an utterance embedding (260) forthe text utterance. Each word in the text utterance has at least one syllable (240) and each syllable has at least one phoneme (230). The utterance embedding represents an intended prosody. For eachsyllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features (232) of each phoneme of the syllable with a corresponding prosodic syllable embedding (245) for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames (280) based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

Clockwork hierarchical variational encoder

Clockwork hierarchical variational encoder

Clockwork hierarchical variational encoder

Owner:GOOGLE LLC

Prosodic labeling method, device and equipment

ActiveCN109326281BImprove efficiencyImprove accuracyNatural language data processingSpeech recognitionAcousticsSpeech sound

The invention provides a rhythm marking method, device and equipment. The method comprises steps that voice data of a to-be-marked text is obtained; according to the voice data, the rhythm informationof the voice data is determined, and the rhythm information is used for indicating the pause duration of the voice data; rhythm symbols of the to-be-marked text are marked according to the rhythm information of the voice data. The method is advantaged in that rhythm marking efficiency and accuracy are improved.

Prosodic labeling method, device and equipment

Prosodic labeling method, device and equipment

Prosodic labeling method, device and equipment

Owner:北京海天瑞声科技股份有限公司

Popular searches

Human language First language Prosody Language analysis Graphical user interface User interface Software tool Voice pitch File format Human–computer interaction

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com