Mixed speech recognition method and device, storage medium and electronic device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of mixing speech and recognition methods, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as affecting recognition time, high cost, and no solution found, and achieve low efficiency, reliable quality, and labor cost saving. Effect

Active Publication Date: 2021-07-23

深圳市北科瑞讯信息技术有限公司

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Among them, for the Chinese-English mixed acoustic model, a large amount of Chinese-English mixed speech and annotation data are required as training materials, but compared with Chinese (single language) training data, Chinese-English mixed data is very rare, and retraining a Chinese-English The cost of the special acoustic model for mixed recognition is also relatively high. In addition, it is also a difficult problem to merge the Chinese and English phoneme sets, which are the modeling units of the Chinese and English acoustic models. If the conventional English acoustic model (between English words and English phonemes Mapping) to identify, but also need to switch models, affecting the recognition time

In addition, there is a problem that the English pronunciation of Chinese-speaking speakers is different from that of English-speaking speakers.

[0004] For the above-mentioned problems existing in related technologies, no effective solution has been found yet

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0036] The method embodiment provided in Embodiment 1 of the present application may be executed in a server, a computer, a mobile phone, a speech recognition device, a recording pen, or a similar computing device. Take running on mobile phone as an example, figure 1 It is a block diagram of the hardware structure of a mobile phone according to the embodiment of the present invention. like figure 1 As shown, the mobile phone can include one or more ( figure 1 Only one is shown in ) processor 102 (processor 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA) and memory 104 for storing data. Optionally, the above-mentioned mobile phone can also be A transmission device 106 for communication functions and an input and output device 108 are included. Those of ordinary skill in the art can understand that, figure 1 The structure shown is only for illustration, and it does not limit the structure of the above-mentio...

Embodiment 2

[0084] In this embodiment, a hybrid speech recognition device is also provided, which is used to realize the above embodiments and preferred implementation modes, and what has been explained will not be repeated. As used below, the term "module" may be a combination of software and / or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

[0085] Figure 8 is a structural block diagram of a mixed speech recognition device according to an embodiment of the present invention, such as Figure 8 As shown, the device includes: an acquisition module 80, a first extraction module 82, and a first identification module 84, wherein,

[0086] Obtaining module 80, is used for obtaining the mixed speech of waiting phoneme recognition, wherein, described mixed speech comprises Chinese word ...

Embodiment 3

[0097] The embodiment of the present application also provides an electronic device, Figure 9 is a structural diagram of an electronic device according to an embodiment of the present invention, such as Figure 9 As shown, it includes a processor 91, a communication interface 92, a memory 93 and a communication bus 94, wherein the processor 91, the communication interface 92, and the memory 93 complete mutual communication through the communication bus 94, and the memory 93 is used to store computer programs;

[0098] Processor 91, when being used to execute the program stored on the memory 93, realize the following steps: obtain the mixed speech to be recognized by phonemes, wherein, the mixed speech includes Chinese words and English words; extract English non-English words from the mixed speech Abbreviated words; using the first preset grapheme sequence to phoneme sequence G2P model to identify the first phoneme information of the English non-abbreviated word, wherein the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a mixed speech recognition method and device, a storage medium and an electronic device. The method comprises the steps: obtaining mixed speech to be subjected to phoneme recognition, wherein the mixed speech comprises Chinese words and English words; extracting English non-abbreviation words from the mixed speech; recognizing the first phoneme information of the English non-abbreviated word by adopting a model from a first preset word sequence to a phoneme sequence G2P, wherein the first preset G2P model is obtained through training of a decoding result of Chinese phonemes and comprises a mapping sequence between the English word and the Chinese phonemes. According to the invention, the labor cost is saved, highly similar mapping annotations in acoustics are pursued, and an English pronunciation scheme with reliable quality is achieved. The technical problem of low efficiency of phoneme recognition of mixed speech in related technologies is solved.

Description

technical field [0001] The present invention relates to the field of voice recognition, in particular, to a hybrid voice recognition method and device, a storage medium, and an electronic device. Background technique [0002] In related technologies, Chinese-English mixed speech recognition refers to Automatic Speech Recognition (ASR, Automatic Speech Recognition) that includes both Chinese and English languages in the process of speaking to the speaker. Today, as English is becoming more and more popular, for most Chinese people In other words, mixed communication in Chinese and English has gradually become a common phenomenon. In the Chinese-English dialogue among Chinese people, the Chinese part is still the main language. According to the type of switching between Chinese and English, it can be divided into "sentence switching", that is, English words are interspersed in Chinese sentences, and the other is "inter-sentence switching", that is, Chinese There is a switch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/08G10L15/06G10L15/02G10L15/00

CPCG10L15/08G10L15/06G10L15/02G10L15/005G10L2015/025

Inventor 黄石磊王昕程刚

Owner 深圳市北科瑞讯信息技术有限公司

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Mixed speech recognition method and device, storage medium and electronic device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology