Mixed voice recognition method and device, storage medium, and electronic device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of mixing speech and recognition methods, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as affecting recognition time, high cost, and no solution found, and achieve low efficiency, reliable quality, and labor cost saving. Effect

Active Publication Date: 2022-03-08

深圳市北科瑞讯信息技术有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Among them, for the Chinese-English mixed acoustic model, a large amount of Chinese-English mixed speech and annotation data are required as training materials, but compared with Chinese (single language) training data, Chinese-English mixed data is very rare, and retraining a Chinese-English The cost of the special acoustic model for mixed recognition is also relatively high. In addition, it is also a difficult problem to merge the Chinese and English phoneme sets, which are the modeling units of the Chinese and English acoustic models. If the conventional English acoustic model (between English words and English phonemes Mapping) to identify, but also need to switch models, affecting the recognition time

In addition, there is a problem that the English pronunciation of Chinese-speaking speakers is different from that of English-speaking speakers.

[0004] For the above-mentioned problems existing in related technologies, no effective solution has been found yet

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0036] The method embodiment provided in Embodiment 1 of the present application may be executed in a server, a computer, a mobile phone, a speech recognition device, a recording pen, or a similar computing device. Take running on mobile phone as an example, figure 1 It is a block diagram of the hardware structure of a mobile phone according to the embodiment of the present invention. Such as figure 1 As shown, the mobile phone can include one or more ( figure 1 Only one is shown in ) processor 102 (processor 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA) and memory 104 for storing data. Optionally, the above-mentioned mobile phone can also be A transmission device 106 for communication functions and an input and output device 108 are included. Those of ordinary skill in the art can understand that, figure 1 The structure shown is only for illustration, and it does not limit the structure of the above-men...

Embodiment 2

[0084] In this embodiment, a hybrid speech recognition device is also provided, which is used to realize the above embodiments and preferred implementation modes, and what has been explained will not be repeated. As used below, the term "module" may be a combination of software and / or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

[0085] Figure 8 is a structural block diagram of a mixed speech recognition device according to an embodiment of the present invention, such as Figure 8 As shown, the device includes: an acquisition module 80, a first extraction module 82, and a first identification module 84, wherein,

[0086] Obtaining module 80, is used for obtaining the mixed speech of waiting phoneme recognition, wherein, described mixed speech comprises Chinese word ...

Embodiment 3

[0097] The embodiment of the present application also provides an electronic device, Figure 9 is a structural diagram of an electronic device according to an embodiment of the present invention, such as Figure 9 As shown, it includes a processor 91, a communication interface 92, a memory 93 and a communication bus 94, wherein the processor 91, the communication interface 92, and the memory 93 complete mutual communication through the communication bus 94, and the memory 93 is used to store computer programs;

[0098] Processor 91, when being used to execute the program stored on the memory 93, realize the following steps: obtain the mixed speech to be recognized by phonemes, wherein, the mixed speech includes Chinese words and English words; extract English non-English words from the mixed speech Abbreviated words; using the first preset grapheme sequence to phoneme sequence G2P model to identify the first phoneme information of the English non-abbreviated word, wherein the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a mixed speech recognition method and device, a storage medium, and an electronic device, wherein the method includes: obtaining a mixed speech to be recognized by phonemes, wherein the mixed speech includes Chinese words and English words; from the Extracting English non-abbreviated words from the mixed voice; using the first preset grapheme sequence to phoneme sequence G2P model to identify the first phoneme information of the English non-abbreviated word, wherein the first preset G2P model is decoded by Chinese phonemes The results are trained, including the mapping sequence between English words and Chinese phonemes. Through the present invention, while saving labor costs, the acoustically highly similar mapping labels are pursued, and a quality-reliable English pronunciation solution is realized. The technical problem of low efficiency of phoneme recognition of mixed speech in the related art is solved.

Description

technical field [0001] The present invention relates to the field of voice recognition, in particular, to a hybrid voice recognition method and device, a storage medium, and an electronic device. Background technique [0002] In related technologies, Chinese-English mixed speech recognition refers to Automatic Speech Recognition (ASR, Automatic Speech Recognition) that includes both Chinese and English languages in the process of speaking to the speaker. Today, as English is becoming more and more popular, for most Chinese people In other words, mixed communication in Chinese and English has gradually become a common phenomenon. In the Chinese-English dialogue among Chinese people, the Chinese part is still the main language. According to the type of switching between Chinese and English, it can be divided into "sentence switching", that is, English words are interspersed in Chinese sentences, and the other is "inter-sentence switching", that is, Chinese There is a switch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/08G10L15/06G10L15/02G10L15/00

CPCG10L15/08G10L15/06G10L15/02G10L15/005G10L2015/025

Inventor 黄石磊王昕程刚

Owner 深圳市北科瑞讯信息技术有限公司

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Mixed voice recognition method and device, storage medium, and electronic device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology