Domain language model construction method and device, computer equipment and storage medium

A language model and construction method technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problem that domain corpus is not very easy

Pending Publication Date: 2020-11-27
SUNING CLOUD COMPUTING CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, there are generally two commonly used methods for constructing language models that meet specific scenarios. One is to directly collect relevant field corpus for training, and the other is to integrate the trained language model with a general language model according to a certain weight. To increase the generalization ability, and the above two methods require a large amount of domain training corpus, but it is not easy to find domain corpus that fits the scene

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain language model construction method and device, computer equipment and storage medium
  • Domain language model construction method and device, computer equipment and storage medium
  • Domain language model construction method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Some, but not all, embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0049] It should be noted that, unless the context clearly requires, the words "include", "include" and other similar words in the entire specification and claims should be interpreted as an inclusive meaning rather than an exclusive or exhaustive meaning; that is, " including but not limited to ". In addition, in the description of the present invention, it should be...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a domain language model construction method and apparatus, a computer device and a storage medium and belongs to the technical field of speech recognition. The method comprisessteps of converting a universal language model into an equivalent first WFSA network; screening out an optimal path meeting a preset condition from the first WFSA network according to a preset numberof domain corpora so as to construct a second WFSA network; and normalizing the second WFSA network, and converting the normalized second WFSA network into a domain language model. The method is advantaged in that under the condition of insufficient domain training corpora, a domain language model which meets a specific scene and has universal generalization ability can be quickly constructed.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a method, device, computer equipment and storage medium for constructing a domain language model. Background technique [0002] Speech recognition schemes are mostly recognition schemes based on language models. When training a language model, the most commonly used model is the N-Gram model. The N-Gram model is a statistical language model. Generally speaking, the larger the corpus, the better the effect of the model. As the scene continues to deepen, it is often required to make various language models that meet the needs of specific scenes and have generalization capabilities, which puts forward higher requirements for the selection of corpus. [0003] At present, there are generally two commonly used methods for constructing language models that meet specific scenarios. One is to directly collect relevant field corpus for training, and the other is to integrate th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/06
CPCG10L15/06G10L2015/0631
Inventor 张旭华齐欣孙泽明朱林林王宁
Owner SUNING CLOUD COMPUTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products