A language model optimization method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of language model and optimization method, which is applied in speech analysis, speech recognition, semantic analysis, etc., can solve the problem of low probability of sentence formation, achieve the effect of optimizing language model and improving user experience

Active Publication Date: 2020-07-03

BEIJING SINOVOICE TECH CO LTD

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The present invention provides a language model optimization method and device to solve the problem of low probability of sentence formation in the language model in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0024] refer to figure 1 , shows a flow chart of steps of a language model optimization method according to Embodiment 1 of the present invention.

[0025] The language model optimization method provided by the embodiment of the present invention includes the following steps:

[0026] Step 101: Obtain a first word vector and a second word vector from the training corpus.

[0027] Among them, the first word vector is the vector of the first word, the second word vector is the vector of the second word, the probability of the second word appearing in the corpus is lower than the probability of the first word appearing in the corpus, the first word and the second The semantics of the words are similar.

[0028] Step 102: Calculate the cosine of the angle between the first word vector and the second word vector.

[0029] According to the obtained word vector, the formula of the cosine value of the angle between two vectors can be used for calculation.

[0030] The current comm...

Embodiment 2

[0044] refer to figure 2 , shows a flowchart of steps of a method for optimizing a language model according to Embodiment 2 of the present invention.

[0045] The language model optimization method provided by the embodiment of the present invention includes the following steps:

[0046] Step 201: Train the corpus to generate word vectors and language models.

[0047] Wherein, the language model includes a plurality of words, the logarithm of the occurrence probability of each word, a plurality of word groups and the logarithm of the occurrence probability of each word group, and the word vector is a vector corresponding to each word. .

[0048] Step 202: Obtain the first word vector and the second word vector from the training corpus.

[0049] Wherein, the first word vector is the vector of the first word, the second word vector is the vector of the second word, the probability of the second word appearing in the corpus is lower than the probability of the first word appe...

Embodiment 3

[0072] refer to image 3 , shows a structural block diagram of an apparatus for optimizing a language model according to Embodiment 3 of the present invention.

[0073] The language model optimization device provided by the embodiment of the present invention includes: a first acquisition module 301, configured to acquire a first word vector and a second word vector from the training corpus, wherein the first word vector is a vector of the first word , the second word vector is the vector of the second word, the probability of the second word appearing in the corpus is lower than the probability of the first word appearing in the corpus, and the first word and the second word The semantics are similar; the first calculation module 302 is used to calculate the cosine value of the angle between the first word vector and the second word vector; the second acquisition module 303 is used to obtain the first word group in the language model Occurrence probability logarithm; Wherein...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A language model optimization method provided by the present invention comprises the steps of obtaining a first word vector and a second word vector from a training language material, and calculatingthe cosine value of an included angle of the first and second word vectors; obtaining the appearance probability logarithm of a first word set in a language model; combining a second word and a thirdword to generate a second word set; according to the appearance probability logarithm of the first word set and the cosine value of the included angle, calculating the appearance probability logarithmof the second word set; correspondingly adding the second word set and the appearance probability logarithm of the second word set in the language model. Therefore, by a language model optimization scheme provided by the present invention, and by training training the language material with distributed words, the language model and the word vectors can be obtained simultaneously, the word vectorscan provide the similarity information between two words, and by utilizing the information to adjust the condition probability in an N-gram language model, an effect of optimizing the language modelcan be realized, thereby improving the usage experiences of the users.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a language model optimization method and device. Background technique [0002] The N-gram language model is the most commonly used language model in speech recognition at this stage, and can be obtained by performing statistical calculations on the word-segmented text. This model is based on the Markov assumption that the occurrence probability of the Nth word in a sentence is related to the previous N-1 words. It is widely used in natural language processing, and its main purpose is to judge the sentence probability of a certain sentence. [0003] However, the N-gram language model itself has the defect of semantic isolation, that is, it cannot recognize the connection between different words, and the model parameters are determined only by statistical information. For example, we can understand that "happy" and "happy" are two words with similar semantics, so the p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/183G10L15/06G06F40/30G06F16/33

CPCG06F16/3344G06F16/3346G06F40/30G10L15/063G10L15/183

Inventor 李健殷子墨张连毅武卫东

Owner BEIJING SINOVOICE TECH CO LTD

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A language model optimization method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology