Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Machine translation model training method and device, medium and electronic equipment

A machine translation and model training technology, applied in the computer field, can solve the problems of poor translation effect, poor translation quality, and high cost of manual annotation, and achieve the effect of improving training effect and avoiding data sparseness.

Pending Publication Date: 2022-04-15
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the high cost of manual labeling, large-scale parallel corpus cannot be obtained for languages ​​with scarce corpus resources, which makes the translation quality of neural machine translation systems for languages ​​​​with scarce corpus resources poor.
[0003] High-quality parallel corpora often exist only in a small number of languages. For some languages ​​that lack resources, it is difficult to find or obtain available parallel corpora from the Internet
Lack of corpus resources will cause the problem of data sparsity in the model training process, resulting in poor translation effect of the model for languages ​​with scarce corpus resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine translation model training method and device, medium and electronic equipment
  • Machine translation model training method and device, medium and electronic equipment
  • Machine translation model training method and device, medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concepts of example embodiments to those skilled in the art.

[0078] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of the embodiments of the application. However, those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be employed. In other instances, well-known methods,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of computers, is applied to the field of natural language processing, and particularly relates to a machine translation model training method and device, a computer readable medium and electronic equipment. The machine translation model training method comprises the steps of obtaining a training data set and a candidate data set; the machine translation model is trained through the corpora in the training data set, and word lists corresponding to all the corpora in the training data set are updated in the training process; updating the word list corresponding to the low-resource corpus by adopting the word list corresponding to the high-resource corpus, and calculating the updating completion degree of the word list corresponding to the low-resource corpus; and adding the low-resource corpus of which the update completion degree of the word list is greater than a preset threshold value into the training data set. Based on the method, the problem of data sparsity in the training process of the low-resource corpus can be avoided, the rich corpus of the high-resource corpus is fully utilized, and then the training effect of the to-be-trained language corresponding to the low-resource corpus is improved.

Description

technical field [0001] The application belongs to the field of computer technology, and in particular relates to a machine translation model training method, a machine translation model training device, a computer-readable medium and electronic equipment. Background technique [0002] The general neural machine translation model is based on the end-to-end Encoder-Decoder (encoding-decoding) framework, and generally requires large-scale parallel corpus for model training. However, due to the high cost of manual annotation, large-scale parallel corpus cannot be obtained for languages ​​with scarce corpus resources, which makes the translation quality of neural machine translation systems for languages ​​​​with scarce corpus resources poor. [0003] High-quality parallel corpora often exist only in a small number of languages. For some languages ​​that lack resources, it is difficult to find or obtain available parallel corpora from the Internet. Lack of corpus resources will ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06K9/62G06F40/289G06F40/216
CPCG06N3/08G06F40/289G06F40/216G06F18/214
Inventor 孟凡东张明亮
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products