Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for suffix translation based on bag-of-words multi-objective learning

A multi-target, bag-of-words technology, applied in the field of machine translation, can solve the problem of low-frequency word category sensitivity and disorder, and achieve the effect of ensuring accuracy and smoothness, maintaining integrity, and improving training efficiency.

Active Publication Date: 2021-05-28
NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
View PDF15 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Its disadvantage is that it is sensitive to low-frequency word categories, and when there are multiple similar low-frequency words in a sentence, it is easy to cause disorder in the process of post-translation processing and replacement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for suffix translation based on bag-of-words multi-objective learning
  • A method for suffix translation based on bag-of-words multi-objective learning
  • A method for suffix translation based on bag-of-words multi-objective learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail.

[0051] Glossary:

[0052] BPE: Byte Pair Encoder, byte pair encoding;

[0053] UNKi: Unknown-i, the i-th unregistered word;

[0054] RNN: Recurrent Neural Network, recurrent neural network;

[0055] CNN: Convolutional Neural Networks, convolutional neural network;

[0056] GRU: Gated Recurrent Unit, Gated Recurrent Unit;

[0057] LSTM: LongShort-TermMemory, long and short-term memory;

[0058] Encoder: Encoder, encoder, expresses text, speech or image as vector through some kind of neural network (such as recurrent neural network);

[0059] Attention: Attention, attention, establish a corresponding relationship between the target end and the source end, that is, the weight of each word on the target end and all words in the source end;

[0060] Decoder: Decoder, decoder, generates word by word with the greatest probability through vector o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a suffix translation method based on bag-of-words multi-objective learning. Through the fusion of the suffix method and the bag-of-words method, the target translation of low-frequency words obtained by pre-translation or dictionary search is input into the neural network translation model for effective learning, achieving The purpose of learning low-frequency word translation and text translation at the same time. In the translation method of the present invention, it is placed at the end of the sentence in the form of a suffix, and the smoothness of the original bilingual sentence pair is maintained under the premise of a given translation prompt; at the same time, placing the target translation in the word bag can allow low-frequency Under the premise of changing the order of words and translations, the sub-objective learning is carried out. The present invention combines the advantages of the two methods of suffix and word bag to achieve the purpose of enhancing learning for low-frequency words. Among them, the suffix method can input the target translation of low-frequency words as an interactive prompt into the translation model in the form of a soft mechanism, and the bag-of-words method uses subtask learning to effectively punish the part of the output translation that does not contain prompt information.

Description

technical field [0001] The invention relates to the field of machine translation, in particular to low-frequency word conversion tasks such as domain terms, proper nouns and named entities in a neural machine translation system. By placing the low-frequency word target translation at the end of the sentence as an interactive translation prompt, and then using the bag-of-words model to set two enhanced learning stages for sub-learning targets, the translation quality of low-frequency words and even the entire sentence is improved. Background technique [0002] The core idea of ​​the current mainstream corpus-based machine translation method is to learn bilingual conversion knowledge from a large-scale corpus, which makes it difficult to effectively obtain low-frequency words such as domain terms, proper nouns, and named entities that have a low frequency or even never appear in the corpus. translate. In addition, since the target translation corresponding to the above-mentio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/58G06F40/295G06F40/284G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/251
Inventor 张学强董晓飞曹峰石霖孙明俊
Owner NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products