Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Data Augmented Machine Translation Method Based on Similar Words and Synonym Replacement

A machine translation and synonym technology, applied in the field of natural language processing or conversion, can solve problems such as poor performance, lack of large-scale, high-quality bilingual parallel corpus, difficulty in building a high-performance machine translation system, etc., and achieve translation quality improvement Effect

Active Publication Date: 2022-08-09
GLOBAL TONE COMM TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) Large-scale, high-quality bilingual parallel corpus is difficult to obtain, and the cost of constructing high-quality bilingual parallel corpus by human translation is relatively high
[0005] (2) Lack of large-scale, high-quality bilingual parallel corpus, resulting in insufficient training data and poor performance of the neural network machine translation model in small languages, making it difficult to build a high-performance machine translation system
However, as a small language, both translation talents and translation systems are very scarce.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Data Augmented Machine Translation Method Based on Similar Words and Synonym Replacement
  • A Data Augmented Machine Translation Method Based on Similar Words and Synonym Replacement
  • A Data Augmented Machine Translation Method Based on Similar Words and Synonym Replacement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0036] like figure 1 As shown, the data-enhanced machine translation method based on the replacement of similar words and synonyms provided by the embodiment of the present invention includes the following steps:

[0037] S101: Utilize the feature that word vectors will eventually be well clustered to obtain high-quality similar word lists and synonym lists;

[0038] S102: Construct similar word lists and synonym lists with word vectors obtained in the training process of large languages, and then replace similar words and synonyms in scarce small languages;

[0039] S103: Expand the parallel corpus of the small...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is the process of processing or conversion technology of natural language. A data enhancement method based on the replacement of similar words and synonyms has been disclosed.Tables and synonymous vocabulary; build similar words and synonymous vocabulary in the word vector obtained during the training process of large -scale language, and then replace similar words and synonyms in scarce small words;And the neural network training small -language neural network machine translation model with a attention mechanism.Expanded training data. The parameters of the neural network translation model can be well learned in sufficient data, and it can alleviate the problem of un logical words in the translation of neural machines, so that the translation quality of the translation model is improved.When the quality of the entire network on the development and collection is no longer significantly improved, the network parameters have been well learned.

Description

technical field [0001] The invention belongs to the technical field of natural language processing or conversion, and in particular relates to a data-enhanced machine translation method based on the replacement of similar words and synonyms. Background technique [0002] At present, the existing technologies commonly used in the industry are as follows: With the improvement of computer computing power and the application of big data, deep learning has been further applied, and Neural Machine Translation based on deep learning has attracted more and more attention. As a research hotspot of artificial intelligence, machine translation has very important scientific research value and practical value. In the field of NMT, one of the most commonly used translation models is the encoder-decoder model with attention-based. The main idea is to encode the sentence to be translated (hereinafter collectively referred to as "source sentence") into a vector representation through an en...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58G06F40/247G06K9/62G06N3/04G06N3/08
CPCG06F40/58
Inventor 汪一鸣熊德意秦文杰程国艮
Owner GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products