Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Statistical machine translation method based on fuzzy tree-to-accurate tree rule

A statistical machine translation and precision tree technology, applied in the direction of instruments, calculations, special data processing applications, etc., can solve problems such as the inability to achieve the effect of the phrase-based translation model, neglect, and inability to significantly surpass

Active Publication Date: 2011-07-06
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Experiments based on the tree-to-tree model prove that this method cannot even achieve the effect of the phrase translation model [Brooke Cowan, Ivona Kucerova and Michael Collins, 2006.A discriminative model for tree-to-tree translation.In Proc.ofEMNLP, pages 232 -241.] So some scholars realized that one of the reasons for the poor performance of the tree-to-tree translation model is that both the source language and the target language use the 1-best syntactic analysis tree, which leads to a very serious problem of data sparsity. A Tree Sequence Alignment-based Tree-to-TreeTranslation Model. In Proc. of ACL 2008, pages 559 -567.] and a tree-to-tree translation model based on compressed forests (both source and target use a syntactic shared forest instead of 1-best syntactic analysis tree) [Yang Liu, Yajuan Lv and Qun Liu, 2009. Improving tree-to -tree translation with packed forests.In Proc.of ACL-IJCNLP 2009, pages 558-566.] Although the improved tree-to-tree translation model has been significantly improved compared to the original model, even if the syntactic forest at both ends is used, The translation model still cannot significantly surpass the phrase-based translation model, let alone the string-to-tree model that does not utilize any syntactic information at the source
Some scholars have suggested that the biggest reason for the poor performance of the tree-to-tree model is that in the stages of rule extraction and rule decoding, the strict constraints on both bilingual and bilingual ends requiring precise syntax trees can easily lead to ignoring a large number of very useful rules and rules during rule extraction. Situations where a matching rule cannot be found while decoding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Statistical machine translation method based on fuzzy tree-to-accurate tree rule
  • Statistical machine translation method based on fuzzy tree-to-accurate tree rule
  • Statistical machine translation method based on fuzzy tree-to-accurate tree rule

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0033] 1. Automatic word segmentation, automatic word alignment and automatic syntactic analysis for bilingual sentence pairs. The specific implementation is as follows:

[0034] Automatically segment the source language and target language sentences in the double sentence pair, and obtain the word segmentation results of the source language side and the target language side. If the source language or the target language does not contain Chinese, word segmentation is not required. If the source language or target language contains Chinese, you need to use Chinese word segmentation. There are many ways to segment Chinese words. In the embodiment of the present invention, we use the open source Chinese word segmentation tool ICTCLAS to segment Chinese. ICTCLAS Chinese word segmentation tool is a commonly used open source Chinese word segmentation tool. The ICTCLAS Chinese word segmentation tool can be downloaded for free at the following URL:

[0035] http: / / ictclas.org / i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a statistical machine translation method based on fuzzy tree-to-accurate tree rule, in particular to a method fully and rightly using the source language end syntactic structure knowledge to improve the statistical machine translation quality based on a string-to-tree translation model. The method comprises the steps of: conducting word segmentation, automatic word alignment and syntactic analysis on the bilingual sentence; automatically extracting the fuzzy tree-to-accurate tree translation rule from the parse tree of the bilingual sentence with word alignment; conducting probability estimate on the translation rule extracted, and training a language model of the target end; designing the matching criterion of the source language end syntactic structure with the fuzzy tree-to-accurate tree translation rule, and estimating the matching probability thereof; and designing the optimization objective of the translation model, and using the fuzzy tree-to-accurate tree translation rule and the language model of the target end to search the target translation of the test statement. The availability of the statistical machine translation method is verified on the translation task from Chinese to English in the international mechanical translation evaluation.

Description

technical field [0001] The invention relates to the technical field of natural language processing, and is a novel statistical machine translation method based on fuzzy trees to precise trees. Background technique [0002] Statistical machine translation is a technology that uses statistical methods to automatically learn translation rules from parallel bilingual corpora, and effectively uses these rules to automatically translate test sentences. After statistical machine translation has experienced word-based and phrase-based translation models, the translation model based on syntactic structure has become a current research hotspot. The translation system achieved the best score in the 2009 International Machine Translation Evaluation and significantly surpassed the very popular phrase-based translation system. The string-to-tree based translation model is one of the best statistical machine translation models currently available. figure 1 An example based on a string-to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27
Inventor 宗成庆张家俊
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products