Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Automatic extraction and filtration method for Chinese-English phrase translation pairs

A technology of automatic extraction and filtering methods, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problem of too strict syntax tree constraints, unable to meet the recall rate, etc. The effect of wasting storage space

Active Publication Date: 2009-07-15
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And we clearly know that, firstly, the accuracy rate of syntactic tree generation itself is a problem, and secondly, the constraints of syntactic tree are too strict to meet the requirement of recall rate, so in most syntactic systems, all phrases are actually retained Yes, using only syntactic knowledge to provide reordering information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic extraction and filtration method for Chinese-English phrase translation pairs
  • Automatic extraction and filtration method for Chinese-English phrase translation pairs
  • Automatic extraction and filtration method for Chinese-English phrase translation pairs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Various details involved in the technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be pointed out that the described embodiments are only intended to facilitate the understanding of the present invention, rather than limiting it in any way.

[0041] All the algorithm codes of the present invention are all completed under C++ language, and the configuration of the adopted model is as follows: Pentium 4 processor, CPU main frequency is 2.0GHZ, and internal memory is 8G. And the GIZA++ toolkit that uses among the present invention must finish under LINUX operating system.

[0042] The present invention provides a phrase extraction and filtering algorithm, which improves the existing phrase extraction method and obtains high-precision phrase pairs.

[0043] The basic idea here is that for the current sentence pair, the present invention combines the word alignment generated by its GIZA++, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides an automatic abstracting and filtering method in Chinese-English phrase translation. The method of the invention comprises the following steps: extracting the characteristic information which divides the language block and filters the candidate phrase to original Chinese-English double-language sentence pair; confirming the language block division anchor point according to different characteristic information, dividing the original Chinese-English sentence pair to a plurality of single language blocks; extracting the candidate phrase in the language block with the word aligning information of original Chinese-English double-language sentence pair; and filtering the generated candidate phrase according to the characteristic information of generation frequency of candidate phrase for generating the required phrase pair. The invention adopts the phrase abstraction in the traversing language block and is especially useful for the indefinite expanding of empty word thereby effectively settling the requirement to the storing space caused by the overgreat extraction amount of phrase, and effectively filtering many noise phrase. The invention can generate a plurality groups of phrases directly according to the fixation word alignment of present sentence pair thereby increasing the recalling rate of phrase pair under the precondition of satisfying the precision.

Description

technical field [0001] The invention belongs to the field of natural language processing, in particular to methods for statistical machine translation, cross-language information retrieval and bilingual phrase automatic extraction and filtering. Background technique [0002] With the advent of the globalized information age, how to overcome language barriers is becoming more and more serious. Using computers to realize automatic translation between different languages ​​has become a common problem faced by all mankind. At present, statistical methods occupy a dominant position in machine translation research, and among statistical methods, phrase-based translation models are more mature. The basic idea of ​​the phrase-based statistical machine translation method is to take the phrase as the basic unit of translation. Because phrases contain information about the selection of translated words and the adjustment of word order, they can better solve the problem of local contex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/28
Inventor 宗成庆周玉
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products