Sentence alignment method for bilingual parallel corpuses

A parallel corpus and bilingual technology, applied in the field of language translation processing, can solve problems such as poor alignment effect and single alignment parameters, and achieve the best alignment effect

Inactive Publication Date: 2017-11-24
北京同文世纪科技有限公司
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the sentence alignment method in the above-mentioned technology has a poor alignment effect because the alignment parameters involved are relatively single.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence alignment method for bilingual parallel corpuses
  • Sentence alignment method for bilingual parallel corpuses
  • Sentence alignment method for bilingual parallel corpuses

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Sentence alignment of bilingual parallel corpora, that is, establishing a sentence-level alignment relationship between bilingual texts, is to determine which sentence (some) in the source language text and which sentence (some) in the target language text are translations of each other.

[0047] For example, if S is the original text and T is the translation text, then S=s 1 the s 2 …s m , T=t 1 t 2 ...t m , seek A=a 1 a 2 …a r , where: a i =(s j ..s k ,t p ..t q ), that is, to find the sequence of the source language sentences of the original text and the target language sentences of the target text, and the original text fragment s j ..sk and translation fragment t p ..t q They are translations of each other, and there is no further sentence-level alignment (alignment of sentence pairs, sentence beads) between the two. In most cases, a source sentence corresponds to a target sentence, and a sentence pair consists of a source sentence and a target sente...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a sentence alignment method for bilingual parallel corpuses. The method comprises the following steps of: A, obtaining a bilingual probability distribution dictionary comprising word inter-translation pairs and word inter-translation probabilities of a source language and a target language; B, constructing a dynamic plan matrix according to the quantities of sentences of the source language and target language of a to-be-aligned text, and determining evaluation scores on the basis of sentence length information, word information and the word inter-translation probability under different alignment modes according to the dynamic plan matrix and the bilingual probability distribution dictionary; C, determining an alignment path under the alignment mode, the evaluation score of which is greater than an appointed threshold value, according to the evaluation score; and D, determining an alignment path sequence of sentences of the source language and target language of the to-be-aligned text according to the alignment path. The sentence alignment method for bilingual parallel corpuses is beneficial for improving the automatic sentence alignment precision of bilingual parallel corpuses.

Description

technical field [0001] The invention relates to the technical field of language translation processing, in particular to a sentence alignment method of bilingual parallel corpus. Background technique [0002] Sentence alignment is to determine which sentence(s) in the source language text and which sentence(s) in the target language text are translations of each other. That is to find the mapping relationship between sentences in bilingual texts. The difficulty of sentence alignment is that there are many-to-many mappings between sentences in bilingual texts, which is prone to mismatch. [0003] Currently, sentence alignment methods in the prior art include methods based on sentence length, methods based on word alignment or character string alignment, methods based on offset position alignment, and the like. These methods rely on sentence length, sentence position, or sentence length ratio information between the two languages. However, since the sentence alignment method...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
CPCG06F40/51G06F40/58
Inventor 刘强彭蓉
Owner 北京同文世纪科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products