Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Bilingual sentence automatic alignment method and device

An automatic alignment and sentence technology, applied in the information field, can solve the sentence equivalence problem of lack of training translation models, achieve high precision, improve precision, and improve accuracy

Active Publication Date: 2021-04-16
TSINGHUA UNIV
View PDF12 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention is used to solve the problem of aligning articles in a specific field or between two specific languages, but lacking sentence pairs for training translation models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bilingual sentence automatic alignment method and device
  • Bilingual sentence automatic alignment method and device
  • Bilingual sentence automatic alignment method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] Embodiments of the present invention will be described below with reference to the accompanying drawings. Those skilled in the art would recognize that the described embodiments can be modified in various ways or combinations thereof without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and not intended to limit the scope of the claims. Also, in this specification, the drawings are not drawn to scale, and like reference numerals denote like parts.

[0060] figure 1 The flowchart of the bilingual sentence automatic alignment method provided by the present invention, the bilingual sentence automatic alignment method comprises the following steps:

[0061] S1. Obtain a collection of article pairs with bilingual aligned articles, divide the articles into sentences, and count the length of each sentence and the relative position of the sentences in the article.

[0062] Specifically, the article ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a bilingual sentence automatic alignment method and device, and the method comprises the steps of obtaining an article pair set, enabling each article pair to comprise a source language article S and a target language article T, dividing sentences of an article, and carrying out the statistics of the relative length of each sentence and the relative position of each sentence in the article; determining word similarity between sentences si in the source language article S and sentences tj in the target language article T by utilizing a word vector model; calculating the distance between the sentence in the source language article S and the sentence in the target language article T by utilizing the inter-sentence word similarity, the sentence relative length difference and the relative position difference of the sentence in the article, taking the relative length of the sentence as the information amount, minimizing the sum of the products of the distance and the information amount as an information transfer optimization model, and solving the model to establish an alignment relationship. According to the invention, alignment between sentences is converted into searching for an optimal transportation strategy, and under the condition that work is minimum, all information of a source language article is transferred into a target language article.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a bilingual sentence automatic alignment method and device. Background technique [0002] Existing bilingual word alignment techniques are mainly divided into three categories, rule-based word alignment techniques, supervised word alignment techniques and unsupervised word alignment techniques. Rule-based word alignment technology relies on artificial rules and is highly dependent on the characteristics of the language itself. Supervised word alignment technology relies on existing dictionaries or aligned sentences in the corresponding field. These dictionaries and a large number of sentence pairs do not exist in specific fields or between some languages ​​that are not particularly mainstream. The unsupervised word alignment technique obtains the word vector spaces of two languages, and obtains aligned word vectors by aligning the two spaces. [0003] The existing sentence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/289G06F40/58
Inventor 俞声罗声旋
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products