Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Low-resource language entity extraction method based on bilingual word vectors

A technology of entity extraction and word vector, applied in the field of information extraction, can solve problems such as unsupervised learning, and achieve the effect of unsupervised learning

Active Publication Date: 2019-08-09
TONGJI UNIV
View PDF7 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a low-resource language entity extraction method based on bilingual word vectors, which considers the semantic features of the language in the low-resource language entity extraction task, and solves the unsupervised learning problem of low-resource language entity extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low-resource language entity extraction method based on bilingual word vectors
  • Low-resource language entity extraction method based on bilingual word vectors
  • Low-resource language entity extraction method based on bilingual word vectors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be described below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific examples described here are only used to explain the present invention, not to limit the present invention.

[0028] The present invention proposes a low-resource language entity extraction method based on bilingual word vectors. Combine bilingual word vectors, reinforcement learning models, and model transfer methods to achieve unsupervised entity extraction in the target language, and enrich the text information in the target language through resource-rich source language text information.

[0029] figure 1 The flow of the low-resource language entity extraction method based on bilingual word vectors is shown, and each step of the method is described in detail:

[0030] Step 1: Build a bilingual comparable corpus by obtaining public texts in the source and target languages.

[0031] Due to the lack of data i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a low-resource language entity extraction method based on bilingual word vectors. Semantic features of languages are considered in a low-resource language entity extraction task, and the problem of unsupervised learning of low-resource language entity extraction is solved. The method comprises the following three stages: step (1), constructing bilingual word vectors based ona comparable corpus; (2) constructing a source language entity extraction model; and (3) constructing a target language entity extraction model. Compared with the prior art, the reinforcement learning and the bilingual word vector are introduced into the low-resource language entity extraction task for the first time, and the problem that the low-resource language lacks an entity extraction annotation corpus is solved. Through the bilingual word vectors, word meaning characteristics of the cross-language text are effectively represented, and the problems that low-resource language semantic information is insufficient and semantic information cannot be directly transferred among languages in the model transferring process are solved. Meanwhile, the reinforcement learning thought is adopted, and unsupervised learning of the low-resource language entity extraction task is achieved.

Description

technical field [0001] The invention relates to the field of information extraction in the fields of artificial intelligence and natural language processing, and in particular to a multilingual-based entity extraction method. Background technique [0002] Entity extraction aims to mine user-focused knowledge from unstructured text in cyberspace, such as names of people, places, institutions, etc., or entities with domain characteristics. For languages ​​such as English, which exist widely in the network and have many researchers, there are already a large number of labeled entity extraction training corpora, and models based on machine learning and deep learning can be used for supervised entity extraction. In the cyberspace, there are also a large number of low-resource languages, such as Chinese and Japanese, which have fewer annotation corpora, and the manual annotation of training data consumes a lot of manpower and time. Therefore, the traditional supervised entity extr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/332G06F17/27G06F17/28G06N3/04G06N3/08
CPCG06F16/332G06N3/08G06F40/56G06F40/58G06F40/30G06N3/044G06N3/045Y02D10/00
Inventor 谭成翔校娅黄超赵雪延徐潜朱文烨
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products