Method and device for constructing Tibetan language question and answer corpus
A corpus and Tibetan language technology, applied in the field of big data processing, can solve problems such as the lack of expected data for Tibetan question and answer, the failure to build a large-scale Tibetan question and answer corpus, and insufficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0023] figure 1 A flow chart of a method for constructing a Tibetan question-and-answer corpus in an embodiment of the present invention is shown, and the method includes:
[0024] Step 101, using a Tibetan triplet entity as the central word entity, and obtaining all triplets related to the central word entity;
[0025] Step 102, mapping all entities in all triples into correspondences between entities and labels;
[0026] Step 103, constructing a Tibetan question-and-answer corpus according to the corresponding relationship and the central word entity.
[0027] Wherein, in step 101, a Tibetan triplet entity can be randomly selected as the central word entity, such as figure 2 As shown, the selected triple entity is Father, >
[0028] Among them, the tags in step 102 include shallow tags and deep tags. Shallow tags are not related to triplet attributes, generally people, places, organizations, etc. Deep tags are related to triplet attributes, such as The time of death,...
Embodiment 2
[0037] In Embodiment 2 of the present invention, on the basis of constructing the Tibetan question-and-answer corpus in the above-mentioned embodiments, a scheme of optimizing natural sentences in the constructed Tibetan question-and-answer corpus is added.
[0038] image 3 It shows a schematic diagram of optimizing natural sentences in the Tibetan question-and-answer corpus in the embodiment of the present invention. The natural sentences in the Tibetan question-and-answer corpus include template questions and real questions. The specific optimization steps are as follows Figure 4 Shown below:
[0039] Step 201, calculating the vector of template questions in the Tibetan question-and-answer corpus and the vector of real questions in the Tibetan question-and-answer corpus;
[0040] Specifically, the word2vec tool is used to add the vector latitudes of each word to obtain the sentence vector expression. The vector expression of the template question is marked as Z, and the ...
Embodiment 3
[0047] In Embodiment 2 of the present invention, on the basis of optimizing and constructing the Tibetan question-and-answer corpus in the above-mentioned Embodiment 2, a scheme for expanding the Tibetan question-and-answer corpus is added, and an end-to-end neural network is trained.
[0048] Such as Figure 5 As shown, the specific plans include:
[0049] Construct anticipation: the Tibetan language question-and-answer corpus constructed according to embodiment one and the effective template question sentence constructed by embodiment two construct quadruples, wherein the order of quadruples is subject, relation, object and question;
[0050] Encoding stage: use the TransE algorithm to obtain the vector expressions of entities and relations in the Tibetan question-and-answer corpus, obtain subject vector expressions, relational vector expressions, and object vector expressions, and form triplet word vectors based on the subject vector expressions, relational vector expressio...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com