An Iterative Method for Text Sequences for Semantic Understanding
A technology for text sequence and semantic understanding, which is applied in the field of text sequence iteration for semantic understanding, and can solve problems such as high cost of reproduction, scalability to be improved, and low efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach 1
[0034] Specific implementation mode one: combine figure 1 Describe this embodiment, a text sequence iteration method for semantic understanding in this embodiment, specifically prepared according to the following steps:
[0035] Step 1. Extract background knowledge base triples and original text triples, and the original text is used to verify the model;
[0036] As the name implies, the knowledge concept is a unit that expresses a complete concept information. As mentioned in 4.2, it is expressed in the form of triples in this model. In order to enable triples to fully express the semantic information in the text, we use Semantic Role Labeling (SRL) to extract the backbone information of the sentences in the text [19](Liu T, Che W, Li S, et al. Semantic rolelable system using maximum entropy classifier[C] / / Proceedings of the NinthConference on Computational Natural Language Learning. Association for Computational Linguistics, 2005:189-192.), mainly extraction A triplet suc...
specific Embodiment approach 2
[0042] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is: the background knowledge base triplet and the original text triplet are extracted in the described step one; The specific process is:
[0043] The experimental data set comes from the Internet text classification corpus provided by Sogou Lab. After preliminary filtering (filtering by artificial settings, filtering out illegal characters in the article and articles with a long text length), the number of available texts is 17,199. There are 9 categories of texts in the Internet text classification corpus, namely finance, IT, health, sports, tourism, education, recruitment, culture, and military. 200 articles are randomly selected for each type of text as test corpus, a total of 1800 original texts. The extraction tool uses Harbin The LTP language technology platform of the Social Computing and Information Retrieval Research Center of the Industrial Un...
specific Embodiment approach 3
[0046] Specific embodiment three: what this embodiment is different from specific embodiment one or two is: in the described step 3, the weight value of original text triple is set to 1, with original text triple as search starting point, by Cosine similarity (cosine similarity) to calculate the semantic similarity between the real number vector of the original text triplet and the real number vector of the background knowledge base triplet; the specific process is:
[0047] The semantic similarity formula between the real number vector of the original text triplet and the real number vector of the background knowledge base triplet is:
[0048]
[0049] In the formula, A is the real number vector of the triplet in the original text, B is the real number vector of the triplet in the background knowledge base, θ is the angle between A and B, · is the inner product of the vector, * is the multiplication, n is the dimension of the vector Number, positive integer, ||A|| is the n...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com