Extraction Method of Entity Attributes and Attribute Values Based on Multi-granularity Semantic Blocks
A technology of entity attributes and extraction methods, which is applied in natural language data processing, other database retrieval, network data retrieval, etc. Incomplete semantics, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0069] Step 1: Construct the attribute and attribute value extraction corpus of entities.
[0070] Using web crawlers based on Python, Selenium and PhantomJS technology to collect entry pages in Wikipedia, Baidu Encyclopedia and Interactive Encyclopedia, save them to the local computer, and construct corpus for entity attribute and attribute value extraction. Further, free text extraction is performed on the webpage, that is, the title and free text of the webpage are extracted, and information such as navigation and pictures in the webpage is removed. For example, for the entity Forbidden City, the entry pages of the entity in Wikipedia, Baidu Encyclopedia and Hudong Baike are collected and saved in the local computer.
[0071] Step 2, perform word segmentation, part-of-speech tagging and phrase recognition on the free text sentences in the attribute and attribute value extraction corpus.
[0072] Use the word segmentation and part-of-speech tagging tool of Harbin Institute ...
Embodiment 2
[0115] A multi-granularity semantic block-based entity attribute and attribute value extraction system based on the above method, such as figure 2 As shown, it includes corpus collection module, word segmentation and phrase recognition module, semantic role labeling module, dependency syntactic analysis module, semantic dependency analysis module, attribute knowledge extraction module based on semantic role granularity, attribute knowledge extraction module based on phrase granularity, word-based Granular attribute knowledge extraction module and attribute knowledge classification module; corpus collection module is connected with word segmentation and phrase recognition module, semantic role labeling module, dependency syntactic analysis module, semantic dependency analysis module; word segmentation and phrase recognition module, semantic role labeling module They are respectively connected to the attribute knowledge extraction module based on semantic role granularity; the w...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com