Microblog text normalization method based on context graph random walk and phonetic configuration codes
A random walk and context technology, applied in the computer field, can solve problems such as the inability to meet the standardization requirements of Chinese microblog text
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0041] The present invention is a microblog text standardization method based on context graph random walk and phonetic-graphic code, the overall process is as follows figure 1 shown, including the following steps:
[0042] Step 1: Segment the Chinese Weibo text.
[0043] Step 2: Use a standard dictionary to identify non-standard words in the microblog text and extract the context of the words.
[0044] Step 3: Construct a context graph according to the word, the context corresponding to the word, and the co-occurrence times of the word and the corresponding context.
[0045] Step 4: Perform a random walk on the context graph to obtain context-based normalized candidate sets for each non-normative word.
[0046]Step 5: Based on the phonetic code of a single Chinese character, find out the phonetic code of the word.
[0047] Step 6: For each non-standard word, extract the feature vector of the phonetic-phonetic code, input it into the phonetic-phonetic code model, and output...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com