Word segmentation method, word segmentation device, named entity identification method and named entity identification system
A word segmentation method and word segmentation technology, which is applied in the direction of instruments, electrical digital data processing, calculation, etc., can solve the problem of low word segmentation accuracy, achieve the effect of reducing video memory usage, ensuring accuracy, and improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0037] Such as figure 1 As shown, a word segmentation method is schematically provided in this embodiment, including the following steps:
[0038] Step 1 (S1 in the figure), build a dictionary.
[0039] The word segmentation method provided in this embodiment is especially suitable for word segmentation of Chinese sentences, so a Chinese word segmenter is constructed here. Jieba is a frequently used Chinese word segmentation tool. Here, the dictionary of jieba is directly used as the dictionary of the word segmenter, and some words that are not commonly used are deleted, and correct and commonly used words are kept as much as possible to reduce the capacity of the word segmenter. Of course, the dictionary of jieba can also be directly used as the dictionary of the tokenizer in the brief operation without any processing.
[0040] Step 2, based on the dictionary built in step 1, generate a prefix tree (trie tree) for the sentence to be segmented, realize efficient word graph s...
Embodiment 2
[0057] see image 3 , this embodiment provides a method for named entity recognition, which utilizes the word segmentation method described in Embodiment 1. Specifically, the named entity recognition method includes the following steps:
[0058] Step 10, according to the method described in Embodiment 1, perform word segmentation on the sentence to be recognized to obtain several words forming the sentence to be recognized, and several words form a word sequence. Still taking a sentence "Xiang'eqing is a catering company" in the sentence to be recognized as an example, wherein "Xiang'eqing" is an unregistered word, and the result identified based on the method described in Example 1 is: Xiang|E|Qing|is |One|Home|Dining|Company.
[0059] Step 20, input the word sequence obtained after word segmentation into the pre-trained NER model based on word sequence, and output the recognition result, that is, identify the named entity in the sentence to be segmented.
[0060] Among th...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com