Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
A Chinese word segmentation and maximum matching technology, which is used in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as inability to correctly segment, slow word segmentation, and maximum matching word length, etc., to improve the response time of word segmentation, The effect of good word segmentation time and improved word segmentation accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0022] Embodiment 1: as Figure 1-3 Shown, a kind of dictionary-based forward successively adds one word maximum matching Chinese participle method, and the step of described method is:
[0023] Step 1. Rough segmentation; remove punctuation marks, spaces, dates, numbers, English letters and other marks from the text to be segmented, set the text to be processed as A, and divide it into N short text sequences S i The set (0i short text, A={S 1 ,S 2 ,S 3 ,...S N};
[0024] Step two, such as figure 2 As shown, the short texts after rough segmentation are read in sequence one by one, denoted as S i , let each sentence sequence S i by m word W ij (0i =i1 W i2 W i3 ...W im >
[0025] Step 3, the text S after rough segmentation i Participate. Such as figure 2 As shown, the text is word-segmented.
[0026] 1) Set a word segmentation search length L slightly smaller than the maximum word length in the dictionary, L is generally slightly smaller than the maximum word ...
Embodiment 2
[0034] Embodiment 2: as Figure 1-3 Shown, a kind of dictionary-based forward successively adds one word maximum matching Chinese participle method, and the step of described method is:
[0035] Set a word segmentation search length L slightly smaller than the maximum word length in the dictionary; set the character string to be segmented as S=s 1 the s 2 the s 3 the s 4 ...s i . From the beginning of the sentence, take the first two characters s 1 the s 2 , judging s 1 the s 2 Is it a word in the dictionary, if not, specify s 1 If it is a single-character word, if it is segmented out, the length pointer of the searched text will be increased by one word to the third word, and it will be taken from the dictionary as s 2 the s 3 Carry out a new round of search and match; if s 1 the s 2 is a word in the dictionary, then add a word to the back, and judge s 1 the s 2 the s 3 Whether it is a word, if s 1 the s 2 the s 3 is not a word in the dictionary, it indicat...
Embodiment 3
[0036] Embodiment 3: as Figure 1-3 Shown, a kind of dictionary-based forward successively adds one word maximum matching Chinese participle method, and the step of described method is:
[0037] Step1. Read the text to be segmented, roughly segment the input text according to obvious separators such as punctuation, numbers, Western characters, charts, etc., and divide it into short texts; for example, divide it into a text "today's weather is particularly good" ;
[0038] Step2, the short text of rough segmentation is used as the object of further segmentation, and further word segmentation search length L=7 is set, wherein L is taken as the length less than the maximum word length in the dictionary, wherein the maximum word length is 12;
[0039] Step3. Take the first two words "today" of a short text after rough segmentation, and search for a match in the dictionary; after matching "today" exists in the dictionary, then add one word to the length pointer of the searched tex...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com