Historical classics word segmentation method based on word alignment
A word segmentation method and word alignment technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as ineffectiveness, and achieve the effect of improving accuracy.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0025] In this embodiment, Eclipse is used as the development platform, and Java is used as the development language. It is carried out on corpus of 4145 sentence pairs of ancient and vernacular Chinese in "The Benji of Qin Shihuang", "The Benji of Qin", "The Benji of Xiang Yu", "The Benji of Gaozu" and "The Benji of Lu Hou". The following is the specific process:
[0026] Step 1: Segment the modern Chinese in the parallel corpus, and split the ancient Chinese word by word. Align ancient Chinese and modern Chinese using the IBM Model 3 model.
[0027] Step 2: Preprocess the alignment results obtained in Step 1 to eliminate the interference of punctuation marks and adverbs:
[0028] (1) Check the alignment results obtained in step 1 one by one, and delete the alignment results whose alignment probability is less than or equal to zero, single ancient Chinese characters, or non-Chinese characters corresponding to modern Chinese;
[0029] (2) Check the part of speech of two wor...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com