Vietnamese multi-category word disambiguation method based on combination method
A combination method and technology of concurrent words, applied in semantic analysis, natural language translation, natural language data processing, etc., can solve problems such as disambiguation of concurrent words, poor generalization performance, and low accuracy rate of Vietnamese part-of-speech tagging , to achieve the effect of identifying
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0068] Embodiment 1: as Figure 1-5 Shown, based on the method for disambiguation of Vietnamese concurrently with class words based on combination method, the specific steps of the method for disambiguation of concurrent class words based on Vietnamese language with combination method are as follows:
[0069] Step1. Firstly, the sub-level part-of-speech tagging corpus of Vietnamese sentences is combined with the Vietnamese concurrent word dictionary to extract the Vietnamese concurrent word field library, and then combined with the characteristics of the Vietnamese language and concurrent class words, the Vietnamese disambiguation features are obtained;
[0070] Step2. Use the maximum entropy statistical analysis method to disambiguate the Vietnamese language and word field corpus that has been formed in the Vietnamese language and word field database, and obtain the maximum entropy Vietnamese word and word disambiguation model;
[0071] Step3. Use the conditional random field...
Embodiment 2
[0075] Embodiment 2: as Figure 1-5 As shown, based on the combination method of Vietnamese and class words disambiguation method, the present embodiment is the same as embodiment 1, wherein:
[0076] As a preferred solution of the present invention, the specific steps of the step Step1 are:
[0077] Step1.1, first use the web crawler program to crawl the Vietnamese webpage corpus from the Internet;
[0078] Step1.2. After filtering and denoising the crawled Vietnamese webpage corpus, construct a Vietnamese text-level corpus, and store the Vietnamese text-level corpus in the database;
[0079] The present invention takes into account that there are noises such as repeated webpages and webpage labels in the crawled Vietnamese webpage corpus, and these noises are invalid. Therefore, it is necessary to remove the high-quality text-level corpus containing only Vietnamese through operations such as filtering and denoising, and store it in the database to facilitate data managemen...
Embodiment 3
[0088] Embodiment 3: as Figure 1-5 Shown, based on the Vietnamese language of combined method and class word disambiguation method, present embodiment is identical with embodiment 2, wherein:
[0089] As a preferred solution of the present invention, the specific steps of the step Step1.5:
[0090] Step1.5.1, take out the Vietnamese sentence sub-level part-of-speech tagged corpus from the Step1.4 database, and obtain the Vietnamese sentence sub-level part-of-speech tagged corpus;
[0091] Step1.5.2. Collect Vietnamese dictionaries from websites and dictionaries to form Vietnamese dictionaries;
[0092] Step1.5.3, obtain the Vietnamese dictionary from Step1.5.2, and manually screen and extract to obtain the Vietnamese concurrent word dictionary;
[0093] Step1.5.4, through the artificially written extraction and classifier program, combined with the Vietnamese and classifier dictionary in Step1.5.3, the Vietnamese sentence-level part-of-speech tagging corpus obtained in Step1....
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com