Industry dictionary generating method and device
An industry and dictionary technology, applied in the field of industry dictionary generation methods and devices, can solve the problems of time-consuming and laborious, high cost of industry dictionary generation and industry dictionary, and achieve the effect of saving production costs and improving efficiency.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment approach
[0033] Specifically, an implementation manner of step 12 includes:
[0034] Step 121, preprocessing the document collection to obtain a word sequence collection;
[0035] Among them, preprocessing mainly refers to performing word segmentation processing on each document in the document collection, that is, performing word segmentation on the document to obtain a series of words. Since Chinese text is not like English, there are spaces between words in English lines as natural delimiters, but there is no obvious delimiter between words in Chinese. In order to facilitate the automatic processing of Chinese documents by the industry dictionary generation device, it is necessary to Perform word segmentation to form a series of words. Wherein, the word segmentation processing may adopt a word segmentation method based on a dictionary, or a word segmentation method based on statistics. Since the accuracy of word segmentation has a certain impact on the quality of the final industr...
specific Embodiment approach
[0054] Further, step 13 obtains a specific implementation manner of relevant candidate terms, including:
[0055] Step 131 , the industry dictionary generation device uses statistical algorithms such as chi-square check or information gain to calculate the correlation between each candidate term and the industry category to which it belongs; the chi-square check algorithm is preferred.
[0056] The principle of the chi-square verification algorithm is: first assume that the two variables are independent (null hypothesis), and then observe the deviation between the actual value and the theoretical value to determine whether the theory is correct. If the deviation is very small, it is considered to be a sample error, and the null hypothesis is accepted, that is, the two variables are considered to be independent; otherwise, the null hypothesis is rejected, that is, the two variables are considered to be correlated. On the issue of calculating the correlation between candidate te...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com