Classified corpus establishing method and system and server provided with system
A construction method and corpus technology, applied in the field of natural language processing, can solve problems such as inability to classify, and achieve the effect of reducing human subjective influence, shortening time, and reducing the degree of manual participation.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0050] The present embodiment provides a kind of construction method of classification corpus, and the construction method of described classification corpus comprises the following steps:
[0051] Obtain the target data to be classified, and obtain category description data according to actual needs;
[0052] Calculate the text similarity between the target data to be classified and the determined category description data to select the text similarity calculation method corresponding to the maximum accuracy;
[0053] Use the text similarity calculation method corresponding to the maximum accuracy to calculate the similarity between the target data to be classified and the determined category description data, and classify the target data to be classified into the corresponding maximum similarity according to the calculated similarity category;
[0054] Perform deep matching on the classified target data and the determined category description data to obtain a first classifi...
Embodiment 2
[0071] see figure 2 , is a schematic flowchart of a method for constructing a classification corpus in another embodiment. Such as figure 2 As shown, the construction method of the classification corpus specifically includes the following steps:
[0072] S1', obtain the target data to be classified through the web crawler system.
[0073]For example, the recruitment information of all domestic listed companies published on 51job, Zhaopin, ChinaHR and Liepin from August 2014 to August 2015 was obtained through the web crawler system. Therefore, the recruitment information of all domestic listed companies published on 51job, Zhaopin.com, ChinaHR and Liepin from August 2014 to August 2015 is the target data to be classified.
[0074] S2', clarify the classification system according to actual needs to obtain category description data. In this embodiment, the "Occupational Classification Code of the People's Republic of China" is used as the basis for classification. There ar...
Embodiment 3
[0086] The present embodiment provides a kind of construction system 1 of classification corpus, please refer to image 3 , which is a schematic diagram showing the principle structure of a system for constructing a classification corpus in an embodiment. Such as image 3 As shown, the construction system 1 of the classification corpus includes: data acquisition module 10, category acquisition module 11, first processing module 12, first classification module 13, second processing module 14, selection module 15, second classification module 16 , a third processing module 17 , a determination module 18 , a third classification module 19 , and a testing module 20 .
[0087] The data acquisition module 10 is used to acquire target data to be classified through a web crawler system.
[0088] For example, the recruitment information of all domestic listed companies published on 51job, Zhaopin, ChinaHR and Liepin from August 2014 to August 2015 was obtained through the web crawler...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com