Method and device for automatically establishing classification rule for cross-language
An automatic construction, cross-language technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve the problems of unbearable workload, high construction cost, high cost, and reduce The effect of labor cost and workload
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0046]The existing preliminary filtering rules mainly include two types: one is the D rule, which is used to filter pages, that is, when the features of the page match the rule, the page is filtered out and does not enter the subsequent classifier stage. The other is the C rule, which is used to retain pages, that is, when the feature of the page hits the rule, the page is retained and enters the subsequent classifier stage, and if a page does not hit any rule, it is filtered out. Usually, no matter what kind of initial filtering rule can be regarded as a feature judgment expression, each judgment condition in the feature judgment expression belongs to one of the following two: whether a certain feature contains, or whether the value of a certain feature is greater than (or less than) a certain value. The relation between each judging condition is "and" or "or". There can be parentheses in the expression to change the priority of logical operations. In any case, a feature jud...
Embodiment 2
[0133] image 3 The structure diagram of the device for automatically constructing classification rules across languages provided by Embodiment 2 of the present invention, as shown in image 3 The shown device may include: a rule transformation unit 300 , a keyword determination unit 310 , a candidate word determination unit 320 , a candidate word selection unit 330 and a rule replacement unit 340 .
[0134] The rule transformation unit 300 is configured to transform the classification rules of the source language to obtain one or more AND relationship rules, and provide each AND relationship rule as a current AND relationship rule to the keyword determination unit 310 .
[0135] Specifically, through the analysis of regular expressions and the distribution rate of logical operations, the rules can first be transformed into disjunctive paradigms, and then the disjunctive paradigms can be split into several AND relational rules.
[0136] The keyword determination unit 310 is...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com