Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Attribute weighting method based on decision tree and text classification method

A text classification and decision tree technology, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as inapplicability and high time complexity

Inactive Publication Date: 2015-08-05
CHINA UNIV OF GEOSCIENCES (WUHAN)
View PDF1 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the heuristic search process of the CFS attribute weighting method has too high time complexity, and it is not suitable for text data with high dimensions or even more than ten thousand dimensions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Attribute weighting method based on decision tree and text classification method
  • Attribute weighting method based on decision tree and text classification method
  • Attribute weighting method based on decision tree and text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention will be further described below in conjunction with embodiment.

[0055] The present invention provides a kind of attribute weighting method based on decision tree, comprises the following steps:

[0056] (1) For a known training document set D, any document d in the training document set D is represented as a word vector form d=1 ,w 2 ,...w m >, where w i is the i-th word in document d, and m is the number of words in document d;

[0057] Use the following formula to calculate the information gain rate of each attribute in the training document set D:

[0058] GainRatio ( D , w i ) = Gain ( D , w i ) Spl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an attribute weighting method based on a decision tree. The attribute weighting method comprises the following steps: firstly, constructing the decision tree by an information gain rate standard, and then, calculating a weight according to the minimum depth of each attribute in a test in the decision tree. The invention simultaneously provides a polynomial nave Bayesian text classification method of the attribute weighting method based on the decision tree, a complementary set nave Bayesian text classification method of the attribute weighting method based on the decision tree and a polynomial and complementary set combination nave Bayesian text classification method of the attribute weighting method based on the decision tree. The attribute weighting method improves the classification precision of the original nave Bayesian text classifier and simultaneously maintains the conciseness and the time complexity of the original nave Bayesian algorithm.

Description

technical field [0001] The invention relates to a decision tree-based attribute weighting method and a text classification method, belonging to the technical field of artificial intelligence data mining classification. Background technique [0002] Naive Bayesian text classifier is often used to deal with text classification problems because of its simplicity and efficiency, but its properties independent Hypothesis affects its classification performance to some extent while making it efficient. Given a document d, the document is represented as a word vector of the form <w 1 ,w 2 ,...,w m >, Multinomial Naive Bayes (MNB), Complementary Naive Bayes (CNB) and the combined model of both (OVA) classify document d using Equations 1, 2 and 3, respectively. [0003] c ( d ) = arg max c ∈ C [ log...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 蒋良孝张伦干李超群
Owner CHINA UNIV OF GEOSCIENCES (WUHAN)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products