The invention discloses a web-based text classification mining
system and a web-based text classification mining method. The
system mainly comprises a text pre-
processing module, a word segmentation
processing module and a classification
algorithm module, wherein the text pre-
processing module is used for automatically screening specific information from texts to be tested, pre-processing the specific information, and filtering out irrelevant information to effectively represent the texts; the word segmentation processing module is used for carrying out word segmentation on the texts, finding attributes / attributive words of each text, and making preparation for selection of characteristic words; and the classification
algorithm module is used for carrying out characteristic selection to obtain an optimum characteristic sub-set, or finding corresponding probabilities according to data which is provided by a file of a training result, comparing the corresponding probabilities to obtain the type of the maximum probability, drawing a conclusion and storing the conclusion in the file finally. The
system overcomes the shortcoming of
conditional independence assumption of a naive Bayes
algorithm by using a
hypertext markup language (
HTML) tag weight, improves a classifier and can improve the recall ratio and precision ratio of
data mining.