Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for automatically classifying text documents by utilizing body

A text document and automatic classification technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficult to improve classification accuracy, cumbersome, and no consideration of semantic relationship between words, so as to save training and learning The process of improving accuracy and enriching the effect of concept content

Inactive Publication Date: 2011-01-12
JIANGSU T Y ENVIRONMENTAL ENERGY +1
View PDF4 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1) The traditional machine learning method to train the classifier needs to manually collect a large number of classified text document sets, which is very cumbersome, and for different classification categories, it is necessary to manually collect different text document sets to train the classifier;
[0006] 2) Traditional machine learning methods do not consider the semantic relationship between words, so it is difficult to improve the accuracy of classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically classifying text documents by utilizing body
  • Method for automatically classifying text documents by utilizing body
  • Method for automatically classifying text documents by utilizing body

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described now in conjunction with accompanying drawing:

[0039]According to the method for classifying text documents using ontology proposed by the present invention, we have implemented it using Java and Perl languages, and the specific implementation process is as follows:

[0040] The text document classification method using ontology is divided into the following four steps:

[0041] Step 1: Construction of the keyword set of the text document. Here, the KEA algorithm is used to extract the weighted keyword set of each text document in the text document collection to be classified, specifically: for the text document collection D={d 1 , d 2 ,...,d |D|} (|D| indicates the number of text documents in the text document collection D) in each text document d i , first, using Naive Bayesian estimation, by considering the frequency tf×idf of words (existing words) appearing in text documents, the average position Occurrence of wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for automatically classifying text documents by utilizing a body, comprising the following steps: firstly expressing the characteristic information of a text document by utilizing a weighted key word set; and then expressing the characteristic information of a classifying catalogue by a body which is subject to body disambiguation and body expansion; transforming the body into a weighted word meaning set through analyzing the body structural characteristic; finally calculating the semantic similar value between the key word set of the text document and the body weighted word meaning set by utilizing a Earth Mover's Distance method; further calculating the similar value between the text document and the classifying catalogue; and classifying and sequencing the text document according to the similar value between the text document and the classifying catalogue. By utilizing the method of the invention, the text document can be automatically classified, and the accuracy of the text document classification can be improved.

Description

technical field [0001] The invention relates to a method for automatically classifying text documents by using an ontology, and belongs to the fields of computer information processing, information retrieval and the like. It is suitable for fast and accurate automatic classification of massive network text documents. Background technique [0002] In order to improve the efficiency of text document organization and better support users to browse and find information, text document classification has always been the focus of attention. At first, text document classification was done manually, but with more and more text document resources, manual classification has become impossible, so automatic text document classification technology has become the focus of research. [0003] Text document classification is generally divided into three stages: first, the feature information of the text document and the classification directory is extracted; then, the classifier calculates t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 郭雷方俊
Owner JIANGSU T Y ENVIRONMENTAL ENERGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products