Method for automatically classifying text documents by utilizing body

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text document and automatic classification technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficult to improve classification accuracy, cumbersome, and no consideration of semantic relationship between words, so as to save training and learning The process of improving accuracy and enriching the effect of concept content

Inactive Publication Date: 2011-01-12

JIANGSU T Y ENVIRONMENTAL ENERGY +1

View PDF4 Cites 40 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] 1) The traditional machine learning method to train the classifier needs to manually collect a large number of classified text document sets, which is very cumbersome, and for different classification categories, it is necessary to manually collect different text document sets to train the classifier;

[0006] 2) Traditional machine learning methods do not consider the semantic relationship between words, so it is difficult to improve the accuracy of classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] The present invention will be further described now in conjunction with accompanying drawing:

[0039]According to the method for classifying text documents using ontology proposed by the present invention, we have implemented it using Java and Perl languages, and the specific implementation process is as follows:

[0040] The text document classification method using ontology is divided into the following four steps:

[0041] Step 1: Construction of the keyword set of the text document. Here, the KEA algorithm is used to extract the weighted keyword set of each text document in the text document collection to be classified, specifically: for the text document collection D={d 1 , d 2 ,...,d |D|} (|D| indicates the number of text documents in the text document collection D) in each text document d i , first, using Naive Bayesian estimation, by considering the frequency tf×idf of words (existing words) appearing in text documents, the average position Occurrence of wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method for automatically classifying text documents by utilizing a body, comprising the following steps: firstly expressing the characteristic information of a text document by utilizing a weighted key word set; and then expressing the characteristic information of a classifying catalogue by a body which is subject to body disambiguation and body expansion; transforming the body into a weighted word meaning set through analyzing the body structural characteristic; finally calculating the semantic similar value between the key word set of the text document and the body weighted word meaning set by utilizing a Earth Mover's Distance method; further calculating the similar value between the text document and the classifying catalogue; and classifying and sequencing the text document according to the similar value between the text document and the classifying catalogue. By utilizing the method of the invention, the text document can be automatically classified, and the accuracy of the text document classification can be improved.

Description

technical field [0001] The invention relates to a method for automatically classifying text documents by using an ontology, and belongs to the fields of computer information processing, information retrieval and the like. It is suitable for fast and accurate automatic classification of massive network text documents. Background technique [0002] In order to improve the efficiency of text document organization and better support users to browse and find information, text document classification has always been the focus of attention. At first, text document classification was done manually, but with more and more text document resources, manual classification has become impossible, so automatic text document classification technology has become the focus of research. [0003] Text document classification is generally divided into three stages: first, the feature information of the text document and the classification directory is extracted; then, the classifier calculates t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F17/27

Inventor 郭雷方俊

Owner JIANGSU T Y ENVIRONMENTAL ENERGY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for automatically classifying text documents by utilizing body

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology