Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

System and method for automatically classifying text

a text classification and automatic classification technology, applied in the field of text classification, can solve the problems of inaccuracy of retrieved information, slow execution speed, and limited coverage rang

Inactive Publication Date: 2006-06-29
CONSONA CRM A WASHINGTON CORP
View PDF49 Cites 138 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

"The present invention provides a method for simultaneously classifying documents into multiple categories. This is done by associating features with each category and document, and then using a mathematical formula to calculate a score for each document based on the features it contains. The score is used to determine which category the document should be placed in. The method can also be used to manually or automatically associate features with categories. The technical effect of this invention is to improve the efficiency and accuracy of document classification."

Problems solved by technology

While this approach is efficient, it suffers from problems relating to the inaccuracy of the retrieved information.
While deep linguistic processing improves accuracy based upon an analysis of the meaning of input text, speed of execution is slow and range of coverage is limited.
This is especially problematic when such techniques are applied to large volumes of text
Text classification systems which rely upon rule-base techniques also suffer from a number of drawbacks.
The most significant drawback is that such systems require a significant amount of knowledge engineering to develop a working system appropriate for a desired text classification application.
It becomes more difficult to develop an application using rule-based systems because individual rules are time-consuming to prepare, and require complex interactions.
Exacerbating the problem is the fact that in most applications, training data is hard to locate, often does not provide adequate coverage of the categories, and is difficult and time-consuming for people to categorize, requiring manual effort by experts in the subject area (who are usually scarce and expensive resources).
Further, badly categorized training data or correctly categorized training data with extraneous or unusual vocabulary degrades the statistical model, causing the resulting classifier to perform poorly.
Of the prior art systems that utilize training data, most do not have the capability to interactively take advantage of human knowledge.
History has shown that a person will often know what results from sound training data, what results from poor training data, and what may not be adequately expressed in the training data.
Those prior art systems that do utilize user input, do not allow users to directly affect the quantified relationship between vocabulary features and classification categories, but simply allow the user to change the training data.
These classifiers cannot interact, consequently one cannot benefit from the other.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for automatically classifying text
  • System and method for automatically classifying text
  • System and method for automatically classifying text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In the following detailed description, reference is made to the accompanying drawings that form a part thereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. This embodiment is described in sufficient detail to enable those skilled in the art to practice the invention and it is to be understood that other embodiments may be utilized and that algorithmic changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limited sense.

[0024] Turning first to the nomenclature of the specification, the detailed description which follows is represented largely in terms of processes and symbolic representations of operations performed by conventional computer components, including a central processing unit (CPU), memory storage devices for the CPU, and connected pixel-oriented display devices. These operations include the manipulation of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.

Description

RELATED APPLICATIONS [0001] The following application is relied upon and hereby incorporated by reference in this application: [0002] U.S. Provisional Application No. 60 / 206,975, entitled “System and Method for Providing a Text Classifier,” bearing attorney docket no. 07569-6004-00000.TECHNICAL FIELD [0003] The present invention is directed to text classification and, more particularly, to a computer-based system for text classification that provides a resource that can be utilized by external applications for text classification. BACKGROUND [0004] Throughout the entire period of recorded history, people have memorialized their thoughts, actions, hopes and dreams on a daily basis. Prior to the latter part of the 20th century, this recorded history was typically written for exchange between human beings without any expectation that the information would be stored in a machine or otherwise converted into a machine-readable format. At that time, archives of this information resided in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30G06F15/18G06F17/00G06F17/21G06F40/20
CPCG06F17/218G06F17/27G06F17/30616G06F17/30707G06F16/353G06F16/313G06F40/117G06F40/20
Inventor UKRAINCZYK, IGORCOPPERMAN, MAXHUFFMAN, SCOTT B.
Owner CONSONA CRM A WASHINGTON CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products