Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Document classification method and document classification device

A document classification and document technology, which is applied in the field of document classification methods and devices, can solve problems that affect the effect of classification, consume large labor costs, and affect the accuracy of the classification system, and achieve the effect of improving adaptability and classification performance

Active Publication Date: 2014-02-12
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is that it requires users to fully understand the classification system to provide accurate classification information, which increases the user's operating costs, and in practical applications, some users do not seriously provide classification information, and the system cannot recognize this. In this case, it will affect the accuracy of the classification system
However, in practical applications, due to the consideration of labor costs, the scale of selected labeled samples is often limited, resulting in inaccurate classification feature extraction, which will affect the classification effect to a certain extent.
On the other hand, in the face of the rapid update of information, the labeled corpus should also be updated accordingly to ensure the classification effect, but to continuously obtain the labeled corpus also requires a lot of labor costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method and document classification device
  • Document classification method and document classification device
  • Document classification method and document classification device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] First, a document classification method provided by an embodiment of the present invention is described, and the method may include the following steps:

[0054] extracting the feature text of the target document, and using the feature text to form a search condition;

[0055] performing a search using the search condition to obtain a corresponding search result;

[0056] calculating the text similarity between the target document and the search result;

[0057] According to the calculated text similarity and the classification information of the search result, the classification result of the target document is obtained.

[0058]The solution of the embodiment of the present invention is based on such a realization premise: currently there are some documents (documents may be located inside or outside the application platform), and these documents themselves have been classified, that is, they have classification information with a high degree of confidence. Then, for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a document classification method and a document classification device. The document classification method includes the steps: extracting feature text of a target document and utilizing the feature text to form search conditions; searching by utilizing the search conditions to acquire a relevant search result; calculating text similarity of the target document and the search result; acquiring a classification result of the target document according to the acquired text similarity by calculating and classification information of the research result. Based on the similarity among the texts and by utilizing existed document classification information to perform classification on the new document, the classification result high in confidence coefficient can be acquired through statistical computation upon text classification similar to current text content due to the fact that documents similar in the text content are high in probability of belonging to the same classification.

Description

technical field [0001] The invention relates to the field of computer application technology, in particular to a document classification method and device. Background technique [0002] With the development of Internet technology, the amount of information on the Internet has exploded. In order to apply these information better, it is necessary to manage these information data effectively. Among them, document classification (document classification) is currently a widely used management technology. Document classification refers to determining a category for each document in the document collection according to the content or certain attributes of the document. In this way, users can not only browse documents in a specific category conveniently, but also make finding documents easier by limiting the search scope. [0003] For massive document resources, it is obviously unrealistic to use a completely manual classification method for classification. At present, on some U...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/951
Inventor 徐兴军
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products