Document classification method based on variance

A document classification and document technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of building a classification model, unable to classify effectively, unable to extract feature value sets, etc., to achieve high efficiency and ensure classification. The effect of accuracy

Inactive Publication Date: 2014-10-29
INFORMATION RES INST OF SHANDONG ACAD OF SCI
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, without the dependence of the topic, these classification algorithms cannot macroscopically establish an effective classification model for documents without topic restrictions before the domain is determined.
The above problems exist in the distinction between literary works and scientific and technological documents, that is, it is impossible to effectively classify whether a document belongs to a scientific and technological document, a novel or an essay
Because the fields and themes involved in literary works and scientific literature are infinite, and when the fields and themes involved in different types of documents are still intersecting, it is impossible to provide accurate category definitions during the training phase, and it is also impossible to extract features that can represent categories. value set, so it is difficult to build a classification model for this classification problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method based on variance
  • Document classification method based on variance
  • Document classification method based on variance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0024] Aiming at the topic-based classification method in a specific field, it is unable to solve the classification problem of scientific and technological literature involving unlimited fields and topics, as well as two typical literary works, novels and prose.

[0025] Such as figure 1 As shown, the principle diagram of the variance-based document classification method of the present invention is given, which includes a sample training stage and a document classification stage, such as figure 2The flow chart of the sample training stage is given. The principle is: firstly, it is necessary to collect three different types of documents of scientific and technological literature, novels and prose from various document libraries such as scientific and technological e-books, technical patent databases, libraries, book bars, etc. sample. With the suppor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document classification method based on variances. The document classification method based on the variances comprises the steps of (a) gathering enough classification-specific scientific and technical literatures, novels and essays as training samples; (b) performing word segmentation through an existing word segmentation method and calculating word frequencies; (c) performing normalization processing on the word frequencies; (d) calculating the variance of the word frequencies in each document; (e) concluding intervals of the variances of the word frequencies; (f) processing the documents to be classified; (g) judging drop-in intervals of the variances; (h) acquiring document classifications according to the drop-in intervals. By means of the document classification method based on the variances, the documents to be classified can be classified automatically, reasonably and scientifically according to different variances of the word frequencies caused by wording features of scientific and technical literatures, novels and essays, and high efficiency is achieved while the classification accuracy rate is ensured. Therefore, the theoretical foundation is laid for classifying existing literatures into scientific and technical literatures, novels and essays, and a complete, scientific theory method is provided.

Description

technical field [0001] The present invention relates to a document classification method based on variance, and more specifically, to a document classification method based on variance to distinguish scientific and technological documents, novels and prose according to the difference in word frequency variance caused by the characteristics of words used in scientific literature, novels and prose . Background technique [0002] With the development and progress of Internet technology, the document resources in the network are constantly enriched, including literary works such as novels and essays that enrich people's spiritual life, and scientific and technological documents that provide people with knowledge and lay the foundation for scientific research. The crystallization of wisdom and technology is the precious wealth of human civilization. However, with the advent of the era of big data, the exponential growth of massive resources poses challenges for the effective org...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 赵燕清魏墨济朱世伟于俊凤李晨蔡斌雷王蕾冯海洲王爱萍
Owner INFORMATION RES INST OF SHANDONG ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products