Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web text classification method based on consistency clustering

A text classification and consistency technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as poor performance and time-consuming operation, achieve strong robustness, improve quality, and overcome dimensionality effects of disaster

Inactive Publication Date: 2013-04-17
BEIHANG UNIV
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, text data has the characteristics of massive, high-dimensional, and sparse, which makes a single traditional clustering algorithm not only perform poorly in the face of text data, but also take a lot of time to run

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web text classification method based on consistency clustering
  • Web text classification method based on consistency clustering
  • Web text classification method based on consistency clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below in conjunction with the accompanying drawings and specific implementation examples.

[0033] The invention provides a new method for Web document classification, which can not only obtain high-quality text classification results, but also has strong robustness. By clustering the text data of some dimensions multiple times, multiple high-quality basic clustering results are obtained; and then through the framework of consistent clustering, these basic clustering results are fused to obtain the final text classification result. Whether it is to obtain the basic clustering result or the consistent clustering, the fast clustering method is used to solve it; and the present invention creatively redefines the consistent clustering problem as the solution of the fast clustering method through the transformation of the utility function problems, making it easy for those skilled in the art to use the method.

[0034] A web t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Web text classification method based on consistency clustering. The Web text classification method based on consistency clustering comprises the following steps of: inputting a paragraph of text information data which comprises n texts; carrying out word segmentation on the n texts according to a preset lexicon containing m vocabularies; manufacturing an m-dimensional vector space model of each text according to the emerging times of participles in each text in the lexicon, and combining the n texts together so as to form a vector space matrix of n*m; extracting sub matrix of n*m' randomly in the vector space matrix of n*m, wherein m' is less than m, and carrying out clustering analysis on the sub matrix of n*m'; repeating the steps for r times until r clustering analysis results are obtained; and carrying out clustering analysis again on the n clustering analysis results, thereby obtaining the final clustering result which represents the classification relationship among the n texts, and classifying the n texts. The method can be used for overcoming the dimension course in clustering analysis and analyzing a great amount of text data, and is especially suitable for the information safety field of online public opinion monitoring and the like.

Description

technical field [0001] The invention relates to a text classification method, especially a text classification method based on consistency clustering, which belongs to the fields of data mining, machine learning and business intelligence, and is especially aimed at massive, heterogeneous, high-dimensional data clustering, and can be used for knowledge fusion and knowledge reuse. Background technique [0002] As one of the important information carriers, text data is growing at an alarming rate, especially with the popularity and popularity of the Internet. How to quickly and effectively find the information that meets the needs from these huge and complex information is a huge challenge for people. As a key technology for processing and organizing a large amount of text data, text classification can largely solve the problems caused by information explosion and information clutter. Based on the assumption that documents of the same type have a high degree of similarity and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 吴俊杰刘洪甫李红韩小汀
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products