Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Fuzzy data mining based automatic classification method of Chinese web pages

A fuzzy data and automatic classification technology, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of non-standard web page writing, time cost and high complexity of web page classification, and reduce labor costs and improve efficiency Effect

Inactive Publication Date: 2010-08-04
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 78 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 2) Chinese webpages contain a lot of "noise", many webpages are not standardized, and contain a lot of advertisements, annotations and other information. It is necessary to preprocess the webpages, evolve the content of the webpages, and extract the information that users are interested in
[0006] 3) Most of the current research on Chinese webpage classification focuses on classifying webpages through feature selection and vector representation of Chinese webpages, using the KNN (k-Nearest Neighbor algorithm) classification algorithm to classify webpages. This method realizes the time cost and complexity of webpage classification high sex

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fuzzy data mining based automatic classification method of Chinese web pages
  • Fuzzy data mining based automatic classification method of Chinese web pages
  • Fuzzy data mining based automatic classification method of Chinese web pages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] Below in conjunction with accompanying drawing, the technical scheme of invention is described in detail:

[0049]The present invention proposes a technical framework for automatic classification of Chinese webpages based on fuzzy data mining, and designs a fuzzy classification algorithm for webpages in detail, as shown in the attached image 3 shown. As can be seen from the figure, the system is divided into three layers, from bottom to top: data collection layer, business logic layer and presentation layer.

[0050] The method involved in the data acquisition layer is the method of extracting Chinese from web pages based on content rules. The web page is preprocessed by first obtaining the HTML source code of the web page. Through testing and analysis, it is found that web pages tend to include redundant information such as various tags, script language codes, advertisement and image links, designer notes, function declarations, and copyright information. Noise inf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a fuzzy data mining based automatic classification method of Chinese web pages. A classification word description library construction part, a new web page preprocessing part, a fuzzy classification matcher part and a classification result fuzzy judgment part are provided in the method. By applying a method of fuzzy comprehensive judgment, the fuzzy classification operation is carried out on a training set characteristic word vector fuzzy matrix generated when a classification word description library is constructed and a web page characteristic word fuzzy vector generated by the new web page preprocessing part, and finally the fuzzy classification of the Chinese web pages is obtained through classification result fuzzy judgment. The method can effectively improve the classification efficiency and solve the classification fuzziness problem and has favorable expandability, simple and convenient operation and easy popularization.

Description

technical field [0001] The present invention is aimed at the research of fuzzy data mining and the automatic classification method of Chinese web pages based on fuzzy data mining. The knowledge of data mining and the method of fuzzy classification of web pages involve technical fields such as automatic web page acquisition, Chinese web page preprocessing, Chinese word segmentation and keyword frequency analysis, and fuzzy classification of Chinese web pages. Background technique [0002] With the rapid development of Internet technology and Web technology, the number of web pages on the Internet is constantly increasing. The increasing popularity of the network and the explosive growth of the number of Internet users make the behavior of network users appear complex and diverse. How to properly analyze, manage and warn the behavior of network users is an urgent problem to be solved. Faced with the massive amount of information on the Internet, how to filter out the informat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 孙雁飞姚蓓丽张顺颐王攀
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products