Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A webpage classification method, terminal equipment and storage medium

A web page classification and web page technology, applied in neural learning methods, website content management, web data retrieval and other directions, can solve the problems of not being widely applicable to web page data, limited application scope, and high classification error rate, to solve the problem of sparse web page features Problems, wide application range, good recognition effect

Active Publication Date: 2022-04-29
XIAMEN MEIYA PICO INFORMATION
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is less flexible and has the problem of low generalization ability; 2) building a classification model based on webpage content, most of the webpage features currently used are too single, such as only using text information or only using image information as the feature representation of webpages, It cannot fully represent the content information of the webpage, ignoring the information carried by other structural data, often ignoring the key information and causing the features to be more sparse, which has obvious limitations
Existing webpage classification methods have the following deficiencies: (1) by comparing webpage content or URLs at present, it is usually necessary to build a large-scale comparison library. High error rate and poor generalization; (2) At present, the classification model is built based on webpage content. Since only single-structure data is considered in the modeling process, but the information structure of webpage content is diverse, some webpages may only have text or pictures, etc.
Therefore, it is easy to see that the classification method based on single-structure data cannot be widely applied to all web page data, and cannot solve the problem of feature sparsity. The scope of application is very limited, and the model effect cannot be guaranteed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A webpage classification method, terminal equipment and storage medium
  • A webpage classification method, terminal equipment and storage medium
  • A webpage classification method, terminal equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] The embodiment of the present invention provides a webpage classification method, such as figure 1 As shown, the method includes the following steps:

[0031] S1: Collect multiple types of web pages, construct graph structures based on at least two types of features in each web page, and mark the types of web pages, and then form a training set with all graph structures with type labels.

[0032] The construction of the graph structure includes the construction of nodes and the construction of edges. Nodes in this embodiment include picture nodes corresponding to picture types, text nodes corresponding to text types, and webpage nodes corresponding to webpage structure types, such as figure 2 As shown, the nodes beginning with "O" represent different web page nodes, the nodes beginning with "W" represent different text nodes, and the nodes beginning with "P" represent different picture nodes.

[0033] 1. Image node

[0034] In this embodiment, the picture nodes use ...

Embodiment 2

[0076] The present invention also provides a webpage classification terminal device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the implementation of the present invention is realized when the processor executes the computer program. Steps in the above method embodiment of Example 1.

[0077] Further, as an executable solution, the web page classification terminal device may be computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers. The web page classification terminal device may include, but not limited to, a processor and a memory. Those skilled in the art can understand that the composition structure of the above-mentioned webpage classification terminal device is only an example of the webpage classification terminal device, and does not constitute a limitation to the webpage classification terminal device, and may include more or less components than the above, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a method for classifying webpages, terminal equipment and storage media. The method includes: S1: collecting various types of webpages, constructing a graph structure according to at least two types of features in each webpage, and performing a classification of the types of webpages After marking, form a training set with all graph structures with type tags; S2: Construct a graph convolutional neural network model, train the graph convolutional neural network model through the training set, and use the trained model as a web page classification model; S3: For the webpage to be classified, after the graph structure is constructed according to the at least two types of features described in step S1, the webpage type corresponding to the graph structure is determined through the webpage classification model. The invention fully learns additional heterogeneous information such as text and pictures in webpages to construct a webpage classification model. Compared with existing webpage classification methods, it can effectively solve the limitations of webpage classification methods based on a single data structure, and can obviously solve webpage characteristics. Sparse problem.

Description

technical field [0001] The invention relates to the field of webpage classification, in particular to a webpage classification method, terminal equipment and a storage medium. Background technique [0002] With the rapid popularization of Internet technology, Internet applications are also booming. High-quality, personalized content is constantly emerging, and more and more netizens can share rich network resources. But at the same time, some illegal and criminal activities are also hidden in it, and a large amount of false information, advertising information, Internet fraud and other illegal and illegal information are released on the Internet, which seriously endangers the property safety of the majority of netizens. How to discover and identify this kind of bad text information and purify the network space requires an efficient and intelligent web page analysis method. [0003] The content information structure of the web page is diverse, with pictures, texts, videos a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/958G06K9/62G06V10/764G06V10/774G06V10/82G06N3/04G06N3/08
CPCG06F16/958G06N3/049G06N3/08G06N3/045G06F18/241G06F18/214
Inventor 陈志明赵建强庄灿波刘晓芳曾鹏
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products