Bad webpage recognition method based on URL

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for identifying methods and webpages, applied in data exchange networks, special data processing applications, instruments, etc., can solve problems such as inability to cope with new sites, large delays, and high complexity of methods

Inactive Publication Date: 2010-04-07

XI AN JIAOTONG UNIV

View PDF0 Cites 66 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0014] 2. The method based on image and streaming media recognition has a wide range of applications, but the method handles a large amount of data, is highly complex, has a large delay, consumes a lot of bandwidth resources, and is not suitable for real-time recognition and processing in a network environment. ;

The disadvantage of this method is that it has poor flexibility and cannot cope with new sites;

[0016] 4. At present, there is no literature on the identification of bad web pages through URL analysis and semantic understanding, so this invention makes up for the vacancy in this regard and provides a new idea for quickly identifying bad web pages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0066] In order to understand the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0067] refer to figure 1 As shown, in the process of identifying URLs, the special characters are first filtered out through the preprocessing module, and the suffixes, main domain names, host names and other parts that have practical effects on the identification are extracted; Belongs to the exclusive suffix (.gov.edu): if it belongs, it will be directly judged as a normal URL, otherwise it will be judged in the next step; in the main judgment process, the domain name part is segmented and feature extraction is performed, and the host name part is Feature extraction: use the combined classifier to classify and judge the extracted results. If the result of the judgment is a normal URL, it will be further confirmed by subsequent tools. If it is bad, the user will be directly prohibited from accessing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a bad webpage recognition method based on URL. The method judges whether a URL is the URL of a pornographic website through the semantic analysis to URL primary domain and the structure analysis to the whole URL. When judging, the two characteristics-sensitive string characteristic and structure characteristic contained in URL are extracted to be the basis for judging, and discriminator final comprehensive characteristic combined with SVM algorithm is adopted to perform secondary classification and obtain the judge result. The bad webpage recognition method based on URL of the invention can assist other recognition methods so as to fast recognize bad websites and provide healthy Internet environment; and the judgment can be performed without obtaining web contents so as to provide a high effective new idea for the recognition of pornographic websites.

Description

technical field [0001] The invention relates to a method for filtering bad information on the Internet, in particular to a method for identifying bad webpages based on URLs. The method involves the field of machine learning, and the final discrimination is accomplished by applying feature extraction and classification techniques in the field of machine learning. Background technique [0002] With the rapid development of the Internet, bad Internet culture is also flooding it, and the emergence of a large number of pornographic web pages has seriously affected the healthy development of young people. In recent years, research on automatic identification of pornographic content has made remarkable achievements. After a novelty search, the applicant retrieved two patents related to the present invention on the automatic identification of pornographic content, which are: [0003] 1. Multifunctional management system for network pornography and bad information detection [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): H04L12/24G06F17/30

Inventor 郑庆华骞雅楠刘均常晓吴朝晖蒋路

Owner XI AN JIAOTONG UNIV

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Bad webpage recognition method based on URL

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology