A URL-based webpage classifier construction method and its classification method
A web page classification and construction method technology, applied in the field of information security, can solve problems such as encryption or equivalent replacement, and achieve the effects of low false positive rate and false negative rate, improved classification accuracy, and simple operation.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0039] The invention discloses a method for constructing a URL-based webpage classifier, such as figure 1 As shown, the steps are as follows:
[0040]Step S1, obtain URLs of a plurality of webpages, mark webpage attributes for each URL, and use each URL of the above-mentioned marked webpage attributes as a training sample to form a training sample set; in this embodiment, acquire URLs of multiple webpages, the training sample set includes a certain number of webpage attributes as malicious URLs and a certain number of webpage attributes as benign URLs.
[0041] Step S2, for each training sample in the training sample set, perform word segmentation processing on each training sample through the selected character, and then convert it into a word vector.
[0042] In this embodiment, the characters selected in this step include "?", "=", ".", "&", "-" and "#", that is, through "?", "=", "." , "&", "-" and "#" perform word segmentation processing for each training sample. For ex...
Embodiment 2
[0062] This embodiment discloses a method for constructing a URL-based webpage classifier, which differs from the method for constructing a URL-based webpage classifier in Embodiment 1 only as follows:
[0063] In this embodiment, after the training sample set is obtained in step S1, it also includes the step of deduplicating the training sample set, such as image 3 As shown, the details are as follows: first select an initial value for N, obtain the first N characters of each training sample in the training sample set, and for URLs with the same first N characters in the training sample set, only one remains after deduplication processing, and then judge Whether the total number of training samples in the training sample set is less than or equal to the threshold, if not, then reduce the value of N, and do the same processing as above, until the total number of training samples in the training sample set is reduced to less than or equal to the threshold; for deduplication pro...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com