Crawler data identification method, system and device

An identification method and data technology, applied in the Internet field, can solve problems such as high energy and ineffective crawler data identification methods

Pending Publication Date: 2020-07-03
CHINANETCENT TECH
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it takes a lot of effort to maintain the UserAgent blacklist and IP address database. Crawler data can also bypass these ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler data identification method, system and device
  • Crawler data identification method, system and device
  • Crawler data identification method, system and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the purpose, technical solution and advantages of the present application clearer, the following will clearly and completely describe the technical solution of the present application in combination with specific implementation methods of the present application and corresponding drawings. Apparently, the described implementations are only some of the implementations of this application, not all of them. Based on the implementation manners in this application, all other implementation manners obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0019] This application provides a method for identifying crawler data, please refer to figure 1 with figure 2 , the method may include the following multiple steps.

[0020] S1: Obtain site map data of a target website, and generate a vector image of the site map data.

[0021] In this implementation manner, the target websi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a crawler data identification method, system and device, and the method comprises the steps: obtaining the site map data of a target website, and generating a vector diagram ofthe site map data; obtaining session data of the target website, and mapping the session data into a sub-graph in the vector graph based on a request contained in the session data; and adding a session label for the session data, the session label being used for representing whether the session data is crawler data, and training a preset classifier based on the session label and the sub-graph toobtain a classifier for distinguishing crawler data from non-crawler data. According to the technical scheme provided by the invention, the crawler data can be effectively identified.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method, system and equipment for identifying crawler data. Background technique [0002] With the continuous development of Internet technology, the amount of information on the network is also growing explosively. At present, crawler technology can be used to automatically obtain web page content, so as to quickly filter out the required information from a large number of information. In practical applications, crawlers may include legitimate crawlers such as search engines, and may also include malicious crawlers that collect illegal data. In order to prevent the server from being attacked by malicious crawlers, it is necessary to screen the access data, so as to screen out the crawler data for further analysis. [0003] Currently, crawler data can be identified or restricted by adding UserAgent blacklist, restricting IP address access frequency, identifying device fin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F16/955G06F16/958
CPCG06F16/951G06F16/9566G06F16/958
Inventor 陈志勇王凤杰赵志文
Owner CHINANETCENT TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products