Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Variable URL-based crawler recognition method

An identification method and a variable technology, applied in the network field, can solve problems such as scarcity of IP resources and misjudgment, and achieve the effect of avoiding large-scale false positives and good performance

Inactive Publication Date: 2017-03-22
北京知道未来信息技术有限公司
View PDF7 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But this technology is difficult to continue to use now
The main problem is that IP resources are becoming more and more scarce, and it is easy for a large number of users to use the same IP. If the request source IP is still used as the basis, there will be misjudgment, and some normal user IPs will be identified as crawler IPs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Variable URL-based crawler recognition method
  • Variable URL-based crawler recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The preferred embodiments will be described in detail below in conjunction with the accompanying drawings. It should be emphasized that the following description is only exemplary and not intended to limit the scope of the invention and its application.

[0021] Process flow of the present invention such as figure 1 As shown, there is always a link at the entrance of the web page (such as the home page), and when the user's mouse does not stay on it, its link corresponds to the "detected url resource". The webpage monitors the mouse behavior, and when the user hovers over it, its url is changed to a "valid url resource", which does not affect the normal user's access. If you access the detection url resource, it means that it is a crawler behavior, so as to distinguish crawlers from normal users.

[0022] Such as figure 2 As shown in , there is always a detection url on the webpage, and this url will be changed to the correct url only when the user hovers over the l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a variable URL-based crawler recognition method. The method comprises the following steps of: 1) setting a variable URL link in a set webpage, wherein the URL link corresponds to two access resources: a URL resource for detection and an effective URL resource; 2) when the variable URL link is triggered, detecting a triggering condition of the variable URL link, if the variable URL link is triggered by a mouse, switching the variable URL link to the effective URL resource corresponding to the variable URL link, otherwise, linking the variable URL link to a default link, namely, the detected URL resource; and 3) when the URL resource for detection is detected, marking an access source IP for accessing to the URL resource for detection as a crawler. According to the method, specific sources can be accurately positioned to carry out break-in operation without influencing the normal user access.

Description

technical field [0001] The invention relates to a crawler identification method based on a variable URL, which belongs to the technical field of networks. Background technique [0002] Usually crawler identification is based on the request frequency of the same IP. Each IP represents a corresponding request source. A threshold of request frequency needs to be set. If the request frequency exceeds this value, the source is considered to be a crawler. But this technology is difficult to continue to use now. The main problem is that IP resources are becoming more and more scarce, and it is easy for a large number of users to use the same IP. If the request source IP is still used as the basis, there will be misjudgment, and some normal user IPs will be identified as crawler IPs. Contents of the invention [0003] Aiming at the technical problems existing in the prior art, the object of the present invention is to provide a variable URL-based crawler identification method. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 陈剑张宇杰
Owner 北京知道未来信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products