Method and system for identifying web crawler, storage medium and electronic apparatus

A technology for identifying networks and web crawlers, applied in the field of information processing, can solve problems such as failure to identify crawlers, and achieve the effect of good identification and realization of identification

Inactive Publication Date: 2019-01-15
WUHAN DOUYU NETWORK TECH CO LTD
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, setting the page attribute as a hidden attribute for anti-crawlers is characterized by preventing normal people from clicking on the link, but the crawler will crawl the link. This method can pass the attribute of the link after careful research by the crawler. The link is crawled, thus bypassing the crawler's detection strategy and failing to identify the crawler

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for identifying web crawler, storage medium and electronic apparatus
  • Method and system for identifying web crawler, storage medium and electronic apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

[0035] see figure 1 As shown, the embodiment of the present invention provides a method for identifying a web crawler, the method includes the following steps:

[0036] S1. Create multiple invalid links for detecting web crawler behavior.

[0037] The embodiment of the present invention will first create some invalid links. Invalid means that the links will not provide any information for normal webpage access, and normal users basically cannot see the links and will not click on the links. Invalid Links are only used to detect crawler behavior. Usually invalid links will be hidden as much as possible to prevent normal users from clicking and accessing.

[0038] The specific link address of the invalid link can be to add a field to the existing address to indicate the invalid link, or to change a certain field of the normal address to an in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for identifying a web crawler, a storage medium and an electronic apparatus, which relate to the information processing field. The method for identifyingthe web crawler comprises the following steps: creating a plurality of invalid links for detecting the web crawler behavior; using the link address provided by the invalid link to create a first linkthat hides link attribute information, and creating a second link that is invisible to the naked eye with the same background color as the HTML page. The first link and the second link are inserted into the HTML page. The operation information recorded on the HTML page is reported to the server. The server judges whether the operation information is received, and if not, the web crawler is determined; if yes, the server further judges whether there is access to the first link and the second link, and if yes, the web crawler is determined; if not, it is determined that the access is normal access. The invention can better avoid the crawler from bypassing the crawler detection strategy and more effectively identify the web crawler.

Description

technical field [0001] The invention relates to the field of information processing, in particular to a method for identifying web crawlers, a storage medium, electronic equipment and a system. Background technique [0002] At present, many websites use anti-crawler technology to prevent normal website visits from being blocked by crawler traffic. The commonly used anti-crawler technology will judge whether it is a browser by judging the Headers field in the user request. It will also use the method of counting the number of IP visits to judge whether it is a crawler. For example, if the number of IP visits is very large for a period of time, it can be judged to be a crawler. It will also be judged by dynamically generating web page information, for example, using code to generate part of the content of the page instead of a static page. [0003] However, based on the above-mentioned anti-crawler strategy, there will eventually be corresponding countermeasures. For example...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/56G06F16/958G06F16/953
CPCG06F21/566
Inventor 周志刚陈少杰张文明
Owner WUHAN DOUYU NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products