Network crawling method, terminal and storage medium
A network crawler and effective technology, applied in the field of network crawlers, can solve the problem of limited crawling times or frequency of the same proxy IP, and achieve the effect of avoiding waste
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0053] figure 1 It is a flow chart of the web crawler method provided by Embodiment 1 of the present invention. According to different requirements, the execution sequence in the flow chart can be changed, and some steps can be omitted.
[0054] 101: Store multiple proxy IPs acquired at preset time intervals in a preset proxy IP pool.
[0055] In this embodiment, a proxy IP pool is preset in the local database, and multiple acquired proxy IPs are added to the proxy IP pool for use by crawlers. The proxy IP can be found in the proxy IP website provided on the Internet, and the specific list can be obtained manually or automatically by another small crawler. It is also possible to purchase multiple proxy IPs through a third-party service organization, and add the obtained proxy IPs to the preset proxy IP pool.
[0056] In this embodiment, the proxy information of the proxy IP may include, but not limited to: IP address, name and port.
[0057] In this embodiment, it is possi...
Embodiment 2
[0075] figure 2 It is a flow chart of the web crawler method provided by Embodiment 2 of the present invention. According to different requirements, the execution sequence in the flow chart can be changed, and some steps can be omitted.
[0076] 201: Store multiple proxy IPs acquired at preset time intervals in a preset proxy IP pool.
[0077] Step 201 in this embodiment is the same as step 101 in Embodiment 1, and will not be described in detail here.
[0078] 202: Verify each proxy IP in the proxy IP pool one by one, and judge whether the obtained proxy IP has the first validity.
[0079] In this embodiment, the proxy IP that performs the first validity verification is referred to as the proxy IP to be verified, and the proxy IP to be verified is used to access a search engine (eg, Google, Baidu, etc.) to verify whether a response from the search engine is obtained. If a response from the search engine is obtained, it indicates that the proxy IP to be verified has the fi...
Embodiment 3
[0137] image 3 It is a functional block diagram of a preferred embodiment of the web crawler device of the present invention.
[0138] In some embodiments, the web crawler device 30 runs in a terminal. The web crawler device 30 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the web crawler device 30 can be stored in a memory, and executed by at least one processor to execute (see for details figure 1 and its related description) tracking of the hand region.
[0139] In this embodiment, the web crawler device 30 of the terminal can be divided into multiple functional modules according to the functions it performs. The functional modules may include: a storage module 301 , a judging module 302 , a recording module 303 , a selection module 304 and a crawling module 305 . The module referred to in the present invention refers to a series of computer program segments that can be executed by at least...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com