The invention puts forward a method with
verification for intelligently
crawling network information in a distributed way. The method comprises the following steps that: when a judgement result shows that the target page data of a website can be obtained after login
verification is carried out, obtaining corresponding login information from a
database, carrying out automatic login through a browser, and submitting
verification information; starting a timed task, using cookie to access the webpage of the timed task, and carrying out keep-alive
processing; starting a network
package capture
detector, accessing a corresponding target page according to
business requirements, carrying out HTTP (
Hyper Text Transport Protocol) message analysis, carrying out customization on a crawler script, and determining a task
crawling data size; and emitting a broadcast by a main node, notifying a corresponding task node, distributing the crawler script, starting the task node, applying for a task from a main node task
queue, carrying out data
crawling according to the applied task, and storing the crawled target data into the
queue so as to store the crawled target data into the
database in batches. By use of the method, a protected page can be automatically logged in and accessed, and a quick and expandable distributed webpage crawler integrated framework capable of mining the script is automatically generated.