The invention provides a method and
system for parallel square crossing
network data collection. The method comprises the steps that S1 a collection method is determined, and S2 tasks are updated at regular time, wherein if a common URL
list collection method is determined, data and
metadata of a webpage to be downloaded are directly downloaded according to a URL
list, and if a square crossing collection method is determined, a webpage to be downloaded is searched by utilizing a crossing keyword
list, and data and
metadata of the webpage is downloaded. Scanning inspection is carried out on all webpages from the first layer to the current layer in a downloaded webpage, and when the last modification time of a newly downloaded webpage is later than the last modification time of a downloaded webpage, data collection is carried out on the newly downloaded webpage, and webpage
data records are updated. By means of the method and
system for parallel square crossing
network data collection, potential risks and cost in the
multithreading technology are avoided, potential risks and cost in
multithreading parallel collection are lowered, parallel collection of
mass data can be carried out more stably and more efficiently, and data reading and inquiring efficiency is improved.