The invention relates to the field of technology, in particular to a webpage-
crawling-based crawler technology. After URL (
uniform resource locator) link addresses are initiated, the technology comprises the following steps: (1), reading the URL link addresses at the head of a running
queue in the
queue from a given access by using an equilibrium assignment crawler thread; (2), judging whether the URL link addresses exist or not, stopping
crawling if the URL link addresses exist, otherwise
crawling and placing the URL link addresses in a completion
queue; (3), extracting webpages corresponding to the URL link addresses which are placed in the completion queue; (4), filtering the URL link addresses in the extracted webpages, keeping and writing effective URL link addresses into the running queue, and returning to the step (1) to repeat the steps. According to the technology, corresponding resources are crawled from
the Internet, and the URL link addresses are rewritten and stored to pertinently acquire Internet information based on objects set by users according to tasks created by the users; in addition, multi-
machine parallel crawling, multi-task scheduling, continuous crawling from a
breakpoint, distributed crawler management and crawler control can be implemented.