The invention discloses a data
crawling implementation method based on a distributed crawler technology, which relates to the technical field of data
crawling, and comprises the following steps of S1,appointing a URL, finding an address which finally needs to find grabbed data according to a given address, and obtaining a corresponding index code, and S2, initiating a request, splicing websites according to the codes obtained in the step S1, judging whether the websites are required captured data or not, if the websites are data pages, calling back to detail pages, and otherwise, continuing to search the data websites circularly. According to the method, focused capture and script are adopted, regularization,
json data and a plurality of data with data frequencies of year, month, quarterand the like are used in the capture process, incremental updating and batch
insertion into an
Oracle database are carried out, contents are processed and screened when webpage capture is carried out,only webpage information related to requirements is captured as much as possible, and whether a character string is matched with a certain mode or not can be conveniently checked.