The invention discloses an implementation method for a directional crawler based on an assigned e-commerce website, belongs to the field of WEB data collection, and aims at improving the analysis efficiency and the
crawling accuracy rate of the crawler, reducing the crawler
failure rate caused by change of website content, and increasing the
readability and robustness of codes; on the basis of a generalized crawler, the sequence of tasks is managed by utilizing a
queue, multi-thread website content analysis is realized by using a
thread pool management mechanism, so that the
crawling efficiency is improved. Python is used as an implementation language, information of an assigned
web page is captured by using a method of combining a CSS (
Cascading style sheets) selector and a
Regular Expression, the analysis efficiency, the
readability and the error-tolerant rate of the crawler are greatly improved, thus the focused crawler specially used for analyzing store commodity information of the assigned e-commerce website is formed,the efficiency and the
crawling accuracy rate of the crawler are improved by the method, and the adaptability and the robustness of the crawler are improved. The method provides a stable and convenient
data source for e-commerce price analysis.