The invention discloses a highly-decoupled method capable of dynamically managing a crawler, which comprises the following steps that: the crawler is divided into two stages of
data analysis and new target generation, rules of the two stages corresponding to a collection target are compiled into
json data according to a protocol, and the
json data are stored into a host end; the host runs a task and sends the task to a
client with sufficient resources through a
message queue module according to a
resource scheduling algorithm, the
client receives task information, converts the task information into
executable information through the crawler protocol core, runs the
executable information through a crawler running module, and finally obtains data; the host end obtains data and a new task, and stores and updates a task
pool; and the host end is separated from a crawler
server, so that the
coupling of the
system can be reduced. Therefore, after the functions of the crawler are separated, the complexity of the crawler
server can be reduced, and the host end can be modified while the distributed crawler
system runs so as to achieve the purpose of specific control management, so that the whole module is subjected to decoupling and extensible design, and the robustness and stability of the whole framework are enhanced.