Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Domain name data mining method and device and Redis server

A data mining and server technology, applied in the field of network security, which can solve problems such as task accumulation and data redundancy

Pending Publication Date: 2021-02-26
BEIJING ANBOTONG TECH CO LTD
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problem that each host first saves all the domain name data collected locally, and subsequently adopts the parsing program to de-duplicate the domain name data collected by all hosts, which will lead to the accumulation of tasks when there are many crawling tasks, resulting in data For the problem of redundancy, this application discloses a domain name data mining method, device and Redis server through the following embodiments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain name data mining method and device and Redis server
  • Domain name data mining method and device and Redis server
  • Domain name data mining method and device and Redis server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to solve the problem that each host first saves all the domain name data collected locally, and subsequently adopts the parsing program to de-duplicate the domain name data collected by all hosts, which will lead to the accumulation of tasks when there are many crawling tasks, resulting in data For the problem of redundancy, this application discloses a domain name data mining method, device and Redis server through the following embodiments.

[0057] see figure 1 As shown, the domain name data mining method disclosed in the embodiment of the present application belongs to the two-end interaction between multiple hosts 100 and the Redis server 200 .

[0058] The first embodiment of the present application discloses a domain name data mining method, the domain name data mining method is applied to any host, see figure 2 As shown in the schematic diagram of the workflow, the domain name data mining method includes:

[0059] Step S101, establishing a connection...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a domain name data mining method and device and a Redis server. When the method is applied to any host, the method comprises the steps of creating connectionwith the Redis server, operating a plurality of crawler threads and controlling each crawler thread to sequentially obtain a crawling task from a task distribution module and execute the crawling task; analyzing response data returned after the target crawler thread executes the target crawling task, obtaining corresponding target domain name information, sending the corresponding target domain name information to a domain name deduplication module, and judging whether the target domain name information is repeated with a crawled domain name or not; if so, abandoning the target domain name information; and if not, storing the target domain name information to a local storage module. The domain name data collected repeatedly is deleted before storage, so that the problem of distributed crawler deduplication is effectively solved, and the subsequent situations of task accumulation and data redundancy are avoided.

Description

technical field [0001] The present application relates to the technical field of network security, in particular to a domain name data mining method, device and Redis server. Background technique [0002] At present, the mining of domain name data is mainly realized by means of web crawlers. Web crawling refers to writing crawler scripts to obtain domain name data. The basic workflow includes: first select some URLs (Uniform Resource Locators, Uniform Resource Locators) as seed URLs and put them in the queue to be crawled, and then write crawler scripts to target Take the seed URL in the queue, access the website by simulating manual browsing, store and analyze the crawled web page HTML (Hyper Text Markup Language, Hypertext Markup Language) data, and use the parsed new link as the next layer of crawling Fetched torrent URL. [0003] In order to improve crawling efficiency, the same crawler program is usually executed on multiple hosts for distributed crawling of domain na...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/12
CPCH04L61/103H04L61/4511
Inventor 柳开江
Owner BEIJING ANBOTONG TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products