Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A crawling path planning method and device

A path planning and path technology, applied in the Internet field, can solve the problems of increased burden of crawling work, low efficiency, complex web page code, etc., to reduce the burden of crawling, improve efficiency, and ensure the effect of comprehensiveness

Active Publication Date: 2019-11-15
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] For the first method, the result of planning an accurate crawling path needs to be realized through the analysis and research of the web page code by the developer, and the web page code is more complicated, which will lead to low efficiency
[0005] For the second method, although the workload of developers is reduced, due to the existence of redundant pages in the website, direct full-site crawling will cause too many downloads of useless pages, which will increase the burden on the crawling work

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A crawling path planning method and device
  • A crawling path planning method and device
  • A crawling path planning method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054]The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0055] In order to improve the efficiency of path planning and reduce the load of crawling, the embodiments of the present invention provide a crawling path planning method and device.

[0056] A crawling path planning method provided by an embodiment of the present invention is firstly introduced below.

[0057] refer to figure 1 , figure 1 It is a schematic flowchart of a crawling path planning method provided by an embodiment of the present invention, and the m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and device for planning a crawling route. The method comprises the steps that according to a preset crawling strategy, pages of corresponding websites of a preset entrance page are crawled with the preset entrance page as the starting point; page characteristics of each crawled page are collected, and route instances from the preset entrance page to each crawled page are recorded; according to the recoded route instances and the page characteristics of each crawled page, the route instances which reach pages similar to a preset target page are selected; and according to the selected route instances and the page characteristics of each page in the selected route instances, route planning is conducted, and a route planning result is generated. Through application of the method and device disclosed by the embodiment of the invention, route planning efficiency can be increased, and crawling burdens can be reduced.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a crawling path planning method and device. Background technique [0002] Web crawlers can automatically extract web pages and download web pages from the World Wide Web for search engines. They are an important part of search engines. At present, web crawlers have become the main means of collecting massive amounts of information and data from the Internet, and many excellent open source crawler frameworks have also appeared. Web crawlers are mainly divided into two categories: one is search crawlers for search engines, and the crawling target is the entire Internet; the other is directional crawlers, and the crawling target is a specific subset of all websites, or even a certain website . There are currently two implementation methods for directional crawlers that crawl webpages from a certain website: one is to define and plan accurate and executable crawling path result...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951
CPCG06F16/951
Inventor 张煜苒帅伟良
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products