Infinite layer collection method based on Web page

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A collection method and webpage technology, applied in store-and-forward switching systems, electrical components, transmission systems, etc., can solve the problems of consuming large computer resources and not being able to use multi-threading technology, so as to reduce server load and ensure accuracy easily. The effect of saving network bandwidth

Inactive Publication Date: 2009-04-08

赵洪宇

View PDF0 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although this kind of program is simple, when there are many links in a URL itself, recursion will push the unfinished code into the program code stack every time, so that the program will consume a lot of computer resources during execution.

In addition, this program cannot use multi-threading technology

Therefore, this method is not used in efficient collection procedures

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] The present invention uses the entry address of the given website as the initial URL for traversal. Based on the Web page acquisition model, traverse all the links in the Web page that conform to this model, and continuously expand to the required Web pages along with the links. Distinguish the characteristics of the web pages pointed to by these links, filter the noise according to the web page acquisition model, and then perform multi-level link analysis to extract the content that users care about.

[0020] Before starting to collect network information, the website entrance address is given first, and the given website entrance address is used as the starting URL of the traversal. When the acquisition program encounters a certain webpage, it analyzes the webpage according to the acquisition model, and adds the relevant link to the link queue; at the same time, it analyzes the content of the page, and puts the webpage into the page library. Program framework such as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method for acquiring an unlimited layer based on Web page, which comprises the following steps: (1) specifying entry page address StartURL acquired by web page; (2) analyzing each URL on the page, if the URL is a relative path, the URL is completed by using the entry address StartURL so as to convert the URL into an absolute path; and (3) judging whether the entry address StartURL is the superior of the URL or not, if so, a downlink acquisition is started to expand downwards continuously, if not, expansion is stopped; during the process of acquisition and expansion, for each URL, cyclically matching and extracting words in the web page, searching links on the web page, extracting and storing words on the link and words in the web page pointed by the link, so that all links of the web page are traversed for web page acquisition of unlimited layer. By using the method for acquiring web page, multi-level link analysis can be carried out against user requirement, contents concerned by the user can be extracted, and network information acquisition can be realized high efficiently.

Description

technical field [0001] The invention relates to a method for collecting web pages. Background technique [0002] The collection of network information is usually accomplished with the help of various search engines. A common commercial search engine consists of four parts: searcher, indexer, retriever and user interface. Generally speaking, a search engine is a network robot called a Robot computer program. It traverses the Internet from the URL of an initial page or site to automatically discover web page information. When entering a hypertext page, it uses HTML language Mark the structure to search for information and obtain URL links pointing to other hypertexts, select the next site to visit through a certain algorithm, and then turn to another site to continue collecting information. The function of the indexer is to understand the data information searched by the searcher, extract index items from it, and establish an index library for representing data documents and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04L29/08H04L12/54

Inventor 赵洪宇袁青霞李闻阮振中

Owner 赵洪宇

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Infinite layer collection method based on Web page

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology