Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web page classification storage system and method

A webpage classification and storage system technology, applied in the Internet field, can solve the problems of many interference factors, disregarding the needs of vertical search, and inability to distinguish webpage categories, etc., to achieve the effect of reducing interference factors

Active Publication Date: 2016-03-02
BEIJING QIHOO TECH CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing search on the whole network basically does not consider the needs of vertical search, and cannot distinguish the categories of webpages, and the processing principles for each page are basically the same
Therefore, the webpages captured during the whole network search are all stored in a unified manner, and will not be stored by webpage category. If pages of different categories are put together for pattern recognition, there are too many interference factors, and the result is difficult to predict.
If the vertical search wants to use the search results of the whole network search, the results of the whole network search must be classified by web page category and stored by category, so as to facilitate the pattern recognition of the web page frame when classifying web pages. The site pages are put together for pattern recognition, there are too many interference factors, and the results are unpredictable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page classification storage system and method
  • Web page classification storage system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0035] The process of the web page classification storage method of this embodiment is as follows figure 1 Shown, including:

[0036] Step S110: Extract the pre-acquired page frame of the webpage, and calculate the page frame ID. The pre-acquired webpage may be a webpage crawled by a search on the entire network. The method of extracting the page frame of the webpage is: extracting the page frame of the webpa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web page classification storage system, which relates to the technical field of Internet. The system comprises a page framework ID computing module and a page framework storage module, wherein the page framework storage module is concretely suitable for searching whether a catalog taking a page framework ID as a name exists in a current sub-catalog; if the catalog exits, a page framework is stored under a corresponding ID catalog; and if the catalog does not exit, a catalog taking the page framework ID as a name is created, and then the page framework is stored under a corresponding ID catalog. The invention also discloses a web page classification storage method. The system and the method can store web pages of same classes under an identical catalog, thereby solving the problem that whole network search results cannot be stored according to web page classes, and as the search results are stored according to web page classes, interference factors to vertical search are reduced during page framework recognition.

Description

Technical field [0001] The invention relates to the field of Internet technology, in particular to a webpage classification storage system and method. Background technique [0002] In search technology, it is basically divided into two categories. One type is based on the entire Internet, crawling all web pages (currently limited crawl depth in a site, and generally does not process js (javascript), and only process some dynamic pages), and process and analyze web pages Web search, that is, the entire network search. The other is a vertical search that only crawls and analyzes pages of a certain category, such as: image search, video search, blog search, forum search, news search, etc. For most vertical searches, processing is currently based on seeds (also called list pages). The processing of vertical search can be divided into two parts: one is to find seeds; the other is to find specific product pages from seed pages, that is, pages of different categories (pictures, video...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 卢宏林
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products