Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web page classification storage system and method

A webpage classification and storage system technology, applied in the Internet field, can solve the problems of many interference factors, disregarding the needs of vertical search, and inability to distinguish web page categories, etc., to achieve the effect of reducing interference factors

Active Publication Date: 2013-01-30
BEIJING QIHOO TECH CO LTD
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing search on the whole network basically does not consider the needs of vertical search, and cannot distinguish the categories of webpages, and the processing principles for each page are basically the same
Therefore, the webpages captured during the whole network search are all stored in a unified manner, and will not be stored by webpage category. If pages of different categories are put together for pattern recognition, there are too many interference factors, and the result is difficult to predict.
If the vertical search wants to use the search results of the whole network search, the results of the whole network search must be classified by web page category and stored by category, so as to facilitate the pattern recognition of the web page frame when classifying web pages. The site pages are put together for pattern recognition, there are too many interference factors, and the results are unpredictable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page classification storage system and method
  • Web page classification storage system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0035] The flow of the method for classifying and storing webpages in this embodiment is as follows: figure 1 shown, including:

[0036] Step S110, extracting the page frame of the pre-acquired webpage, and calculating the page frame ID. The pre-acquired webpage may be a webpage crawled by the whole network search. The method of extracting the page frame of the web page is as follows: extract the page frame of the web page accord...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web page classification storage system, which relates to the technical field of Internet. The system comprises a page framework ID computing module and a page framework storage module, wherein the page framework storage module is concretely suitable for searching whether a catalog taking a page framework ID as a name exists in a current sub-catalog; if the catalog exits, a page framework is stored under a corresponding ID catalog; and if the catalog does not exit, a catalog taking the page framework ID as a name is created, and then the page framework is stored under a corresponding ID catalog. The invention also discloses a web page classification storage method. The system and the method can store web pages of same classes under an identical catalog, thereby solving the problem that whole network search results cannot be stored according to web page classes, and as the search results are stored according to web page classes, interference factors to vertical search are reduced during page framework recognition.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a system and method for classifying and storing webpages. Background technique [0002] In search technology, there are basically two categories. One is to crawl all the webpages on the whole Internet (currently, the crawling depth is limited in a site, and js (java script) is generally not processed, and only some dynamic pages are processed), and the webpages are processed and analyzed The web search, that is, the whole network search. The other type is vertical search that only crawls and analyzes pages of a certain category, such as image search, video search, blog search, forum search, news search, etc. For most search verticals, this is currently done on a seed basis (also known as a listing page). The processing of vertical search can be divided into two parts: one is to find seeds; the other is to find specific product pages from the seed pages, that is, pages of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 卢宏林
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products