Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage denoising system and method based on maximum similarity matching

A similarity matching and webpage technology, applied in the Internet field, can solve the problems of complex machine learning and low efficiency, and achieve the effect of wide adaptability and good applicability

Inactive Publication Date: 2013-06-19
SHANGHAI JIAOTONG UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method is based on machine learning, which is too complex to be efficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage denoising system and method based on maximum similarity matching
  • Webpage denoising system and method based on maximum similarity matching
  • Webpage denoising system and method based on maximum similarity matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention is described in detail below in conjunction with accompanying drawing and embodiment, and present embodiment is carried out under the premise of technical solution of the present invention, has provided detailed embodiment and specific operation process, but protection scope of the present invention is not limited to the following the described embodiment.

[0024] Such as figure 1 As shown, this embodiment includes: a webpage acquisition module 101, a preprocessing module 102, a webpage DOM generating feature tree module 103, a feature tree maximum similarity matching module 104 and an aggregation evaluation module 105, wherein: the webpage acquisition module 101 and the preprocessing module 102 is connected and transmits the webpage code data, the preprocessing module 102 is connected with the webpage acquisition module 101 and transmits the preprocessed target webpage, the preprocessing module 102 is connected with the webpage DOM generating fea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage denoising system and method based on maximum similarity matching, belonging to the technical field of the internet. The system comprises a webpage acquisition module, a pretreatment module, a webpage DOM (document object model) generation feature tree module, a feature tree maximum similarity matching module and an aggregation evaluation module, wherein the webpage acquisition module is connected with the pretreatment module and transmits webpage code data; the pretreatment module is connected with the webpage acquisition module and transmits a pretreated target webpage; the pretreatment module is connected with the webpage DOM generation feature tree module and transmits pretreated webpage data; the webpage DOM generation feature tree module is connectedwith the feature tree maximum similarity matching module and transmits feature tree data; the feature tree maximum similarity matching module is connected with the aggregation evaluation module and transmits a webpage content block candidate set; and finally, the aggregation evaluation module outputs the webpage content block. The invention can be better suitable for majority of content-type websites.

Description

technical field [0001] The present invention relates to a system and method in the field of Internet technology, in particular to a web page denoising system and a denoising method based on LCS (Longest Common Subsequence, longest common subsequence) feature tree maximum similarity matching. Background technique [0002] With the continuous development of Internet technology, Internet information has shown explosive growth. How to find core subject information from massive webpage information has become a trend in the field of Web research today. A web page generally contains some content blocks, but in addition to these content blocks, it often contains navigation bars, copyright information, announcement messages and various forms of advertisements, which exist for commercial purposes or for the convenience of users, which have nothing to do with the theme The information may be referred to as a web page noise block. How to reduce the noise in web pages is of great signi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 宋鳌周军马玲安然罗传飞
Owner SHANGHAI JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products