Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web page pattern recognition method and visual structure learning method based on deep learning

A technology of deep learning and pattern recognition, which is applied in the Internet field, can solve the problems that pattern recognition is difficult to obtain the recognition effect, and achieve the effect of reducing feature loss and ensuring convergence

Active Publication Date: 2019-02-19
HYLANDA INFORMATION TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, due to the highly abstract nature of natural language, it is difficult for machines to achieve ideal recognition results for plain text pattern recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The technical solution is described in detail below:

[0026] The web page pattern recognition method based on deep learning of the present invention comprises the following steps:

[0027] A. Non-formatted text, using the Hypertext Markup Language (Hypertext Markup Language), that is, the Text source code of HTML as the algorithm input;

[0028] B. Preprocessing the Text source code of the above web page HTML;

[0029] C. Design Stacking Denoising Autoencoders (Stacking Denoising Autoencoders), referred to as SDAE, as a deep learning algorithm for web page features; use the Neural Network Language Model (Neural Network LanguageModel), referred to as NNLM, for the input of the Stacking Denoising Autoencoder SDAE Initialize, the output obtained in the previous step is used as the input of the neural network language model NNLM, and the initial feature vector output of the Text source code is obtained, and this initial feature vector is used as the input of the stacked n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a deep learning-based webpage mode recognition method. The method comprises the following steps: taking a Text source code of a hypertext markup language of a webpage with unformatted text as an algorithm for input; carrying out work segmentation on the source code; designing a stack denoising automatic encoder as a characteristic learning algorithm of the webpage; initiating the input of SDAE by adopting a neural network language model; classifying text feature vectors learned through the SDAE by adopting a classification algorithm; and outputting the classification result. The invention furthermore discloses a visual structure learning method, through which the structure of the Text source code in an HTML format is learned by using a machine learning method and then the visual structure of the webpage is learned. According to the deep learning-based webpage mode recognition method and the visual structure learning method, the artificial language is processed by using a natural language processing method, and characteristic learning is carried out on the Text source code in the HTML format by using a deep learning method and a neural network language model; and through the deep learning-based webpage mode recognition method and the visual structure learning method, the webpage modes of various websites such as blogs, forums and information can be precisely recognized.

Description

technical field [0001] The present invention relates to the technical field of the Internet, in particular to a web page pattern recognition method and a visual structure learning method based on deep learning. Background technique [0002] Web page recognition is a key step in data acquisition systems and search engines. At present, the following methods are mostly adopted for web page identification: manually setting identification rules to identify web pages or identifying web pages based on other non-deep learning machine learning methods. [0003] However, due to the highly abstract nature of natural language, it is difficult for machines to achieve ideal recognition results for plain text pattern recognition. Contents of the invention [0004] The technical problem to be solved by the present invention is to provide a web page pattern recognition method and a visual structure learning method based on deep learning. [0005] The technical scheme that the present inv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/955G06N3/08G06F17/27
CPCG06F16/955G06F40/284G06N3/08
Inventor 李志杰刘丽丽张作职
Owner HYLANDA INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products