Web page clickable recognition method and device based on tan tree naive Bayesian algorithm

A technology of Bayesian algorithm and recognition method, which is applied in the field of clickable recognition and devices based on the TAN tree-shaped naive Bayesian algorithm, which can solve the problems of low intelligence and large manual participation, and achieve less manual intervention, High reusability and reduced manual intervention

Active Publication Date: 2021-08-24
智言科技(深圳)有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

High artificial participation, low intelligence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page clickable recognition method and device based on tan tree naive Bayesian algorithm
  • Web page clickable recognition method and device based on tan tree naive Bayesian algorithm
  • Web page clickable recognition method and device based on tan tree naive Bayesian algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048]Next, the technical solutions in the embodiments of the present invention will be described in contemplation in the embodiments of the present invention, and clearly, as described herein is merely, not all of the embodiments of the present invention. Based on the embodiments of the present invention, those of ordinary skill in the art will belong to the scope of the present invention without all other embodiments obtained without creative labor.

[0049] refer to figure 1 , figure 1 A flow chart for the TAN tree-shaped basis based on the TAN tree-shaped basis algorithm web page. The present invention provides a click-based identification method based on the TAN tree-shaped faihees algorithm web page, including:

[0050] Step S10, the control browser acquires the target source web page, climb the data of the web page, and constructs the label node tree according to the obtained data.

[0051] Step S20, calculate the probability of each node feature of the label node tree unde...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for identifying clickable web pages based on the TAN tree-shaped naive Bayesian algorithm. The data constructs a label node tree; step S20, according to the naive Bayesian method, calculate the conditional probability of each node feature of the label node tree under the clickable and non-clickable categories; step S30, according to the clickable condition of each node Probability, calculate the conditional mutual information value of each parent-child node under the category of clickable and unclickable, and use it as the weight of the edge; step S40, according to the weight, determine the node with a higher probability of being clickable, and click the node . Specific behaviors such as data crawling and clicking involved in the present invention do not require manual participation in definition, reducing manual intervention. Adding artificial intelligence assistance, there is less manual intervention in the crawling process, and the trained model can adapt to most target sources, with high reusability.

Description

Technical field [0001] The present invention relates to a tanker-based identification method and apparatus based on a TAN tree-shaped basis algorithm web page. Background technique [0002] The current data climb is mainly based on direct analog HTTP request methods and program operation browser mode. [0003] The way to directly simulate the HTTP request is to fake the HTTP protocol message, such as forgery, headers and request parameters, etc., so that the target source website is considered a normal user to send requests and returns the corresponding data. The drawback of this method is that for each target source, it is necessary to fake the corresponding header information, and when the header information is encrypted, for example, the verification code verification is required before obtaining data, or user login, etc. HEADER information is difficult to be forged. In addition, when the content of the same target source needs to be switched, the link or parameters in the req...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951G06N7/00
CPCG06F16/951G06N7/01
Inventor 周柳阳张南迪许皓天
Owner 智言科技(深圳)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products