Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Website navigation bar information extraction method and device, electronic equipment and storage medium

A technology of information extraction and navigation bar, which is applied in the field of website navigation bar information extraction, can solve problems such as failure to extract, non-standard code writing, and difficult implementation of extracting navigation bar, and achieve the effect of improving accuracy and extraction efficiency

Pending Publication Date: 2020-09-04
深圳市小满科技有限公司
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In order to display corporate culture, products, introductions, contact information and other information on corporate official websites, links to key information are usually displayed at the top or left of the page in the form of a navigation bar. In order to accurately establish a content index for corporate official websites, it is necessary to Extract the navigation bar information, but it is difficult to extract the navigation bar due to the freedom of the HTML language used to write the web page and the non-standard code writing.
[0003] The existing technology uses the NAV tag method, but this method requires the page to use the HTML5 version and the developer strictly follows the development manual specification to accurately extract the navigation bar node information
Therefore, for non-HTML5 pages, or irregular code writing, etc., the accuracy of the extracted navigation bar node information is not high, or even impossible to extract

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website navigation bar information extraction method and device, electronic equipment and storage medium
  • Website navigation bar information extraction method and device, electronic equipment and storage medium
  • Website navigation bar information extraction method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] figure 1 It is a flow chart of a method for extracting navigation bar information of a website provided by Embodiment 1 of the present invention.

[0066] In this embodiment, the method for extracting navigation bar information of a website can be applied to an electronic device, and for an electronic device that needs to extract information about a navigation bar of a website, the website provided by the method of the present invention can be directly integrated on the electronic device The function of extracting information from the navigation bar, or running in the electronic device in the form of a software development kit (Software Development Kit, SKD).

[0067] Such as figure 1 As shown, the method for extracting navigation bar information of a website specifically includes the following steps. According to different requirements, the order of the steps in the flow chart can be changed, and some of them can be omitted.

[0068] S11: Download the main page sourc...

Embodiment 2

[0126] figure 2 It is a structural diagram of a device for extracting navigation bar information of a website provided by Embodiment 2 of the present invention.

[0127] In some embodiments, the website navigation bar information extracting device 20 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the navigation bar information extracting device 20 of the website can be stored in the memory of the electronic device, and executed by the at least one processor to execute (see for details figure 1 Description) to extract the navigation bar information of the website.

[0128] In this embodiment, the website navigation bar information extraction device 20 can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an analysis module 201 , an elimination module 202 , an extraction module 203 , a combination module 204 , a deduplication fil...

Embodiment 3

[0187] refer to image 3 As shown in , it is a schematic structural diagram of the electronic device provided by Embodiment 3 of the present invention. In a preferred embodiment of the present invention, the electronic device 3 includes a memory 31 , at least one processor 32 , at least one communication bus 33 and a transceiver 34 .

[0188] Those skilled in the art should understand that, image 3 The structure of the electronic device shown does not constitute a limitation of the embodiment of the present invention, it can be a bus structure or a star structure, and the electronic device 3 can also include more or less other hardware than shown in the figure Or software, or a different arrangement of components.

[0189] In some embodiments, the electronic device 3 is an electronic device that can automatically perform numerical calculation and / or information processing according to preset or stored instructions, and its hardware includes but not limited to microprocessor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of text extraction. The invention provides a website navigation bar information extraction method and device, electronic equipment and a storage medium, and the method comprises the steps: downloading source codes of a main page and any subpage of a to-be-extracted enterprise website domain name, obtaining a first HTML code, analyzing the first HTML code into a first node DOM tree, and obtaining a second HTML code, and analyzing the second HTML code into a second node DOM tree; removing outer links of the first node DOM tree and the second node DOMtree to obtain a third node DOM tree and a fourth node DOM tree; extracting navigation bar information by using an NAV tag method, an A tag density method, a maximum public area method and a keyword link block method, then performing duplicate removal and filtering, calculating a node score of each node, and outputting the navigation bar information of the enterprise to be extracted. According tothe method, the navigation bar information is extracted through the NAV tag method, the A tag density method, the maximum public area method and the keyword link block method, so that the accuracy andthe efficiency of extracting the navigation bar information in the page are improved.

Description

technical field [0001] The invention relates to the technical field of text extraction, in particular to a method, device, electronic equipment and storage medium for extracting information from a navigation bar of a website. Background technique [0002] In order to display corporate culture, products, introductions, contact information and other information on corporate official websites, links to key information are usually displayed at the top or left of the page in the form of a navigation bar. In order to accurately establish a content index for corporate official websites, it is necessary to Extract the navigation bar information, but it is difficult to extract the navigation bar due to the freedom of the HTML language used to write the web page and the non-standard code writing. [0003] The existing technology uses the NAV tag method, but this method requires the page to use the HTML5 version and the developer strictly follows the development manual specification to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/958G06F16/954
CPCG06F16/986G06F16/954
Inventor 祁俊辉
Owner 深圳市小满科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products