Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic BBS (bulletin board system) page acquisition method

An automatic collection and page technology, applied in special data processing applications, website content management, instruments, etc., can solve problems such as difficulty in making unified rules, real-time update of rules, abnormal data, etc., to achieve efficient solutions, optimize structure, simplify The effect of the acquisition process

Active Publication Date: 2015-02-04
西安烽火软件科技有限公司
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. It is necessary to manually configure the BBS page parsing template and formulate the XPath of the corresponding information
[0007] 2. It is difficult to formulate uniform rules for the massive information capture of the website
Generally, the parsing template is configured separately for each site, and the workload is heavy;
[0008] 3. The follow-up has brought a lot of rule maintenance work, and the problem of real-time update of the rules after the site revision;
[0009] 4. If the revision of the BBS site cannot be discovered in time, then the data collection of these BBS sites will be abnormal
[0010] The above technical defects are particularly prominent for large-scale acquisition systems, and new technical methods are urgently needed to replace manual maintenance work

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic BBS (bulletin board system) page acquisition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The invention discloses a method for automatically collecting BBS pages, comprising the following steps: step 1, collecting and obtaining all element information of the BBS page; step 2, cross-comparing node elements in a system library; step 3, comparing the number of nodes if the node names are the same; Step 4, after confirming that the node names and numbers are the same, identify the two cross-compared nodes as the current floor node; Step 5, record the XPath of the floor node (XML path language, used to determine the location of a certain part in the XML document), and complete Segmentation of post floors, XPath extraction of floor content, and general information collection.

[0040] Specifically, the present invention comprises the following steps:

[0041] (1) Access the target BBS post page from the Internet and obtain the page byte stream.

[0042] (2) Parse the byte stream into a jdom object, which contains all the html tags corresponding to the Element, an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an automatic BBS (bulletin board system) page acquisition method. The method comprises the following steps: 1, acquiring all element information of a BBS page; step 2, cross-contrasting node elements in a system library; step 3, comparing the node quantity if the node name is identical; step 4, determining the node name and the node quantity are respectively identical, and cross-contrasting two node identifiers as a current floor node; step 5, recording XPath of the floor node, segmenting the floors of a post, extracting the XPath of the floor content, and acquiring the universal information. A majority of BBS sites can be processed by automatically analyzing an html structure, the acquisition development efficiency can be effectively improved, the acquisition system structure is optimized, the acquisition flow is simplified, and a novel high-efficient solution is provided for the large-scale acquisition system.

Description

technical field [0001] The invention relates to the technical field of Internet computer BBS processing, in particular to a method for automatically collecting BBS pages. Background technique [0002] With the advancement of science and technology, Internet information has entered an era of explosion and diversity, and the Internet has become a huge information base. Internet information collection can save you a lot of resources in information collection, resource integration, capital utilization, and human investment. Widely used in industry portal information collection, competitor intelligence data collection, website content system construction, vertical search, public opinion monitoring, scientific research and other fields. [0003] Conventional BBS (Bulletin Board System) runs service software on the computer, allowing user terminals to connect through the network, upload and download data, and exchange information with other users. The page parsing template of each...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/958
Inventor 沈文凯瞿伟
Owner 西安烽火软件科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products