Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Agent-Based Intrusive Social Data Collection Method

A technology of social data and collection methods, applied in the field of information collection, can solve the problems of frequent operation, complex realization, incompleteness, etc., and achieve the effect of improving data collection efficiency, avoiding repeated collection, and simple operation

Active Publication Date: 2022-03-15
USTC SINOVATE SOFTWARE
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although there are also descriptions on the Internet about the collection of WeChat official account articles, most of them are incomplete, or just a simple overview. For related patents, some implementations are more complicated, or the data is obtained through the interaction between the browser and the Internet.
[0003] Today’s existing WeChat official account article collection technology uses Sogou WeChat as the entrance. The disadvantages of this method are: (1) anti-crawler restrictions, which require the assistance of an ip proxy and coding platform (2) the collected article links are not permanent (3) The number of likes, readings and comments of the article cannot be collected (4) The number of articles collected is limited to the last 10; there is an interface provided by the material management of the WeChat public platform For data, the disadvantages of this method are: (1) The login is more troublesome and requires the user to log in and scan the code to confirm; (2) Anti-crawler restrictions, frequent operations, and direct ban; (3) The obtained article link still cannot be liked counts, views, and comments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Agent-Based Intrusive Social Data Collection Method
  • Agent-Based Intrusive Social Data Collection Method
  • Agent-Based Intrusive Social Data Collection Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative work all belong to the protection scope of the present invention.

[0037] see Figure 1-2 As shown, the present invention is an agent-based intrusive social data collection method, comprising the following steps:

[0038] Step S1: Start the scheduled task of the scheduler, take the official account from the database and put it into redis, and perform deduplication processing;

[0039] Step S2: regularly take out the address from redis and put it into the rabbitMq queue, and start the WeChat crawler program;

[0040] S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an agent-based intrusive social data collection method, which relates to the field of information collection. The invention includes a WeChat client, a proxy server, a program server and a WeChat server; a packet capture tool is used to obtain the data packet returned by the server to the client, inject JS and return it to the client, and automatically execute the JS code when the client loads the page, so that The browser establishes a connection with the program, and the program sends instructions to the browser to control the entire acquisition process. The invention loads more data through the pull-down operation, grabs the article link, and then executes the detailed link to obtain the article content, the number of likes, the number of readings and comments, etc., the article collection data of the official account is comprehensive, the operation is simple, and the data collection efficiency is improved.

Description

technical field [0001] The invention belongs to the field of information collection, in particular to an agent-based intrusive social data collection method. Background technique [0002] With the rapid development of the Internet, the network has become the most important means for people to obtain information, and with the continuous increase in the amount of data, how to effectively obtain and use these data has become a critical step. Information collection technology can more accurately obtain the specific data that users want, and the collection of a large amount of information data also provides a stage for the rise of anti-crawler technology, making data collection more and more difficult. As WeChat is the mainstream social software, WeChat official account articles have also become an important source of information collection. There are three existing WeChat public account article collection portals: (1) Sogou WeChat, (2) the interface provided by the material man...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L67/02H04L67/141H04L67/56H04L9/40G06F16/951
Inventor 李森李凌悦苏磊
Owner USTC SINOVATE SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products