Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for collecting internet data in blocks according to priorities

A technology of prioritizing and collecting data, applied in the direction of network data retrieval, network data indexing, and other database retrieval, etc., can solve the problems of excessive collection scope, low efficiency, data redundancy, etc., to avoid redundancy and improve utilization efficiency. , high-value effects

Pending Publication Date: 2019-06-11
上海浪潮云计算服务有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical task of the present invention is to provide a method and system for collecting Internet data in chunks according to priority in order to solve the problems of data redundancy and low efficiency caused by excessive collection range in the current Internet data collection process. , to achieve efficient collection of websites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for collecting internet data in blocks according to priorities

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] A method of collecting Internet data in blocks according to the priority order, making characteristic adjustments to the collection method according to the characteristics of the website and the collection theme, summarizing the analysis and collection topics, and then according to the collection topics, the correlation between the data to be collected and the collection topics And / or the priority of valid data is judged in advance, the data is divided according to the priority dimension and the correlation dimension, and the data is collected in blocks in a certain order.

[0038] in,

[0039] Judging the correlation between the data to be collected and the collection topic is to conduct correlation analysis between the website data and the direction of the collection topic to be analyzed, and determine the range of data related to the collection topic.

[0040] Judging the priority of valid data means that in addition to analyzing the correlation between website data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for acquiring internet data in blocks according to priorities. The method belongs to the field of big data application and analysis. The method comprisesthe following steps: carrying out characteristic adjustment on an acquisition mode according to website characteristics and an acquisition theme; concluding and summarizing analysis and collection topics, judging the correlation between data to be collected and the collection topics and / or the priority of effective data in advance according to the collection topics, dividing the data in a reasonable dimension, and sequentially partitioning the data in a certain order to collect the data; wherein the reasonable dimensions comprise a priority dimension and a correlation dimension. The inventiondiscloses a system for collecting internet data in blocks according to priority. The system comprises a data analysis module, a collection hierarchy division module, a collection result judgment module and a data inspection module. According to the invention, redundancy of acquired data can be avoided, website data can be efficiently acquired and analyzed, reduction of research and development cost is facilitated, and cost is reduced for data storage and data analysis.

Description

technical field [0001] The invention relates to the field of big data application and analysis, in particular to a method and system for collecting Internet data in blocks according to priority. Background technique [0002] Today's society is an era of technology surrounded by artificial intelligence, cloud computing, big data, etc. All walks of life are thriving because of the massive explosion of data. In the past 10 years, almost all industries have been more or less affected by this great change. Technology permeates every field and has become an essential element of every processing unit. And its foundation—data mining is particularly important, and naturally the research on data acquisition technology is particularly important. [0003] Traditional data collection generally captures the information of the entire website at one time, and then clears the interfering data through subsequent cleaning operations. This method will not only increase data interference, but ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
Inventor 宋娇
Owner 上海浪潮云计算服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products