A real-time structured data comparison system based on massive data

A technology of structured data and massive data, applied in the field of data processing, can solve problems such as limited processing capacity, high cost, and application scenarios that are not suitable for real-time and fast matching of massive data, and achieve the effect of releasing the dependence on physical memory conditions.

Active Publication Date: 2017-12-12
NANJING FIBERHOME STARRYSKY CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Based on the average rule size of 1KB, 10 million rules consume about 10GB of memory, and a server with a standard 32GB physical memory can support up to 30 million rules. The comparison requirements of hundreds of millions of rules require the server to be equipped with more than 100G of physical memory. Considering The processing capacity of a single machine is limited, and the number of machines needs to be converted according to the total amount of massive data and the processing capacity of a single machine, so the cost is too high
And from a theoretical point of view, with the increase of the number of rules, the physical memory of a single server will always be exhausted
[0004] Based on the above reasons, the current data comparison system is not suitable for the application scenario of real-time and fast matching of massive data under the condition of large rules

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A real-time structured data comparison system based on massive data
  • A real-time structured data comparison system based on massive data
  • A real-time structured data comparison system based on massive data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The distributed comparison system disclosed in the present invention hashes the rules to the comparison server cluster through a predetermined Hash algorithm according to the content, and introduces a Bloom filter at the same time, sets the rule content at the entrance of massive data, and uses the rule content to compare Massive data is subjected to a rough screening, and the data that cannot be screened are directly discarded, and the data that is being screened are distributed to each comparison server through a predetermined Hash algorithm. The comparison server completes the precise comparison through the secondary Hash.

[0036] At the underlying technology level, the distributed comparison system mainly includes hash storage of large rules, fast loading of large rules based on memory, fast matching of massive data based on Bloom filters, and precise matching of massive data and large rules.

[0037] Below in conjunction with accompanying drawing, technical scheme...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass data-based real-time structural data comparison system. The system comprises a rule center server, an external rule server, a comparison client, a comparison server cluster and a bloom filter, wherein the external rule server, the comparison client and the comparison server cluster are connected with the rule center server respectively; the bloom filter is arranged in the rule center server; a rule is transmitted to the rule center server by the external rule server; the rule center server is uniformly hashed to the comparison server cluster using a hash by calculating the rule; the received rule is uploaded into a physical memory of the comparison server cluster by the comparison server cluster; the content for matching in the rule is uploaded to the bloom filter by the rule center server; after the rule calculation is finished by the rule center server, the bloom filter is synchronized to the comparison client and the information of the bloom filter is uploaded to a memory of the comparison client by the comparison client; mass data is received by the comparison client in real time to give an accurate comparison result response in real time. The real-time comparison system for allocating a large number of rules at low cost is made possible.

Description

technical field [0001] The invention discloses a real-time structured data comparison system based on massive data, and relates to the technical field of data processing. Background technique [0002] Structured data comparison is based on the matching of rules and massive data. This system is used to quickly and accurately match specified data in massive data (the amount of data processed every day reaches 100 million levels). In order to ensure fast and real-time matching of data, the rules will be stored in the physical memory of the computer. The number of rules loaded by the comparison system is limited by the size of the physical memory of the computer. The smaller the physical memory of the computer, the fewer the number of rules loaded; the speed of comparison and matching is limited by the number of rules. The larger the number of rules, the slower the matching data. The data comparison includes the following steps: 1) quickly parse the rules and load them into the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/2255G06F16/24569G06F16/24578G06F16/27
Inventor 李金龙顾晓波杨俊卢兴杨
Owner NANJING FIBERHOME STARRYSKY CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products