Data query method and device, electronic equipment and storage medium

A technology of a data query device and a query method, which is applied in the field of electronic equipment, storage media, devices, and real-time data query methods under large-scale data volumes, and can solve the shortcomings, consumption, and multiple resources of large-scale data statistics and data deduplication and other issues to achieve the effect of increasing value and significance

Pending Publication Date: 2021-08-24
西安交大捷普网络科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Elasticsearch (ES for short) is a distributed full-text search engine based on the underlying technology of Lucene. By improving the mechanism of data storage and filtering performance, it can achieve fast query to a certain extent. There are obvious shortcomings in the above. In the face of large-scale data volume, searching, filtering and aggregation analysis of data according to different businesses will consume more resources. Therefore, in order to ensure the normal operation of the business, the entire aggregation analysis needs to be optimized. To achieve better query service

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data query method and device, electronic equipment and storage medium
  • Data query method and device, electronic equipment and storage medium
  • Data query method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Such as figure 1 Shown, a kind of data real-time query method, described method comprises

[0033] Obtain query requests for real-time data and determine the size of the query target data;

[0034] If the amount of target data pointed to by the query request is less than the first threshold, the query content is obtained from ElasticSearch (for convenience of description, hereinafter referred to as ES) or ClickHouse;

[0035] If the amount of target data pointed to by the query request is greater than the first threshold, ClickHouse deduplicates and counts the total data amount, takes out the data, and inputs them into ES one by one for filtering and sub-aggregation, and summarizes the aggregation results and returns them.

[0036] Nested aggregation is the data aggregation of multiple fields in sequence. For example, the "gender" field is aggregated first, and then the "age" field is nested (sub-aggregation), that is, one aggregation is nested within another aggregati...

Embodiment 2

[0044] Such as figure 2 As shown, before obtaining the query request described in the first embodiment, the real-time collected target data is split and stored in different Kafka topics according to the data type. Topic is the basic unit of Kafka data writing operation. Producers (such as various network security devices) can publish data (such as security event logs) to the selected Topic (topic), and each record published to Topic is assigned For each consumer instance in the subscription consumer group, where the consumer instance can be distributed in multiple processes or on multiple machines. ClickHouse and ES, as the data consumers in this embodiment, consume data from the same topic through the Flink data flow processing engine and store them separately. ClickHouse only stores field data that participates in aggregation analysis.

[0045] Kafka is a distributed, partition-supporting, and multi-copy distributed message system. Its biggest feature is that it can proces...

Embodiment 3

[0055] Such as image 3 As shown, a data query device is provided, comprising:

[0056] The query receiving module obtains the real-time query request initiated by the data, and parses to obtain the aggregation analysis dimension;

[0057] A query judging module, configured to judge whether the amount of target data pointed to by the query request is greater than a preset first threshold;

[0058] The query processing module is used for initiating corresponding data aggregation analysis according to the amount of target data pointed to by the query request, and returning the aggregation result.

[0059] Preferably, the query processing module is used for:

[0060] If the amount of target data pointed to by the query request is less than the first threshold, the query content is obtained from ElasticSearch or ClickHouse;

[0061] If the amount of target data pointed to by the query request is greater than the first threshold, ClickHouse deduplicates and counts the total amou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a real-time data query method, device and equipment and a storage medium. ES and ClickHouse consume data from the same source data at the same time and store the data respectively, and different engines are adopted to perform data response according to the target data volume pointed by different query requests, so that the obvious disadvantages of Es in data deduplication and counting are overcome, the flexibility of the ES in aggregation nesting is fully utilized, rapid aggregation analysis is carried out on huge-scale data, a result is returned, an approximate real-time effect is achieved, and the value and significance of a data query result are improved.

Description

technical field [0001] The invention belongs to the technical field of data analysis, and in particular relates to a real-time data query method, device, electronic equipment and storage medium under large-scale data volume. Background technique [0002] With the advent of the era of big data, on the one hand, due to the explosive growth of data volume, and on the other hand, due to the increase of data types, traditional data analysis methods are facing great challenges. Efficient request response is crucial to the effective implementation of big data services. In order to be able to meet the rapid processing of some specific queries and data mining applications, the database needs to perform statistical analysis on some data fields according to various dimensions or combinations of multiple dimensions, such as summing, counting, and maximum values ​​for grouping data. Minimum values, or other custom statistical functions, are aggregated to obtain specific overviews of som...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2455G06F16/215G06F16/2458
CPCG06F16/215G06F16/2455G06F16/2462
Inventor 李福宜赵彦林李周王平陈宏伟何建锋
Owner 西安交大捷普网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products