Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Identifying variance in distributed systems

a distributed system and variance technology, applied in the field of large data set processing, can solve the problems of time-consuming and resource-intensive processing queries on large data sets, and be unsuitable for some automated applications, and achieve the effect of automatically and efficiently selecting a sampling rate and trading off some processing efficiency

Active Publication Date: 2021-11-23
QUANTCAST CORP
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This approach allows for rapid and efficient determination of a sampling rate, adapting to data fluctuations and ensuring accurate results, thereby optimizing throughput and supporting timely decision-making in automated environments.

Problems solved by technology

Processing queries on large data sets can be time consuming and resource intensive.
Often, sampling rates are selected and adjusted manually, which may not be suitable for some automated applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Identifying variance in distributed systems
  • Identifying variance in distributed systems
  • Identifying variance in distributed systems

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015]Embodiments of the invention provide an automatic and efficient method for determining a sampling rate for a sequence of data processing jobs. To optimize throughput, a prior art processing system may generate balanced or nearly balanced partitions for processing in a distributed or parallel system. In contrast, embodiments of the invention purposely configure unbalanced buckets which are unbalanced partitions, and assign unbalanced data loads (e.g. unbalanced buckets) to each of a plurality of processing units; the intermediate results obtained from the unbalanced partitions enable the system to rapidly and efficiently determine a sampling rate for subsequent processing jobs, which offsets the inconvenience of having some processing units complete their tasks before others due to variation in the data loads across the processing units. Advantageously, the invention can adapt to fluctuations in the quality and quantity of data, as subsequent data processing jobs can process sa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Adaptive Sampling. Data comprising pairings of data value with lists of data keys are received. The range of possible values of the data keys is partitioned into unbalanced buckets, with at least two of the unbalanced buckets representing different fractions of the range. Each unbalanced bucket is assigned to a respective processing unit selected from a plurality of processing units. The pairings are processed by the processing units, with each processing unit generating an intermediate result. The intermediate results are combined to generate a comprehensive result. A sampling error is determined by scaling an unbalanced bucket's intermediate result according to its corresponding fraction and comparing the scaled intermediate result to the comprehensive result. An unbalanced bucket having a sampling error less than a sampling error threshold is selected. The selected unbalanced bucket's corresponding fraction is selected as a sampling rate for a second data processing job.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a continuation of U.S. Non-Provisional application Ser. No. 15 / 640,007 entitled “Identifying Variance in Distributed Systems” by Scott S. McCoy, filed on Jun. 30, 2017, which is hereby incorporated by reference in its entirety.”BACKGROUNDTechnical Field[0002]This invention pertains in general to processing of large data sets and in particular to adaptively setting sampling rates.Description of Related Art[0003]Processing queries on large data sets can be time consuming and resource intensive. In some cases, excellent estimates can be made using a subset of the data (e.g. sampled data), which can reduce the resource requirements of a query and expedite the production of results. However, the reliability of the results can depend on the size and quality of the sample, and it is important to select a good sampling rate (e.g. the ratio of the size of the sample to the size of the full data set). Furthermore, for some appli...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(United States)
IPC IPC(8): G06F16/23G06F16/2455
CPCG06F16/2365G06F16/24568G06F16/2462
Inventor MCCOY, SCOTT S.
Owner QUANTCAST CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products