Two-stage online sampling method based on mapreduce model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A stage and model technology, applied in the field of data online sampling, can solve problems such as the influence of unbiased estimation algorithm and the accuracy of estimation results, so as to ensure unbiasedness and effectiveness, eliminate bias influence, and ensure randomness Effect

Active Publication Date: 2020-06-02

SICHUAN XW BANK CO LTD

View PDF8 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At any point in the query processing process, observe the sample set, and the probability of occurrence of blocks with small aggregation values is higher. The samples cannot be regarded as independent and identically distributed random variables, so it will affect the unbiasedness of the estimation algorithm. affect the accuracy of the estimates

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] Such as figure 1 Shown the present invention is based on the two-stage online sampling method of MapReduce model, comprises:

[0028] A. The first stage of sampling: when the MapReduce model receives the input data of the upstream data node and initializes it, set up a group sampler before online processing on the map side, divide each data block into a group, and use the data block as the sample unit Take a sample. The cluster sampler maintains a data block random queue for each data table, and a data block random queue contains data blocks corresponding to multiple data tables, and each data block random queue corresponds to a mapper (mapper). The order of all data blocks in the data block random queue is randomized, and a mapper is designated by the map side each time it is scheduled. When requesting to receive input data from the upstream data node, the mapper iteratively selects from the corresponding In the data block random queue, return the data block at the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a two-stage online sampling method based on a MapReduce model, and the method comprises the steps: 1, carrying out the first-stage sampling: setting a whole group of samplersbefore the MapReduce model carries out the online processing at a map end, and carrying out the sampling through employing a data block as a sample unit; step 2, in a query stage of the MapReduce model, obtaining an estimated value of a query result, and calculating a confidence interval width when a confidence coefficient is given; step 3, second-stage sampling: correcting the probability that each data block is extracted by the reduce end through a receiving-rejecting sampler before the reduce end starts to process; and 4, performing aggregation processing on the discarded map end output result in a recycle bin of the reduce end, and adding a snapshot result obtained by the received data block to obtain an actual result of aggregation query. According to the method, the randomness of thesample is ensured on the premise of not increasing the network transmission cost, effective statistical estimation is provided, the bias influence of data inclination on the statistical estimation iseliminated, and the unbiased property and effectiveness of query estimation are ensured.

Description

technical field [0001] The invention relates to a data online sampling method, specifically a two-stage online sampling method based on a MapReduce model. Background technique [0002] With the development of information digitization, the amount of global data has shown explosive growth, and data mining and data analysis based on big data has become a hot spot of widespread concern in various fields. On-Line Aggregation (OLA) technology provides a method to quickly return approximate results based on sample data to meet the requirements of real-time processing and fast user interaction. In the process of query processing, compared with the offline batch processing technology, the online aggregation technology can return the estimated result and the result confidence interval within a certain degree of confidence in a short period of time, and continuously return approximate results during the processing process, and with The quality of estimates continues to improve as more...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/2458

CPCG06F16/2462G06F16/2471

Inventor 谭皓予

Owner SICHUAN XW BANK CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Two-stage online sampling method based on mapreduce model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology