Data processing method and device and electronic equipment

A data processing device and data processing technology, applied in the field of data processing, can solve problems such as affecting operation efficiency and wasting computing resources, and achieve the effect of improving reading, writing and computing efficiency

Pending Publication Date: 2021-10-19
深圳市云网万店科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Shuffle will involve disk IO and network IO, which will greatly aff

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device and electronic equipment
  • Data processing method and device and electronic equipment
  • Data processing method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] Specifically, in order to apply the data processing method disclosed in this application to perform set operations such as intersection and difference, a computing architecture can be pre-built, such as figure 1 As shown, it includes a UDF module, a preprocessing module and an intersection and difference module.

[0058] Among them, UDF, the user-defined function module, can implement aggregation functions such as integer merge BITMAP and BITMAP merge based on RoaringBitmap, as well as functions such as bitmap initialization, bitmap intersection, union, difference, and base. It also includes the combination of continuous operations, such as bitmap_and_count (and operation base), which can save the performance overhead of serialization / deserialization once. In addition, it can support some special scenarios, such as bitmap_weight_and_count (weighted and operation base), which supports concurrent calculation of the accumulated value of complex Bitmap and weight product. ...

Embodiment 2

[0091] Corresponding to the above examples, such as Figure 4 As shown, the application provides a data processing method, the method comprising:

[0092] 410. Acquire a first data set to be processed and a second data set to be processed, the first data set to be processed includes a plurality of first data to be processed, and the second data set to be processed includes a plurality of second data to be processed , the data to be processed includes a target field and a corresponding field value;

[0093] 420. Determine buckets respectively corresponding to the first data to be processed and the second data to be processed according to the field values ​​corresponding to the target fields included respectively;

[0094] Preferably, the determining the buckets respectively corresponding to the first data to be processed and the second data to be processed according to the field values ​​corresponding to the target fields respectively included includes:

[0095] 421. Perform ...

Embodiment 3

[0115] Corresponding to the above examples, such as Figure 5 As shown, the present application provides a data processing device, the device comprising:

[0116] An acquisition module 510, configured to acquire a first data set to be processed and a second data set to be processed, the first data set to be processed includes a plurality of first data to be processed, and the second data set to be processed includes a plurality of first data to be processed 2. Data to be processed, the data to be processed includes target fields and corresponding field values;

[0117] A dividing module 520, configured to determine buckets respectively corresponding to the first data to be processed and the second data to be processed according to field values ​​corresponding to the target fields included respectively;

[0118] A generating module 530, configured to generate a first bitmap corresponding to the bucket according to the corresponding first data to be processed and generate a sec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method and device and electronic device.The method comprises the steps that a first to-be-processed data set and a second to-be-processed data set are obtained, the first to-be-processed data set comprises multiple pieces of first to-be-processed data, and the second to-be-processed data set comprises multiple pieces of second to-be-processed data; buckets corresponding to the first to-be-processed data and the second to-be-processed data are determined according to field values corresponding to the target fields included in the buckets respectively; a first bitmap corresponding to the bucket is generated according to the corresponding first to-be-processed data, and a second bitmap corresponding to the bucket is generated according to the corresponding second to-be-processed data; and set operation on is performed the first bitmap corresponding to each bucket and the corresponding second bitmap to generate a target calculation result, thereby avoiding the problem of Shuffle triggering caused by the fact that data needs to be acquired from other buckets during set calculation when associated data is distributed in different buckets during set calculation.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a data processing method, device and electronic equipment. Background technique [0002] The analysis of crowd package delivery effect is based on different analysis dimensions (browsing, collection, car addition, purchase and other behavior dimensions or cognition, interest, purchase, loyalty and other model dimensions) under different analysis directions (stores, brands, etc.) ), the process of quantifying the effect of crowd package delivery. This process may continue to be tracked for several days, and there are frequent intersection and merge processing. [0003] Under normal circumstances, each enterprise will use a computing engine (Spark, Presto, Flink, etc.) based on a storage-computing separation architecture to perform the offline calculation of the above-mentioned intersection, merge, and difference processing. However, in such an offline computing process, the JOIN o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06Q30/02G06F16/22G06F16/2458
CPCG06Q30/0201G06F16/2282G06F16/2465
Inventor 汪凯于为建王志伟李成孙迁
Owner 深圳市云网万店科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products