Cache value-based Spark cache elimination method and system

A cache and value technology, applied in the field of big data computing, can solve problems such as ignoring future reuse times, inaccurate calculation costs, and weakened computing power, so as to improve computing speed, optimize memory resource utilization, and reduce running time.

Pending Publication Date: 2020-12-01
CHONGQING UNIV OF POSTS & TELECOMM
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this Spark cache elimination method uses the time to generate RDDs to represent the calculation cost of RDDs is not accurate enough, there may be network delays in data transmission between multiple nodes in the cluster or the problem of weakened computing power caused by nodes running at full capacity. In addition, this This elimination mechanism ignores the future reuse times of the block, but instead increases the consumption of the system to re-extract and calculate the block

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cache value-based Spark cache elimination method and system
  • Cache value-based Spark cache elimination method and system
  • Cache value-based Spark cache elimination method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0041] The present invention provides a Spark cache elimination method and system based on cache value. According to the DAG graph of Spark Job, the usage of RDD and Block in Job is obtained, and the cache value of RDD and Block is defined. The cache value referred to in this specification is high Including: the RDD or Block that is most frequently used and the closest calculation has a high cache value; if the cache value of a Block is zero, it means that the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of big data calculation, in particular to a cache value-based Spark cache elimination method and system. The method comprises the steps: obtaining an initial cache value of each RDD according to a cache value model based on RDD information; sorting the RDDs by using an improved quick sorting algorithm to obtain an RDD sequence; sequentially storing the calculation results of the RDDs in the RDD sequence into a cluster node memory according to the sequence of the initial cache values from high to low; updating the RDD information when each Stage is finished inthe dynamic task execution process; when the node memory is insufficient, calculating the cache value of the cached Block by utilizing a cache value model based on the Block information; and eliminating the Block with low cache value to release the memory space. According to the method, the RDD with the highest cache value is stored in the memory, unused Blocks are cleared in time, the calculation speed is increased, the RDD recalculation overhead is reduced, and the memory resource utilization rate is optimized.

Description

technical field [0001] The invention relates to the field of big data computing, in particular to a Spark caching method and system based on RDD (Resilent Distributed Datasets, elastic distributed data sets) and Block caching value. Background technique [0002] In today's big data era, the amount of data is increasing exponentially, and big data processing is increasingly valued by people. In order to quickly process these massive data information, more and more applications and scientific research projects will be based on huge data sets. For processing and analysis, some big data computing frameworks have emerged, such as: MapReduce parallel computing model for large-scale data processing, open source big data computing framework Hadoop, Spark framework, etc. The limitations of the Hadoop framework in multiple application domains and big data processing scenarios, such as large-scale structured data, graph data, and streaming data, currently Apache Spark has become a unif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2455G06F16/2458
CPCG06F16/24552G06F16/2471
Inventor 熊安萍杨孟达田野龙林波蒋溢
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products