Cache value-based Spark cache elimination method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A cache and value technology, applied in the field of big data computing, can solve problems such as ignoring future reuse times, inaccurate calculation costs, and weakened computing power, so as to improve computing speed, optimize memory resource utilization, and reduce running time.

Pending Publication Date: 2020-12-01

CHONGQING UNIV OF POSTS & TELECOMM

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this Spark cache elimination method uses the time to generate RDDs to represent the calculation cost of RDDs is not accurate enough, there may be network delays in data transmission between multiple nodes in the cluster or the problem of weakened computing power caused by nodes running at full capacity. In addition, this This elimination mechanism ignores the future reuse times of the block, but instead increases the consumption of the system to re-extract and calculate the block

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0040] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0041] The present invention provides a Spark cache elimination method and system based on cache value. According to the DAG graph of Spark Job, the usage of RDD and Block in Job is obtained, and the cache value of RDD and Block is defined. The cache value referred to in this specification is high Including: the RDD or Block that is most frequently used and the closest calculation has a high cache value; if the cache value of a Block is zero, it means that the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the field of big data calculation, in particular to a cache value-based Spark cache elimination method and system. The method comprises the steps: obtaining an initial cache value of each RDD according to a cache value model based on RDD information; sorting the RDDs by using an improved quick sorting algorithm to obtain an RDD sequence; sequentially storing the calculation results of the RDDs in the RDD sequence into a cluster node memory according to the sequence of the initial cache values from high to low; updating the RDD information when each Stage is finished inthe dynamic task execution process; when the node memory is insufficient, calculating the cache value of the cached Block by utilizing a cache value model based on the Block information; and eliminating the Block with low cache value to release the memory space. According to the method, the RDD with the highest cache value is stored in the memory, unused Blocks are cleared in time, the calculation speed is increased, the RDD recalculation overhead is reduced, and the memory resource utilization rate is optimized.

Description

technical field [0001] The invention relates to the field of big data computing, in particular to a Spark caching method and system based on RDD (Resilent Distributed Datasets, elastic distributed data sets) and Block caching value. Background technique [0002] In today's big data era, the amount of data is increasing exponentially, and big data processing is increasingly valued by people. In order to quickly process these massive data information, more and more applications and scientific research projects will be based on huge data sets. For processing and analysis, some big data computing frameworks have emerged, such as: MapReduce parallel computing model for large-scale data processing, open source big data computing framework Hadoop, Spark framework, etc. The limitations of the Hadoop framework in multiple application domains and big data processing scenarios, such as large-scale structured data, graph data, and streaming data, currently Apache Spark has become a unif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/2455G06F16/2458

CPCG06F16/24552G06F16/2471

Inventor 熊安萍杨孟达田野龙林波蒋溢

Owner CHONGQING UNIV OF POSTS & TELECOMM

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cache value-based Spark cache elimination method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology