Column calculation optimization method based on Spark SQL

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An optimization method and execution plan technology, applied in the field of SparkSQL-based column calculation optimization, to achieve the effect of reducing overhead, accelerating calculation speed, and reducing calculation time.

Pending Publication Date: 2022-03-04

西安烽火软件科技有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

It is impossible to use CPU optimization and GPU optimization capabilities in one business at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0041] EXAMPLES: A column calculation optimization method based on Spark SQL, including the following steps:

[0042] S1, unified memory management, providing an Arrow unified data management mechanism, after file data is loaded from disk to memory Arrow structure, can be implemented by multiple plug-in access and calculation, calculating Shuffle, etc., is also based on arrow implementation. Use arrow as a carrier of RDD memory, implement memory data between multitasking, multi-plug-in;

[0043] S2, heterogeneous calculation resource unified scheduling, expand the optimizer and plugin in Spark SQL, implement data-based heterogeneous resource scheduling mechanism, including the steps:

[0044] S2-1, based on the schedule optimization mechanism of the field data characteristics: For numerical calculation priority schedule to CPU, feature vector data, long string computing GPU;

[0045] S2-2, the scheduling optimization of combined calculation characteristics: Task for a large amount...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Spark SQL (Structured Query Language)-based column calculation optimization method, which comprises the following steps of: S1, unified memory management: establishing an Arrow-based unified data management mechanism, and accessing and calculating file data by various plug-ins after the file data is loaded into a memory Arrow structure from a disk; s2, heterogeneous computing resources are scheduled in a unified mode, an optimizer and a plug-in are expanded in a Spark SQL, and a heterogeneous resource scheduling mechanism based on data features is achieved; and S3, performing rule matching on the logic execution plan of the Spark SQL, and generating a physical execution plan of CPU and GPU mixed arrangement based on a unified memory structure Arrow. According to the method, the memory space can be compressed in the format of the Arrow column, the GC overhead of JVM memory calculation is avoided, the calculation efficiency is improved, the Spark SQL operators are mixed and arranged into the optimal execution plan according to the cost optimization method, and the overall calculation time consumption is reduced.

Description

Technical field [0001] The present invention relates to the technical field of cluster computing systems, and is specifically a column calculation optimization method based on Spark SQL. Background technique [0002] Apache Spark is a fast, universal cluster computing system. It provides high-rise APIs for Java, Scala, Python and R, which also supports a set of rich advanced tools, including SPARKSQL modules for SQL and structured data processing, in the drawings of architectures figure 1 Indicated. [0003] Spark SQL is based on the SQL engine of the open source parallel computing framework Spark, providing large data environments, based on data queries and analysis based on SQL languages. [0004] a) Spark SQL Based on RDD, data processing can express the data model of SQL, Spark SQL provides the data processing and analysis capabilities of ROW BASD; [0005] b) Spark SQL Based on Spark's parallel computing model for scheduling and processing, the SPARK is scheduled to perform ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/242G06F16/245G06F9/48G06F9/50

CPCG06F16/2433G06F16/24569G06F9/4881G06F9/5016G06F9/505

Inventor 李华蓉赵智峰李岩苏锋陈芒芒

Owner 西安烽火软件科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Column calculation optimization method based on Spark SQL

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology