Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic optimization method for Apache Spark application based on historical task analysis

A technology of task analysis and history, applied in the direction of instruments, electrical digital data processing, hardware monitoring, etc., can solve the problem of not making full use of the Spark resource manager

Active Publication Date: 2018-07-06
HARBIN INST OF TECH
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the machine learning framework they proposed is relatively simple, and completely treats the performance of the system as a black-box model. There is room for improvement in the efficiency of experimenting with parameter selection and model accuracy, and it does not make full use of the resources of the Spark resource manager. Allocation patterns and some known cluster hardware information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic optimization method for Apache Spark application based on historical task analysis
  • Automatic optimization method for Apache Spark application based on historical task analysis
  • Automatic optimization method for Apache Spark application based on historical task analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0017] The present invention makes full use of the known cluster hardware resource information, selects and implements a solution that can efficiently search for optimal parameters in a high-dimensional parameter space, and then screens out the optimal parameters under the established hierarchical gray box time prediction model. , configuration parameters with better performance to achieve the purpose of automatic Spark parameter tuning.

[0018] c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an automatic optimization method for an Apache Spark application based on historical task analysis. The method comprises the steps that a task submission interface of the Sparkapplication is encapsulated, whether or not a layered grey box time prediction model of the Spark application exists is judged, a database is accessed, the layered grey box time prediction model is read and updated, and a user selects whether or not optimization is conducted; if yes, an optimization parameter is generated, and if not, execution is conducted according to an original parameter, anda shell command running task is called. According to the automatic optimization method, known cluster hardware resource information is sufficiently used, an efficient search scheme for an optimal parameter on a high-dimension parameter space is selected and achieved, a configuration parameter with excellent performance is screened out under the established layered grey box time prediction model, and the aim of automatically optimizing a Spark parameter is achieved.

Description

technical field [0001] The invention belongs to the technical field of computer software, in particular to an Apache Spark application automation tuning method based on historical task analysis. Background technique [0002] Apache Spark is a widely used open source big data general processing platform. It has the advantages of simple programming and fast calculation speed, so it is widely used in the industry. However, since the Apache Spark system has about 150 configuration parameters, the values ​​of these parameters will affect the performance and performance of the system, so you need to be cautious about their configuration. Considering the complexity of the Spark system itself and the potential interaction between Spark system parameters, it is not very realistic to rely solely on manual parameter tuning. Moreover, manual tuning cannot make individual adjustments for each task according to its characteristics, which makes automatic parameter tuning technology of con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/34
CPCG06F11/3419G06F11/3476
Inventor 石胜飞高宏王宏志巢泽敏
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products