Data-driven adaptive checkpoint optimization method

A data-driven and optimized technology, applied in the computer field, can solve problems such as increased checkpoint overhead calculations, and achieve the effect of improving availability and reducing fault tolerance overhead

Inactive Publication Date: 2021-03-05
JIANGNAN INST OF COMPUTING TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is not combined with the real system reliability. Too frequent or too sparse checkpoints will cause unnecessary checkpoint overhead or increase the amount of lost calculations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data-driven adaptive checkpoint optimization method
  • Data-driven adaptive checkpoint optimization method
  • Data-driven adaptive checkpoint optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0017] Embodiment: a data-driven adaptive checkpoint optimization method, comprising the following steps:

[0018] S1. Taking the serious fault data in the fault history database as a sample, and taking the CPU number as an index, arrange the fault sample elements of each CPU node in ascending order according to the fault occurrence time component, and use the adjacent faults of each CPU node The absolute value of the difference between the fault occurrence time components of the sample elements is used as the failure interval time sample of each CPU node, and the failure interval time sample is used as the input data, and the Weibull distribution parameter is calculated by using the maximum likelihood estimation method to obtain the failure occurrence time The failure interval time distribution model of each CPU node in the interval, the density function of Weibull distribution is: Wherein, m is a shape parameter, and η is a characteristic life;

[0019] S2. According to th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data-driven self-adaptive check point optimization method, which comprises the following steps of: calculating CPU node failure distribution by taking fault data in a fault history library as a sample and adopting a maximum likelihood estimation method; establishing an application failure distribution model according to the CPU node failure distribution; calculating to obtain an optimal check point interval according to the failure distribution model; guiding the system to check points at an optimal check point interval; when a new fault occurs, updating related CPU node failure distribution; jumping to step 2. Through the real fault data and the adaptive distribution model optimization algorithm, the operation-level failure distribution model and parameters thereof are dynamically updated, the operation check point interval is continuously optimized and set, the check point interval is adaptively and dynamically adjusted, the check point fault-tolerant modelis optimized and perfected, the check points are optimized as much as possible, and the fault-tolerant performance of the system is improved. Therefore, the fault-tolerant overhead based on check points is reduced, and the availability level of the system is greatly improved.

Description

technical field [0001] The invention relates to a data-driven self-adaptive checkpoint optimization method, which belongs to the technical field of computers. Background technique [0002] In any fault-tolerant system using checkpoint or restart technology, it is necessary to strike a balance between checkpoint overhead and computational overhead in order to achieve fault tolerance with the lowest overhead. Too frequent or too long checkpoint intervals will lead to an increase in system fault tolerance overhead and lost calculations. An optimal checkpoint strategy can minimize checkpoint overhead and improve fault tolerance efficiency. [0003] At present, the checkpoint system adopts a fixed-interval checkpoint mechanism, and the time interval is generally set by manually specifying the experience value. This method is not combined with real system reliability, too frequent or too sparse checkpoints will cause unnecessary checkpoint overhead or increase the amount of lost ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/07
CPCG06F11/0724G06F11/0757G06F11/079
Inventor 刘睿涛宋长明钱宇龚道永刘沙李伟东张宏宇
Owner JIANGNAN INST OF COMPUTING TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products