On the modelling of optimal coordinated checkpoint period in supercomputers
Loading...
Identifiers
Publication date
Abstract
This work revises current assumptions adopted in the checkpointing modelling and
evaluates their impact on the attained prediction of the optimal coordinated single-level
checkpoint period.Anaccurate a priori assessment of the optimal checkpoint period for
a given computing facility is necessary as it drives the incurred overhead due to frequent
checkpointing and, as a result, implies a drop in the resource steady-state availability.
The present study discusses the impact of the order of approximation used in the singlelevel
coordinated checkpoint modelling and follows on extending previous results of
the optimal checkpoint period to explore the effects of the checkpoint rate on the
cluster performance under total execution time and energy consumption policies, and
in terms of resource availability. A consequence of a prescribed checkpoint rate with
current technology is a critical size of the cluster above which the attained availability
is too poor to become a cost-effective platform. Thus, some guidelines for the cluster
sizing are indicated.

