
Docu-menta > Tecnología > Artículos de Tecnología >

Por favor, use este identificador para citar o enlazar este ítem: http://documenta.ciemat.es/handle/123456789/818

Título : Job migration in HPC clusters by means of checkpoint/ restart
Autor : Rodríguez-Pascual, Manuel
Cao, Jiajun
Moríñigo, José A
Cooperman, Gene
Mayo-García, Rafael
Palabras clave : Checkpoint–restart ·
Dynamic job migration
Exascale clusters
Fecha de publicación : 2019
Editorial : Springer
Citación : M. Rodríguez-Pascual, J. Cao, J.A. Moríñigo, G. Cooperman, R. Mayo-García. Job migration in HPC clusters by means of checkpoint/restart. The Journal of Supercomputing 75, 6517-6541 (2019)
Resumen : Until now, jobs running on HPC clusters were tied to the node where their execution started. We have removed that limitation by integrating a user-level checkpoint/ restart library into a resource manager, fully transparent to both the user and running application. This opens the door to a whole new set of tools and scheduling possibilities based on the fact that jobs can be migrated, checkpointed, and restarted on a different place or in a different moment, while providing fault tolerance for every job running on the cluster. This is of utmost importance in the future generation of exascale HPC clusters, where an increasing degree and complexities of efficient scheduling make it challenging to obtain the required degree of parallelism demanded by the applications.
URI : http://documenta.ciemat.es/handle/123456789/818
Aparece en las colecciones: Artículos de Tecnología

Ficheros en este ítem:

Fichero Descripción Tamaño Formato
Artículo_Revisado_Abril2019.pdf257.62 kBAdobe PDFVisualizar/Abrir
View Statistics

Los ítems de Docu-menta están protegidos por una Licencia Creative Commons, con derechos reservados.


Información y consultas: documenta@ciemat.es | Documento legal