Development of algorithm for efficient pipeline execution of batch jobs on computer cluster

5 Jul 2016, 14:30
15m
LIT Conference Hall

LIT Conference Hall

Sectional reports 8. High performance computing, CPU architectures, GPU, FPGA 8. High performance computing, CPU architectures, GPU, FPGA

Speaker

Mr Yury Tipikin (Saint Petersburg University)

Description

The problem of reliability and stability of high performance computing parallel jobs become more and more topical with the increasing number of cluster nodes. Existing solutions rely mainly on inefficient process of RAM dumping to stable storage. In case of really big supercomputers, such approach – making checkpoints - may be completely unacceptable. In this study, I examined the model of distributed computing – Actor model - and on this basis I developed an algorithm of batch jobs processing on a cluster that restores interrupted computation state without checkpoints. The algorithm is part of a computing model that, to be specific, I called "computational kernels model in the name of its core component – computational kernel. This work describes all the components of the new model, its internal processes, benefits and potential problems.

Primary author

Mr Yury Tipikin (Saint Petersburg University)

Presentation materials

There are no materials yet.