Optimizing replication policy in Desktop Grid in case of batch task service

2 Jul 2014, 14:50
20m
407 (LIT JINR)

407

LIT JINR

Russia, 141980 Moscow region, Dubna, JINR
sectional reports Section 7 - Desktop grid technologies and volunteer computing Desktop grid technologies and volunteer computing

Speaker

Mr Alexander Rumyantsev (Institute of Applied Mathematical Research, Karelian Research Centre of RAS)

Description

Desktop Grid is a simple, yet efficient alternative to high-performance computing cluster for certain types of computational problems (mainly when there is no communication needed between computation nodes). Volunteer computing may help the researchers achieve high computational power in the cheapest way. The known drawback of such an approach to performing calculations is instability of computational power in a Desktop Grid (the number of nodes in a Grid and their temporal availability are unstable). This leads to replication-based approach in task service, when a single workunit with the same data is transmitted by a server to a number of nodes. This increases redundancy and, while improving the probability of a successful result in time, lead to waste of resources, proportionally increasing the computational time of the project. Among the most probable causes of a failure of workunit calculation is violation of the so-called deadline. This means that a node was too busy with other tasks (Desktop Grid calculations have lower priority in operating system). In case of such a failure the server has to retransmit the same workunit to another node (a number of such trials is often limited). Moreover, in some cases the nodes of a Desktop Grid might be non-confident, that leads to a mechanism of quorum (when the server waits for a certain number/percent of identical results before marking a workunit as "done"). A mathematical model of such calculations is presented for the case of quorum mechanism, non-zero deadline violation probability, replication and batch service (when a workunit consists of several tasks for the same input data). The conditions are found when the optimal replication leads to lowering the time of the project completion. Note that for the case of batch service prohibited (single task in a single workunit) replication always extends the time of project completion.

Primary author

Mr Alexander Rumyantsev (Institute of Applied Mathematical Research, Karelian Research Centre of RAS)

Presentation materials

There are no materials yet.