Simulation of job execution in distributed heterogeneous computing infrastructures

27 Oct 2022, 17:05
15m
Conference hall (MLIT, JINR)

Conference hall

MLIT, JINR

Oral Information Technology Information Technology

Speaker

Igor Pelevanyuk (Joint Institute for Nuclear Research, Plekhanov Russian University of Economics)

Description

Execution of one computing job demonstrates that software is correctly working. But, when the same job has to be executed thousands of times, it may cause different issues. Nowadays special distributed heterogeneous computing infrastructures are widely used for this type or workload. Main issue when running big workloads on them is network limits. These limits may be imposed in different levels: server, cluster, and storage level. With limited network there is a threshold after which incrementing of cpu resources does not speed up jobs execution rate. The purpose of this work is creation of a software platform for simulation of job execution in distributed computing infrastructures which can predict job execution rate in real infrastructure and show efficient job distribution among computing clusters. Limiting factors are CPUs amount and performance, network speed, RAM size, and disk size. The software platform was developed and tested. Python programming language was used for development. InfluxDB is used for results storage and visualization.

Primary author

Igor Pelevanyuk (Joint Institute for Nuclear Research, Plekhanov Russian University of Economics)

Presentation materials