Distributed virtual cluster management system

11 Sept 2018, 16:00
15m
Conference Hall

Conference Hall

Sectional reports 8. High performance computing, CPU architectures, GPU, FPGA 8. High performance computing, CPU architectures, GPU, FPGA

Speaker

Dr Vladimir Korkhov (St. Petersburg State University)

Description

An effective cluster management system is the key to solving many problems that arise in the field of distributed computing. The wide spread of computing clusters led to the active development of task management systems, however, many issues, in particular, migration issues, have not yet been resolved. In this paper, we consider the organization of a virtual cluster created with virtualization at the OS level, as well as issues related to the dynamic migration of individual processes and containers. The complexity of this task within the cluster is determined by stringent requirements, a large set of process and container state parameters, and the possibility of using specialized equipment. The ability to restore the state of a container to another node is a complex task that requires the development and implementation of multiple subsystems. Migration of containers and processes is much more difficult than migration of virtual machines because of close integration into the OS and the ability to work with individual components of equipment directly: you need to restore the state of individual subsystems, while in the case of a traditional virtual machine, the VM works with virtual equipment, provided by the hypervisor, and the state of the guest OS is inside the VM itself. Migration of processes and containers is an actively developing direction at present. We will present and discuss a technique for managing distributed heterogeneous virtual clusters using virtualization at the OS level; a technique of ensuring reliability, fault tolerance and load balancing of computing clusters due to the dynamic migration of tasks within a virtual cluster; an architecture of a virtual computer network for a computing cluster, minimizing the overhead associated with data exchange, for a specific task.

Primary authors

Mr Amissi Cubahiro (Saint Petersburg Electrotechnical University "LETI", Russia) Mr Vladimir Gaiduchok (Saint Petersburg Electrotechnical University "LETI", Russia)

Co-authors

Prof. Alexander Degtyarev (Professor) Ms MAGDALYNE KAMANDE (St Peterburg State Electrotechnical University) Dr Vladimir Korkhov (St. Petersburg State University)

Presentation materials