A system for monitoring computing cluster resources

30 Oct 2024, 15:05
15m
3-310 (MLIT)

3-310

MLIT

Oral Information Technology Information Technology

Speaker

Gennadii Karpov

Description

The heterogeneous HybriLIT platform is used at JINR to solve various computational tasks, including data collection and processing from experiments, as well as modeling and simulation of physical processes. The main component of the platform is the Govorun supercomputer, which consists of more than a hundred servers that constantly run user programs and communicate with each other over the network. There is a need to monitor the state of computing nodes, including the workload of the central and graphics processors, the use of RAM and permanent memory, network traffic and the temperature of physical components. Thus, it is necessary to develop tools for monitoring the status and usage of components of a heterogeneous platform.

When developing the solution, the advantages and disadvantages of previously used systems were taken into account.
The result was a software product consisting of a server and client part, allowing real-time monitoring of the platform's state through a user-friendly graphical interface.использования компонентов гетерогенной платформы.

При разработке решения были учтены преимущества и недостатки ранее используемых систем.
Результатом стал программный продукт, состоящий из серверной и клиентской части и позволяющий в режиме реального времени следить за состоянием платформы через удобный для пользователя графический интерфейс.

Primary authors

Gennadii Karpov Maksim Skazkin (Student) Mr Maxim Zuev (MLIT JINR) Дмитрий Беляков (JINR)

Presentation materials