10th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2023)

Name: 10th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2023)
Start: 2023-07-03T09:00:00+03:00
End: 2023-07-07T23:05:00+03:00
Location: No location set

3–7 Jul 2023

Europe/Moscow timezone

Support

grid2023@jinr.ru

Automated Analysis and Monitoring of Scientific HTC Jobs on Distributed Heterogeneous Computing Resources

6 Jul 2023, 16:30

15m

Room 310

Distributed Computing Systems Distributed Computing Systems

Ms Anna Ilina (Joint Institute for Nuclear Research)

Executing millions of scientific high-throughput computing (HTC) jobs on distributed heterogeneous computing resources poses challenges in observing their status and behavior after their completion. To address this, an approach was developed to analyze jobs using scatter plots, showcasing the dependency between job durations and the relative performance of CPU cores they were assigned to. Subsequently, a specialized system was created to automate this analysis process. The system regularly collects relevant data regarding finished jobs within the DIRAC infrastructure.

Using the Django web framework on the server side and the HTML+CSS+JavaScript stack on the client side, a web application was developed, offering the necessary tools and filters to highlight different aspects of the operation, such as final status, processors used, cluster names and the sending user. Highcharts JavaScript library was used to visualize the results. After investigating several approaches it was decided to store the data in CSV files. The web application use these datasets as a data source for analysis.

The developed system has proven to be invaluable, enabling the identification of issues on remote servers and demonstrating performance disparities among different computing resources. It facilitates efficient monitoring and analysis of HTC jobs, improving the overall understanding of their execution behavior.

Ms Anna Ilina (Joint Institute for Nuclear Research) Igor Pelevanyuk (Joint Institute for Nuclear Research)

Automated Analysis and Monitoring of Scientific HTC Jobs.pptx

10th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2023)

Support

Automated Analysis and Monitoring of Scientific HTC Jobs on Distributed Heterogeneous Computing Resources

Room 310

Speaker

Description

Authors

Presentation materials

Choose timezone

10th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2023)

Support

Speaker

Description

Authors

Presentation materials