The 8th International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018)

Name: The 8th International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018)
Start: 2018-09-10T08:00:00+03:00
End: 2018-09-14T19:05:00+03:00
Location: No location set

10–14 Sept 2018

Europe/Moscow timezone

Support

grid2018@jinr.ru

THE BIGPANDA MONITORING SYSTEM ARCHITECTURE

11 Sept 2018, 15:30

15m

406B

Sectional reports 2. Operation, monitoring, optimization in distributed computing systems 2. Operation, monitoring, optimization in distributed computing systems

Tatiana Korchuganova (National Research Tomsk Polytechnic University)

Currently-running large-scale scientific projects involve unprecedented amounts of data and computing power. For example, the ATLAS experiment at the Large Hadron Collider (LHC) has collected 140 PB of data over the course of Run 1 and this value increases at rate of ~800 MB/s during the ongoing Run 2 and recently has reached 350 PB. Processing and analysis of such amounts of data demands development of complex operational workflow and payload systems along with building top edge computing facilities. In the ATLAS experiment a key element of the workflow management is the Production and Distributed Analysis system (PanDA). It consists of several core components and one of them is the monitoring. The latter is responsible for providing a comprehensive and coherent view of the tasks and jobs executed by the system, from high level summaries to detailed drill-down job diagnostics. The BigPanDA monitoring has been in production since the middle of 2014 and it continuously evolves to satisfy increasing demands in functionality and growing payload scales. Today it effectively keeps track of more than 2 million jobs per day distributed over 170 computing centers worldwide in the largest instance of the BigPanDA monitoring: the ATLAS experiment. In this paper we describe the monitoring architecture and its principal features.

Mr Aleksandr Alekseev (National Research Tomsk Polytechnic University) Dr Alexei Klimentov (Brookhaven National Lab) Siarhei Padolski (Brookhaven National Lab) Tatiana Korchuganova (National Research Tomsk Polytechnic University) Torre Wenaus (Brookhaven National Lab)

Slides

2018_GRID_BigPanDAmon_architecture.pdf

The 8th International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018)

Support

THE BIGPANDA MONITORING SYSTEM ARCHITECTURE

406B

Speaker

Description

Authors

Presentation materials

Choose timezone

The 8th International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018)

Support

Speaker

Description

Authors

Presentation materials