BigPanDA Experience on Titan for the ATLAS Experiment at the LHC

13 Sept 2018, 08:30
30m
LIT Conference Hall

LIT Conference Hall

Plenary reports Plenary reports

Speaker

Dr Alexei Klimentov (Brookhaven National Lab)

Description

The PanDA software is used for workload management on distributed grid resources by the ATLAS experiment at the LHC. An effort was launched to extend PanDA, called BigPanDA, to access HPC resources, funded by the US Department of Energy (DOE-ASCR). Through this successful effort, ATLAS today uses over 25 million hours monthly on the Titan supercomputer at Oak Ridge National Laboratory. Many challenges were met and overcome in using HPCs for ATLAS simulations. ATLAS uses two different operational modes at Titan. The traditional mode uses allocations - which require software innovations to fit the low latency requirements of experimental science. New techniques were implemented to shape large jobs using allocations on a leadership class machine. In the second mode, high priority work is constantly sent to Titan to backfill high priority leadership class jobs. This has resulted in impressive gains in overall utilization of Titan, while benefiting the physics objectives of ATLAS. For both modes, BigPanDA has integrated traditional grid computing with HPC architecture. This talk will summarize the innovations to successfully use Titan for LHC physics goals

Primary author

Dr Alexei Klimentov (Brookhaven National Lab)

Co-authors

Mr Danila Oleynik (JINR LIT) Dr Jack Wells (Oak Ridge National Laboratory) Kaushik De (University of Texas at Arlington) Dr Ruslan Mashinistov (NRC KI)

Presentation materials