Integration Of PanDA Workload Management System With Supercomputers for ATLAS

7 Jul 2016, 10:30
30m
LIT Conference Hall

LIT Conference Hall

Plenary reports Plenary reports

Speaker

Mr Danila Oleynik (JINR LIT)

Description

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production ANd Distributed Analysis system) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility, MIRA supercomputer at Argonne Leadership Computing Facilities, and others). In our talk we will consider different approaches towards ATLAS data processing on supercomputers: using dedicated allocation of supercomputer time, working in backfill mode, and multi-step processing. Special attention will be devoted to AES (ATLAS event service) on HPC and multi-job pilot. We will present our recent accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.

Primary author

Mr Danila Oleynik (JINR LIT)

Co-authors

Dr Alexei Klimentov (Brookhaven National Lab) Mr Fernando Barreiro (University of Texas at Arlington) Dr Jack Wells (Oak Ridge National Laboratory) Dr Kaushik De (University of Texas at Arlington) Dr Paul Nilsson (Brookhaven National Laboratory) Dr Sergey Panitkin (Brookhaven National Laboratory) Dr Shantenu Jha (Rutgers University) Dr Tadashi Maeno (Brookhaven National Laboratory) Dr Torre Wenaus (Brookhaven National Laboratory) Dr Wen Guan (WISC)

Presentation materials