The ATLAS Production System Predictive Analytics service: an approach for intelligent task analysis

13 Sept 2018, 14:15
15m
406B

406B

Sectional reports 3. Middleware and services for production-quality infrastructures 3. Middleware and services for production-quality infrastructures

Speaker

Mikhail Titov (National Research Centre «Kurchatov Institute»)

Description

The second generation of the Production System (ProdSys2) of the ATLAS experiment (LHC, CERN), in conjunction with the workload management system - PanDA (Production and Distributed Analysis), represents a complex set of computing components that are responsible for defining, organizing, scheduling, starting and executing payloads in a distributed computing infrastructure. ProdSys2/PanDA are responsible for all stages of (re)processing, analysis and modeling of raw and derived data, as well as simulation of physical processes and functioning of the detector using Monte Carlo methods. The prototype of the ProdSys2 Predictive Analytics (P2PA) is an essential part of the growing analytical service for the ProdSys2 and will play the key role in the ATLAS computing. P2PA uses such tools as Time-To-Complete (TTC) estimation towards units of the processing (i.e., tasks, chains and groups of tasks) to control the processing state and rate, and to be able to highlight abnormal operations and executions (e.g., to discover stalled processes). It uses methods and techniques of machine learning to obtain corresponding predictive models and metrics that are aimed to characterize the current system's state and its changes over the close period of time.

Primary author

Mikhail Titov (National Research Centre «Kurchatov Institute»)

Co-authors

Dr Alexei Klimentov (Brookhaven National Lab) Mr Dmitry Golubkov (Institute for High Energy Physics) Mr Mikhail Borodin (The University of Iowa (US))

Presentation materials