Speaker
Mr
Mikhail Titov
(National Research Centre «Kurchatov Institute»)
Description
The workflow management process should be under the control of the certain service that is able to forecast the processing time dynamically according to the status of the processing environment and workflow itself, and to react immediately on any abnormal behaviour of the execution process. Such situational awareness analytic service would provide the possibility to monitor the execution process, to detect the source of any malfunction, and to optimize the management process.
The stated service for the second generation of the ATLAS Production System (ProdSys2, an automated scheduling system) is based on predictive analytics approach to estimate the duration of the data processings (in terms of ProdSys2, it is task and chain of tasks) with later usage in decision making processes. Machine learning ensemble methods are chosen to estimate completion time (i.e., “Time To Complete”, TTC) for every (production) task and chain of tasks, thus “abnormal” task processing times would warn about possible failure state of the system. This is the primary phase of the service and its precision is crucial.
The first implementation of such analytic service already includes Task TTC Estimator tool and is designed in a way to provide a comprehensive set of options to adjust the analysis process and possibility to extend its functionality.
Primary author
Mr
Mikhail Titov
(National Research Centre «Kurchatov Institute»)
Co-authors
Dr
Alexei Klimentov
(Brookhaven National Lab)
Mr
Dmitry Golubkov
(Institute for High Energy Physics)
Mr
Fernando Barreiro Megino
(University of Texas at Arlington)
Mr
Ivan Tertychnyy
(National Research Centre «Kurchatov Institute»)
Maksim Gubin
(Tomsk Polytechnic University)
Mr
Mikhail Borodin
(The University of Iowa (US))