Adaptive Pilot Framework for Distributed Workload Execution in the SPD Online Filter System

29 Oct 2025, 14:15
15m
4th floor, 456 (MLIT)

4th floor, 456

MLIT

Oral Information Technology Information Technology

Speaker

Леонид Романычев

Description

Pilot systems are widely used in distributed computing as a flexible mechanism for dynamic workload management and resource allocation. They have proven effective in large-scale experiments and high-performance environments thanks to their scalability and adaptability. However, the absence of a common abstraction and unified best practices has led to a variety of implementations, often with limited interoperability.
In this presentation, we will explore the architectural principles and operational models underlying pilot frameworks, with special attention to late binding — a key feature that supports efficient resource utilization and adaptive task scheduling. We introduce our implementation tailored for the SPD experiment: a two-layer solution combining a pilot process and a monitoring daemon. The system employs multithreading to ensure effective scheduling, supervision, and reporting. We will share practical lessons learned from deploying this framework in the SPD online filter system, emphasizing its impact on distributed workload execution.

Author

Леонид Романычев

Presentation materials