25–29 Sept 2017
Montenegro, Budva, Becici
Europe/Podgorica timezone

Resource sharing based on HTCondor for multiple experiments

28 Sept 2017, 16:55
15m
Conference Hall (Montenegro, Budva, Becici)

Conference Hall

Montenegro, Budva, Becici

Splendid Conference & SPA Resort, 85315 Becici, Montenegro Hotel Splendid
Sectional Computing for Large Scale Facilities (LHC, FAIR, NICA, SKA, PIC, XFEL, ELI, etc.) Distributed Computing. GRID & Cloud computing

Speaker

Dr Jingyan Shi (INSTITUTE OF HIGH ENERGY PHYSICS, Chinese Academy of Science)

Description

HTCondor, a scheduler focusing on high throughput computing has been more and more popular in high energy physics computing. The HTCondor cluster with more than 10,000 cpu cores running at computing center, institute of high energy physics in China, supports several HEP experiments, such as JUNO, BES, Atlas, Cms etc. The work nodes owned by the experiments are managed by HTCondor. A sharing pool including the work nodes contributed by all HEP experiments has been created to meet the peak computing requirement from the different experiments during different time periods. To manage the sharing pool, a database is used to store the cluster’s information including nodes and groups attributes. The attributes can be adjusted by the cluster manager and published to both scheduler servers and work nodes via http protocol. A monitoring dog is developed to monitor the work nodes health status and report to the database. Both servers and work nodes update their own configuration based on the attributes published by the database. The whole resource utilization rate of the cluster has been promoted from 50% to more than 80% after the sharing pool is created.

Primary author

Dr Jingyan Shi (INSTITUTE OF HIGH ENERGY PHYSICS, Chinese Academy of Science)

Presentation materials