Speaker
Description
The Jiangmen Underground Neutrino Observatory (JUNO) is a major international neutrino experiment located in Kaiping City, Guangdong Province, southern China. To support its large-scale data processing needs, JUNO has adopted a distributed computing model based on the Worldwide LHC Computing Grid (WLCG) architecture. The JUNO distributed computing infrastructure includes collaborative sites from China, Italy, France, and Russia.
To ensure the stability, efficiency, and accountability of this international computing network, we have developed a monitoring system tailored for JUNO’s distributed computing environment. This system is designed to continuously track the operational status of computing sites and core services, as well as to account for cumulative resource usage across all participating centers. Leveraging a dedicated workflow management tool, it executes site-level Service Availability Monitoring (SAM) tests and aggregates diagnostic and performance metrics.
Currently, the system provides real-time data collection and interactive visualization capabilities across several critical areas, including site availability and reliability, data transfer performance, computing and storage resource statistics, and the status of essential grid and cloud services. This monitoring framework provides essential support for the daily operation and performance analysis of JUNO’s distributed computing system.