Andrey Kondratyev
(JINR)
9/11/18, 1:30 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
Всвязи с увеличением объема информации получаемой в ходе эксперимента ALICE на Большом Адронном Коллайдере, повышаются требования к системам сбора данных с детекторов, например увеличение пропускной способности. Одним из возможных методов решения данной проблемы является использование TRB-3 платформы. Решение представляет собой глубокую модернизацию существующей модели сбора данных.
Mr
Anatoly Minzov
(MPEI)
9/11/18, 1:45 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
The article discusses issues of ensuring the integrity of information over a long period of time. This task was not raised earlier. However, experience shows that in the long periods of time in electronic archives there can be an uncontrolled change in information and even its disappearance. Attacks on the integrity of electronic archives can be targeted. This requires the creation of...
Mr
Mikhail Mineev
(JINR)
9/11/18, 2:00 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
The trigger information is an important part of the ATLAS event data. In the EventIndex trigger information is collected for various use cases, including event selection and overlap counting. Decoding the trigger information from the event records, stored as a bit mask, requires additional input from the conditions metadata database, as trigger configurations evolve with time. It depends on...
Mr
Vladimir Gaiduchok
(Saint Petersburg Electrotechnical University "LETI", Russia)
9/11/18, 2:15 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
Linux networking performance is excellent. Linux networking stack is perfect, it is very effective. There are many options that could be configured for different cases. This article is devoted to questions related to networking stack implementation and configuration. Questions related to effective configuration will be discussed. Possible problems and solutions will be shown. Different options...
Mr
Qingbao Hu
(IHEP)
9/11/18, 2:30 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
LHAASO The on line machinecomputer room of LHAASO experiment located at has high altitude and poor natural environment. As t, and there is no permanent resident maintenance manpowerpersonnel, so it needs to deploy an automatic operation and maintenance system for the remote management.
According to the characteristics of the LHAASO cluster management, we have designed a distributed monitoring...
Mr
Igor Pelevanyuk
(JINR)
9/11/18, 2:45 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
Tier-1 center for CMS in JINR has been successfully operating since 2015. Monitoring is an important aspect of ensuring its performance. Hardware monitoring of the Tier-1 center had been introduced at construction time and was constantly upgraded with the center. The scientific community makes use of the resources through Grid services that depend on more low-level services. A dedicated...
Tatiana Korchuganova
(National Research Tomsk Polytechnic University)
9/11/18, 3:30 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
Currently-running large-scale scientific projects involve unprecedented amounts of data and computing power. For example, the ATLAS experiment at the Large Hadron Collider (LHC) has collected 140 PB of data over the course of Run 1 and this value increases at rate of ~800 MB/s during the ongoing Run 2 and recently has reached 350 PB. Processing and analysis of such amounts of data demands...
Mr
Aleksandr Alekseev
(National Research Tomsk Polytechnic University)
9/11/18, 3:45 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
The BigPanDA monitoring system is a Web application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution of the payload among available resources to issue tracking related to any of 350k jobs running...
Ms
Grigorieva Maria
(NRC KI)
9/11/18, 4:00 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
ATLAS is the largest experiment at the LHC. It generates vast volumes of scientific data accompanied with auxiliary metadata. These metadata represent all stages of data processing and Monte-Carlo simulation, as well as characteristics of computing environment, such as software versions and infrastructure parameters, detector geometry and calibration values. The systems responsible for data...
Mr
Andrei Kazymov
(JINR), Mr
Evgeny Alexandrov
(JINR)
9/11/18, 4:15 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
The ATLAS EventIndex collects event information from data both at CERN and Grid sites. It uses the Hadoop system to store the results, and web services to access them. Its successful operation depends on a number of different components, that have to be monitored constantly to ensure continuous operation of the system. Each component has completely different sets of parameters and states and...
Igor Tkachenko
(NRC "Kurchatov Institute")
9/11/18, 4:30 PM
2. Operation, monitoring, optimization in distributed computing systems
Plenary reports
The issues of development and modernization of the Tier-1 center at the National Research Center "Kurchatov Institute" are considered in accordance with the changing requirements of experiments at the Large Hadron Collider. Increasing requirements for computing resources, drived by increase in simulations, led to an increase in their volumes, which in turn required the development of...
Victoria Matskovskaya
9/11/18, 4:45 PM
2. Operation, monitoring, optimization in distributed computing systems
Sectional reports
High energy physics community needs metrics that allow to characterize the resource usage of the experiments workloads detailed enough so that the impact of changes in the infrastructure or the workload implementations can be quantified with a precision high enough to guide design decisions towards improved efficiencies. This model has to express the resource utilization of the workloads in...
Ivan Kashunin
(JINR)
9/11/18, 5:00 PM