Speaker
Dr
Ilija Vukotic
(University of Chicago)
Description
The ATLAS Data analytics effort is focused on creating systems which provide the ATLAS ADC with new capabilities for understanding distributed systems and overall operational performance. These capabilities include: warehousing information from multiple systems (the production and distributed analysis system - PanDA, the distributed data management system - Rucio, the file transfer system, various monitoring services etc. ); providing a platform to execute arbitrary data mining and machine learning algorithms over aggregated data; satisfy a variety of use cases for different user roles; host new third party analytics services on a scalable compute platform. We describe the implemented system where: data sources are existing RDBMS (Oracle) and Flume collectors; a Hadoop cluster is used to store the data; native Hadoop and Apache Pig scripts are used for data aggregation; and R for in-depth analytics. Part of the data is indexed in ElasticSearch so both simpler investigations and complex dashboards can be made using Kibana.
Primary author
Dr
Ilija Vukotic
(University of Chicago)
Co-authors
Mr
Lincoln Bryant
(University of Chicago)
Dr
Robert Gardner
(University of Chicago)