9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Name: 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)
Start: 2017-12-03T12:00:00+03:00
End: 2021-07-09T19:05:00+03:00
Location: No location set

5–9 Jul 2021

Europe/Moscow timezone

Support

grid2021@jinr.ru

Data analysis platform for stream and batch data processing on hybrid computing resources

6 Jul 2021, 14:00

15m

407 or Online - https://jinr.webex.com/jinr/j.php?MTID=m573f9b30a298aa1fc397fb1a64a0fb4b

Sectional reports 9. Big data Analytics and Machine learning Big data Analytics and Machine learning.

Ivan Kadochnikov (JINR, PRUE)

The modern Big Data ecosystem provides tools to build a flexible platform for processing data streams and batch datasets. Supporting both the functioning of modern giant particle physics experiments and the services necessary for the work of many individual physics researchers generate and transfer large quantities of semi-structured data. Thus, it is promising to apply cutting-edge technologies to study these data flows and make the services ' provisioning more effective.
In this work, we describe the structure and implementation of our data analysis platform, built around an Apache Spark cluster. With the official support for GPU computing now available in Spark version 3, we propose a change in architecture to utilize these more performant resources while keeping the platform's functionality provided by using mainstream Big Data software. Furthermore, wanting GPU support necessitated a change of computing resource management infrastructure from Apache Mesos to Kubernetes. Finally, to show the features and operation of the system, we used the task of network packet analysis for security monitoring and anomaly detection in both batch and stream mode.

Ivan Kadochnikov (JINR, PRUE) Sergey Belov (Joint Institute for Nuclear Research, PRUE) Vladimir Korenkov (JINR, PRUE) Roman Semenov (JINR, PRUE) Petr Zrelov (JINR, PRUE)

analysis-platform_grid2021.pdf

analysis-platform_grid2021.pptx

9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Support

Data analysis platform for stream and batch data processing on hybrid computing resources

407 or Online - https://jinr.webex.com/jinr/j.php?MTID=m573f9b30a298aa1fc397fb1a64a0fb4b

Speaker

Description

Authors

Presentation materials

Choose timezone

9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Support

Speaker

Description

Authors

Presentation materials