Grid datasets popularity estimation using gradient boosting

7 Jul 2016, 13:30
15m
406A

406A

Sectional reports 10. Databases, Distributed Storage systems, Big data Analytics 10. Databases, Distributed Storage systems, Big data Analytics

Speaker

Daria Chernyuk (Andreevna)

Description

In this paper we present machine learning based technique to estimate access patterns for grid datasets. We have analyzed 3-year historical data of Kurchatov Institute Tier1 site and applied gradient boosting algorithm to predict nearest future access patterns for the dataset based on its previous access statistics. We show our method to be effective for ATLAS data popularity estimation. Our method can be used to optimize grid storage in two ways: to move unpopular datasets to tape storage to save disk space and to increase number of replicas of popular datasets to reduce access latency.

Primary authors

Anton Teslyuk (Borisovich) Daria Chernyuk (Andreevna)

Presentation materials

There are no materials yet.