Speaker
Description
The rapid growth in data volume and complexity has exposed the limitations of traditional storage solutions. While data lakes offer scalable handling of large and unstructured datasets, they fall short in integrating data across distributed sites - a critical requirement for modern workflows such as machine learning that demand seamless, aggregated access to diverse data sources.
In this study, we show a way for efficient implementation of an architecture, called Data Mesh, designed to unify independent data sites into a cohesive storage ecosystem. Our approach combines hierarchical storage techniques with advanced virtualization technologies. By deploying virtual container clusters and dynamic migration services, Data Mesh achieves high agility and scalability, enabling efficient data placement and real-time access across dispersed repositories.
Central to our design is a distributed metadata layer that maintains a virtual representation of all data assets. That integration service orchestrates metadata synchronization and governs the interaction between hierarchical storage tiers and migration mechanisms. This unified virtual data plane facilitates seamless data discovery, governance, and analysis without compromising individual site autonomy.
Data Mesh represents the next step in evolution of storage architectures, addressing the needs of large-scale, multi-site projects. It offers a dynamic, scalable, and integrated platform capable of supporting demanding machine learning and analytics applications in complex environments.