Speaker
Mr
Alexander Novikov
(National Research Centre "Kurchatov Institute")
Description
Report presents the approach and its software implementation of high-throughput parallel pipelined data processing system on the example of remote sensing of Earth from satellites (up to 1 Tb of data daily). The system is developed for processing potentially infinite data flow as it emerging in realtime. Data flow can be processing by set of subtasks steps( pipeline) with one or many required input files interconnected in a highly customizable way according to set of their parameters. That parameters could be outer or inner( metadata headers) file properties, for example sensor name, type, date, time, geolocation, besides file may have newer versions and be optional. New pipelines could be added, configured and tuned on the fly, sharing with priorities available cluster resources and new or the same data. The system operates at local common data storage for runtime computation and remote data sources and backup storage for results. Supported access and transport protocols for data at different locations programmingly extendable. Special description language flexsible adopts different computational tasks and applied solutions decreasing time for solution deployment and result gaining. Dynamically scalable by the system pool of cloud resources increases usage efficiency. The system controls pipelines, instances of subtasks and virtual machines lifetime, pipelines synchronization, statistics collection, errors detection and auto correction. The system is intended for sustainability with failover in any part of operation.
Primary author
Mr
Alexander Novikov
(National Research Centre "Kurchatov Institute")
Co-authors
Mr
Alexey Poyda
(NRC KURCHATOV INSTITUTE)
Vasilij Aulov
(National Research Centre "Kurchatov Institute")