Overlapping Computation and Communication in Matrix-Matrix Multiplication Algorithm for Multiple GPUs

8 Jul 2021, 16:15
15m
403 or Online - https://jinr.webex.com/jinr/j.php?MTID=mf93df38c8fbed9d0bbaae27765fc1b0f

403 or Online - https://jinr.webex.com/jinr/j.php?MTID=mf93df38c8fbed9d0bbaae27765fc1b0f

https://jinr.webex.com/jinr/j.php?MTID=mf93df38c8fbed9d0bbaae27765fc1b0f
Sectional reports 5. High Performance Computing HPC

Speaker

Yea Rem Choi (HSE)

Description

In this talk, we discuss the optimal strategy for parallel matrix-matrix multiplication algorithm that minimizes the time-to-solution by finding the best parameters of the algorithm for overlapping multiplications of separate tiles in each GPU and data transfers between GPUs. The new algorithm developed for multi-GPU nodes is discussed [1]. The correlation is analyzed between the optimal parameters of the algorithm and the hardware specifications (e.g. the floating point performance and the memory bandwidth). The results are illustrated by the benchmarks made for different Nvidia GPU connected with PCIe or NVLink.

[1] Choi Y. R., Nikolskiy V., Stegailov V. Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink // 2020 Global Smart Industry Conference (GloSIC). – IEEE, 2020. – С. 354-361.

Primary authors

Presentation materials

There are no materials yet.