9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Name: 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)
Start: 2017-12-03T12:00:00+03:00
End: 2021-07-09T19:05:00+03:00
Location: No location set

5–9 Jul 2021

Europe/Moscow timezone

Support

grid2021@jinr.ru

Research of improving the performance of explicit numerical methods on the x86 and ARM CPU

8 Jul 2021, 14:30

15m

403 or Online - https://jinr.webex.com/jinr/j.php?MTID=mf93df38c8fbed9d0bbaae27765fc1b0f

https://jinr.webex.com/jinr/j.php?MTID=mf93df38c8fbed9d0bbaae27765fc1b0f

Sectional reports 5. High Performance Computing HPC

Vladislav Furgailo

Explicit numerical methods are used to solve and simulate a wide range of mathematical problems whose origins can be mathematical models of physical conditions. However, simulations with large model spaces can require a tremendous amount of floating point calculations and run times of several months or more are possible even on large HPC systems.
The vast majority of HPC systems in the field today are powered by x86 and ARM CPUs [1]. Our aim is to investigate methods of increasing computational speed for simulation on CPUs and also to compare the performance and energy efficiency on x86 and ARM CPUs. High-order finite difference time domain (FDTD) method to solve the 3D acoustic equation was used in our work.
For HPC, in conjunction with parallel computing, we used CPU capabilities like SIMD-computing (AVX on x86 and NEON on ARM) [2] and hierarchical structure of the memory of the CPU caches to optimize data locality. For data locality was used the method of changing order of traversal on the iteration space – loop tiling [3]. Our work considers a number of optimization tiling algorithms and test calculations for x86 and ARM architectures. In particular, we considered recursive and non-recursive cube-tiling [4] and ZCube data locality optimization.
We have found that ZCube increases the performance of SIMD-computations on ARM CPU [5] and speeds up computation with tiling on both CPU architectures. Also, as expected, we found that non-recursive tiling has better performance for the CPU architectures than recursive tiling due to less CPU cache misses. And finally, we found that ARM CPU have 12 times more performance/energy efficiency factor than x86 CPU.
In this respect, extending our experiments on ARM-cluster computing with increasing performance of non-recursive and recursive tiling would be of interest.
References

http://www.top500.org/
S. M. et. al., “Vector instructions to enable efficient
synchronization and parallel reduction operations,” U.S. Patent
WO2009120981A2, Oct. 2009.
J. Xue, “On tiling as a loop transformation,”Parallel Processing
Letters, vol. 07,no. 04, pp. 409–424, 1997.
V. Furgailo, A. Ivanov, and N. Khokhlov, “Research of techniques to
improve the performance of explicit numerical methods on the cpu,”
pp. 79–85, 09 2019.
J. Bakos,Embedded Systems: ARM Programming and Optimization.
Elsevier Science, 2015.

Vladislav Furgailo Mr Egor Elchinov Nikolay Khokhlov (MIPT)

Research of improving the performance of explicit numerical methods on the x86 and ARM CPU.pdf

9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Support

Research of improving the performance of explicit numerical methods on the x86 and ARM CPU

403 or Online - https://jinr.webex.com/jinr/j.php?MTID=mf93df38c8fbed9d0bbaae27765fc1b0f

Speaker

Description

Authors

Presentation materials

Choose timezone

9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Support

Speaker

Description

Authors

Presentation materials