Speaker
Description
Explicit numerical methods are used to solve and simulate a wide range of mathematical problems whose origins can be mathematical models of physical conditions. However, simulations with large model spaces can require a tremendous amount of floating point calculations and run times of several months or more are possible even on large HPC systems.
The vast majority of HPC systems in the field today are powered by x86 and ARM CPUs [1]. Our aim is to investigate methods of increasing computational speed for simulation on CPUs and also to compare the performance and energy efficiency on x86 and ARM CPUs. High-order finite difference time domain (FDTD) method to solve the 3D acoustic equation was used in our work.
For HPC, in conjunction with parallel computing, we used CPU capabilities like SIMD-computing (AVX on x86 and NEON on ARM) [2] and hierarchical structure of the memory of the CPU caches to optimize data locality. For data locality was used the method of changing order of traversal on the iteration space – loop tiling [3]. Our work considers a number of optimization tiling algorithms and test calculations for x86 and ARM architectures. In particular, we considered recursive and non-recursive cube-tiling [4] and ZCube data locality optimization.
We have found that ZCube increases the performance of SIMD-computations on ARM CPU [5] and speeds up computation with tiling on both CPU architectures. Also, as expected, we found that non-recursive tiling has better performance for the CPU architectures than recursive tiling due to less CPU cache misses. And finally, we found that ARM CPU have 12 times more performance/energy efficiency factor than x86 CPU.
In this respect, extending our experiments on ARM-cluster computing with increasing performance of non-recursive and recursive tiling would be of interest.
References
- http://www.top500.org/
- S. M. et. al., “Vector instructions to enable efficient
synchronization and parallel reduction operations,” U.S. Patent
WO2009120981A2, Oct. 2009. - J. Xue, “On tiling as a loop transformation,”Parallel Processing
Letters, vol. 07,no. 04, pp. 409–424, 1997. - V. Furgailo, A. Ivanov, and N. Khokhlov, “Research of techniques to
improve the performance of explicit numerical methods on the cpu,”
pp. 79–85, 09 2019. - J. Bakos,Embedded Systems: ARM Programming and Optimization.
Elsevier Science, 2015.