9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Name: 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)
Start: 2017-12-03T12:00:00+03:00
End: 2021-07-09T19:05:00+03:00
Location: No location set

5–9 Jul 2021

Europe/Moscow timezone

Support

grid2021@jinr.ru

SQL query execution optimization on Spark SQL

6 Jul 2021, 16:20

15m

407 or Online - https://jinr.webex.com/jinr/j.php?MTID=m573f9b30a298aa1fc397fb1a64a0fb4b

Sectional reports 9. Big data Analytics and Machine learning Big data Analytics and Machine learning.

Gleb Mozhaiskii

The Spark – Hadoop ecosystem includes a wide variety of different components and can be integrated with any tool required for Big Data nowadays. From release-to-release developers of these frameworks optimize the inner work of components and make their usage more flexible and elaborate.
Anyway, since inventing MapReduce as a programming model and the first Hadoop releases data skew was and remains the main problem of distributed data processing. Data skew leads to performance degradation i.e., common slowdown of application execution and idle of the resources. The newest Spark framework versions allow handling this situation easily from the box. However, there is no opportunity to upgrade versions of tools and appropriate logic in the case of huge projects in which development was started years ago.
In this article, we consider approaches to execution optimization of SQL query in case of data skew on concrete example with HDFS and Spark SQL 2.3.2 version usage.

Gleb Mozhaiskii

Vladimir Korkhov (St. Petersburg State University) Ivan Gankevich (Saint Petersburg State University)

There are no materials yet.

9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Support

SQL query execution optimization on Spark SQL

407 or Online - https://jinr.webex.com/jinr/j.php?MTID=m573f9b30a298aa1fc397fb1a64a0fb4b

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021)

Support

Speaker

Description

Author

Co-authors

Presentation materials