11th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2025)

Name: 11th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2025)
Start: 2025-07-07T09:00:00+03:00
End: 2025-07-11T18:00:00+03:00
Location: No location set

7–11 Jul 2025

Europe/Moscow timezone

Support

grid2025@jinr.ru

Impact of anonymization level on the resilience of dataset clusters in big data

10 Jul 2025, 16:45

15m

Room 420

Sectional talk Round Table on the Areas of Work of the SPbSU-JINR Joint Scientific and Educational Laboratory

Mr Александр Дик

Personal data anonymization is an important step in dataset preprocessing, especially when dealing with sensitive information. However, the impact of this process on the quality of clustering remains poorly understood. The presented study analyzes the impact of different anonymization techniques affect the clustering results. The experimental part of the work is based on the application of ISODATA, maximin distance (Maximin) and hierarchical clustering algorithms to different datasets. The results obtained demonstrate that, for a limited number of features, depersonalization contributes to a clearer separation of the resulting clusters and while preserving the overall data structure and its trend. These findings indicate a future problem with the risks of personal data de-identification.

Mr Александр Дик

Alexander Bogdanov (St. Petersburg University St. Petersburg, Russia) Mr Egor Savkov (Consern Avrora scientific and production association jsc) JASUR KIYAMOV Nadezhda Shchegoleva (St. Petersburg University St. Petersburg, Russia) Dr Геннадий Дик

ДУбна 2025 - Кластеризация .pdf

11th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2025)

Support

Impact of anonymization level on the resilience of dataset clusters in big data

Room 420

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

11th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2025)

Support

Speaker

Description

Author

Co-authors

Presentation materials