Impact of anonymization level on the resilience of dataset clusters in big data

Speaker

Mr Александр Дик

Description

Personal data anonymization is an important step in dataset preprocessing, especially when dealing with sensitive information. However, the impact of this process on the quality of clustering remains poorly understood. The presented study analyzes the impact of different anonymization techniques affect the clustering results. The experimental part of the work is based on the application of ISODATA, maximin distance (Maximin) and hierarchical clustering algorithms to different datasets. The results obtained demonstrate that, for a limited number of features, depersonalization contributes to a clearer separation of the resulting clusters and while preserving the overall data structure and its trend. These findings indicate a future problem with the risks of personal data de-identification.

Author

Mr Александр Дик

Co-authors

Alexander Bogdanov (St. Petersburg University St. Petersburg, Russia) Mr Egor Savkov (Consern Avrora scientific and production association jsc) JASUR KIYAMOV Nadezhda Shchegoleva (St. Petersburg University St. Petersburg, Russia) Dr Геннадий Дик

Presentation materials

There are no materials yet.