Privacy-Preserving Multi-Party Keyword-Based Classification of Unstructured Text Data

Ian Pépin, Furkan Alaca, Farhana Zulkernine.

International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT) 2024, April 29 - May 1, Abu Dhabi, United Arab Emirates.

Abstract:

Collaborative data analytics and text classification are required in numerous application domains including health data analytics, which has strict privacy constraints. However, sharing data with other parties widens the attack surface and thus increases the risk of the data being compromised. Research on applications of secure multi-party computation to health data analytics have mostly focused on structured text or numeric data. In this work, we formulate a privacy-preserving multiparty scheme for performing keyword-based classification of unstructured text data, with a case study on medical text data. Our scheme uses arithmetic secret sharing and we implement it using the CrypTen framework. We also devise techniques that significantly reduce computation time without impacting accuracy, enhancing the practical feasibility of our approach.

Authors' copy (PDF) IEEE Xplore