Little is Enough: Improving Privacy by Sharing Labels in Federated Semi-Supervised Learning

Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoff Webb, Michael Kamp

May, 2024

Abstract

In many critical applications, sensitive data is inherently distributed and cannot be centralized due to privacy concerns. A wide range of federated learning approaches have been proposed in the literature to train models locally at each client without sharing their sensitive local data. Most of these approaches either share local model parameters, soft predictions on a public dataset, or a combination of both. This, however, still discloses private information and restricts local models to those that lend themselves to training via gradient-based methods. To reduce the amount of shared information, we propose to share only hard labels on a public unlabeled dataset, and use a consensus over the shared labels as a pseudo-labeling to be used by clients. The resulting federated co-training approach empirically improves privacy substantially, without compromising on model quality. At the same time, it allows us to use local models that do not lend themselves to the parameter aggregation used in federated learning, such as (gradient boosted) decision trees, rule ensembles, and random forests.

Publication

arXiv

Computer Science - Machine Learning

Little is Enough: Improving Privacy by Sharing Labels in Federated Semi-Supervised Learning

Abstract

Amr Abourayya

PhD Student

Jens Kleesiek

Professor of Translational Image-guided Oncology

Michael Kamp

Team Lead Trustworthy Machine Learning