When the training data for machine learning are highly personal or sensitive, collaborative approaches can help a collective of stakeholders to train a model together without having to share any data. But there are still risks to the privacy of the data. This Perspective provides an overview of potential attacks on collaborative machine learning and how these threats could be addressed.
Despite the rapid increase of data available to train machine-learning algorithms in many domains, several applications suffer from a paucity of representative and diverse data. The medical and financial sectors are, for example, constrained by legal, ethical, regulatory and privacy concerns preventing data sharing between institutions. Collaborative learning systems, such as federated learning, are designed to circumvent such restrictions and provide a privacy-preserving alternative by eschewing data sharing and relying instead on the distributed remote execution of algorithms. However, such systems are susceptible to malicious adversarial interference attempting to undermine their utility or divulge confidential information. Here we present an overview and analysis of current adversarial attacks and their mitigations in the context of collaborative machine learning. We discuss the applicability of attack vectors to specific learning contexts and attempt to formulate a generic foundation for adversarial influence and mitigation mechanisms. We moreover show that a number of context-specific learning conditions are exploited in similar fashion across all settings. Lastly, we provide a focused perspective on open challenges and promising areas of future research in the field.
«
When the training data for machine learning are highly personal or sensitive, collaborative approaches can help a collective of stakeholders to train a model together without having to share any data. But there are still risks to the privacy of the data. This Perspective provides an overview of potential attacks on collaborative machine learning and how these threats could be addressed.
Despite the rapid increase of data available to train machine-learning algorithms in many domains, several...
»