Federated Learning and Differential Privacy: a step-change in private AI
- OctaiPipe, the Federated Edge AI platform for Industrial IoT, has implemented differential privacy into its platform to safeguard against the risk posed by various types of privacy attacks.
- Through differential privacy, there is potential for Federated Learning platforms to further mitigate the risk of these attacks by adding noise to the data – a privacy-by-design approach that guarantees device security across a range of critical systems.
Introduction
The Internet of Things is becoming increasingly stitched into the fabric of critical infrastructure across many industries – from energy and utilities to manufacturing and transportation. Industrial IoT, supercharged by advances in AI, is helping to drive new productivity gains, but the distribution of devices across networks demands a rethink of how data is secured, especially in sensitive industries. Federated Learning (FL) is a privacy preserving machine learning technique which aims to tackle this issue by moving the model to the data, rather than the other way around. The most important quality of a trustworthy Federated Learning infrastructure is the ability to provide a private, secure and scalable platform. However, even with this distributed and decentralised approach, there are often still concerns that the shared model results may be reverse engineered.
Differential privacy (DP) addresses this by adding noise to the data, ensuring that data points cannot be distinguished or reverse engineered from aggregated data, maintaining system privacy while enabling data collection and analysis. By implementing differential privacy in OctaiPipe’s federated learning platform, OctaiPipe has strengthened the privacy guarantee, making it much more difficult for attackers to breach privacy, and enable AI builders and users to operate in sensitive and regulated domains.
Solution
While AI in general is still in a fluid regulatory state, as national and international bodies work out how best to govern it, it is still incumbent on those that use it to ensure the integrity and privacy of their data.
OctaiPipe’s study shows that privacy can be baked into AI platforms by design. Differential privacy is a well-researched and well-tested method for keeping sensitive data in FL infrastructure safe. It is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while limiting information that is leaked about specific individuals.[1]
However, as OctaiPipe’s research indicates, differential privacy is primarily applicable to neural network architectures. In emerging and specialised areas like FL-XGBoost, the application of DP is less understood, and potential attack strategies have not been fully explored. Consequently, the use of DP in FL-XGBoost remains an area that requires further investigation and implementation.
OctaiPipe implements the most advanced, state-of-the-art features—such as local and global differential privacy (DP) with flat and adaptive clipping—specifically for neural network architectures. In local DP, noise is added by data teams to the model updates, ensuring privacy and confidentiality even if a server breach occurs, though this may slightly reduce model utility and accuracy. In contrast, global DP involves adding noise on the server side, safeguarding privacy even in the event of a breach in the communication channel between the server and clients.
This highlights the privacy/accuracy trade-offs that must be considered when developing and implementing neural networks. And it presents a solution to that trade-off so that companies – whether they be manufacturing OEMs, public utilities bodies, or energy companies – can leverage the extra productivity that comes with Federated Learning, safely.
Results
OctaiPipe’s implementation offers several encouraging takeaways:
- OctaiPipe’s implementation of global DP and local DP keeps accuracy as expected with increases and decreases in the privacy budget. Local DP performed within the same accuracy range as global DP, as the number of clients is relatively small. When increasing the number of clients, global DP will theoretically perform better, as noise is getting added only once from the server side.
- In between flat and adaptive clipping, the latter performed slightly better in the case of local DP as it helps to get the optimised clipping threshold depending on client’s dataset.
- Noise should be added so that the privacy budget lies in between 0.75 to 2 per FL round, which is the optimum range for privacy guarantee while also maintaining a suitable accuracy level.
Conclusion
What this shows is a path forward for companies who want to implement IoT within their critical infrastructure but want to rest assured of maximum security for data across their networks. Beyond the technical detail, this approach to differential privacy will enable organisations to keep their sensitive data more secure, while unlocking new opportunities to scale, enabling critical infrastructure data teams to confidently build and orchestrate networks of intelligent devices – ultimately improving the way we live and work across multiple domains.
[1] https://www.semanticscholar.org/paper/Differential-Privacy-%3A-A-Historical-Survey-Hilton-Cal/4c99097af05e8de39370dd287c74653b715c8f6a