Speaker
Description
Synthetic data is often hailed as the future of safe data access – but in practice, it is insufficient for a method to be mathematically private or analytically useful: if legal and privacy teams do not understand the guarantees, they cannot confidently allow its use. This creates a critical but underexplored tension between cutting-edge privacy techniques and real-world operational requirements: the need for explainability.
The UK’s Office for National Statistics (ONS) have overcome this challenge and present a generalisable framework for productionalising high-fidelity, privacy-preserving synthetic data, designed to meet the UK’s legal and regulatory standards – including UK GDPR and the Statistics and Registration Service Act 2007 – while remaining explainable to non-technical stakeholders.
The framework is built around an adapted version of the MST (Maximum Spanning Tree) method, a state-of-the-art approach to differentially private data synthesis. We demonstrate how we made technical adaptations to MST to allow stakeholder involvement from the outset, and how we reframed key parameter choices such as $(\epsilon,\, \delta)$ in terms of familiar disclosure controls such as cell suppression.
We illustrate this framework in the setting of the generation of an $\epsilon = 1$ differentially-private synthetic linked census and death register data set, providing robust measures of utility of the data, alongside insights into how the ONS are now using this data to enable cross-government data sharing.