Speaker
Description
The practice of Output Statistical Disclosure Control has developed largely by consensus, a situation which is being challenged by a number of factors. First of these is the almost Cambrian Explosion in the number and scope of Trusted Research Environments as many domains move away from the ‘download’ model of enabling research. A second challenge is the accompanying proliferation of different forms of outputs requested (including AI models trained on sensitive data). The final driver is a growth in tools for (semi) automated assistance in the OSDC process, which because they arise from different domains, often differ in the types of risks they check for, and the range of mitigations they apply.
In this paper we describe the development of a formal specification of queries, risks, and mitigations. This leverages the taxonomy described in the ‘Statbarn’ framework, but also provides a basis for encompassing the risks posed by machine learning models. This specification has several features that we hope will assist the OSDC community. First, it provides easy-to-understand graphical representations that we hope will spark debate, encourage consensus-building, and be useful for training purposes.
Second, it uses an extension of the W3C ‘Data Privacy Vocabulary’ that means it is both human-readable and machine-actionable. We will describe how this has been used to create a ‘reference implementation’ via a refactoring of the SACRO toolkit.
Third, it creates a rigourous basis for making systematic and grounded comparisons between various OSDC tools (such as Tau/Mu-Argus, SACRO, DataSHIELD etc) and the mitigations offered by various ‘privacy preserving’ technologies.