15–17 Oct 2025
Poblenou Campus Auditorium
Europe/Zurich timezone

Confidentiality and disclosure risk from administrative data

16 Oct 2025, 17:20
14m
In-Person
Poblenou Campus Auditorium, Barcelona, Spain

Poblenou Campus Auditorium

Roc Boronat, 138 08018 Barcelona

Speaker

Gillian Raab

Description

Recent years have seen an increased pressure to allow information derived from administrative data to be used to inform policy; see for example the Sturrock Report, 2024. Several organisations have been set up in the UK to develop policies to facilitate this. When data access is given to researchers, who are not part of the organisation that owns the data, there is a concern that there may be a disclosure risk that will compromise the privacy of the data subjects, who may be individuals or organisations. The possibility of a breach of confidentiality depends crucially on how the data are to be made available; the options for this are a spectrum from free availability on a web site to highly restricted access, requiring formal applications to access data in a controlled and monitored environment. Many intermediate types of access exist, including those where the data held by the data owner is modified to reduce its disclosure potential. Methods of reducing disclosure risk include statistical disclosure control procedures (sdc) as well as the creation of synthetic versions of the original data.
Metrics to assess disclosure risk are required to evaluate any possible loss of confidentiality, but there is little consensus on how they should be used. Two types of disclosure risk are usually defined
1. Identity disclosure: Learning that an individual or organisation is present in a data set
2. Learning something new about a subject who the person enquiring believes to be in the administrative data
Both of these metrics require the specification of quasi-identifiers (Qis) to define items that an intruder might be expected to know about a subject who is being investigated.
This presentation will review a number of the aspects of such metrics. These include what records they should be applied to, how they should be standardised, and other practical issues of how thresholds for the metrics should be decided. Illustrations will use data that has been subjected to SDC as well as synthetic data.

Authors

Co-author

Chris Dibben (University of Edinburgh)

Presentation materials