15–17 Oct 2025
Poblenou Campus Auditorium
Europe/Zurich timezone

Machine learning methods to detect Correct Perturbation

16 Oct 2025, 14:35
14m
In-Person
Poblenou Campus Auditorium, Barcelona, Spain

Poblenou Campus Auditorium

Roc Boronat, 138 08018 Barcelona

Speaker

iain dove (Office for National Statistics)

Description

Trusted research environments have historically used rounding and thresholding as the recommended disclosure control method for exports of population data. However, within ONS Trusted Research Environments, for some datasets, perturbation is allowed in combination with thresholding. Code has been made available so researchers can create perturbed outputs using a specific level of noise.
This creates a problem: how can export checkers tell if an output has been correctly perturbed? Even with supporting information showing the raw counts, it is not obvious that a researcher has used the right method and parameters to create the perturbed counts for export.
To this end, machine learning methods were trialled on a set of synthetic training data (n=5000). Training data was created using perturbation code so datasets would resemble genuine exports. Five different types were produced, 50% were generated with the ‘correct’ method and parameters. Logistic Regression, XGBoost, Random Forest, K Nearest Neighbours, Naive Bayes and Support Vector Machine models were trained and evaluated.
This paper explores the results and how these models could be applied in the Trusted Research Environment context

Author

Samantha Trace (United Kingdom of Great Britain and Northern Ireland)

Presentation materials