Expert Meeting on Statistical Data Confidentiality

Name: Expert Meeting on Statistical Data Confidentiality
Start: 2025-10-15T08:45:00+02:00
End: 2025-10-17T16:00:00+02:00
Location: Poblenou Campus Auditorium

15–17 Oct 2025

Poblenou Campus Auditorium

Europe/Zurich timezone

Chris Jones

jonesc@un.org

Enhancing Statistical Disclosure Control using Large Language Models

16 Oct 2025, 14:50

14m

In-Person

Poblenou Campus Auditorium, Barcelona, Spain

Poblenou Campus Auditorium

Roc Boronat, 138 08018 Barcelona

Machine Learning and Artificial Intelligence versus Disclosure Control

Titouan Rigaud (CASD Secure Data Hub)

In 2023, CASD introduced a system to detect exports that do not comply with statistical secrecy [1]. This approach, based on feature generation from groups of exported files and the training of a boosting model, showed promise but precision could improve. The system relied on historical data from past Statistical Disclosure Control (SDC) expert reviews, where decisions (Accepted/Refused) served as labels, and structured features were extracted from the reviewed exports. The primary goal was to identify situations that could pose a risk to confidential data and trigger alerts for expert review. Additionally, this dataset provided a foundation for data augmentation techniques to enhance model robustness.
An initial improvement to the system involved shifting the compliance prediction from the export level (a group of files) to the individual file level. This refinement allowed the model to be trained on a larger dataset, improving detection accuracy. The transition also necessitated a new approach to file representation and analysis, particularly through the use of structural components.
In this paper, we propose a further enhancement to increase the system’s reliability. By leveraging Large Language Models (LLMs), we introduce new predictive features that enrich the risk rating model. In particular, LLMs enable the detection of key attributes previously missing from the system, such as identifying columns containing headcounts and ensuring compliance with minimum value requirements.
Beyond numerical validation, LLMs also enhance the system’s ability to analyze textual content, determining whether exported data consists of code, structured datasets, or free-text documents. A crucial aspect is assessing the human readability of exported content, ensuring that flagged files can be manually reviewed when necessary. By improving transparency and traceability, this integration strengthens both the reliability and interpretability of statistical secrecy compliance checks.

[1] Rigaud, Titouan, et al. "Checking data outputs from research works: a mixed method with ai and human control." United Nations Economic Commission for Europe (UNECE) Conference of European Statisticians (CES): Expert meeting on Statistical Data Confidentiality. 2023.

Mr Cédric Hansen (CASD Secure Data Hub) Titouan Rigaud (CASD Secure Data Hub)

SDC2025_Sc_CASD_Rigaud.pdf

Expert Meeting on Statistical Data Confidentiality

Chris Jones

Enhancing Statistical Disclosure Control using Large Language Models

Poblenou Campus Auditorium

Speaker

Description

Authors

Presentation materials

Choose timezone

Expert Meeting on Statistical Data Confidentiality

Chris Jones

Speaker

Description

Authors

Presentation materials