15–17 Oct 2025
Poblenou Campus Auditorium
Europe/Zurich timezone

Output SDC Measures and User Behaviour Detection in microdata.no

16 Oct 2025, 09:15
14m
In-Person
Poblenou Campus Auditorium, Barcelona, Spain

Poblenou Campus Auditorium

Roc Boronat, 138 08018 Barcelona

Speaker

Vidar Klungre (Statistics Norway)

Description

The traditional approach to accessing register-based microdata requires researchers to apply for data on a project-by-project basis, a time-consuming process as each application must be manually reviewed and approved before the relevant data can be extracted and handed out.
A more flexible approach is to grant a broader range of researchers from authorized institutions quick access to a secure, microdata-based remote analysis platform where they can conduct their research. In order for such a platform to adhere to the Five Safes framework, it should not directly expose the microdata to users but instead only provide access through a given scripting language. However, this does not fully mitigate the risk, as the output from user queries — especially when executed sequentially and combined — could still be used to infer confidential information in the microdata. Therefore, additional measures are needed to secure the output, ideally with minimal loss of utility.

In this paper, we present the confidentiality measures implemented in the analysis tool microdata.no, a dynamic microdata remote analytics platform providing researchers access to Norwegian register data.
While microdata.no does not display the microdata directly to users, it offers significant flexibility through an expressive scripting language with commands for selecting variables, filtering units, creating, removing or modifying variables, and executing statistical procedures like summarization, tabulation, grouping, and regression. Given the language's high expressivity, additional output safeguards are necessary to prevent disclosure of sensitive information. In line with this goal, we have incorporated automatic output SDC measures in the platform, including, but not limited to, population-based noise on unit counts, suppression of small groups, winsorization, and reduced output precision. These measures minimize distortion for analyses of large populations, which have higher analytical value, while providing greater protection for smaller groups, where the risk of disclosure is greater. To complement the output SDC measures, we have implemented high-level detection mechanisms to examine user-submitted scripts, aiming to identify patterns indicating potential misuse. The insights gained from our work are applicable beyond microdata.no, providing a foundation for implementing similar safeguards in other flexible microdata-based platforms.

Author

Vidar Klungre (Statistics Norway)

Presentation materials