15–17 Oct 2025
Poblenou Campus Auditorium
Europe/Zurich timezone

Contribution List

44 out of 44 displayed
  1. Dr Basheer Kalash (Luxembourg National Data Service (LNDS))
    15/10/2025, 09:45

    We implement a synthetic data generation framework on a pseudonymized subset of the 2021 Census data for Luxembourg. Focusing on seven categorical variables—including ordered age and education—we drop unique records upfront to mitigate the risk of singling out. Synthetic data are produced via the CART method in the synthpop package. Utility is measured using the propensity score mean-squared...

    Go to contribution page
  2. Emma Fössing (Institute for Employment Research, Nueremberg, Germany)
    15/10/2025, 10:00

    Disseminating synthetic data enables easy access to data that retains statistical similarities to the original data if access to sensitive data is restricted. However, the model employed when generating the synthetic data may influence the structure of the data, potentially affecting subsequent predictive analysis. This paper empirically investigates whether the choice of synthesis model...

    Go to contribution page
  3. Jui Andreas Tang (Germany)
    15/10/2025, 10:25

    The demand for georeferenced data is increasing, while sharing proprietary location data poses privacy and confidentiality challenges. This study investigates the use of synthetic data generators (SDGs) to protect sensitive locations in georeferenced datasets. We propose transforming spatial coordinates into a one-dimensional index via a Hilbert space-filling curve, thereby preserving local...

    Go to contribution page
  4. Felix Ritchie (University of West of England)
    15/10/2025, 10:40

    The Trusted Research Environment (TRE, or Research Data Environment, RDC) has been the great success story of data access this century. By providing highly secure yet flexible access, the TRE has enabled research use of the most sensitive data. In its turn the development of the TRE has led to significant developments in research data governance, particularly output disclosure control. The TRE...

    Go to contribution page
  5. Luis Del Vasto Terrientes (Universitat Rovira i Virgili)
    15/10/2025, 11:25

    Differential privacy (DP) has become the de facto data protection mechanism due to its strong privacy guarantees. The mathematical foundation of $\epsilon$-DP is based on the principle that the presence or absence of any record in a data set should not influence the protected result by more than an exponential factor determined by the parameter $\epsilon$. Even though DP was originally...

    Go to contribution page
  6. Mark Eliot
    15/10/2025, 11:40

    Teaching versions of datasets are an important component of the data discovery pipeline. These datasets often serve as an introduction to the data for potential users, allowing them to explore the data and assess the relevance of a dataset to their needs. However, in cases where source data is only available in restricted settings, such as trusted research environments (TREs), then capacity to...

    Go to contribution page
  7. Roman Müller (University of Applied Sciences and Arts Northwestern Switzerland)
    15/10/2025, 12:00

    This contribution addresses the intersection of statistical disclosure control and the special requirements of psychological research. Exemplarily, we show the unique sensitivity and complexity of empirical data from psychological research and the problems and possibilities to anonymize them.

    The replication crisis in psychology (Open Science Collaboration, 2015; Camerer et al., 2018) has...

    Go to contribution page
  8. Yannik Garcia Ritz (Germany)
    15/10/2025, 12:15

    When evaluating the scientific worth of microdata, formally anonymized data provides maximum research potential. But this data can only be accessed onsite via Remote Execution or Safe Centers which offers little convenience for data users. In contrast, factual anonymized data can be accessed from the institutional workspace (offsite access, e. g. Scientific Use Files; SUFs) but offer less...

    Go to contribution page
  9. Julien Jamme (France)
    15/10/2025, 14:30

    Statistical institutes face a major challenge when transitioning from suppressive to perturbative disclosure control methods: how to objectively calibrate protection parameters. While the Cell Key Method (CKM) effectively protects frequency tables by adding controlled noise, selecting optimal parameters remains a serious challenge. We present an evidence-based framework for calibrating CKM's...

    Go to contribution page
  10. Mr Lars-Erik Almberg (Statistics Sweden)
    15/10/2025, 14:45

    Data in tables published for the Swedish R&D survey in the business enterprise sector (BERD) were previously protected by cell suppression to prevent disclosure of sensitive information. In order to avoid cell suppression, key respondents were asked to sign waivers allowing the publication of their data. However, consent was rarely given to disseminate cells where an enterprise’s data...

    Go to contribution page
  11. Mr Peter-Paul De Wolf, Ms Sarah Giessing (Germany)
    15/10/2025, 15:10

    The package τ-Argus is a widely used EU-funded Open Source tool for disclosure control in tabular data. It is automatable via batch functionality, and as Open Source package it is supposed to be easy to adapt, and transparent. As there is growing demand for specific τ-ARGUS functionalities to be provided “as a service”, the paper will discuss pros and cons of a fundamental revision of the...

    Go to contribution page
  12. Mr Shunsuke Kato (National Statistics Center / Statistical Research and Training Institute, Ministry of Internal Affairs and Communications)
    15/10/2025, 15:20

    In many countries, perturbative methods are increasingly used as a privacy protection method for official statistics. The U.S. Census Bureau has applied the mechanism of differential privacy, specifically Zero-Concentrated Differential Privacy (zCDP) during the creation of statistical tables created based on data from the 2020 Census as well as Privacy-Protected Microdata Files (PPMFs) as a...

    Go to contribution page
  13. Simon Kolb (Destatis)
    15/10/2025, 16:05

    The Cell Key Method (CKM) is commonly used by statistics agencies to release tabular data. This paper compares the utility of a new open-source synthetic data tool, SynDiffix, with CKM for very fine-grained geographic data. SynDiffix is designed to have strong anonymity even when used by non-experts, and aims for high accuracy while maintaining strong anonymity. We compare the utility of...

    Go to contribution page
  14. Mr Tomasz Klimanek (Poznań University of Economics and Business)
    15/10/2025, 16:20

    The presentation outlines the practical application and evaluation of the targeted record swapping (TRS) method in the context of the 2021 Polish Census (NSP2021), specifically for population data dissemination within a 1 km² grid framework. The method, recommended by Eurostat, was employed to address statistical disclosure control (SDC) requirements while preserving data utility. The talk...

    Go to contribution page
  15. Dr Violeta Calian (Statistics Iceland)
    15/10/2025, 16:30

    Limitation of statistical disclosure, while preserving utility and accuracy, by using standard methods is the pragmatic goal for official statistics while differential privacy is regarded as a practical goal for technological institutions which use or even stream data and involve open-source tools and libraries.
    In this paper we explore the potential of using Bayesian methods to both estimate...

    Go to contribution page
  16. Aleksandra Bujnowska (Eurostat)
    16/10/2025, 09:05

    This article presents recommendations on data sharing written in the framework of the third phase of the G20 Data Gaps Initiative (DGI-3) .
    The recommendations were written by the task team led by Eurostat and ECB and bringing together representatives of the 21 countries.
    The recommendations comprise amongst others: definitions of terms, general principles of data sharing, modalities of...

    Go to contribution page
  17. Vidar Klungre (Statistics Norway)
    16/10/2025, 09:15

    The traditional approach to accessing register-based microdata requires researchers to apply for data on a project-by-project basis, a time-consuming process as each application must be manually reviewed and approved before the relevant data can be extracted and handed out.
    A more flexible approach is to grant a broader range of researchers from authorized institutions quick access to a...

    Go to contribution page
  18. Prof. Christine Choirat (Federal Statistical Offfice)
    16/10/2025, 09:30

    National Statistical Offices collect massive volumes of data to fulfill their missions. These data fuel the generation of regional, national, and international statistics across various sectors. However, their immense potential remains largely untapped due to strict and legitimate privacy regulations. In this context, Lomas is a novel open-source platform designed to realize the full potential...

    Go to contribution page
  19. Gyula István Bálint
    16/10/2025, 09:55

    In this paper, we are trying to address the current issues and challenges of Research Data Centres, using the Hungarian Central Statistical Office’s model, while proposing possible improvements to further enhance efficiency. In order to do so, we are planning to take a three-way standpoint. By that, we mean to take into consideration the viewpoint of SDC Experts, Researchers and the User...

    Go to contribution page
  20. Ana Esteban (Banco de España)
    16/10/2025, 10:10

    The International Network for Exchanging Experience on Statistical Handling of Granular Data (INEXDA) is a collaborative project involving central banks, the ECB, Eurostat, and other international organizations and national statistical institutes, with strong support from the BIS. The primary goal of INEXDA is to facilitate the exchange of experiences related to the statistical handling of...

    Go to contribution page
  21. Daniel Boller (World Bank Group)
    16/10/2025, 10:55

    Establishing standardized Statistical Disclosure Control (SDC) processes is vital as data-sharing demands increase, anonymization techniques advance, and principles for privacy preservation continue to develop. In response, we present a SDC Architecture to systematically plan, implement, and document SDC of microdata with the objective to improve consistency, transparency, and adaptability in...

    Go to contribution page
  22. Dimitrios Avouris Kalamas (Ministry of Foreign Affairs of the Hellenic Republic)
    16/10/2025, 11:10

    Ensuring secure, efficient, and transparent data-sharing mechanisms is a key challenge for modern public administration, particularly when handling sensitive microdata for policy development and evaluation. The Ministry of Foreign Affairs of the Hellenic Republic (MFA) has implemented an innovative Strategic and Operational Planning (SOP) function, incorporating digital tools and Business...

    Go to contribution page
  23. Cédric Hansen (CASD)
    16/10/2025, 12:05

    The Secure Data Access Center (CASD) platform provides controlled access to sensitive administrative data for research purposes, ensuring strict adherence to confidentiality regulations. Operating in an offline environment, CASD enhances data security by minimizing external vulnerabilities while allowing researchers to access and analyze this data in their secure and totally isolated...

    Go to contribution page
  24. Jim Smith (University of the West of England Bristol)
    16/10/2025, 12:20

    The practice of Output Statistical Disclosure Control has developed largely by consensus, a situation which is being challenged by a number of factors. First of these is the almost Cambrian Explosion in the number and scope of Trusted Research Environments as many domains move away from the ‘download’ model of enabling research. A second challenge is the accompanying proliferation of...

    Go to contribution page
  25. Ms Madelon Hulsebos (Centrum Wiskunde & Informatica (CWI))
    16/10/2025, 12:35

    The [Humanitarian Data Exchange][1] (HDX) is an open platform managed by the Centre for Humanitarian Data designed to facilitate the sharing of humanitarian data among organizations and improve decision-making and response efforts in humanitarian crises. As part of its role in managing HDX, the Centre recognizes the various types of sensitive data being collected and used by partners to...

    Go to contribution page
  26. iain dove (Office for National Statistics)
    16/10/2025, 14:35

    Trusted research environments have historically used rounding and thresholding as the recommended disclosure control method for exports of population data. However, within ONS Trusted Research Environments, for some datasets, perturbation is allowed in combination with thresholding. Code has been made available so researchers can create perturbed outputs using a specific level of noise.
    This...

    Go to contribution page
  27. Titouan Rigaud (CASD Secure Data Hub)
    16/10/2025, 14:50

    In 2023, CASD introduced a system to detect exports that do not comply with statistical secrecy [1]. This approach, based on feature generation from groups of exported files and the training of a boosting model, showed promise but precision could improve. The system relied on historical data from past Statistical Disclosure Control (SDC) expert reviews, where decisions (Accepted/Refused)...

    Go to contribution page
  28. Krish Muralidhar (University of Oklahoma)
    16/10/2025, 15:40

    ε-Differential privacy (DP) is a popular privacy model that has been promoted as the de facto standard in most data intensive areas. However, the selection of the privacy parameter ε (also called budget) in applications of DP remains an open challenge. Even though the meaning and implications of the value of ε are not fully understood, it is clear that large budget values are less...

    Go to contribution page
  29. Jonathan Latner (Institute for Employment Research (IAB))
    16/10/2025, 15:55

    This paper evaluates disclosure risk measures for synthetic data generated by CART-based models, using both a controlled simulated dataset and publicly available data. We find that common disclosure risk measures may fail to detect disclosure risks and, in some cases, misrepresent actual disclosure risks. Additionally, CART-based models, while maintaining high statistical utility, may...

    Go to contribution page
  30. Marieke de Vries (Netherlands (Kingdom of the))
    16/10/2025, 16:40

    The rise in access to public data on the internet, and specifically online social networks (OSNs), is causing new pressures on the statistical disclosure control of microdata. Currently at Statistics Netherlands, a criterium is applied that looks at three properties of variables: rarity, visibility and searchability. Underlying this criterium are, similar to other methods used to assess the...

    Go to contribution page
  31. Dr Sonakshi Garg, Vicenc Torra (Umea University)
    16/10/2025, 16:55

    Government statistical agencies increasingly rely on sensitive tabular data to guide evidence-based policymaking, yet restrictions on data access hinder research and transparency. Synthetic data generated with Generative Adversarial Networks (GANs) offers a promising solution, but conventional GANs often produce unrealistic tables or fail to preserve the statistical relationships that matter...

    Go to contribution page
  32. Gillian Raab
    16/10/2025, 17:20

    Recent years have seen an increased pressure to allow information derived from administrative data to be used to inform policy; see for example the Sturrock Report, 2024. Several organisations have been set up in the UK to develop policies to facilitate this. When data access is given to researchers, who are not part of the organisation that owns the data, there is a concern that there may...

    Go to contribution page
  33. Prof. Mark Elliot (University of Manchester)
    16/10/2025, 17:35

    Introduction: The de-identification of unstructured free-text data is important for sharing large amounts of healthcare information generated by electronic health records, publications and clinical trials. To automate this process, information extraction (IE) and natural language processing (NLP) are essential tools. However, evaluating NLP performance in de-identification requires...

    Go to contribution page
  34. Hege Marie Bøvelstad (Norway)
    17/10/2025, 09:05

    At Statistics Norway, the methodology department is responsible for the internal education of staff. Traditionally, SDC training has been offered on demand, with courses held at most once a year. For a statistician who is newly employed or interested in applying a new SDC method, waiting up to a year for training is both impractical and unproductive. In addition, Statistics Norway has...

    Go to contribution page
  35. Jesús González (Mexico)
    17/10/2025, 09:20

    National Statistical Offices (NSOs) are pivotal in producing data essential for analyzing sociodemographic and economic trends. Most national statistical laws enshrine the confidentiality of collected data to safeguard citizens’ privacy. Yet, these laws frequently lack specific, actionable measures to enforce this principle. Consequently, supplementary normative frameworks are critical to...

    Go to contribution page
  36. Dr Thijs Benschop (World Bank Group)
    17/10/2025, 09:35

    To reflect key advances in statistical disclosure control (SDC), we present a revised and unified version of the World Bank’s microdata anonymization guides. The World Bank previously published three separate guides: one on SDC theory and two practice guides for implementing microdata anonymization using the R package sdcMicro, both via command line and the sdcApp GUI. These guides have been...

    Go to contribution page
  37. Owen Daniel (Office for National Statistics)
    17/10/2025, 10:20

    Synthetic data is often hailed as the future of safe data access – but in practice, it is insufficient for a method to be mathematically private or analytically useful: if legal and privacy teams do not understand the guarantees, they cannot confidently allow its use. This creates a critical but underexplored tension between cutting-edge privacy techniques and real-world operational...

    Go to contribution page
  38. Weiqi Wong
    17/10/2025, 10:35

    With the need for deeper analysis and more granular data, statistical offices must place a greater focus on measures to mitigate the risks of statistical data disclosure. There can, however, be tensions between users who require granular data and the need for statistical offices not to disclose information of the data subjects.

    However, this tension may not be most obvious to some user...

    Go to contribution page