Speaker
Description
When evaluating the scientific worth of microdata, formally anonymized data provides maximum research potential. But this data can only be accessed onsite via Remote Execution or Safe Centers which offers little convenience for data users. In contrast, factual anonymized data can be accessed from the institutional workspace (offsite access, e. g. Scientific Use Files; SUFs) but offer less analytic potential due to the stricter anonymization methods, such as coarsening, removal of information, etc. In this conflict between analytic potential and the need for anonymization, Remote Desktop represents a bridge between onsite and offsite access for scientists. The Research Data Centers of the German Statistical Offices of the Federation and the Federal States recently introduced Remote SUFs as a step towards the junction of user satisfaction and data confidentiality: Remote Desktop allows comfortable use of microdata with more analytic potential than offsite SUFs from an approved institutional workspace (restricted by technical and organizational data protection measures). At the same time, anonymization methods are applied more strictly compared to onsite data but less strictly than in offsite SUFs. However, there is still a need for further research on finding the “sweet spot” concerning the extent of anonymization applied.
Established methods, such as variable removal or coarsening variables, often fail to attain factual anonymity when spatial or sensitive microdata is to be provided and analyzed. As traditional methods appear exhausted, data synthesis may represent a promising alternative/supplementary anonymization method. CART-based partial data synthesis seems to be a promising method to reach factual anonymity and to enable researchers to analyze Remote SUFs enriched with these spatial or sensitive variables. The present research aims to use CART-based partial data synthesis as an additional anonymization method to generate an enhanced Remote SUF for the German microcensus 2019. Subsequently, the potential increase in utility and loss of privacy are investigated compared to the already published Remote SUF for the microcensus 2019, for which traditional methods of data anonymization were used. In addition to already established metrics for evaluating utility and risk, newly proposed metrics for risk assessment are additionally applied (Raab et al., 2024).