15–17 Oct 2025
Poblenou Campus Auditorium
Europe/Zurich timezone

In defence of scientific use files

15 Oct 2025, 10:40
14m
In-Person
Poblenou Campus Auditorium, Barcelona, Spain

Poblenou Campus Auditorium

Roc Boronat, 138 08018 Barcelona

Speaker

Felix Ritchie (University of West of England)

Description

The Trusted Research Environment (TRE, or Research Data Environment, RDC) has been the great success story of data access this century. By providing highly secure yet flexible access, the TRE has enabled research use of the most sensitive data. In its turn the development of the TRE has led to significant developments in research data governance, particularly output disclosure control. The TRE is the sexy new future of data access.

And yet it is the Scientific Use File (SUF) that remains the workhorse of academic research. SUFs are files made available under licence to authorised users, to hold and analyse on their own machines. These are valuable assets to researchers who do not need the detail in TRE data; moreover, they are accessible to users who would not normally be granted access to TREs, such as undergraduate students or non-academic researchers. For the UK Data Archive, which provides both SUFs for download and more detailed files in a TRE, the volume of use of SUFs far exceeds that of the TRE.

However, there have been concerns about the future of the SUF. A methodological review of confidentiality protection suggested that new technologies and methods (such as large-scale computing and AI, combined with social media availability) can reverse engineer any de-identification techniques. An international conference on microdata access suggested that SUFs have a limited future; the head of a major data archive has made similar comments. It has been suggested that, if de-identification techniques are no longer robust, then all personal research data should only be available through TREs.

This would post significant challenges to the research and data services communities. First, there is a significant increase in administrative burden for all parties. Second, there is a resource cost to data services of supporting TRE analysis. Third, TRE access is not available to all those who cannot manage the increased administrative burden. Finally, the logic of this argument is not just that SUFs are untenable, but that all ‘anonymised’ data is ultimately either identifiable or of such low specificity as to lose any research value.

This position holds only as long as the argument is limited to the risks to personal privacy. Less often considered, but equally important, is the offsetting benefit to the public of making such data available. Without considering both sides of the equation, arguments about confidentiality risk have little validity.

This paper re-examines the balance between risk and utility in the light of recent developments in theory (on solidarity in public life) and empirical evidence (on the public acceptability of small levels of risk for research benefit) using the 5 Safes framework to guide us. This strengthens the need to take a balanced view, but we argue that there is a significant gap in our understanding of how confidentiality risk is viewed by the public.

Author

Felix Ritchie (University of West of England)

Co-authors

Aida Sanchez (University College London) Cristina Magder (University of Essex) Elizabeth Green (University of West of England) Franceso Tava (University of West of England) Richard Welpton (United Kingdom of Great Britain and Northern Ireland)

Presentation materials