15–17 Oct 2025
Poblenou Campus Auditorium
Europe/Zurich timezone

Producing synthetic teaching datasets using evolutionary algorithms

15 Oct 2025, 11:40
10m
In-Person
Poblenou Campus Auditorium, Barcelona, Spain

Poblenou Campus Auditorium

Roc Boronat, 138 08018 Barcelona

Speaker

Mark Eliot

Description

Teaching versions of datasets are an important component of the data discovery pipeline. These datasets often serve as an introduction to the data for potential users, allowing them to explore the data and assess the relevance of a dataset to their needs. However, in cases where source data is only available in restricted settings, such as trusted research environments (TREs), then capacity to produce such datasets is limited.
Responding to this challenge, this paper reports on a project which has developed software that produces synthetic datasets tailored for specific teaching purposes by utilising already cleared (and published) analytical outputs as the basis for synthesis without access to the original data. Unlike generic synthetic datasets, the datasets created are designed to solely reproduce the specific outputs. The software, which is available as an open access app, is described and three case study examples are presented. Issues arising such as marginal disclosure risk are then discussed as are other possible use cases of the software.

Author

Co-authors

Dr Claire Little (University of Manchester) Prof. Richard Allmendinger (University of Manchester)

Presentation materials