Speaker
Description
Disseminating synthetic data enables easy access to data that retains statistical similarities to the original data if access to sensitive data is restricted. However, the model employed when generating the synthetic data may influence the structure of the data, potentially affecting subsequent predictive analysis. This paper empirically investigates whether the choice of synthesis model impacts the performance of predictive models trained on synthetic data.
Each synthesis model is used to generate synthetic data, which is subsequently analyzed using predictive models of the same type. We empirically evaluate, whether the choice of the synthesis model influences the performance of the predictive models. For example, CART prediction models might perform systematically better on synthetic data generated using CART models than they perform on the original data. We evaluate this hypothesis based on extensive simulations and real data applications.