Speakers
Description
Government statistical agencies increasingly rely on sensitive tabular data to guide evidence-based policymaking, yet restrictions on data access hinder research and transparency. Synthetic data generated with Generative Adversarial Networks (GANs) offers a promising solution, but conventional GANs often produce unrealistic tables or fail to preserve the statistical relationships that matter most to policymakers. In this paper, we introduce a knowledge-informed GAN framework designed specifically for the needs of statistical agencies. Our approach embeds domain knowledge into the generative process by enforcing statistical constraints, preserving cross-attribute correlations, and leveraging structured priors from probabilistic graphical models. Unlike standard methods, this informed integration ensures that generated data not only mimics distributions but also respects the underlying dependencies inherent in census-like datasets. To safeguard confidentiality, we further extend the model with differential privacy mechanisms. Empirical evaluations on benchmark tabular datasets demonstrate that our method achieves superior trade-offs between utility and privacy. These results suggest that knowledge-informed GANs can provide a practical pathway for agencies to share high-quality synthetic microdata while upholding strong privacy protections.