Speaker
Description
Users increasingly expect to talk to statistics in plain language, while official statistics must remain authoritative, confidentiality preserving, and fully verifiable. The Statistical Office of the Republic of Serbia (SORS) is developing an AI dissemination chatbot for its official channels under a trust first design: public and SDC protected content only, transparent referencing and provenance, and no tolerance for invented indicators, sources, or numeric values. This practical application demonstrates that the primary determinants of GenAI quality in official statistics are not model capabilities, but the maturity of standards, metadata discipline, and concept management that the system can reliably use.
This paper identifies the requirements for standardized data and metadata that make GenAI outputs accurate and auditable, and describes an enterprise architecture pattern that separates semantic retrieval from constrained data access to minimize the risk of hallucination. We focus on canonical concept identity, machine readable disaggregation through explicit dimensions and enumerations, standardized description templates as retrieval infrastructure, multilingual and multi script governance, and provenance treated as a response contract. We also discuss how AI can streamline the production and maintenance of standardized metadata by proposing description drafts, mining user vocabulary for synonym governance, detecting concept collisions, and generating regression tests within a controlled governance loop.