Speaker
Description
Disruptive change—particularly the rapid adoption of generative AI, machine learning, and multi-source statistics—demands standards and architectures that make official statistics interoperable, transparent, and repeatable. A common barrier is inconsistent use of official geographic identifiers across disseminated datasets and metadata, which increases manual reconciliation, introduces linkage errors, and limits machine-actionability for third-party AI consumption.
This contribution presents an automated, standards-oriented approach implemented at INEGI to harmonize official geospatial identifiers across statistical outputs without requiring ad-hoc transformations in each production pipeline. The approach (i) validates identifier values against the official geographic framework and territorial hierarchies, (ii) detects inconsistencies in variable naming, definitions, and geographic levels, and (iii) assigns compliance tags to datasets that meet institutional specifications, enabling seamless integration at multiple geographic granularities.
A key enabler is the systematic use of the institutional metadata catalogue: metadata descriptions and past harmonization decisions are consolidated into a structured training resource that supports automated recognition and validation of geographic identification variables and their intended levels. The resulting pattern reduces redundant integration work, strengthens multi-source governance, and provides a practical building block for AI-ready dissemination by ensuring that geographic semantics are consistent, well-described, and reliably consumable by humans and machines.