26–28 May 2026
Europe/Zurich timezone

Semantic Croissant, CDIF and AI-Driven Data Annotations

Speaker

Slava Tykhonov (CODATA)

Description

This talk will introduce the Semantic Croissant ecosystem created around Croissant for Machine Learning standard, with a focus on ontology alignment with ML and the linkage of metadata to external controlled vocabularies through the Cross-Domain Interoperability Framework (CDIF). It will also highlight how these components support semantic consistency and interoperability across research domains.

Participants will also be introduced to Nectar Publisher, a human-in-the-loop platform integrated into the Dataverse data repository. The platform enables researchers to combine Large Language Models (LLM) with knowledge graphs to support AI-assisted data annotation. It facilitates Generative AI driven detailed variable extraction and description, including units of measurement, classes, attributes, and properties, while maintaining human oversight and control.

The presentation is relevant for stakeholders interested in AI-enabled data management, semantic integration, and the development of interoperable, next-generation research infrastructures.

Presentation materials

There are no materials yet.