5th Summer Datathon on Linguistic Linked Open Data

Sponsored and organised by

The SD-LLOD datathon has the main goal of providing practical knowledge to people from industry and academia in the application of Linked Open Data technology to Linguistics and Language Technology. The ultimate goal is to enable participants to migrate their own (or other’s) linguistic data and publish them as Linked Data on the Web and/or develop applications on top of Linguistic Linked Data (LLD). One of the main focus points this year will be the use of deep learning and neural approaches to/from LLD.

This datathon series is unique in its topic worldwide and continues from the success of the previous editions in 2015 and 2017 in Cercedilla (Spain), 2019 in Dagstuhl (Germany), and 2022 in Cercedilla again. This edition is supported by COST (European Cooperation in Science and Technology) through NexusLinguarum, the “European network for Web-centred linguistic data science” COST Action CA18209.   


During the datathon, participants will be able to:

  • Generate their own Linguistic Linked Data from existing data sources, using visual tools like VocBench and community standards like OntoLex lemon
  • Apply semantic technologies (linked data, knowledge graphs, RDF, SPARQL) to the field of language resources and learn about their benefits and applications for specific use cases, particularly those involving multilingual and/or multimodal aspects.
  • Explore the potential use of embeddings, machine learning, and deep learning techniques in combination with Linguistic Linked Data.


During the datathon, sessions will be organised to cover topics such as:

  • Ontologies and Linked Data
  • The Lexicon Model for Ontologies (Ontolex-Lemon)
  • Integrating documents, annotations and NLP tools with Linked Data and RDF using Web Annotation and NIF
  • Knowledge Graph embeddings and language resources
  • Neural approaches for linguistic data

The programme of the summer datathon will contain three types of sessions:

  1. Seminars to explain theoretical aspects and discuss selected topics.
  2. Hands-on sessions to introduce the basic foundations of each topic, method, and technique, which participants will apply directly through different practical assignments.
  3. Datathon sessions, where participants will work, in groups of 3-5, on miniprojects and where they will apply what they have learned, involving the generation and/or use of Linguistic Linked Data.

Participants are encouraged to propose a “miniproject” related to the topics of the datathon, which might include some datasets for their conversion into linked data. In this edition, we particularly encourage miniprojects that involve interaction with machine learning, deep learning, or embeddings techniques. A selection of proposals will form the basis for the miniprojects which the participants will work on during the datathon sessions. Participants who do not propose a miniproject, or whose miniproject is not selected, will be able to join another miniproject. There will be an award for the best miniproject. 


Jorge Gracia
University of Zaragoza, Spain
Christian Chiarcos
University of Augsburg,
Dagmar Gromann
University of Vienna,
Thierry Declerck
Milan Dojchinovski
CTU, Prague / DBpedia, Germany

Local organisers

Ana Ostroški Anić Institute of Croatian Language and Linguistics,
Kristina Despot
Institute of Croatian
Language and


The datathon is a sponsored event, and it has no registration fee, but participants are expected to cover the cost of their meals and accommodation at the castle residence. As part of the registration process, applicants are invited to submit a short abstract of their ideas for the datathon (a miniproject proposal, e.g., a description of possible resources to be converted, linked or reused during the datathon, ideas for use cases, etc.).

The general price of accommodation, including all meals, will be around 550,00 EUR, to be payed onsite or via bank transfer. The stay is from Sunday evening to Friday afternoon, but possible extensions can be requested individually. The final price to be confirmed.

Register here.

Registration will close on 01/05/2023. More than twenty traveling grants for attendees will be provided by NexusLinguarum (covering accommodation, meals and travel expenses). More details following soon.

Important dates (tentative)

Registration opens: 13/02/2023

Registration closes:  01/05/2023

Notification: 4/05/2023

Datathon: 11/06/2023 to 16/06/2023

COVID-19 statement

The datathon is planned as a physical event. The local organisation is committed to guaranteeing a safe event. Note that there might be some COVID rules to comply with at the time of celebration of the event. These will be announced in due course. 


Mehwish Alam Institut Polytechnique de Paris, France
Michael Cochez
Vrije Universiteit
the Netherlands
Manuel Fiorelli
University of Rome
Tor Vergata, Italy
Katerina Gkirtzou
Athena Research
Center, Greece
Hugo Gonçalo Oliveira, University of Coimbra, Portugal
Max Ionov
University of Cologne,
Diego Moussallem
Paderborn University,
Armando Stellato
University of Rome Tor Vergata,
Andon Tchechmedjiev
IMT École des Mines d’Alès,


Marco Passarotti
Università Cattolica
del Sacro Cuore


The datathon will be held physically at Castle Lužnica near Zaprešić, Croatia.

About LLOD and the SD-LLOD datathon series

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data cloud was conceived and is maintained by the Open Linguistics Working Group (OWLG) of Open Knowledge International, and has been a point of focal activity for several W3C community groups, research projects and infrastructure efforts since then.

To a large extent, LLOD development has been driven forward by international workshops and accompanying hackathons, as organized, for example, in the context of workshops on Multilingual Linked Open Data for Enterprises in 2012 and 2014 in Leipzig, Germany. Since 2015, these are organized in the form of bi-annual summer schools: The first Summer Datathon on Linguistic Linked Open Data (SD-LLOD’15) was held in June 2015 in Cercedilla, Madrid, Spain, as was the second Summer Datathon on Linguistic Linked Open Data (SD-LLOD’17) in July 2017. The 2019 edition is organized in conjunction with and held before the 2nd International Conference on Language, Data and Knowledge (LDK-2019, May 20th-22th, Leipzig, Germany).

Notable outcomes of earlier datathon editions include the first installment of the LLOD cloud and the LLOD cloud diagram (as a result of MLODE-2012), a large number of converted resources, and numerous scientific publications, and thesis projects that build on successful mini-projects, experiments or case studies conducted at or initiated during the previous SD-LLOD datathon.


SD-LLOD-23 Slides


SD-LLOD-23 Materials