5th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-23)

Outcome

During the datathon, participants will be able to:

Generate their own Linguistic Linked Data from existing data sources, using visual tools like VocBench and community standards like OntoLex lemon
Apply semantic technologies (linked data, knowledge graphs, RDF, SPARQL) to the field of language resources and learn about their benefits and applications for specific use cases, particularly those involving multilingual and/or multimodal aspects.
Explore the potential use of embeddings, machine learning, and deep learning techniques in combination with Linguistic Linked Data.

Topics

During the datathon, sessions will be organised to cover topics such as:

Ontologies and Linked Data
The Lexicon Model for Ontologies (Ontolex-Lemon)
Integrating documents, annotations and NLP tools with Linked Data and RDF using Web Annotation and NIF
Knowledge Graph embeddings and language resources
Neural approaches for linguistic data

The programme of the summer datathon will contain three types of sessions:

Seminars to explain theoretical aspects and discuss selected topics.
Hands-on sessions to introduce the basic foundations of each topic, method, and technique, which participants will apply directly through different practical assignments.
Datathon sessions, where participants will work, in groups of 3-5, on miniprojects and where they will apply what they have learned, involving the generation and/or use of Linguistic Linked Data.

Participants are encouraged to propose a “miniproject” related to the topics of the datathon, which might include some datasets for their conversion into linked data. In this edition, we particularly encourage miniprojects that involve interaction with machine learning, deep learning, or embeddings techniques. A selection of proposals will form the basis for the miniprojects which the participants will work on during the datathon sessions. Participants who do not propose a miniproject, or whose miniproject is not selected, will be able to join another miniproject. There will be an award for the best miniproject.

Programme

Organisers

**Jorge Gracia**
University of Zaragoza, Spain

**Christian Chiarcos**
University of Augsburg,
Germany

**Dagmar Gromann**
University of Vienna,
Austria

**Milan Dojchinovski**
CTU, Prague / DBpedia, Germany

Local organisers

**Ana Ostroški Anić** Institute of Croatian Language and Linguistics,
Croatia

**Kristina Despot**
Institute of Croatian
Language and
Linguistics,
Croatia

Registration

The datathon is a sponsored event, and it has no registration fee, but participants are expected to cover the cost of their meals and accommodation at the castle residence. As part of the registration process, applicants are invited to submit a short abstract of their ideas for the datathon (a miniproject proposal, e.g., a description of possible resources to be converted, linked or reused during the datathon, ideas for use cases, etc.).

The general price of accommodation, including all meals, will be around 550,00 EUR, to be payed onsite or via bank transfer. The stay is from Sunday evening to Friday afternoon, but possible extensions can be requested individually. The final price to be confirmed.

~~Register here~~.

Registration will close on 01/05/2023. More than twenty traveling grants for attendees will be provided by NexusLinguarum (covering accommodation, meals and travel expenses). More details following soon.

Important dates (tentative)

Registration opens: 13/02/2023

Registration closes: 01/05/2023

Notification: 4/05/2023

Datathon: 11/06/2023 to 16/06/2023

COVID-19 statement

The datathon is planned as a physical event. The local organisation is committed to guaranteeing a safe event. Note that there might be some COVID rules to comply with at the time of celebration of the event. These will be announced in due course.

TUTORS AND SPEAKERS

**Mehwish Alam** Institut Polytechnique de Paris, France

**Michael Cochez**
Vrije Universiteit
Amsterdam,
the Netherlands

**Manuel Fiorelli**
University of Rome
Tor Vergata, Italy

**Katerina Gkirtzou**
Athena Research
Center, Greece

**Hugo Gonçalo Oliveira**, University of Coimbra, Portugal

**Max Ionov**
University of Cologne,
Germany

**Diego Moussallem**
Paderborn University,
Germany

**Armando Stellato**
University of Rome Tor Vergata,
Italy

**Andon Tchechmedjiev**
IMT École des Mines d’Alès,
France

INVITED SPEAKER

Marco Passarotti
Università Cattolica
del Sacro Cuore

Venue

The datathon will be held physically at Castle Lužnica near Zaprešić, Croatia.

About LLOD and the SD-LLOD datathon series

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data cloud was conceived and is maintained by the Open Linguistics Working Group (OWLG) of Open Knowledge International, and has been a point of focal activity for several W3C community groups, research projects and infrastructure efforts since then.

To a large extent, LLOD development has been driven forward by international workshops and accompanying hackathons, as organized, for example, in the context of workshops on Multilingual Linked Open Data for Enterprises in 2012 and 2014 in Leipzig, Germany. Since 2015, these are organized in the form of bi-annual summer schools: The first Summer Datathon on Linguistic Linked Open Data (SD-LLOD’15) was held in June 2015 in Cercedilla, Madrid, Spain, as was the second Summer Datathon on Linguistic Linked Open Data (SD-LLOD’17) in July 2017. The 2019 edition is organized in conjunction with and held before the 2nd International Conference on Language, Data and Knowledge (LDK-2019, May 20th-22th, Leipzig, Germany).

Notable outcomes of earlier datathon editions include the first installment of the LLOD cloud and the LLOD cloud diagram (as a result of MLODE-2012), a large number of converted resources, and numerous scientific publications, and thesis projects that build on successful mini-projects, experiments or case studies conducted at or initiated during the previous SD-LLOD datathon.