Sponsored and organised by
The SD-LLOD datathon has the main goal of providing practical knowledge to people from industry and academia in the application of Linked Open Data technology to Linguistics and Language Technology. The ultimate goal is to enable participants to migrate their own (or other’s) linguistic data and publish them as Linked Data on the Web and/or develop applications on top of Linguistic Linked Data (LLD). One of the main focus points this year will be the use of deep learning and neural approaches to/from LLD.
This datathon series is unique in its topic worldwide and continues from the success of the previous editions in 2015 and 2017 in Cercedilla (Spain), 2019 in Dagstuhl (Germany), and 2022 in Cercedilla again. This edition is supported by COST (European Cooperation in Science and Technology) through NexusLinguarum, the “European network for Web-centred linguistic data science” COST Action CA18209.
During the datathon, participants will be able to:
- Generate their own Linguistic Linked Data from existing data sources, using visual tools like VocBench and community standards like OntoLex lemon
- Apply semantic technologies (linked data, knowledge graphs, RDF, SPARQL) to the field of language resources and learn about their benefits and applications for specific use cases, particularly those involving multilingual and/or multimodal aspects.
- Explore the potential use of embeddings, machine learning, and deep learning techniques in combination with Linguistic Linked Data.
During the datathon, sessions will be organised to cover topics such as:
- Ontologies and Linked Data
- The Lexicon Model for Ontologies (Ontolex-Lemon)
- Integrating documents, annotations and NLP tools with Linked Data and RDF using Web Annotation and NIF
- Knowledge Graph embeddings and language resources
- Neural approaches for linguistic data
The programme of the summer datathon will contain three types of sessions:
- Seminars to explain theoretical aspects and discuss selected topics.
- Hands-on sessions to introduce the basic foundations of each topic, method, and technique, which participants will apply directly through different practical assignments.
- Datathon sessions, where participants will work, in groups of 3-5, on miniprojects and where they will apply what they have learned, involving the generation and/or use of Linguistic Linked Data.
Participants are encouraged to propose a “miniproject” related to the topics of the datathon, which might include some datasets for their conversion into linked data. In this edition, we particularly encourage miniprojects that involve interaction with machine learning, deep learning, or embeddings techniques. A selection of proposals will form the basis for the miniprojects which the participants will work on during the datathon sessions. Participants who do not propose a miniproject, or whose miniproject is not selected, will be able to join another miniproject. There will be an award for the best miniproject.
The datathon is a sponsored event, and it has no registration fee, but participants are expected to cover the cost of their meals and accommodation at the castle residence. As part of the registration process, applicants are invited to submit a short abstract of their ideas for the datathon (a miniproject proposal, e.g., a description of possible resources to be converted, linked or reused during the datathon, ideas for use cases, etc.).
The general price of accommodation, including all meals, will be around 550,00 EUR, to be payed onsite or via bank transfer. The stay is from Sunday evening to Friday afternoon, but possible extensions can be requested individually. The final price to be confirmed.
Registration will close on 01/05/2023. More than twenty traveling grants for attendees will be provided by NexusLinguarum (covering accommodation, meals and travel expenses). More details following soon.
Important dates (tentative)
Registration opens: 13/02/2023
Registration closes: 01/05/2023
Datathon: 11/06/2023 to 16/06/2023
The datathon is planned as a physical event. The local organisation is committed to guaranteeing a safe event. Note that there might be some COVID rules to comply with at the time of celebration of the event. These will be announced in due course.
TUTORS AND SPEAKERS
GETTING TO THE VENUE
There is no direct public transport connection to arrive at the venue from Zagreb. The easiest way to arrive to Castle Lužnica is to take a taxi from the airport, which should cost 25-30 euro.
Arriving at Zaprešić by public transport
The closest airport is the Zagreb Franjo Tuđman Airport. Here you can find information how to arrive to Zagreb from the airport. There you can either take the train to Zaprešić from the main train station or take the trams 2 or 6 to Črnomerec, and take the bus 172 there. Castle Lužnica is only a couple of minutes away from Zaprešić bus terminal, so you can take a taxi from there.
ABOUT LLOD AND THE SD-LLOD DATATHON SERIES
In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data cloud was conceived and is maintained by the Open Linguistics Working Group (OWLG) of Open Knowledge International, and has been a point of focal activity for several W3C community groups, research projects and infrastructure efforts since then.
To a large extent, LLOD development has been driven forward by international workshops and accompanying hackathons, as organized, for example, in the context of workshops on Multilingual Linked Open Data for Enterprises in 2012 and 2014 in Leipzig, Germany. Since 2015, these are organized in the form of bi-annual summer schools: The first Summer Datathon on Linguistic Linked Open Data (SD-LLOD’15) was held in June 2015 in Cercedilla, Madrid, Spain, as was the second Summer Datathon on Linguistic Linked Open Data (SD-LLOD’17) in July 2017. The 2019 edition is organized in conjunction with and held before the 2nd International Conference on Language, Data and Knowledge (LDK-2019, May 20th-22th, Leipzig, Germany).
Notable outcomes of earlier datathon editions include the first installment of the LLOD cloud and the LLOD cloud diagram (as a result of MLODE-2012), a large number of converted resources, and numerous scientific publications, and thesis projects that build on successful mini-projects, experiments or case studies conducted at or initiated during the previous SD-LLOD datathon.