As of January 2026, Utrecht University’s ArtLab will formally become part of the Centre for Digital Humanities (CDH). ArtLab is an academic heritage laboratory that combines advanced imaging and 3D technologies with expertise in material art history.
Its integration into the CDH will support Artlab’s continued growth and enable a broadening towards new digital humanities themes and applications. Bringing together humanities researchers who work with innovative digital methods, creates opportunities for cross-fertilisation and intellectual exchange.
About the ArtLab
At ArtLab, researchers and students work on location using mobile equipment. They develop accessible research applications, provide training for professionals, and collaborate closely with national and international external partners. ArtLab aspires to be the first laboratory in the Netherlands – and beyond – where material objects and digital methodologies are brought together for the study of art and culture.
Data School zoekt voor 16 tot 28 uur een enthousiaste medewerker kennisvalorisatie/teamleider die zich bezighoudt met projectmanagement, business development en het begeleiden van collega’s.
Als tijdelijke medewerker, in verband met vervanging zwangerschapsverlof, zul je in eerste instantie deze positie vervangen en coördineer je lopende projecten van Data School. Er is ruimte voor eigen inbreng en creatieve ideeën in de vorm van business development en acquisitie.
Das NFDI-Konsortium Text+ fördert jährlich auf ein Kalenderjahr begrenzte sog. Kooperationsprojekte mit dem Ziel, die Angebote an Daten und Services von Text+ kontinuierlich zu erweitern und für die Forschungscommunitys langfristig verfügbar zu machen bzw. die in Text+ verfügbaren Daten und Services gezielt für innovative Forschungsfragen zu nutzen.
Die Kooperationsprojekte von 2025 sind nun abgeschlossen und möchten sich und ihre Projektergebnisse in einer kurzweiligen virtuellen Veranstaltung den interessierten Communitys vorstellen: am 17. Februar 2026, 10–12 Uhr.
Folgende Projekte stellen sich in der Veranstaltung vor:
Text+-Schnittstellen zu den Interview-Sammlungen in Oral-History.Digital (text+oh.d)
Aufbau einer offenen digitalen Sammlung historischer musiktheoretischer Texte aus dem deutschsprachigen Raum anhand von Beispielen aus dem 19. Jahrhundert (DigiMusTh)
LOD-Rollen-Modellierungen aus den Registern von Regestenwerken zum Mittelalter (LRM)
Glossarium Graeco-Arabicum – Open Data (GlossGA-OD)
Das Belvedere Research Center freut sich, die mittlerweile achte Ausgabe seiner Konferenzreihe zur digitalen Transformation von Kunstmuseen zu präsentieren. Eine Keynote-Lecture, vier thematische Online-Sessions, ein Workshop vor Ort sowie eine Podiumsdiskussion beleuchten aktuelle Entwicklungen, ethische Spannungsfelder und konkrete Praxisbeispiele. Im Zentrum steht die Auseinandersetzung mit der Frage, wie Museen digitale Verantwortung übernehmen und aktiv zur Stärkung einer offenen, reflektierten Informationskultur beitragen können.
Für die diesjährige Keynote Lecture konnte Oonagh Murphy, Senior Lecturer in Digital Culture and Society an der Goldsmiths, University of London, gewonnen werden, die ihren Vortrag „Responsible AI as a Cultural Imperative“ präsentieren wird. Das komplette Programm sowie alle relevanten Informationen samt der kostenlosen Anmeldung entnehmen Sie bitte unserer Webseite: https://www.belvedere.at/digitalmuseum2026
Our brand-new Spring 2026 Training Programme offers a range of exciting new workshops and lectures, including Qualtrics, Small Language Models, and AI & investigative journalism.
Whether you are taking your first steps in the digital humanities or looking to deepen your expertise, our free workshops and lectures provide fresh perspectives and hands-on learning opportunities.
Some sessions are open to all, while others are reserved for staff and students from the Faculty of Humanities and other UU faculties. We look forward to welcoming you in one – or several – of these workshops and lectures.
The sixth workshop on Resources for African Indigenous Languages (RAIL) was held on 10 November 2025 at the CSIR International Convention Centre in Pretoria, South Africa. It was co-located with the Digital Humanities Association of Southern Africa (DHASA) 2025 conference, which took place from 11 to 14 November 2025
Folktales are literary forms that reveal the soul of any society; they express its wishes, desires, hopes, and beliefs about the world. They have fictional characters and situations, mostly oral traditions, before they were written down. According to Cynthia McDaniel (1993), folktales can be used in all disciplines to convey knowledge and communicate ideas; they serve as an inherent vehicle for intergenerational communication that prepares and assigns roles and responsibilities to different generations in their communities. They are more pedagogic devices and less literary pieces. They cultivate universal values such as compassion, generosity, and honesty while disapproving of attributes such as cruelty, greed, and dishonesty. To illustrate McDaniel's claims, this paper will firstly use the ideational metafunctional framework found in Systemic Functional Linguistics, which expresses the clausal experiences and content from a grammatical perspective, coupled with syntagmatic analysis, which describes the text (folktale) in chronological order as reported by the storyteller. Secondly, the presentation will use a textual metafunctional framework that fulfills the thematic function of the clause, coupled with the paradigmatic analysis where the folkloristic text's patterns are regrouped more analytically to reveal the text's latent content, or theme. The Voyant Tool, a web-based text reading and analysis environment designed to facilitate the analysis of various text formats, was used to extract and analyze data from a Sesotho folktale to illustrate how folktales may be integrated with technology for research and educational purposes. This paper employed a descriptive research design that incorporates qualitative (content analysis) and quantitative (statistical analysis) methodologies to analyze and interpret the story. It is observed, through the Voyant tool, that the story is built out of 191 Sesotho word formations, and through the ideational analysis, that the storyteller employed more material process types than mental process types, and lastly, with the textual interpretation, indicating the value of oral literature in our daily lives as well as the significant role folktales may play in interpreting sociopolitical events in contemporary communities.
South Africa, with its twelve official languages, is an inherently multilingual country. As such, speakers of many of the languages have been in direct contact. This has led to a cross-over of words and phrases between languages. In this article, we provide a methodology to identify words that are (potentially) borrowed from another language. We test our approach by trying to identify words that moved from English into Sesotho (or potentially the other way around). To do this, we start with a bilingual Sesotho-English dictionary (Bukantswe). We then develop a lexicographic comparison method that takes a pair of lexical items (English and Sesotho) and computes a range of distance metrics. These distance metrics are applied to the raw words (i.e., comparing orthography), but using the Soundex algorithm, an approximate phonological comparison can be made as well. Unfortunately, Bukantswe does not contain complete annotation of loan words, so a quantitative evaluation is not currently possible. We provide a qualitative analysis of the results, which shows that many loan words can be found, but in some cases lexical items that have a high similarity are not loan words. We discuss different situations related to the influence of orthography, phonology, syllable structure, and morphology. The approach itself is language independent, so it can also be applied to other language pairs, e.g., Afrikaans and Sesotho, or more related languages, such as isiXhosa and isiZulu.
The critical lack of structured terminological data for South Africa’s official languages hampers progress in multilingual NLP, despite the existence of numerous government and academic terminology lists. These valuable assets remain fragmented and locked in non-machine-readable formats, rendering them unusable for computational research and development. Mafoko addresses this challenge by systematically aggregating, cleaning, and standardising these scattered resources into open, interoperable datasets. We introduce the foundational Mafoko dataset, released under the equitable, Africa-centered NOODL framework. To demonstrate its immediate utility, we integrate the terminology into a Retrieval-Augmented Generation (RAG) pipeline. Experiments show substantial improvements in the accuracy and domain-specific consistency of English-to-Tshivenda machine translation for large language models. Mafoko provides a scalable foundation for developing robust and equitable NLP technologies, ensuring South Africa’s rich linguistic diversity is represented in the digital age.
Contemporary scholarship increasingly recognises the need to document the growing corpus of African literature being produced and distributed via social media and other online platforms. In African literature and the future, Ogundipe (2015) declared that: In the search for a viable path for the future of African literature, a well-crafted vision of the future and effective strategies to engender transformation are imperative. This raises the practical application of the digital space, the internet and related innovative technology as new paradigms of knowledge to African literary engagement. But the absence of a critical standard remains a bane of this development. To address this critical imperative and further explore the prevalence of such works, I collected a dataset to find examples of literary trends and key recent examples of significant works, informed by Moretti's scholarship on distant reading. The dataset focuses on poetry written by younger South African authors from the Born Free Generation, in line with my broader research. The main purpose of this paper is to present my findings and the theoretical and methodological framework that informed them. The paper concludes by briefly proposing some possible means of expanding this research and proposing a large-scale online archival project.
In times of collective discomfort and dissatisfaction, people often find solace in shared adversity on social media platforms like X (formerly known as Twitter). These platforms offer a unique window into the public’s emotions andviewpoints concerning common challenges. I n2022, South Africa experienced an electricity crisis, during which the country was subjectedto rolling blackouts, commonly known as load-shedding, by Eskom, the country’s primary electricity provider, to prevent a national electricity grid shutdown. This study conducted adata-driven exploration of the public discourse surrounding Eskom and loadshedding on X using natural language processing and data science techniques. The dataset utilised for thisstudy comprised tweets containing keywords related to Eskom and loadshedding. The studydelved into the topics of discussion by applying topic modelling techniques to uncover latent themes within the discourse. The topics were analysed through a multifaceted lens to unpack and highlight patterns within the sentiments, emotions and biases that underpin conversations related to loadshedding and Eskom. A notable inclusion in the analysis was the incorporation of sarcasm classifications,which enhanced the interpretation of the emotion and sentiment within the topics discussed.The findings uncovered from the analysis were contrasted with loadshedding-related events in 2022 to understand the public discourse as the electricity crisis escalated. The methodologyof this study provides a framework for utilising natural language processing techniques touncover and examine the perspectives of a collective within discourse related to events of shared interest.
Do you have a question related to digital humanities? Drop by our weekly walk-in hours every Thursday from 14:00 to 15:00. All humanities staff and students are welcome, whether you are a beginner or working at an advanced level.
Sessions take place in person in room 0.32 in the University Library City Centre, with one exception: on 19 March 2026, the walk-in hour will be online only.
Data Brief. 2025 Nov 8;63:112246. doi: 10.1016/j.dib.2025.112246. eCollection 2025 Dec.
ABSTRACT
Anime is an influential medium with global popularity, combining visual aesthetics with narrative depth and offering potential applications in content analysis, style transfer, and emotion recognition within computer vision research. Despite its widespread appeal, publicly available anime character datasets remain scarce. To address this gap, we propose the Attack on Titan: Anime Image Dataset, derived from the popular series Attack on Titan, to support anime-focused computer vision research. The dataset comprises 4041 high-quality images divided into 14 classes, each representing a prominent character from the series. These images are manually collected through high-resolution screenshots, capturing a wide range of character poses, expressions, costumes, and backgrounds. The dataset is suitable for various computer vision tasks, including character recognition, emotion detection, style classification, and domain adaptation.
Standards bilden die Grundlage nachhaltiger digitaler Forschung. Sie sorgen dafür, dass Daten langfristig interpretierbar, auffindbar, archivierbar und interoperabel bleiben. Gerade in den Digital Humanities, wo komplexe sprachbasierte Ressourcen entstehen und verarbeitet werden, ist dieser Aspekt zentral: Ohne konsistente Standards drohen technische und konzeptionelle Verluste, die die Nachnutzbarkeit und wissenschaftliche Bedeutung von Daten gefährden.
Der Band adressiert eine Reihe von Bereichen, die für die Arbeit mit sprachbezogenen Daten in Forschungseinrichtungen, Infrastrukturprojekten und DH-Verbünden im deutschsprachigen Raum hochrelevant sind:
Metadaten und Annotationen Metadaten strukturieren und kontextualisieren Forschungsdaten. Die Beiträge zeigen, wie sie die Auffindbarkeit, Dokumentation und langfristige Nachnutzbarkeit sichern, insbesondere bei komplexen mehrschichtigen Annotationen.
Langzeitarchivierung Digitale Forschungsdaten benötigen nachhaltige Speicher- und Formatstrategien. Der Band diskutiert, wie standardisierte Workflows und transparente Dokumentationspraktiken Daten über viele Jahre hinweg nutzbar halten.
Audiovisuelle Ressourcen Gesprochene Sprache, audiovisuelle Materialien und multimodale Daten stellen besondere Anforderungen an Formate, Transkription und Annotation. Die Beiträge erläutern etablierte Standards und praktische Herausforderungen in diesem Bereich.
Character Encoding und Sprachvarianten Konsistente Zeichenkodierung ist eine essenzielle Voraussetzung für die Arbeit mit Textdaten. Der Band erklärt typische Encoding-Probleme und zeigt, warum standardisierte Verfahren unverzichtbar für Interoperabilität sind.
Entity Linking Die semantische Vernetzung von Datenbeständen gewinnt zunehmend an Bedeutung. Entity Linking kann heterogene Ressourcen miteinander verbinden und Recherchierbarkeit sowie Analysepotenzial erheblich verbessern.
Warum Standards heute wichtiger sind denn je
Mit dem Einsatz neuer Technologien, darunter KI-gestützte Analyseverfahren und große Sprachmodelle, entstehen neue Chancen, aber auch Herausforderungen. Die Digital Humanities folgen den Aspekten von Transparenz, Reproduzierbarkeit und Nachhaltigkeit. Standards sind hierfür essenziell. Sie ermöglichen:
The developers of the CDH Research Software Lab (RSLab) are currently working on a new upload feature for the text search and exploration Textcavator (formerly I-Analyzer). For the final development phase, they are looking for researchers who would like to test this new functionality.
Do you have a dataset you would like to use in Textcavator? Sign up for the pilot and help the RSLab further improve the tool!
What does the new upload feature do?
The upload feature allows researchers to add their own dataset directly to Textcavator. This makes the tool even more accessible and easier to use. The developers are now in the final development stage and would like to test the feature in practice with users.
Who can see my dataset?
You decide. In Textcavator, you can specify for each dataset whether it is:
publicly accessible,
available only within the university,
restricted to a specific group, or
visible only to yourself.
Who can participate in the pilot?
All researchers within the Faculty of Humanities and other faculties at Utrecht University are welcome to participate.
What kind of data can you upload?
Textcavator is designed for collections of texts. You can upload your own research data or an open access dataset you want to use in Textcavator. Both small and large datasets are welcome.
Data must be provided in a CSV or Excel file. The developers can advise you on structuring or cleaning your data if needed.
Aim of the pilot
The RSLab has been developing Textcavator since 2017 for the Faculty of Humanities at Utrecht University. The tool is designed to make text search and exploration as accessible and easy as possible. With the new upload feature, Textcavator will become even more efficient.
The pilot is intended to test this new functionality. The developers are ready to help if anything is unclear or not working properly. Your feedback will be used to further improve the upload feature.
The developers would also love to hear your ideas: which features are missing? What kind of support would be useful? What could be improved?
What’s in it for you?
You can explore your own dataset using all Textcavator’s features.
You make a contribution to a more powerful and user-friendly text search and exploration tool for all researchers at Utrecht University (with a focus on the humanities) and beyond. Unlike many other tools, Textcavator is open source and non-commercial. By joining this pilot, you contribute to an accessible, high-quality research tool developed for and with researchers. Both large and small research projects will benefit from your input.
Sign up
Register for the pilot before 15 February 2026 by emailing cdh@uu.nl. After registering, you will receive further instructions.
I-Analyzer is now called Textcavator, a name that better reflects the flagship tool of the CDH Research Software Lab (RSLab). In addition to the new name, the RSLab is introducing an upload feature for adding your own dataset, as well as several new corpora. The Centre for Digital Humanities spoke with scientific developer Luka van der Plas about the updates.
Why the name change from I-Analyzer to Textcavator?
‘Textcavator better reflects what the tool actually does than the name I-Analyzer,’ says Van der Plas. ‘It is primarily designed for exploring texts, rather than for conducting in-depth analysis. Excavator literally means a digging machine, but it is also used figuratively to mean digging into something. And that is exactly what the tool does: it retrieves information from texts.’
What can you do with Textcavator?
‘At its core, it is a search engine: you can search a dataset using keywords that are relevant to your research. It is a comprehensive tool for finding what you are looking for. That is why we refer to it as a text search and exploration tool, rather than a text-mining tool. Afterwards, you can carry out more extensive analyses yourself—qualitative or quantitative—using the search results you download from Textcavator.’
‘The analysis tools we offer within Textcavator—simple statistics and basic visualisations—are intended to help you search as effectively as possible, not to conduct your actual research. You can filter by time period or category and bookmark documents. We also provide visualisations and statistics to help refine search queries. These show, for example, how a search term is distributed across categories or time periods, or which words frequently appear in its context. Depending on the dataset, we also offer more advanced features, such as Word Embeddings and Named Entity Recognition.’
New logo Textcavator
Many text exploration tools already exist. Why did you choose to develop your own?
‘The RSLab began developing what was then called I-Analyzer in 2017. This allowed us to tell researchers: we already have a working tool. We only need to load your dataset into it and perhaps add a button or two. That way, we can support even small projects with limited funding, which I find very rewarding.’
‘Working open source is also important to us. And because we develop the tool within the university, we are not driven by profit: we are truly here for the researcher. We work closely with researchers to develop Textcavator, although external users can also use it. We wanted a tool that is not overly technical, can accommodate many different types of datasets, and is suitable for all disciplines within the humanities.’
Are all datasets in Textcavator public?
‘No. We prefer to make data public, but that is not always possible. Cultural data is often protected by copyright. That is why, when uploading, you can decide who gets access: everyone, only the university, a specific (research) group, or just yourself.’
Which new corpora have been added?
‘In collaboration with the University Library, we have added several new corpora from the publisher Gale, including nineteenth-century British and American newspapers and magazines such as Punch and Illustrated London News. These are great additions to the newspaper corpora we already offer, such as The Guardian and The Times.’
How do you decide which corpora to add?
‘Many researchers bring their own data—collected or cleaned for their research. In addition to joint acquisitions with the University Library, we also occasionally add public corpora for which there is wide demand, such as the KB newspaper corpus, DBNL, Gallica and Le Figaro.’
Which research projects have used Textcavator?
‘The largest is People & Parliament, a leading project in political history, conducted with the University of Jyväskylä (Finland). They needed an efficient tool to search a vast collection of parliamentary debates from across Europe.’
‘Another example is Traces of Sound, a much smaller project. For this, we built a proof of concept in Textcavator using a small set of sources and annotations related to references to sound. This helped the researcher in submitting a larger grant proposal.’
How accessible is Textcavator for beginners?
‘We specifically focus on researchers with little experience in text and data mining. The tool is designed to be as user-friendly as possible. For those who want more, additional features are available. But even more advanced features, such as Named Entity Recognition, can be used without extensive technical knowledge.’
You are working on an upload feature. What does it entail?
‘The new upload feature allows researchers to add their own datasets directly to Textcavator. Currently, this is always done by us, which makes researchers dependent on our available time. We are now in the final development phase and are therefore organizing a pilot to test the feature together with the research community.’
What do you hope to learn from this pilot?
‘One goal is to identify any bottlenecks. Textcavator is designed for highly diverse data, which can also complicate things. We want to ensure everything works smoothly and clearly before opening the feature to everyone. During the pilot we will receive feedback and can step in immediately if something is unclear or not yet working properly.’
‘We also think it is important that the feature truly aligns with researchers’ needs. For example: how much should be filled in automatically, and how much should users be able to configure themselves? In which file formats would they like to upload their data? Instead of speculating about this behind closed doors, we want to ask users directly.’
Who can participate in the pilot?
‘We are looking for a broad group of researchers. Anyone with data they would like to add to Textcavator can take part. This may be their own research data, but also an open access dataset. A small Excel file with a hundred documents is just as welcome as a large dataset. The only requirement is that you can clean of format the data yourself, if needed.
Which features could be added if there is demand?
‘In the short term, we aim to make the process user-friendly and accessible, focusing on small adjustments such as additional guidance and feedback. In the long term, we are considering larger expansions, such as more file formats, or even manual data entry.’
‘There are also features already in Textcavator that are not yet offered through the form, such as adding images or word embeddings. These could be valuable additions, but they also make the upload process more complex for researchers.’
What have you learned from developing a tool for so many disciplines?
‘The biggest challenge is maintaining clarity for the user. We continue to add new features, but we want to prevent the interface from becoming overwhelming. It is a constant balance between accessibility and technical possibilities.’
‘And what strikes me is how similar the needs of researchers in the humanities and social sciences actually are. You might expect them to require very different tools, but in practice that is not the case.’
The Centre for Digital Humanities (CDH) invites permanent academic staff of the Faculty of Humanities at Utrecht University to apply for the position of CDH affiliate. Affiliates act as ambassadors and liaisons within their departments, supporting the ambitions of the CDH. Currently, we have three openings in the departments History and Art History, Philosophy and Religious Studies, and Languages, Literature and Communication.
About the role
As CDH affiliate, you will help strengthen the connection between your department and digital humanities. You will do this by:
identifying training and educational needs within your department;
contributing to the development and integration of computational components in BA and MA programmes;
advising colleagues on funding applications that include a digital humanities component;
fostering collaboration between researchers and IT specialists;
contributing to community-building activities.
In addition to these core tasks, each affiliate will pursue an individual project as part of the position.
Affiliates meet periodically with the CDH programme team to strengthen collaboration and exchange.
Position details
The position is open to all permanent academic staff;
Appointment is for a maximum term of 3 years;
The workload is 0.1 fte, funded by the CDH and deducted from teaching duties;
Affiliates have access to the CDH infrastructure, a proportionate budget, and support staff for organizing activities.
New candidates will be appointed by April 2026 and will start in the 2026-2027 academic year.
Application procedure
You can self-nominate and apply directly to the CDH. You can apply for a period of 1 to 3 years, depending on the scope of your proposed project.
Your application should include:
A short CV
A maximum one-page application outlining your initial ideas for an individual project and explaining how your expertise and time will contribute to the CDH strategy and ambitions through this project.
Projects may focus on, for example, organising events or workshops, consortium building, developing relevant educational modules, large SSH infrastructure grant applications, or building networks and projects with external parties.
Problem statement (urgency/relevance and ability to tackle a current challenge);
Aims (addressing current challenges and contributing to lasting change);
Feasibility within allocated time and budget.
About the CDH
The CDH aims to empower all Faculty of Humanities staff and students by enriching their digital competencies and fostering an ethical and critical approach to digital humanities and AI. The CDH does this, among other ways, by offering a wide range of tailored courses, grants, consultancy sessions, and walk-in hours; by connecting humanities researchers and DH specialists; and by deploying an in-house team of research engineers with humanities backgrounds.
More information
Do you have questions about the position or application procedure? Please contact cdh@uu.nl.
Sci Data. 2025 Dec 1;12(1):1804. doi: 10.1038/s41597-025-06241-9.
ABSTRACT
This dataset offers a comprehensive digital catalogue of 483 archaeological settlement sites in western Anatolia dating to the Middle and Late Bronze Age (c. 2000-1200 BCE). Compiled over a decade, it brings together evidence from excavation reports, systematic surveys, historical sources, and remote sensing. Each site is georeferenced and described through a standardized set of metadata, including chronological attribution, site function, material culture, bibliographic references, and associated ancient mineral resources. The dataset is published on Zenodo as a collection of openly accessible files, structured with consistent keys that ensure integration across records. To enhance semantic interoperability, settlement entries are linked to external reference datasets such as open knowledge bases, enabling opportunities for comparative, geospatial, and interdisciplinary research spanning archaeology, digital humanities, and historical geography. By combining standardized metadata with semantic linking, the resource facilitates reuse within broader digital infrastructures. It thereby provides a transparent, openly licensed foundation for analyzing regional settlement systems and encourages more comprehensive approaches to the study of Bronze Age Anatolia.
This article explores the process of developing a digital annotation methodology that allows for the creation of contextual arguments out of entity annotations for both texts and images based on user-defined data models.
Evaluative infrastructures in Indian higher education continue to marginalise
digital scholarly work by equating academic value with print-based authorship
and closure. This proposed framework imagines something better and more
equitable.