普通视图

Received before yesterday

Textcavator renewed: new name, new upload feature, and new corpora

作者masch001
2025年12月12日 18:21

I-Analyzer is now called Textcavator, a name that better reflects the flagship tool of the CDH Research Software Lab (RSLab). In addition to the new name, the RSLab is introducing an upload feature for adding your own dataset, as well as several new corpora. The Centre for Digital Humanities spoke with scientific developer Luka van der Plas about the updates.

Why the name change from I-Analyzer to Textcavator?

Textcavator better reflects what the tool actually does than the name I-Analyzer,’ says Van der Plas. ‘It is primarily designed for exploring texts, rather than for conducting in-depth analysis. Excavator literally means a digging machine, but it is also used figuratively to mean digging into something. And that is exactly what the tool does: it retrieves information from texts.’

What can you do with Textcavator?

‘At its core, it is a search engine: you can search a dataset using keywords that are relevant to your research. It is a comprehensive tool for finding what you are looking for. That is why we refer to it as a text search and exploration tool, rather than a text-mining tool. Afterwards, you can carry out more extensive analyses yourself—qualitative or quantitative—using the search results you download from Textcavator.’

‘The analysis tools we offer within Textcavator—simple statistics and basic visualisations—are intended to help you search as effectively as possible, not to conduct your actual research. You can filter by time period or category and bookmark documents. We also provide visualisations and statistics to help refine search queries. These show, for example, how a search term is distributed across categories or time periods, or which words frequently appear in its context. Depending on the dataset, we also offer more advanced features, such as Word Embeddings and Named Entity Recognition.’

New logo Textcavator

Many text exploration tools already exist. Why did you choose to develop your own?

‘The RSLab began developing what was then called I-Analyzer in 2017. This allowed us to tell researchers: we already have a working tool. We only need to load your dataset into it and perhaps add a button or two. That way, we can support even small projects with limited funding, which I find very rewarding.’

‘Working open source is also important to us. And because we develop the tool within the university, we are not driven by profit: we are truly here for the researcher. We work closely with researchers to develop Textcavator, although external users can also use it. We wanted a tool that is not overly technical, can accommodate many different types of datasets, and is suitable for all disciplines within the humanities.’

Are all datasets in Textcavator public?

‘No. We prefer to make data public, but that is not always possible. Cultural data is often protected by copyright. That is why, when uploading, you can decide who gets access: everyone, only the university, a specific (research) group, or just yourself.’

Which new corpora have been added?

‘In collaboration with the University Library, we have added several new corpora from the publisher Gale, including nineteenth-century British and American newspapers and magazines such as Punch and Illustrated London News. These are great additions to the newspaper corpora we already offer, such as The Guardian and The Times.’

How do you decide which corpora to add?

‘Many researchers bring their own data—collected or cleaned for their research. In addition to joint acquisitions with the University Library, we also occasionally add public corpora for which there is wide demand, such as the KB newspaper corpus, DBNL, Gallica and Le Figaro.’

Which research projects have used Textcavator?

‘The largest is People & Parliament, a leading project in political history, conducted with the University of Jyväskylä (Finland). They needed an efficient tool to search a vast collection of parliamentary debates from across Europe.’

‘Another example is Traces of Sound, a much smaller project. For this, we built a proof of concept in Textcavator using a small set of sources and annotations related to references to sound. This helped the researcher in submitting a larger grant proposal.’

How accessible is Textcavator for beginners?

‘We specifically focus on researchers with little experience in text and data mining. The tool is designed to be as user-friendly as possible. For those who want more, additional features are available. But even more advanced features, such as Named Entity Recognition, can be used without extensive technical knowledge.’

You are working on an upload feature. What does it entail?

‘The new upload feature allows researchers to add their own datasets directly to Textcavator. Currently, this is always done by us, which makes researchers dependent on our available time. We are now in the final development phase and are therefore organizing a pilot to test the feature together with the research community.’

What do you hope to learn from this pilot?

‘One goal is to identify any bottlenecks. Textcavator is designed for highly diverse data, which can also complicate things. We want to ensure everything works smoothly and clearly before opening the feature to everyone. During the pilot we will receive feedback and can step in immediately if something is unclear or not yet working properly.’

‘We also think it is important that the feature truly aligns with researchers’ needs. For example: how much should be filled in automatically, and how much should users be able to configure themselves? In which file formats would they like to upload their data? Instead of speculating about this behind closed doors, we want to ask users directly.’

Who can participate in the pilot?

‘We are looking for a broad group of researchers. Anyone with data they would like to add to Textcavator can take part. This may be their own research data, but also an open access dataset. A small Excel file with a hundred documents is just as welcome as a large dataset. The only requirement is that you can clean of format the data yourself, if needed.

Which features could be added if there is demand?

‘In the short term, we aim to make the process user-friendly and accessible, focusing on small adjustments such as additional guidance and feedback. In the long term, we are considering larger expansions, such as more file formats, or even manual data entry.’

‘There are also features already in Textcavator that are not yet offered through the form, such as adding images or word embeddings. These could be valuable additions, but they also make the upload process more complex for researchers.’

What have you learned from developing a tool for so many disciplines?

‘The biggest challenge is maintaining clarity for the user. We continue to add new features, but we want to prevent the interface from becoming overwhelming. It is a constant balance between accessibility and technical possibilities.’

‘And what strikes me is how similar the needs of researchers in the humanities and social sciences actually are. You might expect them to require very different tools, but in practice that is not the case.’

Currently, scientific developers Luka van der Plas, Jelte van Boheemen, Mees van Stiphout and Ben Bonfil are working on Textcavator alongside their other projects.

Read more about Textcavator here.

Read more and sign up for the pilot here.

The post Textcavator renewed: new name, new upload feature, and new corpora appeared first on Centre for Digital Humanities.

Peering through the paint – Revealing the Lucia Master

作者masch001
2025年10月27日 23:27

Sjors Nab won the 2025 Humanities RMA Thesis Prize for his technical art-historical research on the Master of the Legend of Saint Lucy. Layer by layer, he explored the paintings of this fifteenth-century Flemish painter, whose real name remains unknown. The Centre for Digital Humanities spoke with him about his research and the techniques he used.

Photo KIK-IRPA

When Sjors Nab was staring at an infrared image of a fifteenth-century painting at three in the morning, he suddenly spotted what looked like a medieval smiley face in one of the underlayers: a simple circle with two eyes, a nose, a mouth, and two big ears. “Someone drew that 550 years ago in the spot where an angel was supposed to be,” Nab laughs. “Later, paint was applied over it. It was never meant to be seen. And suddenly, I’m looking right at it. That’s still amazing.”

A ‘smiley’ as an underdrawing for an angel – IRR (Infrared Reflectography) detail. Master of the Legend of Saint Lucy, Mary, Queen of Heaven, ca. 1485-1500. National Gallery of Art, Washington. Photo and IRR by NGA, Washington.

You call yourself a technical and digital art historian. What does that mean?

“I study paintings using techniques such as infrared and X-ray imaging.” He shows a series of images of the same painting. “Look — here the arch runs slightly higher than in the underpainting, and here the water is painted a bit lower than in the underdrawing.”

You studied the paintings of the Master of the Legend of Saint Lucy. What drew you to him?

“The Master of the Legend of Saint Lucy — not exactly a name that rolls off the tongue,” Nab laughs, “He’s named after the painting The Legend of Saint Lucy. This master was enormously successful in his day and worked for patrons across Europe. Yet in recent decades he has often been dismissed as a copyist or craftsman, not a true artist.”

Master of the Legend of Saint Lucy, The Legend of Saint Lucy, 1480. Sint-Jakobskerk, Brugge. Photo: KIK-IRPA.

“Van Eyck, Van der Weyden, Bosch, and Bruegel are the big names of fifteenth-century Flemish painting. But I wondered: are they so well-known because they were truly the best, or simply because their names survived? More than three-quarters of the paintings from that period can no longer be linked to a specific painter’s name.”

“Of the ten largest surviving Flemish panel paintings from the fifteenth century, two are by the Master of the Legend of Saint Lucy — or, in short, the Lucia Master. This painter, who hasn’t always been highly regarded in the literature, had a remarkably large production. I wanted to know: do the seventy paintings attributed to him actually belong together? What remains of that corpus if I look purely at the technical aspects of his painting style?”

How did you manage to collect all those works, including the infrared and X-ray images?

“That involved a lot of negotiating and pleading. I spent the first month doing nothing but emailing. Luckily, I had the full support of my thesis supervisor, Daantje Meuwissen, who has a lot of experience with this kind of research. My previous internship at the Royal Institute for Cultural Heritage in Brussels was a key advantage. I had contact with curators at the Prado in Madrid and Museum Boijmans Van Beuningen in Rotterdam. The Prado curator found my research interesting, which meant I could tell other museums: the Prado is already on board! Another advantage was that I was willing to publish my thesis under embargo. In the end, I was surprised by how open most institutions were to collaboration. After all, I was just a master’s student emailing museums around the world.”

What do these techniques reveal?

“An X-ray allows you to see through the entire painting, including the wooden panel support. Infrared images show the underdrawing best. I look from back to front: the drawing was made directly on the panel, then came the underpainting, followed by a first layer of paint to define light and shadow, and finally ten to twelve layers of oil paint.”

Different layers visualised by Sjors Nab, based on the St. Nicholas Altarpiece, ca. 1486–1493. Groeningemuseum, Bruges.

“To compare these layers accurately, you need the highest-quality images available. The seventy works I collected varied enormously in quality — from state-of-the-art research photos to a single black-and-white image from 1910. Sometimes you need to draw a few guiding lines. Mostly, it’s a matter of looking for a very long time, comparing all those little squiggles. It’s a kind of guided meditation, really.”

And what have you learned from looking that way?

“The changes between layers reveal how a painter worked. Dieric Bouts, for example, we jokingly call the tap-dance master: in his underdrawings you sometimes see eight hundred feet side by side. The Lucia Master, by contrast, followed a very refined process of gradual improvement — an arm becoming slightly thinner with each layer, tiny dots and strokes to indicate shadow. You see continuous small, inventive adjustments. That’s what I call a creative process, not merely copying.”

Where two little angels were eventually painted, only one angel was prepared in the underdrawing – IRR (Infrared Reflectography) and X-Ray detail. Master of the Legend of Saint Lucy, Mary, Queen of Heaven, ca. 1485-1500. National Gallery of Art, Washington. Photo, IRR, and X-Ray by NGA, Washington.

Are all those works rightly attributed to him?

“I now have a good sense of his working method and noticed a technical consistency throughout. You can even see it in his hesitations: the Lucia Master is always fiddling with the little roof above Mary and Child — the arch rises and falls again. So most attributions in art history turned out accurate. But there are two important works that, in my view, don’t fit. Their underdrawings are extensively revised with five or six stages and with endless charcoal reworking. In his other works, there are always just one or two stages. That process is completely different from his other paintings.”

The thesis prize jury wrote: ‘The significance of his study extends beyond this particular painter, because his method within technical art history can also be applied to other oeuvres.’ What does your approach add to existing methods?

“In technical art history, there’s a strong drive toward the newest technologies and highest resolutions. That yields excellent results, but such projects cost millions. I believe you can also achieve a great deal using existing materials if you systematically bring them together and study them consistently.”

What are your next plans?

“I’m going to work as lab manager at Utrecht University’s ArtLab, an innovative research and teaching centre for this kind of study. We’ll be collaborating closely with museums and heritage institutions to tackle their research questions.”

Read more about the Humanities Thesis Prize winners
Read more about the ArtLab

The post Peering through the paint – Revealing the Lucia Master appeared first on Centre for Digital Humanities.

Interview with CDH director Hugo Quené: past, present and future of the CDH

作者yara
2024年9月5日 17:25
Hugo Quené

In 2020, the Centre for Digital Humanities (CDH) at Utrecht University was officially launched. In this interview, prof. dr. Hugo Quené, Professor of Quantitative Methods of Empirical Research in the Humanities and CDH founding director, reflects on the past four years since its inception, and shares his thoughts on the future. How should digital humanities evolve in the coming years? And what is his vision for the CDH?

You have been involved in the Centre for Digital Humanities since before its foundation. What motivates you to take on the role of CDH director?

Honestly, there are many aspects of my work that I enjoy. Most importantly, it is the relevance of digital humanities and the opportunity to work alongside bright, like-minded individuals towards shared goals. It is as straightforward as this: we need computational methods and AI-driven tools to ensure the survival of the humanities. From a historical perspective, failing to embrace technological changes can lead to irrelevance. We are facing a lot of new challenges, such as the rise of generative AI, and as a society, we need to respond. At the CDH, we strive to be proactive, encouraging others within the Faculty of Humanities to recognise the value of these developments and nudge them towards change.

The CDH is continuously working to accelerate the integration of digital humanities in academic research and education. Do you also see any potential dangers in these technological developments?

In my inaugural lecture, I used the example of the tape recorder. This ground-breaking innovation revolutionised musical practices, as well as linguistic and ethnographic research. A similar transformation is occurring now within the digital humanities, particularly with AI and machine learning.

“Technological developments will inevitably impact the future of your humanities teaching and research.

We are approaching a tipping point where digital methods will be a standard approach in humanities. There is no doubt that we will lose some traditional research methods in the humanities during this transition, e.g. phonetic transcription by ear and hand. For me, the only rational response is to embrace these changes and learn about them, rather than fearing them. Being overly optimistic is not helpful either. So, I advocate for a balanced approach: remaining cautious about the technology itself, critically reflecting on the accompanying changes, while also embracing the new and innovative opportunities they offer us as humanities scholars and educators.

What inspired you and your team to establish the Centre for Digital Humanities at Utrecht University?

In 2019, we began the process of establishing the CDH. As Utrecht University has a very large Faculty of Humanities, many individuals were already involved in various digital and computational projects at the time. However, these efforts were spread out, quite diverse, and largely disconnected. Our goal was to bring all these people together to foster closer collaboration and stronger synergy in their work, with a shared purpose across different branches of the humanities.

Centre for Digital Humanities

The Centre for Digital Humanities consists of the Research Software Lab, Data School, Institute for Language Sciences Labs and Humanities IT. The CDH also has affiliated members and an advisory board, and works in close collaboration with the Digital Humanities Team at the University Library, where the Digital Humanities Workspace is also hosted.

Looking back over the past four years, what successes have you achieved since the launch of the CDH in 2020?

Firstly, our Research Software Lab has grown significantly since the inception of the CDH and remains a rare example in its concept. Unlike the typical setup where software developers are placed within IT departments or at the central university level, our developers work directly alongside humanities researchers. This approach has attracted considerable interest from other faculties and universities.

“Our Research Software Lab is uniquely specialised, as all our scientific developers bring a strong humanities background to their work.”

Like other developers, they enjoy creating software. However, our developers also hold degrees in the humanities and have conducted their own digital research in this field. This allows them to collaborate effectively with humanities researchers, diving deeper into the theoretical background and research design. As a result, their work is often a co-production of knowledge and software.

Another thing I am really proud of, is the realisation of the Digital Humanities Workspace at the Utrecht University Library, and the weekly walk-in hours we host there. This initiative would not have been possible without the close collaboration with the University Library. Together with their DH Team, we form a group of digital humanities experts capable of tackling complex issues from multiple perspectives. The weekly walk-in hours are open to all UU humanities staff and students, providing easy and direct access. Consultations are also available outside the DH walk-in hours on a case-by-case basis. You can come to us with any questions, whether you are a beginner or an expert. For example, we often provide personalised consultancy where the CDH advises on the computational and quantitative aspects of a funding proposal, aiming to increase your chances of success.

The Digital Humanities Workspace

Our most recent success is something I have been passionate about since getting involved in digital humanities. As a teaching professor and a former director of education, I realised the need to integrate digital humanities into our curriculum. My idea was that if we train students in digital methods, they will naturally gravitate towards these topics, thus encouraging teachers to incorporate digital humanities into their research and teaching. So we set out to change the intended learning outcomes (eindtermen) and learning trajectories (leerlijnen) to include ‘digital literacy’ in the form of digital methods and skills.

Starting this academic year 2024/2025, ‘digital literacy’ will become an intended learning outcome in all bachelor programs within the UU Faculty of Humanities, beginning with first-year courses. This change also requires teachers to familiarise themselves with digital humanities, if they have not already done so. They need to prepare students for today’s world, where digital corpora, audio-visual material, and tools such as ChatGPT are already a reality.

Hugo Quené will speak at the CDH seminar for teachers on 16 September, 2024, about implementing ‘digital literacy’ in humanities education.

“Students will enter a world where digital tools and methods are everywhere.”

The CDH played a crucial role in the process of getting ‘digital literacy’ implemented in all humanities programs: we created a clear roadmap and provided detailed documentation on how to implement the changes. Let me stress that this success absolutely was not mine alone—it was a team effort at the CDH and we owe a great deal to prof. dr. Iris van der Tuin and the CDH affiliated members, who meticulously tailored ‘digital literacy’ to fit each specific discipline within the humanities. Other universities and organisations admire our success in adapting the overall intended learning outcomes and trajectories—it is something no one else has achieved yet. I have been invited by various institutions to share our approach.

Does the CDH team also provide direct instruction to humanities staff and students?

Yes, we do. From the very beginning, the CDH has offered a staff education program twice per academic year, in close collaboration with the DH Team at the Utrecht University Library. While the program was initially focused on supporting humanities staff, it has since expanded to include students, particularly with the introduction of ‘digital literacy’ as an important component in their studies. Our education program is now open and free for all participants from the Faculty of Humanities (UU).

The CDH education program is unique in its size, scope, and focus. We offer diverse courses tailored to the specific digital methods relevant to each discipline, recognising that participants do not need to know everything about ‘digital humanities’—just what is pertinent to their field. If our standard offerings do not meet a particular need, we can provide custom workshops or assist with funding for further external training. We can also provide instructors to teach in humanities courses.

Most of our courses and workshops are designed for beginners in digital humanities, though they often attract participants at all levels, including professors. Unlike some other centres that focus on specialised or niche courses, we deliberately aim to broaden the reach of digital humanities and involve as many people as possible.

Digital humanities is not just for specialists; it is essential for everyone in today’s digital world.

It seems like the CDH is making great progress. Do you also foresee obstacles for the CDH in the future?

One of the challenges is that ‘the digital’ does not come naturally to many who choose humanities as their field of study. Some students enter the Faculty of Humanities specifically to avoid working with computers, which can lead to resistance among both students and staff. They may feel that digital technology is not relevant to their projects.

As digital humanities experts at the CDH, our role is to demonstrate that digital tools are indeed highly relevant to the entire humanities discipline. Many of the principles behind digital methods, such as formalisation through coding schemes, are already familiar concepts in the humanities. For instance, using multiple-coloured markers to highlight a text is similar to basic digital analysis, where each colour represents something different. Today, tasks like these can be easily done on a computer or with AI, making digital technology less foreign to the humanities than it might seem. The CDH tries to address this epistemological distance through leading by example: if we can learn these tools and methods, then so can you.

“We demonstrate to humanities staff and students that computational methods and quantitative analyses are far more accessible than they might imagine.”

A significant challenge facing both the CDH and the entire Faculty of Humanities is the current financial uncertainty. The future of academic funding in the Netherlands, particularly in the humanities and social sciences, is unclear, making it difficult for us to prepare for the coming years. We had outlined a detailed strategy for the CDH from 2024 to 2028, but potential budget cuts may require adjustments. Regardless of financial difficulties, I expect that computational and quantitative methods will become increasingly important at our faculty and in society.

What is your vision for the future of digital humanities and the role of the CDH over the next four years?

Our vision for the future of the CDH is to further enhance interdisciplinary research and teaching. We aim to increase collaboration within our faculty, with other faculties, and across universities. As we move towards a greater reliance on computational methods, it is crucial to emphasise the importance of Open Science. Over the next four years, I also want to focus more on societal relevance and impact. Data School is a great example of how digital humanities can be leveraged for public engagement.

Example of Data School’s work, more information on the Data School website

Moving forward, we will continue integrating digital literacy as an intended learning outcome in all humanities education programs. Our role at the CDH is to equip bachelor program coordinators and directors of education with the tools, examples, and guidelines needed to effectively implement ‘digital literacy’ in both content and form. We have just begun this transformation, starting with the inclusion of digital literacy in all first-year courses across thirty different humanities bachelor programs. Next, we need to extend this integration to all second- and third-year bachelor programs, as well as master’s programs at the Faculty of Humanities. This process is extensive and will evolve alongside each new cohort of students.

If we are talking about a wishlist, one priority would be to have all CDH teams under the same roof. Currently, we are spread across different buildings, and although we can connect quite easily, sharing a single co-working space would certainly enhance both our collaboration and the overall impact of our work. Having said that, whether we work on the same street or in the same building: all of us at the CDH will continue to support and foster the computational perspective in the humanities.


If you wish to find out more about the Centre for Digital Humanities, you can visit cdh.uu.nl or contact the CDH at cdh@uu.nl.

Registration for the CDH Education Program for the fall of 2024 is still open for staff and students at the Faculty of Humanities at Utrecht University. You can register for free workshops, courses and lectures in digital humanities here:

18 Nov
13:00 - 16:00
This workshop is fully booked. If you are interested in participating, please register for the extra edition of this Introduction to NVivo workshop on 2 December 2024.   Explore the…
CDH workshop: Introduction to NVivo (FULL, extra edition on 2 Dec 2024)
21 Nov
15:15 - 16:45
During this hands-on lecture, Iris Muis, Lead Operations at Data School, will talk about data ethics and the Fundamental Rights and Algorithms Impact Assessment (FRAIA). The lecture will be followed…
CDH hands-on lecture: Introduction to Data Ethics
25 Nov
13:00 - 17:00
Jeroen Bakker, researcher at Data School, and Jelte van Boheemen, research software engineer at the Research Sofware Lab of the Centre for Digital Humanities will introduce you to the core concepts of…
CDH workshop: Network visualization – introductory Gephi/Python workshop
28 Nov
15:15 - 16:15
Prof. Tobias Blanke, University Professor of Artificial Intelligence and Humanities at the University of Amsterdam, will discuss his new project: 'Deep Culture – Living with Difference in the Age of…
CDH online guest lecture by Tobias Blanke: Deep Culture - Living with Difference in the Age of Deep Learning
2 Dec
13:00 - 16:00
Explore the fundamentals of qualitative data analysis with NVivo in this entry-level workshop led by Julia Straatman, researcher at Data School. This session is designed to familiarize you with the basic…
CDH workshop: Introduction to NVivo

Interview with Tijmen Baarda (CDH), Leonard Rutgers & Stefan Dingemans: interactive map of Jewish migration in Europe in Roman times

作者yara
2024年6月20日 21:04
From left to right: Stefan Dingemans, Leonard Rutgers, Tijmen Baarda. All photos used in this interview were made by Annemiek van der Kuil

“Digital research is the future, also in humanities”

Professor Leonard Rutgers and his colleagues research how Jewish migrants spread over Europe in Roman times. Thanks to a grant of the FAIR Research IT programme, the team was able to digitise a large number of sources and incorporate them into an interactive map. Other scholars can now use that map for their research.

When you are researching the Roman period, says Leonard Rutgers, you need to be lucky with the sources that have been handed down: books, inscriptions, what is left of buildings. “Often it is a Swiss cheese. Fortunately, many sources on Jewish communities have survived. That is why we know that Jews were already living in Europe two thousand years ago.”

As a professor of Late Antiquity, Leonard Rutgers is concerned with the historical spread of the Jewish people across our continent. `This group has kept its own identity throughout the centuries and can therefore be traced. As opposed to other migrant groups in Antiquity which cannot be traced in an archeological sense within two to three generations. My colleagues and I wanted to take the next step in our research: what underlying patterns can be recognized in Jewish migration?”

Playing with data

The team applied for a grant at Utrecht University FAIR Research IT Innovation Fund (see text box). With the help of the grant, they wanted to convert the information from all these hundreds of ancient sources to an interactive, digital map showing the migration movements. “We have lots of archeological material, but as long as that information is saved in several places, you are unable to discover underlying patterns. Nor can you ‘play’ with the data. You can if you digitise the data.”

What is the FAIR Research IT Innovation Fund?
Utrecht University wants every research team to be well supported in the field of research IT. That is why there is the FAIR Research IT Innovation Fund. Scientists can receive a contribution for projects that improve the IT infrastructure of scientific research.Think of projects that ensure there is enough storage capacity for data. Or the development of tools and services that help researchers in their work. When selecting projects the FAIR and open science principles form the guideline. For instance, other researchers must be able to easily reuse the knowledge and solutions.

Two thousand documents

The team got the grant, and the historians could get to work. Master’s student Stefan Dingemans was tasked with creating an overview of all sources with so-called ‘Jewish settlement evidence’: proof that Jews lived in a particular spot in the Mediterranean in Roman times.

“These sources can be found in dozens of thick books and countless journal articles,’ says Stefan Dingemans. “I collected all useful data in one large Excel sheet. What is the origin of the source, what period are we talking about, and what type of source is it? Think for instance of inscriptions on tombs, or the archeological remains of a synagogue. I also included translations of inscriptions. Finally I arrived at a dataset with almost two thousand documents, in our discipline that is an awful lot.”

“The interactive map shows interesting patterns, providing evidence for our assumptions”

To bring out the underlying migration patterns, the researchers had to take an extra step. Because an Excel sheet with two thousand records is not very clear. That is why Leonard Rutgers and Stefan Dingemans asked help from the Centre for Digital Humanities. These scientific developers got the job to combine the humanities data with digital methodologies.

From Excel sheet to map

“Stefan asked me to visualize his dataset on a map,’ explains developer Tijmen Baarda. “So we started experimenting. Soon that resulted in a beautiful map!” The dataset contains coordinates of all the places where sources about Jewish migrants were found in Classical Antiquity. And the team made use of an existing dataset with coordinates of known places in Antiquity. Tijmen Baarda: “By combining the two, we were able to place all data neatly on the map.”

Do you need help with digitisation or data management? Go to the CDH!
Humanities researchers, teachers and students can also go to the weekly walk-in hours of the Centre for Digital Humanities (CDH). You can also sign up for CDH events or get technical support for your humanities research or education. Just come to a walk-in hour or contact the CDH and we will help you on your way!

Migration patterns become visible

The results are promising, the researchers are happy to say. Leonard Rutgers: “The interactive map shows all kinds of interesting patterns, and provides evidence for the assumptions we already had. For instance, that most Jewish migrants travelled to large cities and easily accessible areas. That makes sense, also now migrants travel to places that offer the best chance of a better life. That was also common in the Roman period.”The researchers also put their map next to maps with travel routes which were used in ancient times. “For example, we saw that most people chose for short routes with intermediate stops. That can also be explained: you get to places in between where you can make money to support yourself. Moreover, we discovered that people flocked to areas in the ancient world that were flourishing economically. So to big cities like Rome, Athens and Alexandria.”

“It is not yet common practice in humanities to conduct data-driven research”

In addition, the map reveals cultural patterns, the professor continues. “We see that the Jewish community strongly integrated in the Roman Empire, it took over the local culture. And apparently the ‘receiving’ society was open to this.” For this reason the Roman Empire knew such a long successful period, he emphasizes: it was one big integration machine.  “We know that this migration and integration policy of the Romans came to an end in the course of the fifth century. And that is exactly what we see reflected on our map: from this time onwards the migration of Jews decreased at lightning speed.”

Historians put two-thousand-year-old migration movements on interactive map with the help the CDH Research Software Lab

Data search with interface

Nice findings, but that did not mean the team was done. The grant of the FAIR Research IT Innovation Fund was also intended to publish the data FAIR (see text box). Tijmen Baarda: “We wanted to make the dataset freely available on the internet: Findable and Accessible, the F and A from the acronym FAIR. But the published research data also had to be Interoperable and Reusable, so that other researchers can work with the data in the future.”

What is FAIR?
Utrecht University stands for open science. FAIR data is a part of that. The FAIR principles are a series of instructions for researchers, aimed at storing and publishing research data in the best possible way. The acronym FAIR stands for Findable, Accessible, Interoperable and Reusable. For you, as a researcher, FAIR has several  benefits. To make it easier for you, we have developed FAIR Cheatsheets at Utrecht University. There is also the Publishing and Sharing Data Guide.

In collaboration with Stefan Dingemans, Tijmen Baarda and his IT colleagues are building an interface, allowing users to easily search and edit the underlying data.  “For example, you can filter on the source’s year of origin, or on language – there are inscriptions in Latin, Greek and Aramaic. Researchers more experienced in digital research methods can download the underlying dataset and use it in other applications.”

The research team studying a map

New research methods for historians

It is not yet common practice in humanities to conduct data-driven research,  professor Rutgers stresses.  “That is why our project had another goal, namely to try out this method within our discipline. Beforehand we were not sure if something useful would come out of it. In that respect,  I am really glad with the way things went. Because digital research is the future, also in humanities.”

Tijmen Baarda adds: “In humanities you often have to deal with medium-sized datasets. Too big to study manually, too small to run real statistical analyses on. Often no research methods have been developed for these datasets. This project was a step in that direction.”

Link with other data research
For his master’s thesis Stefan Dingemans linked the dataset of this project to worldwide data about land use through the ages. Geologist Kees Klein Goldewijk has been collecting and analysing that data since the 1990s. He recently made a large amount of data available via the Hyde Portal. Kees Klein Goldewijk also received a grant from the FAIR Research IT Innovation Fund for this project. “By connecting our data on migration flows to data on climate, for instance, we discovered more interesting things,”  says Stefan Dingemans. “ The increase and decrease in the number of migrants show a clear correlation with changes in the climate. For example, we saw in the Levant,  an area east of the Mediterranean, that many people moved to the cities in periods of severe drought.’  This research is still in its infancy, Stefan Dingemans stresses. “ But these initial results are promising.”

“There are undoubtedly more historians, linguists or cultural scholars whose hard disk contains an Excel sheet full of data”

Now it is your turn

The team is currently writing a scientific paper about its methods and findings, so colleagues can build  on the knowledge and experiences gained. And discussions on a larger, international follow-up research are already taking place.

Leonard Rutgers,Stefan Dingemans and Tijmen Baarda have a clear message for fellow-researchers who are not familiar with digital data research: just do it!. Tijmen Baarda: “There are undoubtedly  more historians, linguists or cultural scientists at Utrecht University whose hard disk contains an Excel sheet full of data. You can go to RDM Support with your questions about research data. They are there to offer help and support. Are you a humanities scholar? Then you can also go the Centre for Digital Humanities. My colleagues and I are happy to help you on your way!”

This interview was originally published on the Utrecht University website on 20 June 2024 as part of a series showcasing projects that received a grant from the FAIR Research IT Innovation Fund. This fund gives scientists a scholarship for projects that contribute to more FAIR research and data.

SURF interview: prof. dr. Antal van den Bosch on AI and GPT-NL

作者yara
2024年6月19日 19:05

Prof. dr. Antal van den Bosch has given an interview to SURF, the cooperative association of Dutch educational and research institutions. The interview is titled ‘AI lets scientists work magic with language’.

In the interview, he speaks about his career in AI and his thoughts on the future of AI. He also shares his ideas on the Netherlands’ own open AI-language model: GPT-NL.

In 2023, the Netherlands started the development of GPT-NL as a joint effort led by SURF, the Netherlands Forensic Institute (NFI) and the Dutch Organization for Applied Scientific Research (TNO) in close collaboration with the academic sector.

You can read the full SURF interview with prof. dr. Antal van den Bosch here.

Prof. dr. Antal van den Bosch is one of the Centre for Digital Humanities’ affiliated members as well as a member of our advisory board. He is a professor of Language, Communication and Computation at Utrecht University.

UU interview: Core team Human Artificial Intelligence

作者yara
2024年6月13日 00:28

An interview with Dr. Pim Huijnen and Dr. Evelyn Wan has been published on Intranet, the internal website for Utrecht University staff. Together with Dr. Tejaswini Deoskar and Dr. Dominik Klein, Pim and Evelyn make up the core team Human Artificial Intelligence. The interview is part of the series: ‘educational innovation through the eyes of the four core teams’.

The core team Human Artificial Intelligence, from left: Dr. Pim Huijnen, Dr. Tejaswini Deoskar, Dr. Evelyn Wan, and Dr. Dominik Klein.

The article talks about the role of the core team and their ideas about AI in the humanities, on which the team also wrote a position paper. They emphasize the inherent interdisciplinarity of AI and Pim explains their focus on the humane aspect of AI: “in line with our human norms and values, and with a critical humanities view”. Evelyn concludes: “We have to prepare our students for the contemporary context that we live in”.

Read the full interview here (only accessible to Utrecht University staff).

Interview with Karin van Es and Dennis Nguyen: Exploring the potential of data donations

作者masch001
2024年3月26日 23:44

Data donation presents a new and unique approach to collecting digital trace data, liberating researchers from platform dependencies and restrictions. A consortium of six Dutch universities has now started building its own digital data donation infrastructure. Within this initiative, Karin van Es and Dennis Nguyen from the Faculty of Humanities at Utrecht University, supported by Laura Boeschoten (UU) and Niek de Schipper (UvA) of the national D3I team, are conducting two pilot studies: one focusing on the video-sharing platform Netflix, and the other on AI chatbot ChatGPT. In this interview, Karin van Es and Dennis Nguyen elaborate on the project’s significance.

Why is a data donation platform necessary?

Karin van Es: ‘It is increasingly hard for researchers to get access to online behavior as big tech companies and online platforms restrict access to their data. The digital data donation infrastructure D3I, funded by PDI-SSH, helps researchers navigate this challenge by asking participants to donate their ‘digital traces’ for research purposes.’

What are data donations?

Van Es: ‘Under the GDPR, individuals have the right to obtain a copy of their personal data held by data processors, received as Data Download Packages (DDPs) in files. Data donations offer a new method for collecting data, enabling researchers access to digital trace data donated to their project by participants. Participants request and share their DDP’s with researchers, downloading them to their devices. They don’t donate all the data, but just the features of interest to the researcher.  Moreover, participants can select what data they eventually want to donate. This method for data collection opens up new avenues for academic research while taking ethical measures to ensure responsible data practices. By leveraging this technique, scholars gain access to datasets previously inaccessible.’

What are you exploring with these pilots?

Van Es: ‘For the first pilot, we received 129 Netflix DDP’s and surveys with the help of panel recruitment company Ipsos I&O. Netflix is often credited to have disrupted the traditional media landscape through its sophisticated data collection and analysis capabilities, and the perceived effectiveness of its recommendation system. However, the implications of these innovations are still not fully understood. Traditionally, Netflix has kept its viewership data closely guarded, but a recent shift towards greater transparency has emerged. Yet, they still only share information they want to share. As such, Netflix maintains control over the narrative, shaping public understanding of binge-watching, content popularity, and diversity within its catalog. How then can we critically explore these practices? Our pilot critically examines Netflix’s narratives, challenging its selective transparency and shedding light on the dynamics of streaming consumption.

Dennis Nguyen: ‘We are currently setting up the second pilot, focusing on acquiring ChatGPT DDP’s from university students. We are exploring how to conduct the research project in the most ethically sound manner with input from the privacy officer and ethical board. The impact of ChatGPT on learning objectives and educational approaches, both in the classroom and academia at large, poses intriguing questions. What is missing in the cycles of hype and fear about GenAI in education is empirical research into the situation on the ground. How students use it in different contexts has only been tentatively explored, mostly through surveys. Limited research is available on how they perceive and harness GenAI services such as ChatGPT for study-related purposes. Researching student interactions through their actual user data collected by ChatGPT/OpenAI opens new paths for better understanding how GenAI entered higher education from a student perspective.’

What’s next?

Nguyen: ‘So far, we find the opportunities really encouraging and we are looking forward to actually work with the data. We are curious to further explore the willingness of participants to donate to research and the quality of the data available. The ultimate goal of this project is making this method more widely available to other researchers. More and more materials, including ethics approval procedures and data management, will be made available online to support others using data donations for research projects in the future.’

About

Karin van Es is Affiliate and Impact Specialist at the Centre for Digital Humanities (CDH). She works as associate professor of Media and Culture Studies at Utrecht University and is part of the GenAI in Education Humanities taskforce.

Dennis Nguyen is assistant professor in computational methods and digital literacy at Media and Culture studies at Utrecht University.

Interested in learning more on data donation? Visit the D3I symposium which also includes a short course on how to prepare your data donations study.

Interview Lorena De Vita: Using Transkribus to decrypt the diaries of a German jurist

作者masch001
2024年2月27日 22:23
A number of Otto Küster’s diaries. Photo: Lorena De Vita

During her research Lorena De Vita stumbled upon the personal diaries, spanning from 1932 to 1989, of Otto Küster, a German jurist who dedicated a significant portion of his professional career seeking reparations for Holocaust survivors. His handwriting, however, was almost illegible. De Vita participated in a Trankskribus workshop offered by the UB and CDH and put together a research team to start transcribing the diaries.

On 20 March 2024 De Vita and her postdoc Laura Fahnenbruck will deliver a CDH lecture. This is followed by an optional Transkribus workshop provided by the DH-UB Team.

Data School on impact: ‘We want to take part in shaping the digital society’

作者masch001
2024年2月26日 22:44

Data School’s Mirko Schäfer and Iris Muis are keen to push for a broader definition of impact at university level. Although they do not find the time to publish as often as they would like, their research is having significant impact, with their work used in Austria, Sweden, Finland and Greece.

Schäfer: ‘If you really want to achieve an impact, then a trickle down from universities to society will fall short. It is our ambition not only to study society, but also to help shape it.’

Read the interview

FAIR Research IT interview with Berit Janssen

作者masch001
2024年2月23日 19:04

This interview is part of a series conducted by UU FAIR research IT, aimed at showcasing the impact of their projects on FAIR research IT. This TextMiNER project, led by the CDH Research Software Lab, has been made possible through a FAIR Research IT Innovation Grant.

‘You get much more out of a text corpus with Named Entity Recognition’

I-Analyzer helps you to easily search and visualize text corpora. In addition, the Named Entity Recognition functionality will soon allow you to search for entities such as place names, persons or organisations quickly and easily. You can also visualize these Named Entities. This application will be integrated in I-Analyzer from the TextMiNER project. This contributes to working FAIR and Open Science.

Suppose, you are doing research into the theme ‘collaboration’ in English newspapers. You open the I-Analyzer research software and enter your search query for the corpus of The Times. Soon all kinds of data appear on your screen: you see which texts mention something about collaboration, how often it occurs and when. You scroll through some articles and at a glance you see all places in the text where the word ‘collaboration’ occurs, because they are marked by colours. What’s more: on a topographical map you can see all place names mentioned in these articles.

Until recently this scenario was only a dream for some researchers, but soon that will change. This is thanks to Research Software Engineer Berit Janssen and her colleagues from the Research Software Lab (RSLab) team of the UU Centre for Digital Humanities. “Our team supports Humanities researchers who are using software, for instance as part of large corpus studies on newspaper articles,’ Berit Janssen explains. “We advise on the right software for this and also develop software ourselves on request.”  The team built I-Analyzer: a tool for searching and visualizing text corpora. It can be used to analyze a large collection of texts from a bird’s eye view (distant reading). “ Useful if,  for instance, you want to know what search terms it contains, or how the different types of texts are divided within the corpus. This way you can answer your research question or make a selection that you further analyze with ‘close reading’, zooming in on the details.”

At the moment you will mainly find newspaper corpora in I-Analyzer, but also, for instance, court records and parliamentary data. “Researchers can work directly with these corpora. If they want to work on another corpus, they submit a request to us.”  Recently I-Analyzer has become open source, so anyone can use it with their own dataset. This contributes to Open Science. “ We are working towards making it easier for researchers to enter corpora without using code. In this way accessibility is increased.”

Named Entity Recognition (NER)

Dr. Berit Janssen, photo by Annemiek van der Kuil, PhotoA

Although researchers gained a lot by the arrival of I-Analyzer, Berit Janssen and her colleagues noticed a need that was not met. “ Researchers regularly asked: can we do something with Named Entity Recognition (NER)? Historians wanted to do research into place names in newspaper articles during the Second World War. Sifting manually through a text corpus takes up a lot of time. NER offers the option to search a corpus automatically for so-called ‘ entities’, such as place names, persons, brand names or years. Although this leads to some more mistakes, it allows you to analyze more texts.”  A useful functionality then, but it has to be available of course. This is how the RSLab Team came up with the idea of applying to the FAIR Research IT Innovation Fund for the TextMiNer project. “To make research with Named Entities in I-Analyzer possible, we need to enhance the data. That means that we go through all texts in a corpus in one go and label all Named Entities. For that we use a model within the spaCy software. We store the labels and make them visible in I-Analyzer, so researchers can proceed with them.”

What is Named Entity Recognition?
Named Entity Recognition (NER) happens with machine learning models. These models are trained on large amounts of data with ‘named entities’ annotated by humans. This allows the models to make predictions on new data where place names, personal names and other ‘entities’ might be located.

Benefits for researchers

Thanks to this project, I-Analyze users will soon get direct access to the NER-functionality. Ideal for researchers working with large amounts of textual data, Berit Janssen says. “For instance, a researcher wanted to track the concept of Fairtrade. In that case, NER can be used to do an analysis on chocolate brands mentioned in English newspapers over time. Another research project dealt with family companies. As companies are labeled too, you can search for them in the corpus of annual proceedings of Dutch companies.”

In short, this functionality offers many options for researchers. What can they expect exactly when they start working with it? “Suppose, you search a corpus in I-Analyzer. Then you see what entities have been found and you can apply statistics to this data. Maybe you are comparing several periods: what occurs more frequently and when? Or you zoom in on the retrieved data: you see the sentence containing the entity, along with its label and content.” The team also wants to visually represent the Named Entities, including histograms and geographical maps on which place names are mapped. In addition, all entities are colour marked in the texts. In this respect it is important to note that you can only search Named Entities in the enhanced corpora in I-Analyzer, Berit Janssen emphasizes. “Unless you know how to program yourself. The code we are developing is open source, so researchers will soon be able to apply Named Entity Recognition to their text corpora themselves.”

Dr. Berit Janssen, photo by Annemiek van der Kuil, PhotoA

Working FAIR

TextMiNER fits in well with a FAIR way of working, Berit Janssen explains. “For instance, NER makes certain aspects of data better findable. This project also makes this functionality more accessible to researchers in a reusable way. As a team we always try to develop software anyway which can be used more than once, and has lasting value. NER is therefore a method that can be used by various disciplines. That is why a project such as TextMiNER is so great: we are systematically creating a solution that helps many researchers.” Moreover, TextMiNER increases ‘interoperability’  because several research techniques are linked, such as filtering, analysing and viewing Named Entities. Does I-Analyzer also allow easy collaboration or the replication of research? “In principle, yes. I-Analyzer can also be accessed by researchers outside Utrecht University. However, then you both need to have access to the corpus. Unfortunately, not all data within I-Analyzer is in the public domain. This is because the owner of a newspaper corpus is often a publisher, with whom each university has to enter into its own agreement. Enhancements of non-public data are therefore difficult to share. As a researcher you might still be able to indicate which Named Entities have been found without sharing the text itself, but that must be coordinated with the copyright owner.”

Seizing the initiative

Without the Innovation Fund this project could not have got off the ground quickly, according to Berit Janssen. “We have seen for quite some time now that researchers really want this. Some even had some budget, but never enough to enhance large amounts of data, let alone to represent them visually. The Innovation Fund offered a splendid opportunity to finally take this up.” The good thing about this grant is, according to her, that they could apply for the grant as Research Software Engineers.  “Many funds require a researcher. Now we could take our own initiative to create this solution.” The budget is used for the time the developing requires. Currently, Berit Janssen is mainly occupying herself with this.  “First I set up a pilot by preparing data. I mainly test how we can save the labels in such a way that they can be retrieved quickly.”

From 2024 onwards several team members will start working on the project. `Then we will apply NER on a large scale and develop the visual representation of Named Entities. Which corpus will be singled out?  “Probably The Times. Newspaper corpora are of interest to many researchers. Moreover, this corpus was the first one in I-Analyzer. After that the debates held in the Dutch parliament might be a good option, since they are publicly available.’ In any case, the RSLab keeps exploring how they can help researchers further with smart solutions. “Take for instance the representation of Named Entities on a map. Researchers really wanted that, but we always had to disappoint them. But soon we will be able to say: yes, we can do that!”

Would you like to start working with I-Analyzer? Get direct access here or have a look at the I-Analyzer GitHub pages  and the TextMiNER project.

About the FAIR Research IT Innovation Fund
Utrecht University wants each research team to be well supported in the field of research IT. One of the ways to achieve this is through the FAIR Research IT Innovation Fund. Scientists can receive a grant for projects which, for instance, improve the IT infrastructure of scientific research. You may think of projects that enable enough storage capacity for data, or of the development of tools and services that help researchers in their work. FAIR and open science principles are the guidelines when selecting projects. Other researchers must be able to easily and quickly reuse the knowledge and solutions.

This article was originally published here at uu.nl.

❌