普通视图

Received before yesterday比 - 鲁汶大学(KU Leuven)

A Story from the Research Trenches: Erasmus+ in Mannheim

2024年10月18日 16:19

As part of our blog series, “Stories from the Research Trenches,” we often invite researchers and colleagues to share their personal experiences. For this installment of the series, we are delighted to have our colleague André Davids from KU Leuven Library of Economics and Business share about his recent Erasmus+ stay at the University of Mannheim. André talks specifically about the opportunity to explore Optical Character Recognition (OCR) tools, a topic that Faculty of Arts researchers often seek advice about. Read about André’s experience learning about various OCR software options, his takeaways on how they do things at the University of Mannheim, and his impressions about the city itself.

Meanwhile, somewhere else: Erasmus+ in Mannheim

Hello, I am André, and in March 2023, as part of the Erasmus+ program, I spent five days at the University of Mannheim. Why did I choose Mannheim? In the context of my work at the Library of Economics and Business, where I am involved, among other things, with OCR (Optical Character Recognition), it quickly became clear after some online research that they are very actively engaged in that field there.

I was warmly received at the library by Stefan Weil, one of the most active current developers of the OCR software Tesseract. He told me a lot about the university and the city, but also introduced me to the world of Linux, Ubuntu, Debian. In addition, I was able to experiment with various OCR software (Tesseract, eScriptorium, Pero-OCR) and received more information about the OCR-D Project.

In Mannheim, they primarily work on the further development of open-source software. Additionally, they offer support to students and researchers in using this software. Once a month, they organize an open online OCR consultation hour in collaboration with the University of Heidelberg, where anyone can ask their OCR-related questions. The “clients” are mainly researchers, but also library staff from other universities.

Also interesting to mention: The library has a room, the ExpLAB, which is dedicated to brainstorming, Design Thinking, etc. This room is fully equipped for brainstorming sessions, but also has Eye-Tracking Stations, Virtual Reality glasses, etc., which can be used by both students and staff.

This Erasmus+ experience not only enriched my knowledge about OCR but also about the city and university. Although Mannheim is a well-known city, I didn’t know much about it myself. Due to its architecture, it was chosen by the Allies in 1940 as a place to experiment with air raids and complete city destruction. As a result, there wasn’t much left of the city after World War II, and it had to be rebuilt. After long debates, the Baroque Palace (Barockschloss) was also rebuilt. Luckily so, because in 1967, the University of Mannheim could establish itself there. This building, with its width of 450 meters, is the second-largest baroque palace in Europe, after Versailles (but – and this is important – it has one more window than Versailles).

A large palace in baroque style with a flag flying above the center entrance
baroque palace, mannheim

Navigating the city was quite a challenge since the city center has no street names but has been divided into squares since the 17th century. The most striking street is the one in front of the university, the “Kurpfälzer Meile der Innovationen” (Palatinate Mile of Innovation), which has 42 bronze plaques on the ground honoring famous innovators such as Carl Benz (automobile), Karl von Drais (precursor to the bicycle), Werner von Siemens, and others. Maybe an idea for KU Leuven?

What stuck with me most in terms of their work culture is the Teams channel called “Mittagessen” (lunch). This is where colleagues arrange lunch plans. This is also how I met a colleague who, as a student, did Erasmus at KU Leuven. I still don’t fully understand their working hours. Apparently, they work 40 hours a week, but I was always the first one there and one of the last to leave… Maybe they calculate time differently there. Everywhere is different, but a lot is still familiar. I look back very positively on my trip to another library and can highly recommend it to everyone.

Also interesting to see is the university library’s introductory video:

Training: “From Paper to Files – How to Efficiently Digitize (Historical) Documents”

2022年10月26日 22:02

Program

 

The Faculty of Arts and KU Leuven Libraries jointly organize a 2-day training (22/11 and 23/11) focusing on “self-made” document digitization for researchers and staff members. Different modules (that can be followed independently from each other) will teach you: (1) how to photograph archival materials to ensure the best possible quality; (2) how to create an ideal research data management (RDM) workflow for the digitized materials; (3) and how to apply Optical Character Recognition and Handwritten Text Recognition (HTR) to automatically transcribe text from images.

Practical Details

Module 1 (day one)
Digitization and Research Data Management 

Module 2 (day two)
OCR and HTR

  • Date: Tuesday 22 November
  • Time: 10:00-16:00 (1 hour reserved for lunch on your own)
  • Location: Room M00.E67 – collaborative learning room, Leercentrum AGORA
  • Date: Wednesday 23 November
  • Time: 9:30-15:30 (1 hour reserved for lunch on your own)
  • Location: Colloquium, University Library

Want to join?

The workshops will take place in person and will be hands-on in nature. To register for one or both modules, click here.

Webinar: Transcription and OCR tool Transkribus on May 31 (in Dutch)

2022年5月19日 17:58

Are you a Dutch speaker who needs to transcribe old or new hand-written materials, or do optical character recognition (OCR) on print materials? Check out this upcoming webinar on Transkribus in Dutch, taking place on May 31, 2022, at 16h CEST:

Dit webinar van Dr. Annemieke Romein geeft een overzicht van de basis van Transkribus in het Nederlands. U leert hoe u documenten upload naar Transkribus, de lay-out analyse uitvoert, handmatige transcripties doet om trainingsdata te genereren, hoe u de geautomatiseerde herkenning gebruikt, welke publieke modellen we aanbieden, hoe de training van uw eigen model werkt en hoe u uw documenten kunt doorzoeken op speciale woorden en zinsdelen. We zullen de workflow stap voor stap doornemen en u krijgt de kans om vragen te stellen via de chat.

U hoeft zich niet te registreren om deel te nemen aan dit webinar (het zal ongeveer 45 minuten duren plus tijd voor vragen), u kunt er toegang toe krijgen via deze link: https://youtu.be/xe-OTS48FK

Recap: March 2022 DH Virtual Discussion Group

2022年4月1日 14:47

The first meeting of the spring 2022 edition of the DH Virtual Discussion Group for ECRs in Belgium kicked off on Monday 21 March with a presentation from Gianluca Valenti (University of Liège). We had a total of twenty attendees—some new faces and some familiar ones—who all contributed to an engaging conversation about digital humanities. 

This session followed our standard format, which opens with a greeting from the organizers, Julie M. Birkholz (KBR and Ghent University), Margherita Fantoli (KU Leuven), and Leah Budke (KU Leuven). This is followed by our networking session where new and returning attendees can introduce themselves in a small group, tell about their interests and experiences in DH, and get to know others in the community. This networking moment also allows those of us who already know each other to catch up and enjoy a coffee or tea before the main presentation starts and to welcome new members into our community. After the networking moment, the group comes back together to share any upcoming DH events or opportunities. The main event follows, when a member of our community gives us a behind-the-scenes look at a digital project, workflow, or tool. 

Gianluca’s “under-the-hood” presentation was titled “Modern Letters and Text Analysis: The ‘EpistolarITA’ Project” and discussed the importance of epistolary texts in historical research. As Gianluca explained, today there is a wealth of correspondence available to researchers, but we are still lacking adequate tools to engage with these materials to the fullest extent. The EpistolarITA project aims to fill this gap and to contribute to scholarly efforts to exploit historical epistolary texts through the development of the EpistolarITA database. The database brings together fifteenth through seventeenth century Italian letters and allows users to perform statistical analysis on this corpus. As Gianluca explained in his presentation, the database allows readers to compare a target text to the texts in the corpus. The database then has the capability to return similar texts, ranking them in order of their similarity. In order to be able to accomplish this, the algorithm uses a number of different techniques including TF-IDF, Word2Vec, and Named-Entity-Recognition. The advantage of using the database, as Gianluca demonstrated, is that it allows users to draw connections or to see patterns that they might not otherwise see. While the full text of letters is not made available due to copyright restrictions, users are still able to perform text analysis on these materials and to return results that they would otherwise not be able to achieve without many visits to the archives and the additional work that goes into creating the infrastructure which allows this type of text analysis.  

The EpistolarITA database is still in the process of being populated, but the official publication is expected this spring. For now, the project site and database is entirely in Italian, but they hope to make an English translation available in the future. 


If a look behind the scenes of a digital project sounds interesting to you, we would be delighted to have you join us for our next DH Virtual Discussion Group meeting on Monday 25 April from 15h-16h30 CEST! In this session, Montaine Denys from the Flanders Heritage Libraries will take us behind the scenes of the Flanders Heritage Libraries’ digitization projects. Montaine’s talk, titled “Managing the Evaluation of OCR Quality in Flemish Newspaper Collections,” will include a discussion of the project workflow, the creation of a “ground truth” dataset, interpreting results, and the specific challenges they have faced and the lessons they have learned while undertaking this project.  

To join us for this session or any future sessions all you need to do is register for our mailing list. Once registered, you will receive all future emails, including the links to the Zoom meetings. These links are distributed via email the morning of the event. 

The DH Virtual Discussion Group is designed to be a low-threshold way for researchers, particularly early career researchers, to come together and learn about digital humanities. Everyone is welcome to attend and absolutely no DH expertise is required. To see a full overview of this spring’s sessions, click here. If there is a session that seems of interest to you, please do join us! 

❌