普通视图

Received before yesterday

AI for Humanities

2025年10月2日 21:10

As part of our blog series, “Stories from the Research Trenches,” we often invite researchers and colleagues to share their personal experiences and opinions. In this blogpost our colleague Miara Fraikin, AI specialist at KU Leuven Libraries and lecturer in architectural history, shares her perspective on the role of AI in the humanities.

With the launch and rapid adoption of ChatGPT, (Generative) Artificial Intelligence is quite abruptly changing the ways we study, research, and work. Based on the research paper ‘Working with AI: Measuring the Occupational Implications of Generative AI’ from Microsoft, media coverage quickly concluded that translators and historians were most likely to be replaced by AI, with respectively 98% and 91% of their tasks able to be taken over by (generative) artificial intelligence. In a blog published on historici.nl, I argued that AI can do a lot, but it can never replace a historian. Why? Read it for yourself here!

While historians won’t be replaced, the research from Microsoft does suggest that the work of historians will change. For those willing to embrace the new possibilities, this could well be a change for the better. Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) are already well established, and their impact should not be underestimated. Instead of having to read all the published building accounts of Louis XIV searching for mentions of the word ‘chambre,’ a simple search of the document saved me at least a week’s worth of time during my own PhD research. While preparing an article, I used Transkribus and Copilot Chat to decipher an illegible word in a handwritten manuscript, which put the document in a completely new light. AI is also helping librarians, archivists, and museum professionals describe their collections more quickly and extensively, which in turn makes the historian’s research data more accessible. And look at the Studium.ai and NIKAW project to see the wealth of possibilities that researchers at KU Leuven are currently exploring for AI and humanities research.

In the future, long days spent in the archive – reviewing document after document, sometimes with no result – might not be as common. And while I understand that this is part of the charm of being a historian, imagine a future where AI enables us to delve into vast collections of historical data, uncovering historical phenomena on grander scales and in greater detail, with new techniques opening up new lines of inquiry. This is something to be excited about!

A Story from the Research Trenches: Erasmus+ Experience in Barcelona

2025年1月16日 19:44

As part of our blog series, “Stories from the Research Trenches,” we often invite researchers and colleagues to share their personal experiences. For this installment of the series, we are delighted to have our colleague Marleen Marynissen from KU Leuven Libraries Nexus Research Data Management team sharing about her recent Erasmus+ experience in Barcelona.

A Journey of Collaboration and Learning: My Erasmus+ Experience in Barcelona

In October 2024, I had the opportunity to participate in the Catalan University Libraries Erasmus Staff Week in Barcelona. This five-day event, held from October 7th to 11th, brought together library professionals from across Europe to collaborate, exchange experiences, foster collaboration, and explore the evolving role of university libraries in the field of open science.

The week kicked off with an international coffee break and an icebreaker activity, setting a friendly and collaborative tone for the days ahead. Hosted by the Consortium of University Services of Catalonia (CSUC), the event provided a platform for participants to introduce themselves and share their expectations. It was also very interesting and inspiring to see how CSUC facilitates shared services and infrastructures among Catalan universities and research centers, enhancing their efficiency. The first day concluded with a guided tour of Barcelona – allowing us to discover the city’s vibrant culture.

The next day we went to the Universitat Politècnica de Catalunya (UPC), where we explored the research support services offered by the libraries. We learned about the Library’s Research Café, user training programs, and cultural programming in collaboration with UPCArt. This day also featured the first round of participant presentations, including my own presentation titled: Empowering Open Science, promoting FAIR dataset publication through documentation and metadata enhancement.

On the third day, we visited the Universitat Pompeu Fabra (UPF), where we focused on teaching support services. We were introduced to La Factoria, a support service for digital production managed by the library and IT. The day also included engaging presentations and discussions.

At the Universitat Autònoma de Barcelona (UAB), we delved into open education and citizen science initiatives. One of the highlights was a collaborative task focused on creating an open educational resource, which allowed us to explore the opportunities and challenges of open education in practice. After visiting the UAB’s facilities and enjoying several participant presentations, we concluded the day with a social dinner.

On the final day we had a session to reflect on our shared experiences and key takeaways. The program concluded with a visit to the Catalan National Library.

The Catalan University Libraries Erasmus Staff Week in Barcelona was an unforgettable experience. Each location we visited enriched the program by showcasing its unique approach to library services and open science. The participant presentations offered fresh perspectives and it was really nice to meet and exchange ideas with colleagues from across Europe. Of course, the beautiful and dynamic city of Barcelona added an extra charm to the entire event. This week was a perfect blend of learning, networking, and cultural discovery, and I am grateful for the chance to be part of it.

A Story from the Research Trenches: Erasmus+ in Mannheim

2024年10月18日 16:19

As part of our blog series, “Stories from the Research Trenches,” we often invite researchers and colleagues to share their personal experiences. For this installment of the series, we are delighted to have our colleague André Davids from KU Leuven Library of Economics and Business share about his recent Erasmus+ stay at the University of Mannheim. André talks specifically about the opportunity to explore Optical Character Recognition (OCR) tools, a topic that Faculty of Arts researchers often seek advice about. Read about André’s experience learning about various OCR software options, his takeaways on how they do things at the University of Mannheim, and his impressions about the city itself.

Meanwhile, somewhere else: Erasmus+ in Mannheim

Hello, I am André, and in March 2023, as part of the Erasmus+ program, I spent five days at the University of Mannheim. Why did I choose Mannheim? In the context of my work at the Library of Economics and Business, where I am involved, among other things, with OCR (Optical Character Recognition), it quickly became clear after some online research that they are very actively engaged in that field there.

I was warmly received at the library by Stefan Weil, one of the most active current developers of the OCR software Tesseract. He told me a lot about the university and the city, but also introduced me to the world of Linux, Ubuntu, Debian. In addition, I was able to experiment with various OCR software (Tesseract, eScriptorium, Pero-OCR) and received more information about the OCR-D Project.

In Mannheim, they primarily work on the further development of open-source software. Additionally, they offer support to students and researchers in using this software. Once a month, they organize an open online OCR consultation hour in collaboration with the University of Heidelberg, where anyone can ask their OCR-related questions. The “clients” are mainly researchers, but also library staff from other universities.

Also interesting to mention: The library has a room, the ExpLAB, which is dedicated to brainstorming, Design Thinking, etc. This room is fully equipped for brainstorming sessions, but also has Eye-Tracking Stations, Virtual Reality glasses, etc., which can be used by both students and staff.

This Erasmus+ experience not only enriched my knowledge about OCR but also about the city and university. Although Mannheim is a well-known city, I didn’t know much about it myself. Due to its architecture, it was chosen by the Allies in 1940 as a place to experiment with air raids and complete city destruction. As a result, there wasn’t much left of the city after World War II, and it had to be rebuilt. After long debates, the Baroque Palace (Barockschloss) was also rebuilt. Luckily so, because in 1967, the University of Mannheim could establish itself there. This building, with its width of 450 meters, is the second-largest baroque palace in Europe, after Versailles (but – and this is important – it has one more window than Versailles).

A large palace in baroque style with a flag flying above the center entrance
baroque palace, mannheim

Navigating the city was quite a challenge since the city center has no street names but has been divided into squares since the 17th century. The most striking street is the one in front of the university, the “Kurpfälzer Meile der Innovationen” (Palatinate Mile of Innovation), which has 42 bronze plaques on the ground honoring famous innovators such as Carl Benz (automobile), Karl von Drais (precursor to the bicycle), Werner von Siemens, and others. Maybe an idea for KU Leuven?

What stuck with me most in terms of their work culture is the Teams channel called “Mittagessen” (lunch). This is where colleagues arrange lunch plans. This is also how I met a colleague who, as a student, did Erasmus at KU Leuven. I still don’t fully understand their working hours. Apparently, they work 40 hours a week, but I was always the first one there and one of the last to leave… Maybe they calculate time differently there. Everywhere is different, but a lot is still familiar. I look back very positively on my trip to another library and can highly recommend it to everyone.

Also interesting to see is the university library’s introductory video:

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年11月18日 02:34

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Lode Moens, BiblioTech Hackathon participant. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Lode’s group, called StudentsBlock, worked on the Magister Dixit dataset, which features a collection of handwritten lecture notes of the old University of Louvain (1425-1797). You can learn more about the team’s work by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

Five people stand in front of a stone building with columns, smiling towards the camera.

Members of the StudentsBlock team pose for a photo at the reception of the closing event.

What first interested you in the hackathon? Have you done one before? What is your background?

I’d never done a hackathon before, since this was the first one directly associated with the Faculty of Arts [I decided to participate]. I was interested in it because my friend was going to apply, and she asked if I was interested. I’m not a huge ICT nerd but wanted to improve my skills, so it sounded like a good opportunity. So I looked at the datasets, and it sounded interesting. [I know] Professor Fantoli; I do some student work for her and thought why not. I have no experience programming, but I’m always eager to learn new stuff. I study Greek and Latin and a bit of Theology as part of my master’s.

What was your primary concern when beginning the project? The project? The process?

My first concern was whether we were going to find a subject. We had a lot of ideas but not a lot of it was super straightforward. So it took a while to establish a clear idea. Even then, it was a bit chaotic, but it was nice to see people taking the lead and our team leader did a good job at coordinating everything.

What kind of audience did you have in mind?

Our project was to optimize an existing database of student notes. Our first target audience was researchers – students, professors, PhD students, etc. of the Classics, history, philosophy, theology, law, basically anything that the notes represent, as well as anyone doing research on the Old University of Leuven for a Bachelor’s or Master’s thesis. Secondly, just people interested in the Old University and the courses. Yannick and Linde [team members] were also both familiar with notes from the University and are trying to reconstruct the old student and professor notes. So that was the less abstract audience. Very topical and applicable. The jury asked us what we would you do if we couldn’t read Latin, but transcribing all this material would take some time.

How did you establish your methodology and approach to the data set? Were you inspired by any other platforms or projects?

We had a PhD student on our team who comes from the Faculty of Economics and is now working in DH. She absolutely loves data and was able to figure out what we should do. We had three people working on the XML file. They cleaned the data and structured it. We had a specific goal in mind; we wanted to structure the data according to students, dates, etc., but we had some unorganized data. We also had two people working on the presentation. One was writing the HTML for the Toledo page [KU Leuven’s educational platform], and I myself took care of communication. So, the project was quite structured and everyone knew what they had to do. People used their backgrounds to work together. We had someone doing DH, Greek, and our team leader, Daria, is from LECTIO. So we knew what was possible to accomplish. We had a lot of meetings – two each day – so communication was efficient.

When did you reach out to the experts?

We reached out to ask if we could use the CSS profile of Toledo, as well as the Python script. So not in the very beginning, but during the first step.

What was the brainstorming process like at the “Meet the Data, Meet the People” event before the hackathon officially kicked off? Was there a clear vision from the beginning?

There was not a clear vision. Daria’s first idea was to work on doodles – so what students wrote in the margins of their notes. But the manuscripts we had were neat and more polished than expected. So if they talked about how to construct a globe, there would be technical drawings but no funny doodles. We were going to reconstruct the lecture cycles, but that was a bit too narrow. But it was great to meet the people before the stress of the actual hackathon. Once the hackathon started, it was chaotic. So having [established] personal connections first, then the technical skills, was a nice approach. One or two people also came [to the “Meet the Data, Meet the People” event] and had to withdraw – so it was good for them to get that first taste, and we didn’t immediately start working with them.

Were there any issues you ran into when cleaning up the data or creating the database?

There were some problems with the cleaning process because of the dates; not all manuscripts had dates in the metadata. So then we had to try to figure out what should be included in our data but we weren’t sure how to structure it. We had some conflicting ideas on how to target this. There were some Python problems which had to be recoded too. I also reached out to an old professor of Daria’s to take a second look at the Latin and help correct it.

How was the idea for the Toledo-inspired website first conceptualized and then implemented?

At the opening event we were throwing ideas out and someone said “oh we could visualize it as a Toledo page.” The idea stuck, but there was a practical problem: to be able to make it a decent page, we needed one student and three courses for the tiles [on the homepage of the platform]. That was something we managed to do with our dataset. It was kind of all or nothing. Once we saw it was possible, the writing started: the CSS profile, the underlying HTML. One person had a clear idea of what it could look like. Then it was a matter of filling it with content descriptions, side notes, history. We actually had someone who did a Master’s in History and so we were able to use some of his thesis topic and really had some fun with it. We had a structured dataset so we wanted to present it in a fun way. Toledo was a natural decision since it was familiar to so many students. All in all, it was a result of hard work.

How would you describe your experience?

It was very nice to work with people from so many backgrounds; math, history, philosophy, Dutch; you came into contact with people you wouldn’t normally meet. There were challenges in ways of working, ideas, etc., but in the end we worked well. We also all had a drink after so we all got along well [laughs] and we’re all pretty friendly now. It was interesting to be a part of the hackathon without a strong ICT background. It was really interesting to see what can be possible, especially when watching the other teams’ presentations. Our own dataset didn’t lean towards visualization like some of the other teams, but we still were able to learn a lot about the university, which brought the data to life.

How could you use these skills for future research?

I do textual research, so I look at lots of manuscripts. At this point, a model can’t really be trained for OCR because there are too few manuscripts and handwriting, regional differences, etc. make it difficult to analyze in that way. Medieval Latin is also not easily analyzed because the syntax is different and sometimes one manuscript has more authority than another. Sometimes it can also be wrong. The difficulty now is to interpret the Latin and there is difficulty in interpreting the data. As a classicist, the language should always be the main objective. DH can be a tool or asset, but it is not the main objective. I’ve been testing out different DH approaches, and, while I still like to work with it manually, there are always chances that my research will be become more digital.

What kind of tips would you give to a team doing their first hackathon? Any tips for someone from your background?

Try to apply your strengths – know your strengths and weaknesses. Try to look at the possibilities and what would be interesting to you on the subject of DH. I’m hoping to learn Python but also Old Norse and Arabic. It can be very stimulating to think about how to solve problems in a new way, and it might even be a bridge to build an interest in DH. And, don’t panic… there will be times when there is a lot of chaos but just look at what you have and where you want to go and just keep moving.

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年11月4日 00:27

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Ivania Donoso Guzmán, BiblioTech Hackathon participant. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Ivania’s group, called the Poststars, worked on the historical postcards dataset, featuring over 35,000 old Belgian postcards. You can learn more about the Poststars’ project by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

A dark-haired young woman speaks into a microphone while standing behind a podium.

Ivania presents her team’s project during the closing event of the BiblioTech Hackathon.

What first interested you in the hackathon? Have you done one before? What is your background?

I had done one before, but it was the more traditional type where you work all day and all night. I didn’t like it [laughs]. So when I saw this opportunity to work with data, and it was over the course of a few days, it really interested me. I wasn’t sure whether or not to participate since I didn’t know who I would work with and what their experience was, but I had a friend who wanted to participate so we signed up. In the end he couldn’t make it, but I thought it was a good experience! Even though I was the one with more experience in programming, it didn’t feel like it was a bad thing. I really liked that there were people from different backgrounds. For the dataset, I actually love to collect postcards so this dataset interested me. I had also worked with digital humanities before because I participated in a project which tried to explain a political process that was happening in my country. I had worked with historians, sociologists, journalists, and poltical scientists and that experience was very interesting to me. So I wasn’t afraid to work with people from another background; it was actually an exciting opportunity to me.

I’m an engineer. I got my degree in Chile and France. I’m officially an IT engineer, but I’m not very interested in the industrial and high-level side. I was always more interested in computer science, user interfaces, AI, user relationships, etc., so that’s what my master’s focused on. I’ve worked for two years in computer science, and now I’m doing my PhD. I’m working ith Prof. Katrien Verbert in user and AI relationships, specifically explainable AI. 

What was your primary concern when beginning the project?

My main concern was that there were limitations in the time frame. I think that as far as technical competence goes I didn’t know what to expect, but realized the other members’ skills were an asset. Additonally, there were lots of people supporting us throughout the process.

What kind of audience did you have in mind?

At first, we wanted to really understand the connection between the postcards and the places where the photos were taken. We wanted to look beyond the postcard itself. This would have been really interesting, but it didn’t work out. We ended up creating a search engine where you could write a sentence—for example, “children playing outside in the park”—and it would return postcards that matched. It was a really interesting approach for a search engine because usually search engines look at metadata, but with our project, the search engine analyzed the image itself. When we developed the project, we had in mind someone who would be interested in exploring the postcard set as a dataset for future research projects.

How did you establish your methodology and approach to the data set? Were you inspired by any other platforms or projects?

Our brainstorming process involved thinking about how we could contextualize the postcards: time period, location, etc. We also were interested in exploring the destinations the postcards were sent to. So we could collect and organize information about where the postcard was from and where it went. We also thought about the usage of the postcard: postcards used to be a way to communicate with people. It was cheaper than a telegram and would arrive before you returned home. But exploring these ideas wasn’t possible because the metadata didn’t contain that information, and it was really hard to find this information. We decided to work on the search engine. For this, we were inspired by current AI models and their usages. CLIP, a model created by open AI had good examples on how it worked, and it matched some of our ideas, so we decided to use it.

How did you create the search engine for the website?

Our team leader, Prof. Tim Van de Cruys, suggested we use the CLIP model. It brings images and texts into the same plane. So we first created a mathematical representation for the dataset, and then it would convert the search entry to a mathematical representation. From there, it would produce results. There were some cases when it didn’t match. For some sentences there weren’t any matches. But I think that for this dataset, which is very image-based, it was quite useful.

How did you use your professional/academic experiences to help with the hackathon?

I used some approaches I had learned from my experience as a software engineer and data scientist. We used Github to share and collaborate. Specifically the project tracking tools helped us stay up-to-date with what everyone was doing. I had experience putting systems in production (putting them online), so I was able to join what everyone had done into one system and make it available.

Did you apply any of the topics you are studying for your PhD to the project?

Not really [laughs]. But now I proposed three masters topics that would work with these datasets. I see so much potential for enriching the data. One idea is a date analyzer for the postcards. For this, we could try to use an AI model to predict the time frame of the postcard. In some cases, we have dates, but they’re not complete. Sometimes there is an x in the date, or it’s incomplete. So, we want to train a model that can predict the dates. The model could also predict aspects of the postcard, including color, image, shape, quality, and then the researcher can confirm manually if it is right. The other project ideas were more about how to transcribe what was written on the postcards. Understanding what is written on the postcards can be hard since they are handwritten and messy. Creating an interface to help researchers to add more information to the postcards would allow them to be analyzed better by researchers. [These projects] would relate to my PhD because they provide explanations of predictions made by an AI model. People have to make decisions, but the AI decisions have consequences for research.

Was there any main motivator/goal that encouraged the team when things didn’t go as expected?

Everyone had this level of compromise where we wanted to do something cool. Nobody was like “we have to work every day all day.” But there was also this idea that we will do what we can. It was very propositional, and we were able to adjust and move on quickly. 

What were some of the roadblocks you faced?

There were some things we wanted to focus on, like the text on the postcards, that we couldn’t. There was also no information in the metadata about the stamps. So the metadata—or the lack of it—presented a problem. 

What kind of tips would you give to a team doing their first hackathon?

Enjoy it! It’s an opportunity to learn, so don’t stress about it. Make sure to think about how to fix things if it doesn’t work; don’t feel defeated. Move forward. In a hackathon, if it doesn’t work, then you need to keep moving on so that you have a project. It’s better to try and to not give up immediately. Try two or three times, and if it doesn’t work, then move forward. The proposal of the hackathon was just to do something new since nobody has been working very closely with all of these datasets. So anything you do is contributing – anything you do is better than nothing.

Do you have any final comments?

I really enjoyed the hackathon. Some of my data visualization students were in my group, which was fun. I loved the technical support. We didn’t have to worry as much about the data because it was well documented. We didn’t have to worry about getting access to it or how to handle such big datasets because that infrastructure was already place. That made it very fast to experiment. I thought it was great!

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年10月27日 00:48

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Daria Kondakova, BiblioTech Hackathon group leader. Daria is the Research Manager at LECTIO, the KU Leuven Institute for the Study of the Transmission of Texts, Ideas and Images in Antiquity, the Middle Ages, and the Renaissance. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Daria’s group, called StudentsBlock, worked on the Magister Dixit dataset, which features a collection of handwritten lecture notes from the Old University of Leuven (1425-1797). You can learn more about the team’s work by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

A group of five people stand facing the camera and smiling.

Some members of the StudentsBlock team pose together during the reception of the closing event.

What first interested you in the hackathon? Have you done one before? What is your background?

I’m a Classicist, and I’m currently working with Digital Humanities. I’ve never done a hackathon before but was always interested in how they work, so this seemed like an interesting opportunity. The Magister Dixit dataset was specifically why I chose to do it because there is a project on it at LECTIO. It was a nice way to get to know the dataset better, especially since Magister Dixit has a long project history, and I’ve only been with LECTIO for one year. 

I had an interest in digital humanities before; I did programming whenever I could, then found DH in 2014/2015. I wouldn’t call myself a digital humanist, though, since I only use DH methods and my background is very traditional. My background is in Classics. My PhD is in Classics, and I mostly do things in Greek and Latin, specifically -300 BCE to 200 CE. I don’t really work on Early Modern stuff. So, in a way, the only link between the Magister Dixit collection and my research is the fact that it’s in Latin. But for the project this helped because I also knew about the history, layout, color fonts, and how to pull the data. There were things I would have liked to explore more. So this project was more connected to my technical interests. I’m very open minded when it comes to DH and don’t worry about the relevance of something and whether or not I will work on it again – it’s just fun.

Can you expand on a topic you wish you could have explored more? Do you mean the doodles in the margins of the notes?

Yeah, there weren’t many doodles and the student notes weren’t like what you expect now. It was more of a syllabus rather than actual notes. It wasn’t spontaneous production in the same way that your class notes are today. It was more formal. There were illustrations, but we also didn’t really know what to do with those. The data extraction approach presented us with a less shiny option, but it was more practical and usable. The design of the team’s project website was great, but the data was just a simple table.

What was your primary concern when beginning the project? Interface? Usability?

I’m accustomed to working with texts, so when beginning to work with Magister Dixit I wasn’t expecting it to be so metadata-heavy and was thinking we would work directly with the student notes. Once I realized we couldn’t approach the dataset in the way I was expecting, I was a bit concerned about how we could adapt our approach. I wasn’t that much concerned about the technical aspect as it seemed that after the first meeting we would be able to work together. My main concern was just finding out what to do with the metadata.

Before beginning the project, were there any other platforms or projects which you were planning to use as inspiration?

Not really. We decided we would do the metadata analysis and we had the help of Jarrik Van Der Biest, an expert on the dataset, who is working on the student notes for his PhD. A few other people also knew about student life and the content of the course, which helped us frame our question. So we started with reconstructing the lecture cycles. Ultimately, we did this by creating a Toledo-like website, mirroring the e-learning environment [Toledo] that current KU Leuven students use.

How did you apply your knowledge from working at Lectio to the hackathon? Were there any tools or methodologies that were translated to this dataset?

My job is to support and implement DH in the projects. As such, I have to have a good overview of what is out there and bring people and tools together. This is something that worked well, but I also learned a lot from the team members, too. Even as a team leader, I learned a lot; I didn’t want this to be a classroom exercise. We had someone who worked with Open Refine, which I had not used before, and it worked really well since Python can be hard to understand sometimes. My role was less about suggesting or incorporating tools and more about bringing people together. There was an intrinsic connection with what I do at LECTIO just by listening to the team members and seeing how all the tools worked together. In the last few days, I also joined in with the data analysis and with the help of two other team members, I was able to brush the dust off my coding skills. The main connection to LECTIO, I think, was the subject matter and just seeing, on a smaller scale, the challenges and advantages of having a variety of people with different backgrounds working on a single project.

Do you think participating in a hackathon as a student would have benefitted you in your career?

Definitely. I would have loved to be able to do that. I did a group project when I was doing a DH class during my master’s. There wasn’t enough commitment and things just didn’t work; we didn’t learn from each other. But with a hackathon, if you stick around, you learn a lot from your team and other teams. The final presentation, for example, I took a lot away from. There we had the opportunity to figure out what else we could have done and what would have worked. As a student, it’s great to have a result. There’s an output. You’re not doing it just for the sake of doing it, and you have the support of others as you’re working on it. So it would have been great to be able to do this as a student.

What kind of practical knowledge do you think can be gained from a hackathon like this? 

Open Refine was a great tool that I learned about. I also learned a lot about MARC 21 XML tagging scheme from the Library of Congress since we had to go through the tags. I also got a better understanding of the material of the Magister Dixit collection without even going through the actual content. I knew metadata is a treasure trove of information but I only knew that in theory. So then seeing it up front was more practical. Project management was also a skill I worked on. Other projects I’ve done were more extended and involved a lot of reflection and checking things. Having 10 days was a challenge, especially when combining with my job at LECTIO, as I had to check in and stay continuously involved. 

Was there any main motivator/goal that encouraged the team when things didn’t go as expected?

I don’t know. There was a moment in the middle when it felt like everything was falling apart. Meeting online was very hard, and it was difficult to get everyone together anyway. I used Gathertown to have virtual drop-in rooms, which I think helped. We did feel like a team and wanted to work together, and during the rush we were all working on the same thing. Checking with Jarrik towards the end was very helpful as he was very encouraging and remarked on the project’s practicality and usefulness for researchers. Getting that feedback motivated us to improve the platform, and it felt good to know that we have a practical product. We were also kind of competitive and wanted to win [laughs].

What kind of tips would you give to a team doing their first hackathon?

Be open and ask questions. That was something that worked well for us. Sometimes people ask questions a bit too late. Reaching out to the expert pool earlier also would have been great. Because everyone has questions, and we need to address them to move forward. Be proactive. It is better to have five ideas and only one works than to not have any at all. Try to find something you would like to do. Even just observing something you don’t feel equipped to do, follow along and see what you can learn from your teammates. Use your strengths. Have fun! You shouldn’t approach a hackathon like an exam (but you do still have to be committed).

For team leaders, I learned a lot about how to organize groups. I would remind myself to notice when things don’t work. I had an idea in my head and then when it stopped working, I got stuck and had to become flexible quickly. And I thought I was flexible, but I realized I really wasn’t [laughs]. 

Any further comments?

I really enjoyed it. I didn’t realize it could be so fun! I really liked the sense of relevance for the library and seeing the datasets be used. I really appreciate the team of organizers and hope we do it again. I would love to be a team leader again, or maybe a team member. I would love to see the Magister Dixit collection used again, and we’re hoping to organize a way to discuss the other approaches and to further develop the database our team built. I think a lot of the projects have a lot of potential.

Preprints: Where are we now?

2024年5月14日 19:42

The term “preprint” is actually used for two related, but still slightly different, things. The term can refer to an author’s original manuscript (of an article, a book chapter, or a complete book) as it is submitted for publication (hence also known as “the submitted version” of the text). This submitted version typically remains private, whereas later versions of the text (revised after peer review and/or copy-edited by the publisher) are made available, either behind a paywall or in Open Access. However, the term preprint can also refer to the first public version of a text, which is being disseminated before formal peer review took place and which afterwards might or might not be developed into a more traditional publication. This second meaning of preprint is thus basically identical to what is known as “working papers” in disciplines like economics, law, and political sciences. To put it succinctly: the first meaning of the term preprint refers to a manuscript of an article, a chapter, or a book before publication; the second meaning – typically only used for articles – is considered to be the first public version of a text and therefore oftentimes treated as a publication in its own right. Both meanings of the term have in common that they refer to a text which was not submitted to formal peer review (yet).

Lately, the second meaning of the term preprint has become more dominant, not in the least because the habit of disseminating articles before they have been peer-reviewed is becoming more widespread.

Preprints can be distributed through designated preprint servers, i.e. online repositories where researchers share articles before they have undergone formal peer review. Preprint servers are often connected with a specific discipline, such as medRxiv (health sciences) or bioRxiv (biology), or region, such as AfricArXiv, and typically guarantee some basic form of quality control such as a plagiarism check before the text is accepted for publication as a preprint. However, preprints can also be shared using general repositories which are discipline-agnostic (like Zenodo) and/or platforms which accept all kinds of research outputs (such as the CORE repository of Humanities Commons), and which do not perform such basic quality checks. Preprints typically get a permanent identifier (such as a DOI) and are indexed by services such as Google Scholar, Open Science Framework (OSF) Preprints, or Web of Science’s Preprint Citation Index.

As said, the practice of disseminating preprints is on the rise. In some disciplines, such as astronomy and mathematics, up to 35% of articles start out as preprints, which are seen as an important instrument for Open Scholarship (as preprints can always be shared openly), as a way of speeding up research (since dissemination of research results is no longer slowed down by pre-publication peer review) and as a way to establish priority of discoveries. Preprints also make other innovations in scholarly communication possible (such as open peer review or the publish-review-curate approach – topics which deserve a blog post of their own) and put into question the exorbitant prices of journal subscriptions or article processing charges. Recent research by Brierley et al. and Davidson et al. in the context of the COVID-19 pandemic even brought to light that differences between preprint and final versions of articles published in biomedical journals are limited, which gives cause to reconsider the time and money spent to develop a preprint into a journal article.

Recommended reading:

J. Bosman et al. (2022), New Developments in Preprinting and Preprint Review, DOI: 10.5281/zenodo.7040997

K. Hettne et al. (2021), A Practical Guide to Preprints: Accelerating Scholarly Communication, DOI: 10.5281/zenodo.5600535

L. Mesotten – J. Berckmans (2022), To preprint or not to preprint? KU Leuven researchers share their thoughts on the (dis)advantages of preprint publishing, https://www.kuleuven.be/open-science/what-is-open-science/scholarly-publishing-and-open-access/schol-pub/interview-preprints

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年9月18日 22:13

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Annelore Knoors, BiblioTech Hackathon participant and student in the Advanced Master in Digital Humanities. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Annelore’s group, the ChaoTech Warriors, worked on the wartime posters dataset featuring proclamations issued by the German General Government in Belgium during World War I. You can learn more about the ChaoTech Warrior’s project by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

three people stand facing the camera in a boxing stance. they are smiling.

Members of the Chaotic Warriors team pose for a playful team photo.

What first interested you in the hackathon? Have you done one before? What is your background?

I hadn’t done one before, but since the DH Master only lasts one year I wanted to take the opportunity to learn more, especially since there are so many different types of analysis and approaches. Also, the interdisciplinarity of the hackathon was an interesting opportunity. I think my team had a historian, informatics, PhD student, and two linguistics students. Then I have a linguistics and literature background and now do DH.

Why did you choose the DH Master?

I was curious to know about the recent developments within my field, and other fields since DH allows you to move beyond your initial field. This happened for me as I did the linguistics program. There are some classes that are for all of the Faculty of Arts which have to do with data and those really got my attention.

What were your expectations of the hackathon before beginning? Did you feel like you would be well-equipped for the project?

I didn’t think I would have enough skills to contribute to the team. Especially since the hackathon started in March, so I only had one semester of classes. But through the Introduction to DH course, I got a great basic knowledge. With technical things and analysis you always need to look at things precisely, so I knew what I had to look for. We did have to do a lot since we lost a few members, but the remaining members were able to do a lot, even without much previous knowledge. 

Any concerns?

Not really. I hadn’t considered that so many people would drop out, but we managed well enough, and I think we had a great result even with the limited team members.

What was the group brainstorming process like for this project?

First, we noticed that people in the team had different approaches to the theme in relation to their profiles. So we really used our imaginations and were able to think non-traditonally. In the end we found a combination of ideas.

How did you establish your methodology and approach to the dataset? Were you inspired by any other platforms or projects?

I did the topic modeling, which I had already done for the Introduction to DH class when we did different titles of journals. I did topic models on the titles and institutions’ names. So that was nice because I could go in and only had to change a few things. We also made a timeline, which I had experience doing for an art history course, so I knew the platform was accessible and good for less technical users to use. 

What was your role in the project and how was it different/in line with what you were expecting?

On a conceptual level, I thought I would be a beginner DH’er and people would have to teach me things. But I was more of the expert DH person. I could really teach the other team members like the informatics students with no background in humanities, and the others who didn’t know a lot about the digital side. So it was nice to be able to share what I had been learning for the past six or seven months. They told me they really liked learning that way.

What was the most challenging part of the hackathon?

I don’t think there was anything that was that big of a hurdle, honestly. I think the most difficult thing was the OCR that we wanted to correct; the layout wasn’t right. The timing of the presentation also meant that we had to tone down our ideas. Maybe if we had a bigger group we could have tackled a bigger project, but we were limited.

Was there any main motivator/goal that encouraged the team when things didn’t go as expected?

We basically got a lot of confidence in knowing that we can just go for it. 

What kind of advice would you give to a team doing their first hackathon?

I think in the starting phase, don’t be afraid to sign up. Even if you don’t have any experience in the field, it’s a great opportunity to learn about data and meet people from other programs. You also have a team leader and the experts to help. It’s a great time to learn, and you get a lot of help.

Also, if you’re considering doing the DH Master’s, it’s a great opportunity to get to know the people in the DH community. The community is really strong. You get to know a lot of people, talk to them at the reception, etc. It’s great to get to know the people and the subject matter. 

What kind of advice would you give to someone from the DH Master’s?

I would tell them that they will learn a lot from the expereince, and that they shouldn’t underestimate themselves. 


Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年11月14日 21:10

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Tom Gheldof, BiblioTech Hackathon group leader. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Tom’s group, called the ChaoTech Warriors, worked on the wartime posters dataset, featuring proclamations issued by the German General Government in Belgium during World War I. You can learn more about the ChaoTech Warrior’s project by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

the image is a two-picture collage. The left image shows three people, a woman, a man, and another woman. They are standing in a boxing stance with their hands in fists by their shoulders. They are smiling.

The ChaoTech Warriors team show their warrior stances during the closing event of the hackathon. On the right, Tom animatedly delivers the presentation of his team’s project at the closing event.

What first interested you in the hackathon? Have you done one before? What is your background?

I am an ancient historian by training and have a degree in journalism and cultural studies, after which I followed a training in what is now called digital humanities. When I started working as a researcher, I realized I could use my programming and digital skills more than my scientific skills. So when a position opened up with DARIAH (a European consortium for Digital Research Infrastructure for the Arts and Humanities), they were looking for someone with a balanced profile. I’ve always liked to balance digital tools and methods, even 10 years ago, so it’s a personal and professional interest. Now I am the day-to-day coordinator for CLARIAH-VL – the collaboration of DARIAH and CLARIN in Flanders. I’m mainly involved in helping researchers working in digital humanities where possible by using my main interests: historical databases and linked open data.

I can’t remember how I first got included in the hackathon [laughs]. The organizers had already planned that I would be participating, but through more casual meetings, the question came up whether I would be interested to take part as an expert or as a team leader. I had already participated in a hackathon at a conference before, but it was a lot more condensed. I enjoyed that hackathon in a way that I hadn’t expected. Mainly there were a lot of researchers, and it was the first time I worked in a small group with [people from] such different backgrounds. We worked with our own materials, which made it more ambitious, yet a lot more straightforward. With two days, you get right into it and have to keep to a timeline, making it a great experience.

What was your primary concern when beginning the project? Interface? Usability? What kind of audience did you have in mind?

My primary concern was whether everyone would show up [laughs]. I also had to prepare myself beforehand, because even though I am used to being a coordinator, I didn’t know what the background or level of enthusiasm of the group would be like. Also, the wartime posters corpus was beyond my familiarity and comfort zone as a researcher, so I wanted to familiarize myself. I began by exploring the corpus on the digital platform and had a look at the metadata.  I looked at it as if I were a contestant to see what it would be like, then I decided to let the team give their ideas first.

How did you establish your methodology and approach to the data set? Were you inspired by any other platforms or projects?

The first thing I noticed was the quality of the digitization was not 100%. While most of the automatic translations were okay to read, we encountered problems with the metadata. I reflected on my past experiences with some other digitization projects, services, and tools that might have the potential to improve the translations. I asked my group members which ones they were familiar with and how they would approach such a challenging corpus. In their answers, it became clear that the members brought valuable backgrounds to the table, particularly from fields like linguistics, Natural Language Processing (NLP), and philosophy. The first gathering during the “Meet the Data, Meet the People” event was very exploratory. During that meeting, we just brainstormed about potential tools and methods.

How did the workflow and work distribution change once some group members could no longer participate?

I should disclaim that our group was struck with a bit of misfortune. Two of the eight group members cancelled their participation, and they were the ones who knew the most about the ManGO platform and the High Performance Computing infrastructure (HPC). Additionally, one of the more technically skilled members—who followed the explanation of the platform—also cancelled further on. So out of the 8, only 4 participated over the full 10 days. But even with a small group, we were motivated. I took encouragement from their motivation and expertise. I tried to encourage them to go beyond their comfort zones to explore the other tools that they might not have considered at first. This was very positive, but also a lot more work for each of us. In the end, we overcame the challenges, but we did have to downsize our ambitions because of the workflow. All in all, I think the members were pleased with the outcomes and with being able to build up their skills by using tools they hadn’t used before.

In the poster it says there were advantages and disadvantages to having a small corpus. What were those?

Our corpus was composed of 171 posters, but we were able to familiarize ourselves with it very quickly. We noted that it was mostly multilingual, and asked ourselves if we could do anything with that. And regarding the metadata: can we scrape it for a multilingual use? One of the disadvantages was that we couldn’t use some of the more ambitious tools. So, for example, exploring the use of the HPC infrastructure wouldn’t be of much help because of the feasibility, and our laptops weren’t able to do those computing tasks.

Was there any main motivator/goal that encouraged the team when things didn’t go as expected?

I must give credit to my team members for pursuing their personal interests. A lot of the members entered the hackathon not knowing if they wanted to do a research position or do the Advanced Masters of Digital Humanities. For them, it was great to experience if they would want to pursue this further. Speaking for my group as whole, the more you delve into the research corpus, the more you can see what can come out of it. Despite not having a lot of experience in the subject area, I was impressed by the language, content, formation, and layout of the posters as we explored a wide variety of research questions.

What kind of tips would you give to a team leader participating in their first hackathon?

I would again give the group as much responsibility as possible. You don’t need to worry, as a team leader, if they have the skills or digital competence, they will delve into it out of their own interest. By giving them responsibility, they might start exploring tools which they would have easily given to someone else. Don’t underestimate the creativity of the team members. At our brainstorming session, we ended with a variety of approaches after talking for only one hour. Out of their creativity came the most interesting research questions. I also benefited from how the main organizers said to not be afraid to reach out to the experts. We waited too long to do that. We had a great pool of experts from the library, Faculty of Arts, ICT department, PhD researchers, etc., and I was surprised to learn how wide their support reached. I think we might have benefitted more if we had reached out to them at the beginning. I will admit, I did want the team members to have full responsibility first so that they wouldn’t immediately reach out to the experts, but it’s a double-edged sword. Still, an expert could have made up for a missing team member. And according to feedback from my team, they really enjoyed the collective motivation to continue and the learning of new tools and methods that they applied.

Final Remarks

Thank you to the organizers for the flawless organization. It was a great balance of personal and online meetings, and, as a DH event, it worked really well. I really enjoyed the final event and was pleased with the quality of the posters, including our own. So for myself, it was a very pleasant experience, and I would not hesitate to be a candidate again as a team leader or even as a team member, just to see how fun that can be.

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年10月27日 22:51

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Sandra Elpers, BiblioTech Hackathon participant and Bachelor student in Theology. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Sandra‘s group, the Illuminators, worked on the Bible of Anjou dataset. The Bible of Anjou is one of the most important and valuable pieces in the collection of KU Leuven’s Maurits Sabbe Library, the library of the Faculty of Theology and Religious Studies. The Bible of Anjou dates to 1340 and is a unique and beautiful illuminated manuscript created at the Royal Court of Naples. To learn more about the manuscript and The Illuminators’ hackathon project, have a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

 

Two photos are placed side by size. In the left photo, a woman in a white sweater smiles while standing in front of a microphone. In the right photo, a group of people stand together smiling towards the camera.

Sandra presents the Illuminators’ final project during the closing event of the hackathon. On the right, the Illuminators pose for a team photo.

 

What first interested you in the hackathon? Have you done one before? Can you tell us more about your background?

This was my first hackathon and what interested me were the datasets. I also liked the fact that participating in the hackathon was available to all students. Since I’m only a bachelor’s student, it was a nice way to get launched into the program; originally I started studying French and Latin, but then I switched over to theology. For the hackathon, I was interested in an academic sense, but I also thought it was a good opportunity to learn new skills.

What was your primary concern when beginning the project?

Maybe my lack of technical skills. I found myself in a group with people with a lot of skills already and didn’t really know where were we going with it. But this worry was resolved quickly by just trusting the others in my team. We lifted each other up.

What most excited you about working with this dataset?

The beauty of the object. As an illuminated manuscript, it is aesthetically very beautiful.

Were you familiar with the Bible of Anjou before the hackathon?

I was not familiar with the Bible of Anjou itself. I was familiar with the Bible and Bible translations, but for this specific piece, I had only heard of the name and knew it is a big deal in the Maurits Sabbe Library.

What was the brainstorming process like for this project?

We had a lot of ideas come up originally; stuff based on the images and the text, but the text and the potential of OCR was more uncertain. With the images, though, we knew that with the animals we could make a sort of “medieval zoo.” Looking at the illustrations of musical instruments and categorizing the illustrations were some of the other immediate ideas.

What was your primary audience for this project?

I don’t think we really had an audience in mind when developing the project… it was more general. We just wanted to make the images work.

Were you inspired by any other platforms or projects?

We weren’t really inspired by anything specifically. One of the main aspects of our project involved creating GIFs of the illustrations in the manuscript. For this, we just came up with the idea and then researched how to do it. Researching the options told us it would be possible, and we just wanted to do it.

How were you able to apply knowledge from your studies to the project?

I was mostly learning as I went. But for the poster presentation, we did look into the historical context for the Bible of Anjou, and I saw some stuff that I already knew from my studies. So there was a bit of overlap. One interesting takeaway: I went to the library and had a great talk about the book and its content and the different locations it had been stored in. This helped us to be able to also contextualize its history.

Was your role in the project different or in line with what you were expecting?

My expectations were different. I tried to not have too many since it was my first hackathon. The experience turned out to be mostly in line with what I thought it could be. Since my technical skills were limited, I figured I’d have to learn some new things, and since I’m strong with presentations, it wasn’t too surprising that I ended up presenting our project during the closing event.

What kind of advice would you give to a team doing their first hackathon?

Try to be flexible and available. It was nice having 10 days and not having to meet every day, but it was necessary to sometimes see everyone. The Microsoft Teams environment was also very important to attend the meetings online. The organization within the chat on Teams was good, so there was a good stream of communication.

What kind of advice would you give to someone in your field, specifically?

I would tell them that they definitely have something to offer to the team. If not the technical skills, then it’s at least a way to provide more perspective. We all had different ideas and brought new ideas and perspectives.

 

 

 

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年10月20日 20:54

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Heike Pauli, BiblioTech Hackathon participant and MA student in linguistics. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Heike’s group, the Digital Peripatetics, worked on the Lovaniensia dataset. Lovaniensia comprises the old academic collection, featuring the work published by members of the Old University of Leuven and output linked to the university spanning the period between 1425 and 1797. You can learn more about the team’s project by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

Two images side by side. The left image depicts a woman in a pink shirt speaking into a microphone and gesturing to her left. The right image depicts three people, a man and two women, facing the camera and smiling.

Heike delivers her team’s project presentation at the closing event of the hackathon. On the right, three members of the Digital Peripatetics team pose for a team picture.

What first interested you in the hackathon? Have you done one before? What is your background?

I had not done one before. My initial interest was because it was an opportunity to combine my passion for Latin, Greek, and AI. You don’t see that overlap often. So before the hackathon, I hadn’t seen a place where I could combine my overlapping interests in AI and the classics. This was a unique opportunity for that. Then a few of my professors mentioned it, and I continued hearing about it–probably like three times–and took it as a sign [laughs].

What was your primary concern when beginning the project?

I would say, for me, the OCR. But that was a surprising concern. We had to skip it entirely due to the quality and the number of errors. Apart from that we didn’t have a lot of troubles. And once we established that the OCR would be a challenge, we all just worked together. It was a missed opportunity because there was a lot we could have done and a lot available, but since we decided to look at the metadata and not the OCR it was a bit restricting. I would have liked to look at the text more. There is still a lot of room to explore this data.

What was your primary audience for this project?

Our target audience was mostly people interested in the Old University. Our group leader was a prime example of the type of person we were catering to. We tried to look at the university itself: what was being created, who was there. So, the target audience was basically the people who were in the group [laughs]. It was almost like we were creating a resource for ourselves.

How did you establish your methodology and approach to the dataset? Were you inspired by any other platforms or projects?

Indeed, I think some people were inspired by past projects. We just had to start with the metadata. We did the usual stuff like making a word cloud and other things you would typically do for Natural Language Processing (NLP); it was pretty exploratory. During the introduction at the “Meet the Data, Meet the People” event, the explanation of metadata was also really helpful. And it just started with the basic approach, and then, from there, we knew where to go. We used the tools we already had. I was at first disappointed that we couldn’t make a crazy new tool or code, but it’s challenging to do that in 10 days.

What was the brainstorming process like for this project?

We had a lot of ideas at first. At “Meet the Data, Meet the People” everyone was writing all over the brainstorming paper. Then we had to tone it down to make it coherent and achievable. We picked three or four tasks, which was only the tip of the iceberg. We had to think about what was manageable, so the different analyses were the best approach.

What was your role in the project and how was it different/in line with what you were expecting?

I was expecting that I’d need to program a lot, but I’m not an expert or anything. So it wasn’t necessary; we found that a lot of tools exist which could help with that. I focused more on the language. It was nice because we all had a place where we could best apply our skills. Some people were really great at social network analysis and worked on that, and my job was to work on, or maybe get mad at, the OCR [laughs]. Sometimes it was really accurate and helpful and then other times there were errors. At the end, I found my place in the language analysis side. So in looking at my background, the Natural Language Processing was an obvious approach. And I also loved presenting the project at the closing event. We divided the tasks according to what people wanted to do. At first it was hard to organize it that way, but it helped make it coherent and a singular project.

How did you use your academic experience to help with the hackathon? 

The easiest part was the Latin. We were the most academically prepared for this. For me, it also helped to see that I could do some useful stuff with my interest in computational linguistics; I already knew a bit of Python, so it was nice to apply my DH skills and see that they are useful.

Was there any main motivator/goal that encouraged the team when things didn’t go as expected?

It helped that we all had a background and were interested in the dataset ourselves, so that was our main encouragement. It was within our field and we all had a common love for the material.

What kind of advice would you give to a team doing their first hackathon?

Just do it! Don’t overthink it. Even though there was a prize, the main interest was to get knowledge and to explore the data. We didn’t worry too much, honestly. Sometimes there can be frustrations or things won’t work, but those aren’t necessarily failures and you can explore new approaches.

What kind of advice would you give to someone in your field, specifically?

You can bring a new perspective even if you don’t know about technology. You can help those with a technological background if you’re open to it. There’s a stereotype for people in the Classics that we don’t know about computers and only care about books [laughs], but you don’t have to become a professional programmer during or after the hackathon. The knowledge you gain could be helpful for research, your career, anything. It’s not just for people who know how to program. It’s an option for everyone. Blur the lines.

Final comments:

I found it very useful to have a chance to explore all the fields I’m interested in. I would consider participating again. The 10 days was also a good timeframe in the grand scheme of things.

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年9月7日 18:24

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and André Davids, BiblioTech Hackathon participant and employee at KU Leuven Libraries Economics and Business. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” André’s group, the Demographic Dynamo, worked on the historical census dataset. You can learn more about the team’s project by having a look at their project poster in the BiblioTech Zenodo community. To read more about the hackathon and the results, you can visit the BiblioTech website.

three men stand facing one another; two receive a paper certificate from the third

André Davids and Stijn Carpentier, two members of the Demographic Dynamo team, receiving their prize from Demmy Verbeke, jury member.

What first interested you in the hackathon? Have you done one before? What is your background?

It was my first hackathon. I saw an advertisement about it and thought it looked interesting, but I didn’t subscribe because I didn’t have a technical background. I was convinced by Nele Gabriëls who enthusiastically encouraged me to join. As I provided one of the datasets, industrial countings from the 19th century in Belgium, I was already familiar with some of the data. At the time this dataset was created, Belgium had a top class statistician, Adolphe Quetelet, and the Belgian statistics were the first that were open to the public and for research. The Belgians were the first to make this type of data available for research. It became a model for many other countries. This is also why it’s interesting to other countries to see the Belgian countings. So I’ve been converting these over to Excel files and universities often contact me regarding this dataset. I wanted to see the potential up close of using a dataset like this for researchers from Belgium and abroad.

I started working at KU Leuven Libraries in January 2000. It was mostly by coincidence. I graduated with a degree in Information and Library Science and then came to Leuven to fully learn Dutch. By coincidence, I found a job at KU Leuven; it was actually my first job interview. My first task was to be at the information desk, and then I switched to cataloging and then acquisition. Now I do digitization. So, I’ve been here for 23 years, but my job is always changing; there was never a day when I didn’t want to come to work.

What was your primary concern when beginning the project?

My primary concern before seeing who was in my group was that our group wouldn’t have enough technical skills to do something nice. My group was made up of mostly researchers, and I saw very quickly that I was a bit ad hoc and that they knew a lot. We started on our project quite late. At the beginning, we weren’t sure what to do with the data and had to make a selection. I was already very familiar with the data, so I helped them select the tables. 5 days after we received our datasets, we had a meeting and decided what to do with it. At the beginning, I was a bit stressed because time was passing and we didn’t have a concrete goal. After that meeting though, we found direction. So even though we were starting late, the technical members were very efficient.

How did you establish your methodology and approach to the data set? Were you inspired by any other platforms or projects?

Nope. The teammate who came up with the idea we went with had a dynamic map of Belgium, so she said we could adapt that. Then we looked at the data to see what could be used for the map. We saw that we had tables that showed internal Belgian migration. By putting this on a map, you could watch how people moved from one place, such as Brussels, to another. It was actually quite impressive to me.

What was the brainstorming process like for this project?

At the “Meet the Data, Meet the People” event, we actually spent most of the time coming up with a name [laughs]. We didn’t have a lot of ideas at the first meeting. And it took a few days to find that direction. But since I knew the data, I was able to share my knowledge with the team.

Were there any obstacles the team had to overcome?

Well, some of the members were absent because they were attending conferences. The people who were the strongest technically were not able to be at the closing event where we presented the project. I could speak mostly to the data, but luckily Stijn volunteered to present since he already does that pretty often. I told him he should be a salesman [laughs]. We were worried that there would be more technical questions. Neither of us could really speak to the code, but luckily the questions didn’t go there.

What was your role in the project and how was it different/in line with what you were expecting?

I don’t know what I expected. I had no idea what the outcome would be. I was pleasantly surprised that we had a dynamic product. But given the time frame, I wasn’t expecting to do something like that. Still, it was nice to see how quickly people with a technical background can do something.

What advice would you give to someone who is hesitant to participate in a hackathon due to their background?

I was initially afraid since I don’t have a lot of technical skills. But now I would say that everyone has something that they can bring. In my case, I knew the data and was willing to put a lot of time into the project. But if someone mentioned to me that they didn’t want to participate because of their technical skills, I would just say “[it] doesn’t matter. Other people will have the technical skills.” Just the fact that someone is willing to participate means that they will have ideas to bring and can contribute in their own way.

Did you apply any skills from your career to the project? Did you take away any knowledge you can now use for your job?

I didn’t really get new technical skills since I didn’t really have any preliminary technical knowledge. When we started working on the industrial counting project, though, we were focused on the numbers, not the categories. Eventually, as a researcher, I decided to start working on the categories. So I started using color codes for the tables and when doing this project, my teammate said that this actually helped him a lot with the code. So now, in the context of my work, I make sure to use the same colors since it will help the people working on the data in the future.

Also, last week, two researchers came to me and were asking for new volumes. I showed them our dynamic map because even though they need the volumes, showing our map shows them the possibilities of the industrial data.

What are some benefits of a hackathon for someone with a career in the GLAM sector?

It was a nice experience. Because we won the hackathon it was nice to have that, as well. It was nice to see that the data we had been working on could produce a winning project. But even if we hadn’t won, I would have come away with a good feeling. Even though I’m quite competitive [laughs], in this case, I was just happy to have something to show. The census data is usable for a ton of different fields and faculties and there was already interest in the data. So there’s still a lot of potential in the development of that. And it’s nice to see that people are interested in the data and what we generated even before this project.

What kind of tips, as the winning team, would you give to a team doing their first hackathon?

I feel that, in this case, winning was not our main objective. If we had more time, we could have done a lot. Just have an enjoyable experience and don’t think too much about winning. I’m not even sure if we would have had a better project if we had wanted to win in the first place. I don’t think we would have because of the pressure. But even if I could restart and pick a team, I would still work with this team and this dataset.

Interview Series: In Conversation with BiblioTech Hackathon Participants

2023年8月11日 19:23

The following is an interview between Alisa Grishin, Artes Research intern 2022-2023, and Fien Messens, BiblioTech Hackathon participant. The hackathon took place in March 2023. It was a 10-day event and included a pre-hackathon orientation moment called “Meet the Data, Meet the People.” Fien’s group, God Save the Tweets!, worked on the contemporary news media dataset featuring tweets including the hashtags #queueforthequeen, #abolishthemonarchy, and #queenelizabeth during a short span of time around the death of Queen Elizabeth II in September 2022. You can learn more about the team’s work by having a look at their project poster in the BiblioTech Zenodo community. You can also find data related to the technical aspects of the project in the God Save the Tweets! GitHub repository. To read more about the hackathon, view the full photo album, and discover the teams’ results, you can visit the BiblioTech website.

 

A group of people stand in front of the KU Leuven university library building. They are waving to the camera.
The “God Save the Tweets!” team at the closing event of the 2023 BiblioTech Hackathon.

 

What first interested you in the hackathon? Have you done one before? What is your background?

When I first learned about the hackathon, I was immediately drawn to the opportunity of  tackling a challenging problem within a limited time frame and collaborating with a team of individuals with specific expertise from a variety of backgrounds. I have a background in art history and digital humanities, especially with born digital collections in the GLAM scene. I can archive and preserve a tweet in a stable way, but have little knowledge on how to analyze the output. So this hackathon provided me with a learning platform where I could go about learning about this through trial and error.

What was your primary concern when beginning the project? Interface? Usability?

We analyzed the tweets from the first 10 days of the passing of Queen Elizabeth II. Our primary concern at the outset was the sheer volume of the data. We were confronted with a vast corpus comprising multiple languages, but we swiftly narrowed it down to English only. Despite this refinement, we still faced the challenge of dealing with a substantial dataset of 300,000 posts. Our main objective became comprehending and effectively managing this extensive stream of tweets by employing suitable methodologies.

What kind of audience did you have in mind?

That’s a tricky question.

The project posed a complex challenge as we used text analysis, sentiment analysis and data mining techniques. In addition, we also created a tweet generator, utilizing our existing corpus to craft newly generated tweets. The primary objective of our tweet generator was to cater to users who sought to share opinions and partake in engaging discussions on Twitter, fostering a sense of community and/or collaboration. Our target audience encompassed not only researchers but also a wider public audience, aiming to provide a platform for diverse individuals to express themselves and facilitate meaningful (or not so meaningful) interactions. In general, we were aiming for a broad public, not just researchers.

How did you establish your methodology and approach to the data set? Were you inspired by any other platforms or projects?

The methodology revolved around text analysis, also known as text mining. To ensure relevance, we filtered the English language tweets using specific keywords and then employed sentiment analysis techniques to analyze the data.  In addition to drawing inspiration from other sentiment analysis projects, we benefited from the expertise of one team member who had previously explored sentiment analysis methodologies during a digital humanities course.

When it came to the tweet generator, we found inspiration in various existing tweet bots. We especially liked the @TorfsBot tweet bot, from the old rector of KU Leuven, which had publicly available source code that proved immensely helpful. While we drew insights from existing bots, we primarily relied on our own experiences and creatively adapted concepts from other tweet bots to shape our unique implementation.

Why did you choose to do the most active users as opposed to a larger pool? Was it just to downsize the data size or was there another reason?

We focused on the most active users because it helped us get a grasp on the users and those who were actively participating in online discussions. We observed an occurrence of overfishing in specific hashtags, where certain users exploited the hashtags to promote unrelated content. So, by looking at the top 10 or so active users, it allowed us to better understand the overall sentiments surrounding the Queen’s passing. This approach aided us in filtering out the excessive use of certain hashtags and reducing their impact. It allowed us to focus on the overall sentiment analysis without being influenced by the promotion of unrelated tweets or ideas that were going viral at the time.

How did you classify words/users on a -1.00 to 1.00 scale?

We utilized a sentiment analysis algorithm that employed a numerical scale ranging from -1.00 to 1.00. A score closer to -1.00 indicated a negative sentiment, while a score closer to 1.00 indicated a positive sentiment. The algorithm took into consideration various linguistic and contextual factors, such as keywords, to determine the polarity of sentiments expressed in the text. It is important to highlight that one of our team members had expertise in sentiment analysis, which significantly enriched our project.

Given the size of our team, we were able to allocate tasks efficiently. I primarily focused on developing the website, while other team members handled tasks such as presentation creation, analysis, and more. This division of responsibilities within the team proved to be highly beneficial, ensuring a smooth workflow and allowing each member to contribute their expertise to the project.

Was there any main motivator/goal that encouraged the team when things didn’t go as expected?

Digital humanities is all about embracing the adventure of trial and error, and boy, did we dive headfirst into it! Our incredible team leader, Leen Sevens, was our ultimate cheerleader, always keeping our spirits high with her GIFs and emojis. But it didn’t stop there – the organizing team of the hackathon also rocked our chat with their infectious motivation. Knowing that we had this amazing support system, cheering us on and having our backs throughout the process, made the journey all the more exciting and rewarding. It’s safe to say we had a real dream team!

What were some of the roadblocks you faced?

Our initial plan was to generate tweets for the tweet generator in the backend, but it proved to be more complex than anticipated. Not only would it have required purchasing additional storage, but it simply didn’t work out as intended. Thankfully, we had a solid plan B in place. We resorted to manually creating the tweets for our generator and seamlessly integrating them into the HTML and Java Script system.

Timing also presented its own challenges. With numerous tasks and responsibilities on our plate, the 10-day timeframe seemed like a whirlwind. However, there was a silver lining to this intense rush. It brought the entire team together, all of us uniting our efforts to meet the deadline. In a way, it fostered a sense of community among us, reinforcing our shared commitment to achieving our common goal.

What kind of tips would you give to a team doing their first hackathon?

I’ve already written some down [laughs].

  1. Plan, divide, and conquer. Let folks explore what tickles their brain and make it a learning experience for everyone.
  2. Communication is key. Regularly touch base, share ideas, and foster an environment of open dialogue.
  3. Embrace the beautiful chaos of errors, because that’s where the magic happens!
  4. Have fun and enjoy the process. It’s a great opportunity to learn from each other and work towards one shared goal.

Interview: Professor Fred Truyen in Conversation with Artes Research Intern Alisa Grishin

2023年5月9日 21:55

In March of 2023, Artes Research Intern Alisa Grishin interviewed Professor Fred Truyen of the Faculty of Arts. As a professor in the Digital Humanities and Cultural Studies programs, Prof. Truyen has also been involved in numerous projects related to digitization, archives, databases, and other manifestations of DH. In this interview, he was asked about best practices for organizations, industry structures and barriers, as well as some general reflections on the development of the humanities.


Based on your background in ICT, how do you think there could be a better relationship between more traditionalist approaches to the humanities with newer developments, such as Open AI? Why do you think certain barriers still exist?

I think with Open AI we have come to a pivotal moment. There has been a long tradition of linguistics research, engineering research, and mathematical context to come to this point. We are starting to realize that we will be operating in a more intelligent environment and it’s not our kind of intelligence – our digital environment is a more responsive, anthropocentric environment and it allows us to build up our world and our world views in our own image. I think it’s a pivotal moment and forces us to rethink many of the approaches that we do daily which will be much more mediated and information-based. But for me, it’s always a rediscovery of fundamental humanistic skills. You know that I studied philosophy, and I remember that at a certain moment, there was this idea that ‘oh philosophy is just talk and now with social sciences we will do all these things empirically and that is true, you can do a lot in social sciences and they have made a quantum leap in progress and insights which also transpires in the kind of new innovations that in these disciplines have emerged. We statistically can see that these are very innovative domains. But in my view, the fundamental paradoxes of humanity, which have been captured in the basic philosophy context, have never been superseded by any kind of science whatsoever. So they have never been solved. And perhaps we shouldn’t have the ambition to solve. The great things of the past have taught us to appreciate the limited position of humanity in the world. Our essentially very limited understanding of what we are doing here, why, and where it comes from – even if there is substantial progress in personology, neuroscience, that doesn’t solve the core paradoxes. That’s why I think that critical thinking in the humanities will always stay with us. We will always have recourse to expressions like art, music, to partly cope with this failing and understanding that it is just part of our destiny. There will never be total clarity.

And from a more practical perspective, what do you think the potential role of hackathons, such as BiblioTech, could have in improving the relationship between more scientific disciplines and the humanities?

I think a hackathon is a fantastic example of interdisciplinarity. You have an element of serendipity, you bring people together at random, mix profiles, you have an open theme, you try and see what can come out of it. By trying to make something very practical, not only are you exchanging ideas, but you’re finding things that work. And in that sense, it’s always interesting when there are interdisciplinary themes. That is the charm of something like digital humanities. You work with people with very different backgrounds to find solutions that you want to explore and implement. I also think that libraries are an excellent place to do so. I have an extremely warm heart for libraries, museums, and archives. The library was meant as a place to safeguard and consolidate knowledge. You want to safeguard intellectual production, primarily books, and make this accessible. This absolutely honorable mission is still behind libraries today. They were very early to understand that this would transcend the book, would include every new meaning of knowledge expression, and that they had to go digital, and go into multimedial formats. And so many libraries were always at the forefront of organizing these kinds of hackathons and their mission is also closely related to things like what Wikipedia is trying to do. It’s a natural environment. So many people might think that the heritage sector is a kind of mummified sector of people doing the same thing all over again *laughs*. It’s absolutely the contrary. I’ve never experienced such a dynamic environment with people constantly rethinking their roles, even when these roles have very formal missions like archives, which are partly very legal missions. They have certain duties that they just can’t set aside. They have an extreme ambition to rethink the way in which it evolves. To come up with new ideas about what they should archive, how, who should decide what is an archive, what condition can be granted, all kinds of questions. They get a new environment for technology but it’s still the same question.

You also started going into my third question: “how do you see the intersection between Cultural Heritage (CH) and Digital Humanities (DH) in your line of work?” Archives and heritage can be seen as quite traditional approaches to managing our history. Can you elaborate on the importance of incorporating DH in these sectors?

Yeah, there’s a clear interest of CH in the DH profiles. They evidently need people with sufficient humanities background to actually understand to the full mission, the collections, the preservation that’s used, etc. You cannot just train that overnight. Most of these professionals in the sectors have a quite solid background in the humanities. The same goes for implementing digital solutions. You cannot just invent an engineer in one year *laughs*. So we need to bridge very, very distant disciplines that require deep training and the DH profile is an interface profile. In my sector, the heritage sector, we see this rather in more junior positions and we hope that they will be the next generation middle management. There is a very high demand and that is situated more at the junior-master level that heritage institutions are now taking on board. And they come from a variety of domain expertise, but with sufficient understanding and vision of digital technologies and digital reasoning. We see a trickle-down effect. If you see the digital humanities, you see techniques have been doubled up with computational linguistics, then corpus linguistics, then broader language strategies. Now we are taking hold of history because also historians work on texts. And so there is a whole battery of tools that you can just take on board from linguistics and you can apply to historical research. Technologies have also been doubled up with social science; I think of everything that has to do with social network analysis and all the tools there. We see from certain disciplines, a trickle-down of technologies to broader humanities fields. To me, don’t be afraid, it will never change the core of the humanities. That has a perennial value and will be part of society in 100 years. It’s not so that we are going to end up with statisticians and engineers and robots, *laughs*. It’ll be a very important part, but not the only part *laughs*.

And in a broader sense, what do you wish the public and the academic world and beyond knew about DH?

Well, I think they should know that humanities easily translates into the concepts and problems that it is part of critical thinking about the challenges that we face today. So you need this humanities background and the humanities disciplines have the capacity to adapt to the requirements of today’s science. Python is a part of scientific language and just like English, has become a lingua franca for scientific research. They are part of the way of scientific work and scientific thinking and so a digital humanities student has sufficient background to express themselves in that language and to assess and to discuss with developers how this can be implemented without being a software engineer, because that’s not the intention. When I started with ICT at the university, many of my colleagues who were responsible for ICT in the humanities faculties came from the humanities and were partly self-learners. So I hope that in many sectors people understand that digital humanities students have this capacity to bridge expertise and to make a coherent analysis of the different aspects involved, but still with a problem-oriented perspective behind it.

What do you think makes the humanities so adaptable? Is it the fact that it’s so abstract and focused on critical reasoning and perhaps isn’t as confined to theories?

I think it has to do with that. It’s like learning a second language and to learn a second language it’s always beneficial if you master your own language. Then you have a head start when you want to learn another language, and it’s the same here. We do want students with a solid humanities background and so they need to have proper training and that’s why we are not offering it [the Master of Digital Humanities] as an initial master at this moment. We had an excellent doctoral defense half a year ago from a student from architectural engineering who does these excavations in Turkey with a very interdisciplinary team of biologists, sociologists, anthropologists, archaeologists, and historians. So it’s a very interesting binary. And she focused on this, on the negotiations between these different teams. How can they experience the excavation side as a common object of study? How? How does it work? Because they use different time scales, they use different terminologies. They have thoroughly different basic concepts, often using similar words. But meaning quite different things. For me, that is digital humanities. Trying to understand how small differences in concepts can lead up to difficulties in implementations. This can force revisions of procedures; it’s a kind of humanities engineering. It’s not true engineering, but it’s humanities engineering. This is really at the core of what a digital humanist should be doing; helping in this translation of data into information and how it emerges.

And then going off of that, what would you say to a student coming from a traditional humanities background who’s interested in DH, but maybe a little hesitant because they don’t have the exact training?

Be confident. Be confident in your own skills with your training in the humanities, you are well prepared to take this step. So be confident and be bold. It’s our philosophy that we want to empower students. Because the frustration could be that you have this rich historical literacy or art background and you are faced with the challenges of today and the dominating role of technology. You are forced into a consumer role and that can be very frustrating. You feel powerless because you don’t speak the language of technology exactly. And to this we say yes, you can learn this language, you learned languages before. You can make this step and you can become a proactive user. You can work in teams that change technologies. To give an example, in heritage, there is the challenge to adopt the contents that we digitize and put them out in the open. Many of these archives are written in languages that are not adapted to today’s times. And if you think about the colonization context, you understand the depth of the problem. Many of these descriptions are racist and we shouldn’t be shy about addressing this. Now the challenge is how can we meet the demand of Open Access? How can we give the same collections to these people in a non-hurtful, contributive way? Well, manually, we can’t do this. It’s all technology, it’s all databases, it’s information systems, it’s artificial intelligence. But who is leading the teams? Historians, anthropologists, sociologists and lawyers. These [entities] are leading the way in developing the ontological approaches. What kind of software do we need? How can we give co-governance? How can we enable communities to actually participate in the development of the technologies that they will be using?

How can museums and other cultural organizations better incorporate DH into their work and mission? Have you noticed a significant increase in cultural institutions using DH methods? Do you have any predictions or any ideas on the direction of that? 

The sector actually is embracing this. It’s the role of the European Commission to stimulate innovation in these domains. What we lack a bit in Europe is “platformization.” The big platforms are of course not in Europe and that’s a sore point that needs to be addressed still. But the sector is embracing the development of databases, imaging techniques to scan the objects, digital tools for crowdsourcing for engaging audiences, 3D technology, and augmented reality applications. So the sector is embracing that and mid-size to large heritage institutions all have a very decent digital part of their working and they often still lack an embracing digital strategy. That is what we are working with now. 

AI is also in many of the current projects that are being funded today. I’m currently involved in a project called AI4Europeana. So the name says it all, yeah, but there were AI projects in the past already. And the reason is simple. The reality of AI is very, very neutral. And so in research projects, we are also thinking about how we can diminish a bit of “black box” over AI. How can we give people more confidence in trusting what the algorithm is doing? I see that when you look at AI programs in the heritage sector. It is often related to rebuilding trust relationships and using digital environments for that by looking at how we can reestablish transparency.

So that’s our role from the humanities: to challenge the bias, too. I’m now involved in a project which is called DE-BIAS for exactly that reason, as we are trying to see in the heritage sector where the bias is in the data that we accumulate and that we are now publishing through digital means.

Are there then some challenges in DH in terms of sustainability and preservation of digital materials? How do you prolong the benefits of these approaches?

The heritage institutions that are now safeguarding and preserving will gradually contain more and more digital artifacts. So there is already a movement underway to think about how we can acquire digital assets and how we can safeguard digital production. 

The core rules of a library, a museum, and an archive in relation to digital artifacts will include the preservation of digital artifacts. Preserving old versions of software; preserving cd-roms, you name it. And it’s increasingly challenging. We already saw this with, let’s say, the recorders of the 80s; these are now objects that are for preservation in an audio-visual archive. These are quite challenging things and a lot of technological thinking is needed to address proper preservation methodologies for that. You have analog film tape but you also have videotape, and videotape is also already the next step in the preservation problem. And then we go to the true digital formats and that is a whole history on its own, too. And so historians of the future will, instead of treating manuscripts, early books, early prints, etc., they will be treating tapes. And I’m not even sure that you can open some older files in the current version of Windows. So these are all challenges that lie ahead. And there are conferences about that and digital preservation and so there are a lot of disciplines emerging in that sector. But yeah, I guess the work is just never done. It’s just a matter of adapting to whatever is next in line for preservation. 

Shifting gears a little bit to briefly focus on cultural policy, are there any improvements that can be made to European cultural policies as to encourage DH? Or has Europe, at the policy level, been relatively receptive to digital approaches due to an interest in innovation?

Yeah, certainly reference to Europe, it also has to do with the specific mandate that the European Commission has and the balance that it needs to keep with the Member States, which is always a very fraught relationship. And so it was always the idea that culture is something that we export, is something that attracts people to us, is something that we have a long history in, and so we should be at the vanguard of what happens in culture. And this connects to the idea of innovation because innovation is truly at the heart of the Commission’s mission. They have always thought that there is a nice mix between innovation and culture and this means that, for example, when you look at heritage at the European level, it’s often a discussion in close connection with the creative industries. So how is Europe’s heritage part of the new creative industries and how can we valorize this? So on cultural policy, I think that it has been colored a bit by technology, by innovation at the European level.

For example, the rediscovery of craftsmanship that you see throughout Europe. You see this integrated in digital approaches. There are many quite interesting projects which are trying to marry look, quality, craftsmanship, the rediscovery of natural, agricultural processes, and seeing how digital technologies can be helpful in that context. For example, there are many YouTube channels on how to do crafts, so that there is a natural way of self-organizing knowledge transfer on very local things. The digital space mediates between the diaspora and the local. And that’s also empowerment because communities can protect their tradition and their sociality. 

Are there common internal concerns when approaching a DH project? Any legal, structural, and sustainability concerns?

From the viewpoint of the Commissioner, but not from DH. I think that their main concern is the human resource, the competency. Do we have the right profiles? Do they have interdisciplinary skills? If you need a conservator or engineer or a graphic designer, well, that’s a variety of roles that you need to integrate. That’s something totally different than when you have your classic preservation work or classic digitization work. That’s very focused. But the DH project goes broader. And then the real scare is, can we pull it off? Can we really bring the right people together and can we pay for the resources? That’s why I would think that we need a mindset shift. In the beginning, when organisations went to digital projects, they were side projects: we need a new cataloging system; we need a website; we need a kind of community forum; we need a social media manager. It was all fragmented, small-scale things that you think you would do and now we see that the digital permeates every aspect of your business. And that is a scary thing, because you realize that you need very, very different competencies than before.

What predictions did you have about DH which have not yet been realized, any technology that never took off, methods that didn’t become mainstream etcetera?

Ohh. That’s a difficult one. I think we waited very long for the breakthrough that we now see with generative AI. I think it was a much longer route and credit to my colleagues who held on! I don’t think we had misconceptions about what would happen. I think we always took into account that adoption would be slow, but again, in the 21st century, this is pivotal. And in academia for humanities, it was the monography in literature that was important: the article that you wrote by yourself as a single author. That took very long to go away. But today it has dramatically changed. The same if you look at PhD proposals. We waited for the whole last quarter of the 20th century for PhDs that would become more interdisciplinary. Now, in most disciplines, it’s just a given. 

I’m not sure that there are a lot of promises that didn’t come through. In linguistics, language technology took longer than expected. It then went through more probabilistic statistical analysis and that brought about the Googles of this world. But now we see that again.

So nothing really surprises you too much about the direction that technology is taking?

Well, what we couldn’t imagine is the power of imaging technologies and its response from the world. And now we are even talking about microdrones in your body. So these are technologies that were underestimated. And it’s used everywhere now, also in archaeology and on heritage sites. So that’s a type of technology that’s very rapidly taken hold that we’d underestimated.

And then what do you think the future of DH looks like? 

Well, there is always this debate about whether Digital Humanities will just become humanities or whether the humanities will become digital humanities. I think it’s a silly question. For example, photography came from scientific research in optics and chemistry. Chemistry advances made photography possible. So it came from science and it’s a technology that we all adopted and that we don’t even question. We don’t see it even as technology. The core of the humanities is linked to the process of human existence; it will exist forever. It’s not something that was part of modernity. We now speak about digital humanists, but the “digital” is such a broad term. Think about quantum computing, etc. That could bring us a totally new way of approaching technology. Will we still call it digital then? Or will we talk about quantum humanists? I don’t know, but I don’t care. It will be humanist.

 

The Digital Humanities journey of Sara Cosemans: using digital research methods to deal with information overload

2022年11月9日 18:40

Sara Cosemans is a Doctor Assistant in Cultural History at KU Leuven and a part-time Assistant Professor in the School of Social Sciences at the UHasselt. She developed this digital method during her PhD at KU Leuven, together with data scientists Philip Grant, Ratan Sebastian, and computational linguist Marc Allassonnière-Tang. She tweets at @SaraCoos.                                                                    This blog post was written by Sara Cosemans and originally published on Refugee History. You can find the original post here.

Sara Cosemans

Digital technology has transformed archival research. Instead of painstakingly taking notes most historians today take digital pictures or scans from archival documents. The advantages are undeniable: it reduces time spent in one archive, which increases the possibility to visit more archives and enhances accuracy. However, there are downsides, too. Access can lead to excess. The sheer volume of paper has exponentially increased since the invention of the typewriter in the late nineteenth century; computers, digitisation, and online access have only enhanced this growth. Hence, historians often end up with enormous collections of research photos on their personal devices. Many struggle to properly process their digital material. The problem of multitude is no longer situated during but after the archival visit.

The struggle became real in my own research on international refugee policy in the 1970s – the era when computerized telegrams were introduced. For my PhD project on the international resettlement of Ugandan Asians in 1972, the Chileans in 1973, and the Vietnamese between 1975 and 1979, I collected approximately 100,000 digital documents from both physical archives (approximately twenty archives across eight countries on five continents) and digital ones (mostly the American State Department’s Access to Archival Database or AAD). Here, I share my experiences of dealing with such abundant and diverse information and the opportunities for historians offered by ‘distant reading’ techniques.

Digital research methods, such as Natural Language Processing (NLP), can provide solutions for large corpora, but they require digitised material with high accuracy. Most digital historical research therefore deals with sources that are ‘born digitally’ (such as the electronic telegrams in AAD) or with crowd-sourced data that are manually inserted into digital databases. I needed to combine information from digitally born sources (20%) with a large corpus of typewritten (>75%) and handwritten (<5%) sources. Manual transcription into a database was only possible for the handwritten files; the typewritten corpus was simply too large. Moreover, the quality of the typewritten material varied starkly, from good to barely readable. The challenge was to deal with this information overload through a hybrid digital method, which could enhance the analysis without major distortions caused by the internal diversity of the material.

Together with an interdisciplinary team, I designed a three-step method based on augmented intelligence. We used digital methods to analyse the files, while ‘keeping the historian in the loop’. Every step was thus verified and tested against historical knowledge acquired by more traditional research methods, such as close reading and discourse analysis.

Step 1: Digitising documents

Optical Character Recognition (OCR), a technique used to convert analogue to digitised text, has improved markedly, but research shows that a threshold of 80% correctly recognised characters is necessary for digital analysis. The variation of the typewritten sources (due to low quality paper, ink, preservation, etc.) affected my corpus enormously. Some archives – particularly the archives of the United Nations High Commissioner for Refugees in Geneva – showed considerably poorer results of OCR quality (<80%), while others, mostly the national archives of the UK, US, Vietnam, Chile and Uganda, had high degrees of consistency and quality (>90%). But what effect did this have on our analysis?

To verify whether the digitised material was useable for digital research, we compared the OCR text with the digitally born telegrams from the American State Department. We selected a test sample including 4,679 records from the UNHCR Archives in Geneva, 7,272 records from the British National Archives, and 8,646 electronic cables from AAD, all in English. We then used topic modelling to compare the results between the three collections. We found that the computer was able to distinguish between various themes within the archives related to the research question, focusing on the resettlement of the Ugandan Asians, Chileans, and Vietnamese between 1972 and 1979. For each group, the computer could identify one or more topics. It also identified four topics related to policy making (two related to UNHCR, and one each to the UK and US policies). A last topic isolated all documents unrelated to resettlement that were inserted by mistake (see graph). This division occurred automatically, without human intervention, and thus showed that the computer correctly understood the input. Moreover, we found that the computer identified the same themes (‘topics’) across all archives, both in the digital and the typewritten sources and regardless the OCR-quality. This result indicated that all collections were suitable for digital analysis.

Visualisation of the topics per year and per archive (AAD = American electronic cables, TNA = British National Archives, UNHCR = UNHCR Archives). The yellow topics relate to the Ugandan Asian casus, the green to Chile, the red to Vietnam, the grey to UNHCR policy, the purple to British and American policies; blue indicates the
anomaly of the mistakenly inserted data.
 
Step 2: Hypothesis formation
 

Historians have the appropriate disciplinary knowledge to judge the quality of results derived by technology, but they also come to their research with preconceived notions and biases, which can blind them to other results and outcomes than they had in mind. We combined technological solutions with interdisciplinary communication and iteration to come to a model that can supplement and support data exploration and hypothesis formation. While the backbone of this research has a solid historiographical-theoretical underpinning and thorough understanding of the data through close reading, distant reading can be used to point historians in new directions, especially when the potential corpus of sources is larger than a single researcher can read in a lifetime.

We used another Natural Language Processing method, namely clustering, to find how the topics related to each other. One key finding was the centrality of human rights, most particular the right to freedom of movement, in policy discussions regarding refugee resettlement. These debates occurred in relation to the freedom of movement of Ugandan Asians towards their former imperial metropole, the UK, but the topics also showed how this discussion surfaced in the subsequent resettlement periods. This result was unexpected and led to new research questions.

Step 3: Analysis

The output of our method was a ‘reading list’ of the most important pages per topic. Instead of starting the daunting task of combing through approximately 100,000 documents in an unstructured way, my reading method became highly structured. I used the pages in the reading list to detect interesting lemmas for further full-text search and explored the material, including sources in other languages than English, through traditional complementary methods.

Digital methods can help in managing the overload of digital sources historians currently face. They can gauge the quality of the digitisation before one delves into the material. And they can reveal patterns potentially undiscernible for a human reader. However, these solutions are not value neutral. They still reflect the heuristic process of the researcher, and the inherent biases, gatekeeping and colonial legacies within the archives. Moreover, digital technologies, often engineered to perform best in an English-language environment, add their own biases to the process. Therefore, we opted for augmented rather than artificial intelligence. By pointing to interesting areas for close reading, while all interpretation and mistakes remain the historians’ responsibility, the computer supports but does not take over the analysis from the historian.

This digital research project was partly funded by the Research Foundation – Flanders (Fonds voor Wetenschappelijk Onderzoek, FWO). Sara Cosemans is grateful to her interdisciplinary team, without whom she would have never managed to conduct the digital analysis. She also wants to thank the editors of RefugeeHistory.org for their helpful suggestions.

Libraries and Diamond Open Access

2022年4月13日 15:21

The following is the redacted text of a statement given by Demmy Verbeke at the “The Diamond Open Access Model: what impact on research?” webinar organized by Academia Europaea Cardiff, KU Leuven Libraries and the Young Academy of Europe on March 28, 2022. 

Academic libraries have a responsibility in the context of Diamond Open Access on at least two levels.

For more than a decade now, librarians worldwide have played a role in promoting OA, explaining the various options to make academic work freely available to all, highlighting the pros and cons of the various routes towards OA, etc. This advocacy work is lately more and more interwoven with talking about funder compliance or talking about things like block grants, OA funds and read-and-publish deals. However, we need to be very careful that the latter does not turn librarians into salesmen for the publishers with whom their universities have this kind of agreement. The thing that we always need to remember is that academic librarians do not work for publishers; they work for their institutions and serve the scholarly community, so they need to talk about the diversity of OA possibilities. They owe it to their profession to provide an analysis which is as objective as possible of the pros and cons of various OA approaches so that authors can make up their own mind about whom they want to entrust with the dissemination of their research results.In that context, it is important that librarians also talk about Diamond OA and give the full picture. For instance by not only talking about the main thing that scholars associate with Diamond OA, namely that this is an approach to scholarly publishing which does not charge fees to either authors or readers, but also to stress the second element of the characterization used in the recent Action Plan for Diamond Open Access, namely that these are community-driven, academic-led and academic-owned publishing initiatives. This is important, because this makes an essential difference in the financial model behind initiatives of this sort and is the reason why scholars, funders and institutions alike should not only foster Diamond OA but should even prioritize it over other approaches.

The second responsibility is to not only talk and inform about Diamond OA but also to financially support it. Personally, I have very little patience for the argument “we do not have the budget to support Diamond OA programs”. Most university libraries in the Western world have multi-million budgets, whether they receive additional block grants for OA or not. I find it hard to believe that it would be impossible to find a few grand in that budget for Diamond OA. I do, however, understand and sympathize completely with the realization that we need to rethink our budgets in order to make room for Diamond OA. Both acquisition and cataloguing processes of libraries are still completely geared towards either the traditional model of publishing behind a paywall or towards publishers who have found a way to shape their OA offer in such a way that it almost appears as business as usual, for instance through read-and-publish deals. As a result, there is a big risk that library budgets are completely hoovered up by a combination of buying paywalled content and spending money on the privately-owned, for-profit approach to OA. This means that, if libraries want to financially support Diamond OA, they need to either prioritize it in the sense that they first spend available budget on Diamond OA, then on paywalled content, then on for-profit OA; or that they need to make much clearer distinctions in their budgets and need to separate a percentage for Diamond OA, a percentage for paywalled content and a percentage for for-profit OA. The added task is that they also need to stick to that division. If the price tag of either of those three categories increases – and, by the way, I guarantee that the price tag for for-profit OA will increase – then they cannot move around money from one category to the other without first having a thorough discussion that this implies a policy change.

I consider both responsibilities for academic librarians in the context of Diamond OA as an obvious continuation of the role they have been playing in the field of scholarly communication for generations. Librarians are not in the business of telling researchers what to do and how to distribute the results of their work. But that does not cancel out the fact that researchers turn to librarians for guidance in this, either by making explicit appeals to the expertise within libraries to provide support and advice, or implicitly by observing which choices libraries make in their collection building and adapting their own publishing practices to this. Similarly, research libraries have a long tradition of funding the market for academic publishing. Library budgets pay for the acquisition of monographs, for standing orders for series and for subscriptions to journals and databases. So it is natural that these same libraries are now called upon to act as funders for publishing in OA. And just like librarians were entrusted to make wise budget choices in a traditional system of acquisition of content behind a paywall, they should now be entrusted to make wise budget choices in how to support OA publishing. I, for one, am convinced that librarians will ensure much better value for money, and thus do a much better job for their institutions and the scholarly community which they serve, if they favor academic-led approaches towards OA without author fees over for-profit approaches towards OA based on publication-level payments.

Opening The Future: A new funding model for OA monographs

2021年8月31日 19:30

Opening the Future is a collective subscription model for OA books. Libraries can sign up for its membership scheme, which implies that they grow their collections and support Open Access at the same time. The objective is to raise small contributions from a large number of academic libraries, so that no single institution bears a disproportionate burden.

How does it work?

A library subscribes to a backlist package of non-OA books offered by a publisher. The publisher makes this backlist package of non-OA books available to subscribers only (in other words: books in this package remain non-OA), but uses the subscription money to publish new books in OA. These new books are thus made available to everyone in OA, benefitting scholars and institutions around the world.

How it started and how it’s going

Opening the Future was launched by the COPIM project: an international partnership of researchers, universities, librarians, open access book publishers and infrastructure providers supported by the Research England Development Fund (REDFund) as a major development project in the Higher Education sector with significant public benefits, and by Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin. The Central European Press (CEP) piloted this model and was recently joined by the Liverpool University Press. Both programs are funded by  KU Leuven: membership for the  CEP program is financed by KU Leuven Libraries Artes, whereas membership for The Liverpool University Press program is funded via the KU Leuven Fund for Fair OA.

In June 2021 it was announced that Opening the Future has been shortlisted as a finalist for an ALPSP Award for Innovation in Publishing. ALPSP (Association of Learned and Professional Society Publishers) is the international trade association which represents non-profit scholarly publishing. The winner of the award will be announced during their annual conference on 15-17 September.



Open Access terminology (bis)

2021年3月25日 19:39

A previous post shed some much needed light on the complex world of OA terminology. It certainly helps to be familiar with these names, although we cannot claim that all possible questions can be answered with one post. What about Black OA, Rogue OA, Radical OA and Platinum OA? Can Bronze OA actually be called OA? And is Elsevier really talking about the same approach to Diamond OA as the early advocates for community-driven, APC-free OA do when they plan a Diamond OA Journals conference?

It is necessary to get familiar with this OA terminology. However, we should also be mindful of the risk that we spend so much time and energy on definitions that we lose sight of the essence – not unlike what happens with trying to define what a predatory journal is, which deflects attention from the real problems of scholarly publishing (version of this article openly archived here). Another approach would be to start the discussion from two simple distinctions.

The first is the difference between open archiving and open publishing.

  • The actor with open archiving is the author. He or she publishes something, and then takes the additional actions of archiving an electronic version of that publication in a repository and making the archived version freely available to all. Quite regularly, the version which the author has archived and made available is not identical to the published version (e.g. an accepted version instead of the published version), and the archived version is not available at the same time as the published version (e.g. it is made openly available with a delay of 12 months). Open archiving thus offers the advantages of OA, at no cost to the author (who can always find a free repository to archive his or her text in), but requires an additional action by the author and exists as a system parallel to actual publishing.
  • The actor with open publishing is the publisher. At the time of publication, the publisher immediately makes the final version of the text openly available. A publisher typically does not do this for free, and somebody needs to cover the publication costs. This can be done by charging the author, by charging an academic institution or a research funder, or by having the publication costs covered by a group of supporters, typically a consortium of university libraries.

The second is the difference between a for-profit and a non-profit approach to scholarly publishing.

  • In a for-profit approach, the goal is to realize incoming funds which are higher than the actual publication costs. The profit which is thus realized is not reinvested fully in the scholarly community but used to reward shareholders in the publishing business. Scholarly publishing has great potential for being a profit-bearing enterprise, because most of the skilled workers in the production chain (i.e. the researchers) offer the fruits of their labor (producing manuscripts, performing peer review, editorial work) for free, because there is a stable market of customers (i.e. university libraries), and because prestige is such a big factor in academic publishing (so that publishers who have attained a good reputation can realize very high mark-ups).
  • The non-profit approach rejects the premise that profit should be made on the dissemination of research results. At its core is the conviction that scholarly knowledge is a common good and thus should be shared by all. It resents the fact that a small group (i.e. shareholders of a publishing company) would profit from investments with public money (both in employing researchers and by subsidizing university libraries) and therefore maintains that if incoming funds are higher than publication costs, they should be reinvested in the scholarly community.

Not Only Transformative Agreements

2021年3月16日 19:04

More and more institutions and consortia of libraries conclude so-called read-and-publish deals or transformative agreements with legacy publishers. But this new incarnation of the big deal is not without its critics. The hard line opposition argues that transformative agreements hamper  progression and should therefore be avoided at all cost. A less radical approach is to make sure that the available budget is not spent exclusively on transformative agreements but is also used to support alternatives, fostering diversity of business models in the market of academic publishing.

The hard line

Transformative agreements diminish rather than stimulate diversity and equality in scholarly communication, are unnecessary in certain disciplines, might worsen the state of the market, and stimulate vendor lock-in.

Let’s look at the last argument in a bit more detail. If the negotiations leading towards a transformative agreement are successful (which is only possible if they are very well prepared – which comes at great expense), they might lead to a deal with a legacy publisher including OA at about the same cost as an earlier subscription arrangement. Hoorah! However, by concluding such transformative agreements, academic institutions demonstrate that they are able and willing to pay above production cost for publication services. What is more: by doing so, they finance legacy publishers to further develop services concerning research data management, bibliometrics, and other aspects of scholarly communication.

So what will happen next negotiation round? If (and this is again a very expensive if) negotiations go well, academic institutions might even be able to drive down the price for publication services offered. But they will have to pay additionally, and handsomely, for the other services. They will feel obliged to do so (1) because these legacy publishers will dominate the market place even more than before, (2) because the services these publishers offer will be more attractive and user-friendly than anything else on the market (since academic institutions, unwittingly but generously, provided the budget to develop them), and (3) because legacy publishers will be able to lure academic institutions into new forms of big deals packaging services concerning scholarly communication which will seem easier and cheaper than obtaining these services separately.

For this, and many more reasons, transformative agreements should actually be considered librarian malpractice.   

What if academic institutions would invest the budget as well as the time, energy, and talent they currently waste on transformative agreements in community-owned alternatives? Alternatives that foster diversity rather than monopoly and support bibliodiversity and multilingualism, thus providing a more global and democratic approach. Alternatives which involve working with partners who do not insist on vendor lock-in and who operate in service of the academic community (rather than in the service of their shareholders). Would that not mean that we would finally see, in the words of Eloy Rodrigues, the return of universities and scholars “to the driver’s seat of scholarly communications”?

Back to reality

Even if it seems a naïve dream to expect a general commitment to this approach, is it not smart to safeguard part of the budget to invest in alternatives, thus keeping the market healthy and our choices open? Even when we do it in a very small way, let’s say – as argued by David W. Lewis – by putting aside 2,5% of the total library budget to support open and community-owned infrastructure (and if you don’t know where to start, the Global Sustainability Coalition for Open Science Services can most certainly help). An investment of 2,5% seems little, and perhaps not something to be proud of (since the implication is that you spend 97,5% of your budget on scholarly communication infrastructure which is closed and/or privately-owned). But it is a start. 

 As Head of KU Leuven Libraries Artes, Demmy Verbeke is responsible for collections and services for the Arts and Humanities.  Demmy is a strong believer in Fair Open Access, serves on the editorial board of the Journal of Librarianship and Scholarly Communication and is, together with Laura Mesotten, responsible for the day-to-day management of the KU Leuven Fund for Fair OA.
 

 

Interview: Professor Martin Kohlrausch about the KU Leuven Fund for Fair Open Access

2021年2月25日 21:46

In order to boost Open Access publications, the KU Leuven Fund for Fair Open Access helps finance OA books published by Leuven University Press. Professor Martin Kohlrausch shares his experiences about publishing his book in OA.

Your book is published open access thanks to the support of the KU Leuven Fund for Fair Open Access. How did the open access publication process go? What makes open access so attractive for you/your book? Have you thus far noticed that your book reaches a wider audience?

There were mainly two reasons why I decided to go for open access. First, I expected that with open access my book could reach a much broader audience than a print edition and, as its theme, ‘modernity’ is global, also a global audience. Second, I was hoping for a quick ‘absorption’ of the book. Judging from the very high download numbers my expectations have been outmatched. Moreover, these numbers provide me as the author with quite telling insights into where the book is downloaded (and hopefully read). I still believe it is important, however to also have a print edition and to be able to communicate the results of my research the ‘classic’ way.

Continue reading at the ‘Author’s Corner’ of Leuven University Press: Martin Kohlrausch | Brokers of Modernity. East Central Europe and the Rise of Modernist Architects, 1910-1950

 

❌