普通视图

Received before yesterday

Recap: How do you do it? A behind-the-scenes look at research workflows (2024)

2024年12月12日 00:02

Every academic year, the HDYDI (How Do You Do It?) event on research data workflows signals the start of the Digital Scholarship Module. Through a series of sessions and (mini-)workshops, Artes Research aims to guide students through the complexities of scholarship in the digital age, from Open Science to Research Data Management and beyond. At the HDYDI kick-off event, three researchers from the Faculty of Arts lift the curtain on their own research workflow and offer a behind-the-scenes look at the ways in which they approach their research, the data they engage with, and the tools they use in doing so. The goal of this session is to provide examples of more advanced workflows for the first-year PhD researchers as they embark on their own research journey. Hopefully this recap of the session can spark some inspiration for you!


Seb Verlinden – Using Obsidian as a note-taking tool for literature

The first speaker, Seb Verlinden, is a second-year PhD candidate in medieval history. Under the supervision of Maïka De Keyzer and Bart Vanmontfort, Seb is studying the long-term landscape changes – mainly in the form of gradual desertification – that characterize the Campine region, one of the driest areas in Belgium. Particular focus is on the impact of eighteenth-century drainage in the region.

Seb’s talk concerns an issue that all researchers can relate to, regardless of the relative complexity of their project – that of taking notes. It is true, as Seb highlights, that every researcher has their own unique workflow, often relying on a combination of tools that makes sense for them (in his case, QGIS, FileMaker Pro, MAXQDA, and spreadsheet software). But at the heart of any research process is the need to organize one’s thoughts, and this is where note-taking apps can make a real difference. So, what are some of the options out there?

Zotero is a possible solution – one we’ve already discussed elsewhere on this blog. As a reference manager first and foremost, Zotero has the potential to become a researcher’s living library, a knowledge base covering all relevant literature. It also has great capabilities for annotating PDFs, especially with its new 7.0 update. What you’re missing in the context of note-taking, however, is the big picture. Seb aptly points out that using Zotero to make notes is like putting post-its in books: you have no real overarching structure, and no way to easily link notes across books.

Other tools are likewise flawed. Lots of researchers use Microsoft Word to take notes, even though it is primarily tailored to mid-length longform text. As a result, it is easy to lose track of notes, unless you’re willing to navigate multiple files; and it tends to grow slow and cumbersome, since it is occupied with layout. It is, simply put, unintuitive for this purpose.

This is why Seb puts forward another solution, one that he believes to be faster, better automated, and easier to use: Obsidian. A widely supported and free tool, Obsidian does have its advantages: in contrast to both Microsoft Word and Zotero, it uses open-source file formats (.md or Markdown files, written in an accessible markup language) and it is full-text searchable and provides a structured overview of notes. Moreover, it offers a versatile workspace, allowing you to go as simple or as complex as you like – especially with the addition of supported plugins. One such plugin, in fact, allows your Obsidian environment to easily interoperate with your Zotero library (including references, bibliographies, and PDF annotations), which is particularly useful.

Seb ends his talk by highlighting another key benefit in using Obsidian. By introducing links in your notes, it is possible to cross-reference other notes within your system with minimal user effort; and through the use of tags, you can generate another layer of structure. Obsidian then uses this information to visualize the relations between your different notes, automatically creating a network of clusters that correspond to certain topics of interest. This way, it expands the possibilities of the data without the need for the researcher to make any real effort – a great reason to think about using Obsidian for your own note-taking needs!

Seb showcased his own network of notes, automatically clustered by Obsidian. This way, he can visually grasp the connections between different topics of interest!

Laura Soffiantini Managing linguistic and historical data. A PhD workflow using FileMaker

Laura Soffiantini is the second speaker: as a PhD researcher at the Cultural Studies Research Group, she is currently analyzing the geographical representation of Greece in Pliny the Elder’s Naturalis Historia. With the help of her supervisor Margherita Fantoli, Laura intends to shed new light on the way in which Greece was perceived in Flavian-era Rome. In order to do so, she has to manage a varied mix of linked data – textual, linguistic, and historical – as part of her daily routine.

Grappling with 37 books of a classical encyclopedia, and dealing with data in different formats and with different qualities (actual text, numeric coordinates, symbols, etc.), Laura realized the importance of proper Research Data Management. It enables aggregating, manipulating, analyzing, and comparing your data more efficiently throughout – and even beyond – the research process. Indeed, a challenge faced by many researchers is the retrieval of data collected or processed at an earlier time, with the aim of relating it to “new” data. In this context, Laura provides a look at her own research workflow.

The primary strategy in managing your data, she remarks, is to structure it. By adding structure to your data, you can parse it more easily and return to it without issues, even in later phases of your project. Software like Obsidian is indispensable for this purpose, but it’s also good to think about using tabular formats like .csv (an open plain text format) as a way to organize your data. A useful tool put forward here is pandas, a Python library designed to help manage and analyze data derived from such .csv files. That might sound technical, but Laura ensures us that – even if you have no background in programming – pandas is a very accessible and convenient tool in handling tabular files.

Having thought about what data she worked with (an essential step for every researcher), Laura adopted an initial workflow in three parts. She first started out with .json files containing Pliny’s text, which she converted into tabular .csv files, adding data related to the lemmatization of the corpus, part-of-speech tagging, and references to book and chapter positions. Subsequently, she thought about grouping this data into different categories, which she assigned to different columns – such that there is a column titled “book_chapter”, one titled “lemma”, and so on. Finally, Laura assigned identifiers to the information contained in these files; she explains she wasn’t aware of the importance of such identifiers at the start of the project, but now realizes they form a crucial part of keeping tabular data.

As a result, Laura ended up with multiple .csv files, which she then related to each other using FileMaker (with the expert assistance of Mark Depauw and Tom Gheldof). One table, for instance, contains a list of all the Latin words used (the tokens, e.g. urbs) alongside their identifier, book number, lemma, and possible identifier linked to the Trismegistos database of ancient texts. Another contains the lemma along with its part-of-speech tag (e.g. proper noun) and meaning (e.g. “city”). By linking the different files through the use of identifiers – the keys to the data – Laura made a relational database easily managed and organized through FileMaker. The resulting dataset is at the core of her research project.

The main takeaway Laura wants to leave us with is that it is important to create an environment in which you can efficiently collect, store, manipulate, and analyze your data. This should not come at the cost of traditional approaches and methodologies – in fact, you can add to them to create a better workflow as a whole!

Laura showed us some examples of how she used specific identifiers to connect tabular files and create a relational database in FileMaker.

Zakaria El Houbba Obsidian as part of the research workflow

The third and final speaker is Zakaria El Houbba, third-year PhD candidate in Arabic Studies. Zakaria’s project, supervised by Arjan Post, focuses on the pre-modern relation between Islamic jurisprudence and Sufism, and in particular on the way in which these two strands are united in the figure of Aḥmad Zarrūq. In doing so, the research aims to come to a theory of applied legal epistemology in Zarrūq’s Sufism.

By discussing his own workflow in detail, Zakaria intends to highlight a number of key takeaways revolving around the idea of the “second brain”. Because we are so deeply involved with knowledge gathering on a daily basis, and constantly receive input from various sources (whether academic or not), we run the risk of being overwhelmed by a flood of information. When you use software to carry that burden for you, you can save your own brainpower for actual critical thinking rather than secondary tasks like categorizing information. This way, you’re effectively constructing what’s referred to as a second brain.

In this context, Zakaria also makes use of Obsidian, though he approaches it from a very different angle than Seb. Zakaria doesn’t actually enter all of his notes into Obsidian – he first uses an app like Microsoft OneNote as a “vault” to record random, non-processed thoughts, which he periodically goes through to think about how they fit in his project. He then sorts these thoughts and puts them in corresponding folders (relating to certain projects, classes, issues, etc.) in order to process them properly in Obsidian. Zakaria emphasizes that it’s fine to keep it simple and take it slow, focusing on what you specifically need from the note-taking environment so as not to get overwhelmed by all the options and information.

There are more tools Zakaria uses in his workflow – in fact, he says, there is a constant conversation between himself, Obsidian, Zotero, and ChatGPT. He uses Zotero to make notes and highlight text when reading articles, which he imports into Obsidian and categorizes using tags. Afterwards, he copies those highlights from Obsidian into ChatGPT, asking it to take up the role of copy editor and summarize the text. The resulting summary, which he critically revises, is then given a place in Obsidian once again.

Next to the powerful visualization capabilities discussed by Seb, Zakaria explains that Obsidian can also be used to create subpages within notes to explain terms and concepts, provide brief biographies of important figures, and so on. These “subnotes” can be linked back to in other notes as well, resulting in a kind of personalized Wikipedia for your research topic. This can also be helpful when you’re following classes on a certain topic or revising your own teaching material!

Finally, speaking of teaching material, Zakaria points us to a couple of helpful AI tools that can be used to process video files, such as recorded lectures or talks – whether you attended them or gave them yourself. One such tool is NoteGPT, which essentially functions as a transcriber and summarizer of recordings. You can revise and copy the resulting transcriptions and summaries into Obsidian as well, further expanding the scope of your second brain. Brisk Teaching serves a similar purpose as NoteGPT, but can also be used to turn a video into a PowerPoint presentation, which can be very convenient and time-saving. By thus constructing a workflow, gradually accumulating relevant information through different tools, it becomes much easier to manage your research.

The home tab of Zakaria’s Obsidian environment. As both he and Seb explained, you can make it as simple or complex as you like – try to make it a welcoming space for your daily research workflow!

The workflows of the presenters reveal both similarities and differences, but there’s one thing all three can agree on – what’s important is to find a workflow that works for you. To that end, take inspiration from some of the tools and processes described here, but always make sure they support your specific research methods. This was emphasized in the questions as well: don’t feel pressured to adopt a tool like Obsidian, but try it out and see if it accommodates your needs. Who knows, you might uncover a more efficient workflow or see your data from a new perspective.

Happy holidays from the Artes Research team, and may your data be blessed in the year to come! 🎄

Recap: How do you do it? A behind-the-scenes look at research workflows (2023)

2023年12月6日 17:55

Each academic year, we, at Artes Research, kick-off the Digital Scholarship Module – a training for first-year PhD researchers at the Faculty of Arts – with a session dedicated to research data workflows. Three researchers from the Faculty of Arts offer a behind-the-scenes look at their research workflows by outlining how they approach and structure their research, the tools they use, and with what kind of data they are working. The goal of this session is to provide examples of more advanced workflows for the first-year PhD researchers as they embark on their research journey. Hopefully this recap of the session can spark some inspiration for you!

Vicente Parrilla López – Plain text and structured notetaking

Vicente’s research, which is in the field of musicology, focuses on reviving the Renaissance practice of improvised counterpoint. Apart from a PhD researcher, he is also a musician and recorder player himself. In his research workflow, Vicente consistently seeks out tools to enhance efficiency and further streamline the structure of his work.

Vicente introduced us to the versatility and accessibility of plain text files, highlighting the benefit of this file format, as it is universally usable across various computers and software platforms. One drawback, however, lies in readability due to the absence of text formatting and smaller typography. Fortunately, applications like iA Writer, which allow users to use markdown to apply additional formatting, address this issue.

There are a wide array of digital tools for structured notetaking out there. In addition to iA Writer, other examples include Obsidian and Notion. The key is to choose the tool that suits your needs and preferences best.

Vicente highlights the advantages of using plain text files for structured notetaking in conjunction with applications like iA Writer:

  • Distraction-free writing: plain text notetaking ensures an undisturbed writing experience with basic formatting; once you are finished you can preview your text for example as HTML or PDF output.
  • Versatility: plain text files are very adaptable; they can be exported to various formats such as HTML for websites, DOC for Microsoft Word, PDF, and even transform into programming language files like Python, Java, JSON, CSS, XML, LaTeX, among others.
  • Interconnectedness: notetaking tools like these often incorporate a tagging system that facilitate connections between concepts and ideas.
  • Search capability: these tools also offer robust search functionalities, ensuring swift and efficient retrieval of desired information.
 

An important aspect of Vicente’s notetaking workflow is the integration of structured metadata. Vicente implements a dedicated metadata section at the beginning of each note, enhancing the categorization and contextualization of his notes. In general, adding metadata in a systematic way offers several advantages. By recording key details like creation date, authorship, and related keywords, metadata enriches a note by adding surrounding context. Additionally, metadata enhances searchability by allowing the user to search for specific information or themes across an entire note repository. Lastly, structured metadata can foster collaboration between various users but also across different projects.

Vicente also introduced us to the concept of text expanders. The purpose of this type of software is to replace designated keystrokes, known as ‘shortcuts’ or ‘abbreviations,’ with expanded text segments. Its strength lies in expediting the writing process by swiftly inserting frequently used words or phrases into articles, grant applications, and more. It can also help to easily integrate standardized metadata and bibliographic entries. Using the text expander software allows Vicente to have a streamlined writing experience. When used systematically, it also helps him create consistency across various documents. Moreover, the program saves him the time that would be spent on manually inserting phrases or words he uses frequently in his research and writing.

Stijn Carpentier – Digitized source material and distant reading

Within the Negotiating Solidarity project, Stijn’s research aims to uncover and contextualize the wide variety of contacts between actors within Belgian civil society and the rapidly growing influx of foreign guest workers from the 1960s to the 1990s. Despite labeling himself as a hobbyist in the Digital Humanities realm, Stijn presented to us an inspirational workflow where he merges historical research with digital tools.  

Stijn’s journey into DH was triggered by his source material. For his research, he wanted to explore how guest workers in Belgium were communicating about their activities and their ideas through periodicals and other types of serial sources. As the term suggests, serial sources are published at regular intervals, resulting in an overwhelming volume of material that cannot always be read entirely during the timeframe of a PhD project. Consequently, Stijn sought an efficient method to comprehensively analyze this extensive array of sources without having to read them all in full.

The first step to achieve this goal was digitization. Stijn encountered both undigitized and poorly OCR’d digitized sources, prompting him to undertake the digitization process himself. However, digitization is time-consuming; hence, Stijn emphasizes the importance of collaboration with the archives or institutions housing the materials. They may offer assistance in digitizing the content or provide access to their scanning equipment and OCR software. Stijn stresses that while digitized sources offer many advantages such as searchability, it remains crucial to engage with the physical materials. Understanding the contextual nuances of their creation and preservation is imperative, rather than treating them merely as isolated PDF files.

Once he tackled the first hurdle of digitization, Stijn delved into distant reading, a text analysis method enabling insights into vast corpora without the need for exhaustive reading. To conduct this analysis, he used the software AntConc.

AntConc is a free, cross-platform tool for corpus analysis. There are also other tools with similar features such as VoyantHyberbase, and Sketch Engine.

Upon uploading his documents to AntConc, Stijn could perform basic word searches and proximity-based word analysis. The tool also enables tracking keyword mentions over time, which helps to get an overview of patterns and how they evolved. As a result, Stijn could efficiently extract core ideas from an extensive corpus, a task that would have been impossible for him to complete during his PhD if he were using close reading methods. Such tools not only extract information but also foster creativity in research, encouraging novel perspectives on the research material that might otherwise remain unexplored.

Stijn concluded by comparing Digital Humanities to a Swiss army knife: it is like a versatile tool that doesn’t necessarily need to be the focal point of your project but serves as a valuable instrument for exploring both your sources and your research domain. Beyond that, DH facilitates connections with peers. Belgium boasts a vibrant Digital Humanities community, offering ample opportunities for networking and learning from a diverse group of experts and enthusiasts.

If you want to get involved in the DH community in Belgium you can join the DH Virtual Discussion group for Early Career Researchers. The discussion group meets on a monthly basis via MS Teams. Each meeting features a presentation from a member of the Belgian DH community, a moment to share DH-related news, and a chance to network.

Tom Gheldof – A day in the (tool) life

Tom Gheldof is the CLARIAH-VL coordinator at the Faculty of Arts. Throughout the years, he was involved in several projects in the field of Digital Humanities such as the Trismegistos project at the Research Unit of Ancient History. Currently, he is a scientific researcher of the ‘CLARIAH-VL: Advancing the Open Humanities Service Infrastructure’ project that aims at developing and enhancing digital tools, practices, resources, and services for researchers in many fields of the humanities.

Tom provided an insider’s view of his typical day, shedding light on the various tools he employs:

  • Identification: to introduce himself, Tom showcased his ORCID iD, a persistent digital identifier that sets researchers apart regardless of name similarities. It serves as a central hub to which you can link all of your research output. Not only does it boost the visibility of your work, it also streamlines administrative tasks, as you only need to update one platform that you can then connect with your funder, publishers, etc.
  • Text recognition: given that Tom’s research relies on manuscripts, he has familiarized himself with automated text recognition. His primary tool for this is Transkribus, a platform that uses machine learning technology to automatically decipher handwritten and printed texts. Through a transcription editor, users within the Transkribus community transcribe historical documents, training the system to recognize diverse text forms – be it handwritten, typewritten, or printed – across various languages, predominantly European.
  • Annotation: Tom relies on Recogito for his research on place names. This online annotation tool offers a user-friendly interface for both texts and images. Recogito provides a personalized workspace to upload, collect, and organize diverse source materials such as texts, images, and tabular data. Moreover, it facilitates collaborative annotation and interpretation of these resources.
  • Coding: for coding tasks, Tom uses Visual Studio Code, a free coding editor compatible with multiple programming languages. To collaborate and access code with open licenses, he turns to GitHub, a repository where people share their code, fostering a collaborative coding environment.
  • Relational databases: Tom has a lot of expertise when it comes to building relational databases. A relational database allows you to represent complex datasets and the connections between and within different types of data. He uses the FileMaker environment, which has broad functionalities and permits export of the data to any other format.
Tom has given trainings about relational databases in general, and FileMaker in particular, in the past. An overview of existing training material can be found on the DH@rts website.

To familiarize yourself with these and similar tools and methods, Tom recommends exploring the tutorials that are available at The Programming Historian, a DH journal that offers novice-friendly, peer-reviewed instructional guides.

Through trial-and-error, the presenters have figured out their workflow, which can hopefully inspire you to tailor your personalized data management processes. However, they all emphasized that the best research workflow is the one that works for you. For further inspiration when it comes to DH and research data, consider joining DH Benelux 2024, hosted by KU Leuven. This year’s conference, with the theme “Breaking Silos, Connecting Data: Advancing Integration and Collaboration in Digital Humanities”, is sure to bring much more inspiration when it comes to organizing, manipulating, and sharing research data.

Recap: DH Virtual Discussion Group – Fall 2022 Edition

2023年1月10日 21:51

In December we closed off our fifth edition of the DH Virtual Discussion Group, during which we heard from three different early career researchers in our community. The Discussion Group series was jointly organized this semester by Prof. Margherita Fantoli (KU Leuven Faculty of Arts), Dr. Sven Lieber (KBR), Prof. Julie M. Birkholz (KBR Digital Research Lab and Ghent CDH), and myself, Dr. Leah Budke (KU Leuven Libraries Artes). This semester’s edition was yet another engaging and inspiring series of talks and discussions centering on many different aspects of DH research.

In October, Paavo Van Der Eecken from the University of Antwerp introduced us to his PhD project and detailed the decisions that went into to developing the annotation methods for a large corpus of images from nineteenth century Children’s literature. For a recap of this session and to learn more about Paavo’s work, you can read the full recap blog post here.

November brought us an engaging presentation from Houda Lamqaddam on the concept of digital satellites. This was our largest group for the fall edition with an attendance of 27, which led to a dynamic Q&A moment at the end of the meeting. Houda’s presentation took us behind the scenes of her work on Project Cornelia. This project is a hybrid research engine which focuses on bridging art history and computer science. As such, it uses and develops datasets and develops data retrieval and visualization tools to use with seventeenth-century Flemish tapestry and paintings.

From this starting point, Houda’s presentation raised some important questions about the creation and long-term preservation of digital tools. As Houda explained, questions of maintenance and sustainability are at the heart of digital humanities research, especially when it comes to the development of new digital research tools.

To conceptualize the relationship between digital research tools and their role in specific research projects and beyond, Houda introduced the concept of “digital satellites” to us. There are three defining characteristics to this concept: digital satellites are (1) “built to support a specific goal including communication, exploration or analysis, (2) they are artificial in nature, i.e. they exist in highly non-digital spaces, and (3) they orbit around more grounded research material.”[1] Houda also emphasized the importance of the end of the satellite’s life, likening this phase to a type of “space debris.” This phase also deserves thought and preparation; according to Houda, we should be asking ourselves how digital research tools can be useful for us during this phase as well. Some of the steps that Houda defines to support long-term function of satellites include (1) archiving the design, (2) archiving process and requirements, (3) making code available in open access, and (4) considering removing tools that no longer function.[2]

Houda’s presentation sparked much thought and interesting discussion. While we often see how new tools are developed or how existing tools are integrated into various research projects during the DH Virtual Discussion Group presentations, Houda’s presentation encouraged us to think more critically about the long lifecycle of these tools. This is of particular importance when we think of digital scholarship as a concept that also encompasses research data management and scholarly publishing. Documentation and extensive archiving of research output, in this case in the form of digital research tools, is of key importance. Moreover, publishing code in open access has the benefit of not only allowing more access but also creating transparency and giving insight into the research process and development of specific tools.

During our final session in December, we heard from Laura Soffiantini about her research on the extraction of formulae from ancient Latin funerary inscriptions. As Laura explained during her presentation, the funerary inscriptions are typically written or carved into stone pottery or metal. This is a vast amount of material, but Laura’s corpus for this project was narrowed to texts written on tombstones. The pieces that Laura is particularly interested in in these texts—the formulae—are multi-word sentence(s) in semi-fixed or fully-fixed form(s) which are used as stock expressions. Links exist between these words and removing a word in the phrase cannot be done without altering the meaning of the phrase.

Laura was particularly interested in applying computational methods to this large corpus of textual data to test whether or not these methods hold any potential for the recognition and extraction of formulae. She relied on two approaches to test this: (1) a rule-based approach with a manually annotated dataset of funerary formulae and (2) a frequency-based analysis and extraction determined by word combinations.

Laura’s research gave us some insight into the application of different computational methods with a specific linguistic focus. The comparison showed that both methods hold potential for future application, but that neither was fail proof 100% of the time. As a result, Laura’s analysis is a great example of the importance of methods-based research. In addition to presenting us interesting research, it also helped our group to think about the digital methods we apply in humanities research and the way that we justify these decisions in our work.

On behalf of the organizers, I would like to thank all of our presenters from the fall 2022 edition as well as all of those who attended the sessions. We enjoyed learning more about these projects, catching up with and learning from others in our community, and welcoming new people into our DH circle! If you would like to join us for the spring 2023 edition of the Discussion Group, you can join our mailing list here. When we have the spring program finalized, we will also publish the details here on Scholarly Tales.

[1] Houda Lamqaddam, “In Search of Meaning: Thinking Information Visualization within Art History Research” (2022), 93.

[2] Lamqaddam, 98–99.

 

Recap: How do you do it? A behind-the-scenes look at research workflows

2022年12月8日 16:14

To kick-off the Digital Scholarship Module, a training for first-year PhD researchers at the Faculty of Arts, we, at Artes Research, hosted a training session dedicated to research data workflows. Three researchers from the Faculty of Arts offered a behind-the-scenes look at their research workflows by outlining how they approach and structure their research, the tools they use, and with what kind of data they are working. The goal of the session was to provide examples of more advanced workflows for the first-year PhD researchers as they embark on their research journey.

Elisa Nelissen: applying digital tools throughout the entire research workflow

Elisa is a PhD researcher under the supervision of Jack McMartin, working on the interdisciplinary project “The Circulation of Science News in the Coronavirus Era” in collaboration with the KU Leuven Institute for Media Studies. Her research focuses primarily on how science news about COVID-19 vaccines travels to and from Flanders, and the inter- and intralingual translations it is subject to.

Elisa started off the session by introducing us to the tools she applies during various steps of her research workflow, leaving us with plenty of food for thought.

Literature collection

For collecting all the literature that holds potential relevance for her research, Elisa uses Zotero as it has some very interesting features such as full text searches (which makes it easy to look up specific concepts), highlighting and color coding interesting sections or terms, etc.

Go check out our blog posts about Zotero if you are interested in learning more!

Reading literature and tracking progress

After gathering the relevant literature, reading all the collected material naturally follows. Here, Elisa had a very useful tip for those that, just like herself, easily lose focus when reading a text: why not try turning text into audio files? This helps Elisa to follow the text more closely and take notes while listening. She also keeps close track of her reading progress by using the productivity application Notion. Apart from creating reading lists, Notion also helps her to keep an overview of her project’s progress, upcoming tasks, etc.

Data collection

Also for collecting her data, Elisa had to make herself acquainted with new digital tools. A first important piece of data for her research are news articles. As Elisa did not yet know how to code, she followed some online courses on Python to learn the basic skills needed. Thanks to this, she can now scrape websites for metadata of news articles. Another important element in her data collection is conducting interviews, where she finds it very important that you invest in proper recording systems and equipment to guarantee the usability of the material.

If you are a KU Leuven student or staff member, you can borrow audiovisual equipment from the lending service of LIMEL for free! Check out all the details here.

Next to interviews, she conducts surveys with Qualtrics. The best advice here is to test your surveys thoroughly. Once sent out, you cannot change the survey questions anymore, so you have to be sure that the chosen questions will deliver the needed results.

KU Leuven researchers can purchase a Qualtrics license through ICTS, more information can be found here.

Data analysis

First, in order to correctly organize and analyze all the collected data from news articles, Elisa felt the need to build a relational database in FileMaker. It helps her to organize her data, compare texts, and keep track of her overall workflow.

If you are interested in knowing more about FileMaker, check out this training session given by Tom Gheldof in the context of the DH training series organized by the Faculty of Arts.

Secondly, for transcribing the conducted interviews, she uses sonix, which is an automated transcription service. It offers good quality transcriptions that you can edit yourself afterwards. Elisa stresses the importance of anonymizing your interviews before sending them in, to make sure you do not unwillingly share any personal data! Lastly, for coding the interviews she uses NVivo.

KU Leuven researchers can purchase an NVivo license through ICTS, more information can be found here.

To conclude her talk, Elisa left us with a useful tip: it might be interesting to try out a different browser (in her case Sigma) as this might give you new perspectives about how to structure and manage your daily work.

Sara Cosemans: using digital research methods to deal with information overload

Sara is a Doctor Assistant in Cultural History at KU Leuven and a part-time Assistant Professor in the School of Social Sciences at UHasselt. The digital method discussed below was developed during her PhD at KU Leuven, together with data scientists Philip Grant, Ratan Sebastian, and computational linguist Marc Allassonnière-Tang. Learn more about her digital approach in this blog post.

Sara Cosemans

Sara’s presentation was based on her PhD project entitled “The Internationalization of the Refugee Problem. Resettlement from the Global South during the 1970s”, which initially started off as a very analogue project. However, when facing some serious challenges, Sara started to explore digital methods. Her journey was one of trial and error, with a lot of investment in, on the one hand, educating herself in how to use digital tools, and, on the other hand, building a network of digital experts to collaborate with.

Sara’s project required a lot of archival visits in various countries. When going to the archives, she did not yet know what she was looking for exactly, making it necessary to scan every piece of information that held potential relevance. An analysis of the content would have to wait. However, when finalizing her archival visits, she ended up with an unimaginably large corpus of about 100,000 pages. She quickly realized that she would never be able to read everything and needed to come up with a digital solution.

To photograph the archival documents, Sara used her iPad as this had a big enough storage capacity and rendered high quality pictures. By using ABBYY FineReader she could subsequently apply Optical Character Recognition (OCR), which converted these photographs into fully text-searchable documents.

We recently organized a two-day workshop devoted to OCR, you can download the slides on this webpage that collects information and resources about the DH trainings offered by the Faculty of Arts.

The next question, however, was how to search through all these files. A first idea was to build a relational database in FileMaker, which would mean entering all the metadata of the files coming from different institutions into the database, with the ultimate goal of making relations between those files. Unfortunately, entering the metadata was so time-consuming that it could only be completed for one institution. Therefore, she needed to come up with another solution. Since all her photographs were now searchable documents, a first quick way to find information that she was already expecting to find was simply using the CRTL + F function. But how can you find what you don’t already know? Here, natural language processing (NLP) proved to be the solution.

Since Sara did not have the time to learn natural language processing methods like topic modelling and clustering herself, she invested her energy in networking at DH conferences, which led to finding researchers who were very eager to work with her data. They developed a Google Colaboratory notebook in Python to do topic modelling on all files, determine topics, and make visualizations. They then created reading lists about the most important documents so that Sara could start with reading those files. This close reading made it possible for Sara to find new topics, which she could then explore further in other documents by using her CRTL + F method.

Sara concluded by saying that while she needed digital methods to make her research manageable and to help her find relevant connections, the analysis of the material still depended completely on her. The computer will never fully replace the close-reading, deep-thinking historian.

Marianna Montes: reproducibility and versioning as two important keys to a successful coding project

Mariana’s main research interests lie in corpus linguistics and cognitive semantics. The goal of her PhD project is methodological triangulation of distributional methods (namely, comparing vector space semantics, behavioral profiles and traditional lexicographical analysis), with case studies in English and Dutch. Some of the tools developed and used within the project can be found on her personal webpage. She recently also started working at ICTS, where she supports research data management.

Marianna’s interest in digital methods and tools was spiked when studying languages for which she needed to acquaint herself with statistics and programming. During her talk she therefore stressed the importance of challenging yourself to learn new skills and to use new digital tools. Over the past years, she has actively helped fellow researchers in their process of trying out new methods to achieve greater efficiency in their work.

Her main expertise is in R. She showed us how R can be used in multiple ways throughout your research: creating plots, making interactive reports, presenting slides, coding workflows, and so forth. On her blog, Marianna wrote an interesting piece about how you can implement R-project tools in your workflow.

Marianna also underlined how your work should be reproducible for both yourself and other people. During her research, Marianna experimented a lot with running various codes, trying out different clustering algorithms, etc. She ended up forgetting how she reached her results, making it necessary to double- or triple-check everything. Therefore, she started to carefully register all steps in her workflow in order to put into words the reasoning behind her coding. This way, she could answer questions like “What decisions did I make, and why?”. Marianna has written more extensively about how your old, current, and future self might not understand your decisions in this insightful blog post.

In the same vein, Marianna highlighted how versioning can be a true life-saver. For this, she uses Git. Git allows you to control versions, keep track of the differences between files, retract files that were removed, and make a screenshot of the state of your files at a given time. This way, you create an online backup, that you can also share with other people.

KU Leuven hosts its own GitLab, you can find more detailed information here.

To conclude with an important message that was shared throughout all the presentations: doing a PhD, despite popular belief, should not be done in isolation. Instead, you should look for potential ways to connect with other researchers. A willingness to make the process of developing the dissertation visible can only help to improve the project and stimulate collaborations, which might lead to solving the problems you are facing or opening up new research avenues and generating new perspectives.

Recap: DH Virtual Discussion Group for ECRs in Belgium 24/10

2022年10月25日 18:17

Yesterday we kicked off our fall 2022 edition of the DH Virtual Discussion Group for ECRs in Belgium. This edition marks our fifth semester of the discussion group, which is jointly organized this year by Prof. Margherita Fantoli (KU Leuven Faculty of Arts), Dr. Sven Lieber (KBR), Prof. Julie M. Birkholz (KBR Digital Research Lab and Ghent CDH), and myself, Dr. Leah Budke (KU Leuven Libraries Artes). Our first session featured a behind-the-scenes presentation by PhD candidate Paavo Van der Eecken from the University of Antwerp. We had a total of 15 attendees during this group meeting, which contributed to a great networking environment and stimulating discussion inspired by Paavo’s work.

Paavo’s presentation, “Viewing Between the Lines: Annotating Sensitive Attributes in Illustrated Children’s Literature,” was based on his PhD project focusing on nineteenth and early twentieth-century children’s literature in Dutch. More specifically, Paavo’s project examines representation in illustrations along the lines of age, race, class, and gender. He shared insight with us into how he developed the annotation strategies for the images in his corpus, which was provided by the DBNL and comprises more than 1000 children’s books.

We learned from Paavo’s presentation that conducting large-scale analyses like his can be greatly aided by the use of digital tools, in this case image annotation tools. For the annotations, Paavo explored the use of a number of different annotation tools, and he stressed the importance of taking time to try different tools. In the end, Paavo settled on the VGG Image Annotator tool.

While Paavo does some of the annotations himself, he is also assisted by other annotators. Annotating and analyzing a corpus of this size with multiple people contributing requires a standardized approach and the VGG Image Annotator tool allows for this. As Paavo explained, he was able to develop a list of labels to populate the image annotator tool, which annotators could then select from to label the images they were working on. If they happened to come across a problematic image–that is to say, one that cannot easily be categorized–then they log it in a shared spreadsheet for follow-up.

The behind-the-scenes talk generated plenty of food for thought. Our attendees this time had questions about the research workflow of the project, the intricacies of analyzing a concept like (historic) representations of race, and the possibility of automating the image tagging (through the use AI). 

 

Our next meeting will take place on Monday 14 November, 15h00-16h30 CET. Join our mailing list to receive the link on the day of the session! If you would like to see the full schedule for the upcoming sessions including abstracts (post will be updated as abstracts are confirmed), please check here.

Recap: Zotero workshop and plugin recommendations

2022年7月7日 22:14

Recap Zotero workshop

On the 16th of June the Artes Research team joined forces with the research support services from the Faculty of Law to help PhD researchers get started with Zotero, a free online reference manager. The session was a success with an attendance of around 30 researchers, who were all eager to dive into the world of reference management.

*Please note, these materials were created based on version 6 of the Zotero software; the interface received a major update with version 7. While the interface looks different, Zotero 7 still has the same features as previous versions, however, some plug-ins may no longer be supported.

We kicked off the workshop with an introduction to the basic functionalities of Zotero. Nele Noppe of the Artes Research team shared her knowledge on the various ways to gather references to a personal Zotero library, gave an in-depth explanation on how to organize a library with tags, folders and collections and zoomed in on several topics such as the integration of Zotero in Word which allows you to create in-text citations, footnotes and bibliographies among other things. Of particular interest was the analysis of Zotero plugins.  Zotero has a very wide range of plugins that add extra functionality and connect Zotero with other software and platforms. As there are numerous plugins, we have made a list of plugin recommendations with a short explanation of their use, which we have included in this post (see below).

This was followed by an introduction to Juris-M by Patrick Allo from the Faculty of Law. Juris-M is a Zotero offshoot with extra features for law research and multilingual bibliographies.

During the next part, the participants had the chance to bring this new information into practice.  Everyone was assigned to a breakout room with a group leader. During the next hour Leah Budke, Nele Noppe, Patrick Allo and Rebecca Burke guided their group through some basic exercises in Zotero.  In preparation for this workshop, the researchers had been asked to gather a few sources to experiment with. These sources were used to test out the functionalities of Zotero, including how to import a reference with the Zotero browser plugin, how to create a bibliography, and more.

If you weren’t able to attend the workshop, you can still consult the documentation provided by Nele Noppe. Her presentation is available on Zenodo here. We also have a few blogposts on how to get started with Zotero. Here you can find part 1 and part 2. Keep an eye out on the blog for future Zotero workshops or check out the Kubic classroom sessions.

Plugin recommendations

Plugin recommendations from Nele Noppe and Patrick Allo 

  • For everyone: ZotFile, which adds all sorts of handy PDF functionality to Zotero. If you notice that a PDF-related feature described in the presentation linked above doesn’t seem to appear in your Zotero, it’s probably because you don’t have ZotFile installed yet. 
  • For those who work with a lot of PDFs that are not searchable because the PDF pages are just images, not text you can select: Zotero OCR, which performs optical character recognition on PDFs to make them searchable. Setup has a number of technical steps, but the instructions are very clear. 
  • If you do your writing in a program that isn’t Microsoft Word: Zotero word processor plugins to cite Zotero items from a range of other programs.  
  • If your writing program isn’t covered by a plugin: First, double-check the list of plugins to make sure there’s nothing that covers your situation. If there’s nothing, you can still use the plugin RTF/ODF Scan for Zotero to create a workaround and connect your favorite writing program with your Zotero library. 
  • For those who currently have a big library of PDFs in a manual folder-based system and want to switch to Zotero: Zotero Folder Import 
  • To quickly start bringing order to a Zotero library: Zotero Storage Scanner looks for missing attachments and duplicate items. 
  • For people who want to use Zotero for note-taking: Better Notes has useful functionality for organizing your notes and visualizing connections between them.
  • For those who want to apply simple but inspiring data visualizations to their Zotero library: Zotero Voyant Export connects Zotero with the Voyant visualization software. 
  • If you work in LaTeX: Better BibTex for Zotero is for you. The plugin also enables integration with some additional writing programs, though we haven’t tested this.  

Recap: March 2022 DH Virtual Discussion Group

2022年4月1日 14:47

The first meeting of the spring 2022 edition of the DH Virtual Discussion Group for ECRs in Belgium kicked off on Monday 21 March with a presentation from Gianluca Valenti (University of Liège). We had a total of twenty attendees—some new faces and some familiar ones—who all contributed to an engaging conversation about digital humanities. 

This session followed our standard format, which opens with a greeting from the organizers, Julie M. Birkholz (KBR and Ghent University), Margherita Fantoli (KU Leuven), and Leah Budke (KU Leuven). This is followed by our networking session where new and returning attendees can introduce themselves in a small group, tell about their interests and experiences in DH, and get to know others in the community. This networking moment also allows those of us who already know each other to catch up and enjoy a coffee or tea before the main presentation starts and to welcome new members into our community. After the networking moment, the group comes back together to share any upcoming DH events or opportunities. The main event follows, when a member of our community gives us a behind-the-scenes look at a digital project, workflow, or tool. 

Gianluca’s “under-the-hood” presentation was titled “Modern Letters and Text Analysis: The ‘EpistolarITA’ Project” and discussed the importance of epistolary texts in historical research. As Gianluca explained, today there is a wealth of correspondence available to researchers, but we are still lacking adequate tools to engage with these materials to the fullest extent. The EpistolarITA project aims to fill this gap and to contribute to scholarly efforts to exploit historical epistolary texts through the development of the EpistolarITA database. The database brings together fifteenth through seventeenth century Italian letters and allows users to perform statistical analysis on this corpus. As Gianluca explained in his presentation, the database allows readers to compare a target text to the texts in the corpus. The database then has the capability to return similar texts, ranking them in order of their similarity. In order to be able to accomplish this, the algorithm uses a number of different techniques including TF-IDF, Word2Vec, and Named-Entity-Recognition. The advantage of using the database, as Gianluca demonstrated, is that it allows users to draw connections or to see patterns that they might not otherwise see. While the full text of letters is not made available due to copyright restrictions, users are still able to perform text analysis on these materials and to return results that they would otherwise not be able to achieve without many visits to the archives and the additional work that goes into creating the infrastructure which allows this type of text analysis.  

The EpistolarITA database is still in the process of being populated, but the official publication is expected this spring. For now, the project site and database is entirely in Italian, but they hope to make an English translation available in the future. 


If a look behind the scenes of a digital project sounds interesting to you, we would be delighted to have you join us for our next DH Virtual Discussion Group meeting on Monday 25 April from 15h-16h30 CEST! In this session, Montaine Denys from the Flanders Heritage Libraries will take us behind the scenes of the Flanders Heritage Libraries’ digitization projects. Montaine’s talk, titled “Managing the Evaluation of OCR Quality in Flemish Newspaper Collections,” will include a discussion of the project workflow, the creation of a “ground truth” dataset, interpreting results, and the specific challenges they have faced and the lessons they have learned while undertaking this project.  

To join us for this session or any future sessions all you need to do is register for our mailing list. Once registered, you will receive all future emails, including the links to the Zoom meetings. These links are distributed via email the morning of the event. 

The DH Virtual Discussion Group is designed to be a low-threshold way for researchers, particularly early career researchers, to come together and learn about digital humanities. Everyone is welcome to attend and absolutely no DH expertise is required. To see a full overview of this spring’s sessions, click here. If there is a session that seems of interest to you, please do join us! 

❌