普通视图

Received before yesterday

Interesting digital humanities data sources

2025年8月26日 12:00

I bookmark sources of data that seem interesting for digital humanities teaching and research:

  • showing humanists what data & datafication in their fields can look like
  • having interesting examples when teaching data-using tools
  • trying out new data tools

I’m focusing on sharing bookmarks with data that’s already in spreadsheet or similar structured format, rather than e.g.

  • collections of digitized paper media also counting as data and worth exploring, like Josh Begley’s racebox.org, which links to full PDFs of US Census surveys re:race and ethnicity over the years; or
  • 3D data, like my colleague Will Rourk’s on historic architecture and artifacts, including a local Rosenwald School and at-risk former dwellings of enslaved people

Don’t forget to cite datasets you use (e.g. build on, are influenced by, etc.)!

And if you’re looking for community, the Journal of Open Humanities Data is celebrating its 10th anniversary with a free, global virtual event on 9/26 including “lightning talks, thematic dialogues, and community discussions on the future of open humanities data”.

Data is being destroyed

U.S. fascists have destroyed or put barriers around a significant amount of public data in just the last 8 months. Check out Laura Guertin’s “Data, Interrupted” quilt blog post, then the free DIY Web Archiving zine by me, Quinn Dombrowski, Tessa Walsh, Anna Kijas, and Ilya Kreymer for a novice-friendly guide to helping preserve the pieces of the Web you care about (and why you should do it rather than assuming someone else will). The Data Rescue project is a collaborative project meant “to serve as a clearinghouse for data rescue-related efforts and data access points for public US governmental data that are currently at risk. We want to know what is happening in the community so that we can coordinate focus. Efforts include: data gathering, data curation and cleaning, data cataloging, and providing sustained access and distribution of data assets.”

Interesting datasets

The Database of African American and Predominantly White American Literature Anthologies

By Amy Earhart

“Created to test how we categorize identities represented in generalist literature anthologies in a database and to analyze the canon of both areas of literary study. The dataset creation informs the monograph Digital Literary Redlining: African American Anthologies, Digital Humanities, and the Canon (Earhart 2025). It is a highly curated small data project that includes 267 individual anthology volumes, 107 editions, 319 editors, 2,844 unique individual authors, and 22,392 individual entries, and allows the user to track the shifting inclusion and exclusion of authors over more than a hundred-year period. Focusing on author inclusion, the data includes gender and race designations of authors and editors.”

National UFO Reporting Center: “Tier 1” sighting reports

Via Ronda Grizzle, who uses this dataset when teaching Scholars’ Lab graduate Praxis Fellows how to shape research questions matching available data, and how to understand datasets as subjective and choice-based. I know UFOs sounds like a funny topic, and it can be, but there are also lots of interesting inroads like the language people use reflecting hopes, fears, imagination, otherness, certainty. A good teaching dataset given there aren’t overly many fields per report, and those include mappable, timeline-able, narrative text, and a very subjective interesting one (a taxonomy of UFO shapes). nuforc.org/subndx/?id=highlights

The Pudding

Well researched, contextualized, beautifully designed data storytelling on fun or meaningful questions, with an emphasis on cultural data and how to tell stories with data (including personally motivated ones, something that I think is both inspiring for students and great to have examples of how to do critically). pudding.cool

…and its Ham4Corpus use

Shirley Wu for The Pudding’s interactive visualization of every line in Hamilton uses my ham4corpus dataset (and data from other sources), which might be a useful example of how an afternoon’s work with open-access data (Wikipedia, lyrics) and some simple scripted data cleaning and formatting can produce foundations for research and visualization.

Responsible Datasets in Context

Dirs. Sylvia Fernandez, Miriam Posner, Anna Preus, Amardeep Singh, & Melanie Walsh

“Understanding the social and historical context of data is essential for all responsible data work. We host datasets that are paired with rich documentation, data essays, and teaching resources, all of which draw on context and humanities perspectives and methods. We provide models for responsible data curation, documentation, story-telling, and analysis.” 4 rich dataset options (as of August 2025) each including a data essay, ability to explore the data on the site, programming and discussion exercises for investigating and understanding the data. Datasets: US National park visit data, gender violence at the border, early 20th-century ~1k poems from African American periodicals, top 500 “greatest” novels according to OCLC records on novels most held by libraries. responsible-datasets-in-context.com

Post45 Data Collective

Eds Melanie Walsh, Alexander Manshel, J.D. Porter

“A peer-reviewed, open-access repository for literary and cultural data from 1945 to the present”, offering 11 datasets (as of August 2025) useful in investigations such as how book popularity & literary canons get manufactured. Includes datasets on “The Canon of Asian American Literature”, “International Bestsellers”, “Time Horizons of Futuristic Fiction”, and “The Index of Major Literary Prizes in the US”. The project ‘provides an open-access home for humanities data, peer reviews data so scholars can gain institutional recognition, and DOIs so this work can be cited’: data.post45.org/our-data.html

CBP and ICE databases

Via Miriam Posner: A spreadsheet containing all publicly available information about CBP and ICE databases, from the American Immigration Council americanimmigrationcouncil.org/content-understanding-immigration-enforcement-databases

Data assignment in The Critical Fan Toolkit

By Cara Marta Messina

Messina’s project (which prioritizes ethical critical studies of fan works and fandom) includes this model teaching assignment on gathering and analyzing fandom data, and understanding the politics of what is represented by this data. Includes links to 2 data sources, as well as Destination Toast’s “How do I find/gather data about the ships in my fandom on AO3?”.

(Re:fan studies, note that there is/was an Archive of Our Own dataset—but it was created in a manner seen as invasive and unethical by AO3 writers and readers. Good to read about and discuss with students, but I do not recommend using it as a data source for those reasons.)

Fashion Calendar data

By Fashion Institute of Technology

Fashion Calendar was “an independent, weekly periodical that served as the official scheduling clearinghouse for the American fashion industry” 1941 to 2014; 1972-2008’s Fashion International and 1947-1951’s Home Furnishings are also included in the dataset. Allows manipulation on the site (including graping and mapping) as well as download as JSON. fashioncalendar.fitnyc.edu/page/data

Black Studies Dataverse

With datasets by Kenton Ramsby et al.

Found via Kaylen Dwyer. “The Black Studies Dataverse contains various quantitative and qualitative datasets related to the study of African American life and history that can be used in Digital Humanities research and teaching. Black studies is a systematic way of studying black people in the world – such as their history, culture, sociology, and religion. Users can access the information to perform analyses of various subjects ranging from literature, black migration patterns, and rap music. In addition, these .csv datasets can also be transformed into interactive infographics that tell stories about various topics in Black Studies. “ dataverse.tdl.org/dataverse/uta-blackstudies

Netflix Movies & Shows

kaggle.com/datasets/shivamb/netflix-shows

Billboard Hot 100 Number Ones Database

By Chris Dalla Riva

Via Alex Selby-Boothroyd: Gsheet by Chris Dalla Riva with 100+ data fields for every US Billboard Hot 100 Number One song since August 4th, 1958.

Internet Broadway Database

Found via Heather Froehlich: “provides data, publishes charts and structured tables of weekly attendance and ticket revenue, additionally available for individual shows”. ibdb.com

Structured Wikipedia Dataset

Wikimedia released this dataset sourced from their “Snapshot API which delivers bulk database dumps, aka snapshots, of Wikimedia projects—in this case, Wikipedia in English and French languages”. “Contains all articles of the English and French language editions of Wikipedia, pre-parsed and outputted as structured JSON files using a consistent schema compressed as zip” huggingface.co/datasets/wikimedia/structured-wikipedia. Do note there has been controversy in the past around Hugging Face scraping material for AI/dataset use without author permission, and differing understandings of how work published in various ways on the web is owned. (I might have a less passive description of this if I went and reminded myself what happened, but I’m not going to do that right now.)

CORGIS: The Collection of Really Great, Interesting, Situated Datasets project

By Austin Cory Bart, Dennis Kafura, Clifford A. Shaffer, Javier Tibau, Luke Gusukuma, Eli Tilevich

Visualizer and exportable datasets of a lot of interesting datasets on all kinds of topics.

FiveThirtyEight’s data

I’m not a fan for various reasons, but their data underlying various political, sports, and other stats-related articles might still be useful: [data.fivethirtyeight.com(https://data.fivethirtyeight.com/) Or look at how and what they collect, include in their data and what subjective choices and biases those reveal :)

Zine Bakery zines

I maintain a database of info on hundreds of zines related to social justice, culture, and/or tech topics for my ZineBakery.com project—with over 60 metadata fields (slightly fewer for the public view) capturing descriptive and evaluative details about each zine. Use the … icon then “export as CSV” to use the dataset (I haven’t tried this yet, so let me know if you encounter issues).

OpenAlex

I don’t know much about this yet, but it looked cool and is from a non-profit that builds tools to help with the journal racket (Unsub for understanding “big deals” values and alternatvies, Unpaywall for OA article finding). “We index over 250M scholarly works from 250k sources, with extra coverage of humanities, non-English languages, and the Global South. We link these works to 90M disambiguated authors and 100k institutions, as well as enriching them with topic information, SDGs, citation counts, and much more. Export all your search results for free. For more flexibility use our API or even download the whole dataset. It’s all CC0-licensed so you can share and reuse it as you like!” openalex.org

Bonus data tools, tutorials

Matt Lincoln’s salty: “When teaching students how to clean data, it helps to have data that isn’t too clean already. salty offers functions for “salting” clean data with problems often found in datasets in the wild, such as pseudo-OCR errors, inconsistent capitalization and spelling, invalid dates, unpredictable punctuation in numeric fields, missing values or empty strings”.

The Data-Sitters Club for smart, accessible, fun tutorials and essays on computational text analysis for digital humanities.

Claudia Berger’s blog post on designing a data physicalization—a data quilt!—as well as the final quilt and free research zine exploring the data, its physicalization process, and its provocations.

The Pudding’s resources for learning & doing data journalism and research

See also The Critical Fan Toolkit by Cara Marta Messina (discussed in datasets section above), which offers both tools and links to interesting datasets.

Letterpress data, not publicly available yet…

I maintain a database of the letterpress type, graphic blocks/cuts, presses, supplies, and books related to book arts owned by me or by Scholars’ Lab. I have a very-in-progress website version I’m slowly building, without easily downloadable data, just a table view of some of the fields.

I also have a slice of this viewable online and not as downloadable data: just a gallery of the queerer letterpress graphic blocks I’ve collected or created. But I could get more online if anyone was interested in teaching or otherwise working with it?

I also am nearly done developing a database of the former VA Center for the Book: Book Arts Program’s enormous collection of type, which includes top-down photos of each case of type. I’m hoping to add more photos of example prints that use each type, too. If this is of interest to your teaching or research, let me know, as external interest might motivate me to get to the point of publishing sooner.

Designing a Data Physicalization: A love letter to dot grid paper

2025年2月11日 13:00

Claudia Berger is our Virtual Artist-in-Residence 2024-2025; register for their April 15th virtual talk and a local viewing of their data quilt in the Scholars’ Lab Common Room!

This year I am the Scholars’ Lab’s Virtual Artist-in-Residence, and I’m working on a data quilt about the Appalachian Trail. I spent most of last semester doing the background research for the quilt and this semester I get to actually start working on the quilt itself! Was this the best division of the project, maybe not. But it is what I could do, and I am doing everything I can to get my quilt to the Lab by the event in April. I do work best with a deadline, so let’s see how it goes. I will be documenting the major steps in this project here on the blog.

Data or Design first?

This is often my biggest question, where do I even start? I can’t start the design until I know what data I have. But I also don’t know how much data I need until I do the design. It is really easy to get trapped in this stage, which may be why I didn’t start actively working on this part of the project until January. It can be daunting.

N.B. For some making projects this may not apply because the project might be about a particular dataset or a particular design. I started with a question though, and needed to figure out both.

However, like many things in life, it is a false binary. You don’t have to fully get one settled before tackling the other, go figure. I came up with a design concept, a quilt made up of nine equally sized blocks in a 3x3 grid. Then I just needed to find enough data to go into nine visualizations. I made a list of the major themes I was drawn to in my research and went about finding some data that could fall into these categories.

A hand-written list about a box divided into nine squares, with the following text: AT Block Ideas: demographics, % land by state, Emma Gatewood, # miles, press coverage, harassment, Shenandoh, displacements, visit data, Tribal/Indig data, # of tribes, rights movements, plants on trail, black thru-hikers
What my initial planning looks like.

But what about the narrative?

So I got some data. It wasn’t necessarily nine datasets for each of the quilt blocks but it was enough to get started. I figured I could get started on the design and then see how much more I needed, especially since some of my themes were hard to quantify in data. But as I started thinking about the layout of the quilt itself I realized I didn’t know how I wanted people to “read” the quilt.

Would it be left to right and top down like how we read text (in English)?

A box divided into 9 squares numbered from left to write and top to bottom:  
1, 2, 3  
4, 5, 6  
7, 8, 9

Or in a more boustrophedon style, like how a river flows in a continuous line?

A box divided into 9 squares numbered from left to write and top to bottom: 1, 2, 3; 6, 5, 4; 7, 8, 9

Or should I make it so it can be read in any order and so the narrative makes sense with all of its surrounding blocks? But that would make it hard to have a companion zine that was similarly free-flowing.

So instead, I started to think more about quilts and ways narrative could lend itself to some traditional layouts. I played with the idea of making a large log cabin quilt. Log cabin patterns create a sort of spiral, they are built starting with the center with pieces added to the outside. This is a pattern I’ve used in knitting and sewing before, but not in data physicalizations.

A log cabin quilt plan, where each additional piece builds off of the previous one.
A template for making a log cabin quilt block by Nido Quilters

What I liked most about this idea is it has a set starting point in the center, and as the blocks continue around the spiral they get larger. Narratively this let me start with a simpler “seed” of the topic and keep expanding to more nuanced visualizations that needed more space to be fully realized. The narrative gets to build in a more natural way.

A plan for log cabin quilt. The center is labeled 1, the next piece (2) is below it, 3 is to the right of it, 4 is on the top, and 5 is on the side. Each piece is double the size of the previous one (except 2, which is the same size as 1).

So while I had spent time fretting about starting with either data/the design of the visualizations, what I really needed to think through first was what is the story I am trying to tell? And how can I make the affordances of quilt design work with my narrative goals?

I make data physicalizations because it prioritizes narrative and interpretation more than the “truth” of the data, and I had lost that as I got bogged down in the details. For me, narrative is first. And I use the data and the design to support the narrative.

Time to sketch it out

This is my absolute favorite part of the whole process. I get to play with dot grid paper and all my markers, what’s not to love? Granted, I am a stationery addict at heart. So I really do look for any excuse to use all of the fun materials I have. But this is the step where I feel like I get to “play” the most. While I love sewing, once I get there I already have the design pretty settled. I am mostly following my own instructions. This is where I get to make decisions and be creative with how I approach the visualizations.

(I really find dot grid paper to be the best material to use at this stage. It gives you a structure to work with that ensures things are even, but it isn’t as dominating on a page as a full grid paper. Of course, this is just my opinion, and I love nothing more than doodling geometric patterns on dot grid paper. But using it really helps me translate dimensions to fabric and I can do my “measuring” here. For this project I am envisioning a 3 square foot quilt. The inner block. Block 1, is 12 x 12 inches, so each grid represents 3 inches.)

There is no one set way with how to approach this, this is just a documentation of how I like to do it. If this doesn’t resonate with how you like to think about your projects that is fine! Do it your own way. But I design the way I write, which is to say extremely linearly. I am not someone who can write by jumping around a document. I like to know the flow so I start in the beginning and work my way to the end.

Ultimately, for quilt design, my process looks like this:

  1. Pick the block I am working on
  2. Pick which of the data I have gathered is a good fit for the topic
  3. Think about what is the most interesting part of the data, if I could only say one thing what would that be?
  4. Are there any quilting techniques that would lend itself to the nature of the data or the topic? For example: applique, English Paper Piecing, half square triangles, or traditional quilt block designs, etc.
  5. Once I have the primary point designed, are there other parts of the data that work well narratively? And is there a design way to layer it?

For example, this block on the demographics of people who complete thru-hikes of the trail using annual surveys since 2016. (Since they didn’t do the survey 2020 - and it was the center of the grid - I made that one an average of all of the reported years using a different color to differentiate it.)

I used the idea of the nine-patch block as my starting point, although I adapted it to be a base grid of 16 (4x4) patches to better fit with the dimensions of the visualization. I used the nine-patch idea to show the percentage of the gender (white being men and green being all other answers - such as women, nonbinary, etc). If it was a 50-50 split, 8 of the patches in each grid should be white, but that is never the case. I liked using the grid because it is easy to count the patches in each one, and by trying to make symmetrical or repetitive designs it is more obvious where it isn’t balanced.

A box divided into 9 squares, with each square having its one green and white checkered pattern using the dot grid of the paper as a guide. The center square is brown and white. On top of each square is a series of horizontal or vertical lines ranging from four to nine lines.

But I also wanted to include the data on the reported race of thru-hikers. The challenge here is that it is a completely different scale. While the gender split on average is 60-40, the average percentage of non-white hikers is 6.26%. In order to not confuse the two, I decided to use a different technique to display the data, relying on stitching instead of fabric. I felt this let me use two different scales at the same time, that are related but different. I could still play with the grid to make it easy to count, and used one full line of stitching to represent 1%. Then I could easily round the data to the nearest .25% using the grid as a guide. So the more lines in each section, the more non-white thru-hikers there were.

My last step, once I have completed a draft of the design, is to ask myself, “is this too chart-y?” It is really hard sometimes to avoid the temptation to essentially make a bar chart in fabric, so I like to challenge myself to see if there is a way I can move away from more traditional chart styles. Now, one of my blocks is essentially a bar chart, but since it was the only one and it really successfully highlighted the point I was making I decided to keep it.

A collection of designs using the log cabin layout made with a collection of muted highlighters. There are some pencil annotations next to the sketchesThese are not the final colors that I will be using. They will probably all be changed once I dye the fabric and know what I am working with.

Next steps

Now, the design isn’t final. Choosing colors is a big part of the look of the quilt, so my next step is dyeing my fabric! I am hoping to have a blogpost about the process of dyeing raw silk with plant-based dyes by the end of February. (I need deadlines, this will force me to get that done…) Once I have all of those colors I can return to the design and decide which colors will go where. More on that later. In the meantime let me know if you have any questions about this process! Happy to do a follow-up post as needed.

A #mincomp method for data display: CSV to pretty webpage

2025年1月15日 13:00

(Note: Brandon is going to blog about related work! Will link here once that’s live.)

This is a post to tell yall about a neat little web development thing that’s allowed me to easily make (and keep updated!) nifty things displaying kinds of data related to both professional development (easy CV webpage and printable format generation!) and bibliography/book arts (an online type speciment book, based on an easily-updatable Gsheet backend!). If you aren’t interested in the code, do just skim to see the photos showing the neat webpage things this can make.

Screenshot of a type specimen webpage created with Jekyll and a CSV of data
Figure 1: Screenshot of a type specimen webpage created with Jekyll and a CSV of data.

Screenshot of a CV webpage created with Jekyll and a CSV of data
Figure 2: Screenshot of a CV webpage created with Jekyll and a CSV of data.

Jekyll (skip this section if you know what Jekyll is)

Jekyll is a tool for making websites that sit in a middle ground between using a complex tool like WordPress or Drupal (a content management system, aka CMS) or completely coding each page of your website in HTML by hand, and I think easier to create and manage than either extreme. It’s set up to follow principles of “minimal computing” (aka #mincomp), which is a movement toward making technical things more manageably scoped with an emphasis on accessibility for various meanings of that. For example, using website development tools that keep the size of your website files small lets folks with slow internet still access your site.

If you want to know more about Jekyll, I’ve written peer-reviewed pieces on the what, why, and how to learn to make your own Jekyll-generated DH websites—suitable for folks with no previous web development experience!—as well as (with co-author Brandon Walsh) how to turn that into a collaborative research blog with a review workflow (like how ScholarsLab.org manages its blog posts). Basically, Jekyll requires some webpage handcoding, but:

  • takes care of automating bits that you want to use across your website so you don’t have to paste/code them on every page (e.g. you header menu)
  • lets you reuse and display pieces of text (e.g. blog posts, events info, projects) easily across the website (like how ScholarsLab.org has interlinked blog posts, author info, people bio pages, and project pages linking out to people and blog posts involved with that project)

DATA PLOP TIME

The cool Jekyll thing I’ve been enjoying recently is that you can easily make webpages doing things with info from a spreadsheet. I am vaguely aware that may not sound riveting to some people, so let me give you examples of specific uses:

  • I manage my CV info in a spreadsheet (a Gsheet, so I have browser access anywhere), with a row per CV item (e.g. invited talk, published article)
  • I also keep a record of the letterpress type and cuts (letterpress illustrations) owned by SLab and by me in a Gsheet

I periodically export these Gsheets as a CSV file, and plop the CSV file into a /_data folder in a Jekyll site I’ve created. Then, I’ve coded webpages to pull from those spreadsheets and display that info.

Screenshot of my letterpress specimen Gsheet
Figure 3: Screenshot of my letterpress specimen Gsheet

Data Plop Op #1: Online Letterpress Type Specimen Book

You don’t need to understand the code in the screenshot below; just skim it, and then I’ll explain:

Screenshot of some of the code pulling my letterpress Gsheet data into my Jekyll webpage
Figure 4: Screenshot of some of the code pulling my letterpress Gsheet data into my Jekyll webpage

I include this screenshot to show what’s involved to code a webpage that displays data from a CSV. What this shows is how I’m able to call a particular spreadsheet column’s data by just typing “”, rather than pasting in the actual contents of the spreadsheet! LOTS of time saved, and when I edit the spreadsheet to add more rows of data, I just need to re-export the CSV and the website automatically updates to include those edits. For example, in the above screenshot, my CSV has a column that records whether a set of letterpress type is “type high” or not (type high = .918”, the standard height that lets you letterpress print more easily with different typefaces in one printing, or use presses that are set to a fixed height). In the code, I just place “” where I want it in the webpage; you can see I’ve styled it to be part of a bullet list (using the “<li>” tag that creates lists).

In the screenshot, I also use some basic logic to display different emoji, depending on what’s in one of the CSV columns. My “uppercase” column says whether a set of letterpress type includes uppercase letters or not. My code pulls that column (“”) and checks whether a given row (i.e. set of letterpress type or cut) says Uppercase = yes or no; then displays an emoji checkmark instead of “yes”, and emoji red X instead of “no”.

Here’s how one CSV line displayed by my specimen book webpage looks (I haven’t finished styling it, so it doesn’t look shiny and isn’t yet live on my very drafty book arts website):

Screenshot of a webpage displaying letterpress Gsheet data in a nicely designed grid of boxes

And I was also able to code a table version, pulling from the same data:

Screenshot of a webpage displaying letterpress Gsheet data in a nicely designed table format

If the code discussion is confusing, the main takeaway is that this method lets you

  1. manage data that’s easier to manage in a spreadsheet, in a spreadsheet instead of coded in a webpage file; and
  2. easily display stuff from that spreadsheet, without needing to make a copy of the data that could become disjoint from the spreadsheet if you forget to update both exactly the same.

Data Plop Op #2: Keeping your CV updated

I used to manage my CV/resume as Google Docs, but that quickly turned into a dozen GDocs all with different info from different ways I’d edited what I included for different CV-needing opportunities. When I had a new piece of scholarship to add, it wasn’t clear which GDoc to add it to, or how to make sure CV items I’d dropped from one CV (e.g. because it needed to focus on teaching experience, so I’d dropped some less-applicable coding experiences from it) didn’t get forgotten when I made a CV that should include them.

UGH.

A happy solution: I have 1 CV Gsheet, with each row representing a “CV line”/something I’ve done:

Screenshot of a Gsheet containing CV data

I periodically export that CSV and plop it into a Jekyll site folder. Now, I can do 2 cool things: the first is the same as the letterpress specimen book, just styling and displaying Gsheet data on the web. This lets me have both webpages showing a full version of my CV, and a short version of my CV, and theoretically other pages (e.g. code a page to display a CV that only includes xyz categories):

Screenshot of a webpage displaying a CV

And! I’ve also coded a printable CV. This uses a separate CSS stylesheet that fits how I want a printed CV to look different from a website, e.g. don’t break up a CV line item between two pages, don’t include the website menu/logo/footer. Same text as above, styled for printing:

Screenshot of a webpage displaying a CV, with styling that looks like it would print to make a nice-looking printed CV

When I need a whittled down CV that fits a page limit, or that just shows my experience in one area and not others I’m skilled in, I can just make a CSV deleting the unneeded lines—my spreadsheet ahs category and subcategory columns making it easy to sort these, and also to tag lines that could appear in different sections depending on CV use (e.g. sometimes a DH project goes under a peer-reviewed publication section, or sometimes it goes under a coding section as I want my publication section to only include longform writing). But I add new lines always to the same core Gsheet, so I don’t get confused about what I’ve remembered to record for future CV inclusion where.

I currently don’t have this CV website online—I just run it locally when I need to generate a printable CV. But I’ll be adding it to my professional site once I have a bit more time to finish polishing the styling!

In conclusion

Jekyll + CSV files =

Screenshot of a letterpress cut consisting of a repeating row of 5 images; the image that repeats is a hand giving a thumbs-up next to the text "way to go!"

(One of the letterpress cuts recorded by my specimen book Gsheet/webpage, as discussed above!)

ISAM 2024 Conference Report

2024年12月17日 18:07

Each year educators, students, and staff of university makerspaces gather to share research, ideas and projects at the International Symposium on Academic Makerspaces conference. This was the first year since it’s founding in 2016 that the conference was held internationally, at Sheffield University in England. It was, perhaps, the international appeal that convinced several SLab Makerspace Technologists to submit a paper or project to the conference. Unsurprisingly (because these students are amazing) all of the papers and project were accepted for the conference.

It was a great conference, a fun trip, and we all did great on our presentations. The most unfortunate thing was that Link Fu came down with COVID two days before the trip and was too sick to travel with us. Resourceful as always, she recorded her part of the presentation and we were able to play that during our session.

by J.Phan and J. Truong

Recommending Makerspace Best Practices Based On Visualization of Student Use Data

by Holly Zhou and Ammon E. Shepherd

Typewriter Poetics: Creating Collaborative Memory Maps

by Qiming (Link) Fu and Ammon E. Shepherd

Mutualism between Interdisciplinary Student Organizations and Makerspaces: The Nutella Effect

Limited Letterpress Synonym Finder

2024年12月15日 13:00

I coded a quick web app for a particular book arts need: Limited Letterpress Synonym Finder. If you too also only have 1xA-Z letterpress type on hand (ie just the 26 characters of the alphabet, 1 sort per letter) and what to figure out what you can print without needing to carefully position (register) your paper and do multiple pressings between moving the letters around, you can enter words here to see only those synonyms you’re able to print (i.e. only synonyms using no more than 1 of each A-Z letter).

Screenshot of the Limited Letterpress Synonym Finder webpage linked in the post, which says "Limited Letterpress Synonym Finder. For when you only have 1 x A-Z type on hand. Finds synonyms for the word you input, removes any that use any letter more than once, then displays the rest. (Only works with single-word inputs, not phrases.)" There is a field to enter words, with the word "glow" entered in this example screenshot, followed by a "Find that subset of synonyms" button. There is a list of matching non-multiple-same-letter synonyms for "glow" shown, containing the words burn, beam, shine, gleam, and lambency. Below is a retro internet logo image:  on a black background, the text "Limited Letterpress: Synonym Finder" is in a glowing green neon Old English font.

Reframing AI with the Digital Humanities

2024年12月5日 13:00

A version of this piece will be an open-access chapter in a volume by invited speakers at the 10/23/2024 “Reimagining AI for Environmental Justice and Creativity” conference, co-organized by Jess Reia, MC Forelle, and Yingchong Wang and co-sponsored by UVA’s Digital Technology for Democracy Lab, Environmental Institute, and School of Data Science. I had more to say, but this was what I managed inside the word limit!

I direct the Scholars’ Lab, a digital humanities (DH) center that’s led and collaborated on University of Virginia ethical, creative experimentation at the intersections of humanities, culture, and tech since 2006. A common definition of DH encompasses both using digital methods (such as coding and mapping) to explore humanities research questions (such as concerns of history, culture, and art); and asking humanities-fueled questions about technology (such as ethical design review of tools like specific instances of AI). I always add a third core feature of DH: a set of socially just values and community practices around labor, credit, design, collaboration, inclusion, and scholarly communication, inseparable from best-practice DH.

I write this piece as someone with expertise in applicable DH subareas—research programming, digital scholarly design, and the ethical review of digital tools and interfaces—but not as someone with particular experience related to ML, LLMs, or other “AI” knowledges (at the levels that matter, e.g. code-review level, CS-journal-reading). A field of new and rapidly evolving tools means true expertise in the capabilities and design of AI is rare; often we are either talking about secondhand experiences of these tools (e.g. “Microsoft Co-Pilot let me xyz”) or about AI as a shorthand for desired computing capabilities, unfounded on familiarity with current research papers or understanding of codebases. (A values-neutral claim: science fiction authors without technical skillsets have helped us imagine, and later create.)

Convergence on the term “data science” has both inspired new kinds of work, and elided contributions of the significantly overlapping field of library and information studies. Similarly, “AI” as the shorthand for the last few years’ significant steps forward in ML (and LLMs in particular) obscures the work of the digital humanities and related critical digital research and design fields such as Science and Technology Studies (STS). When we use the term “AI”, it’s tempting to frame our conversations as around a Wholly New Thing, focusing on longer-term technical aspirations uninhibited by practical considerations of direct audience needs, community impacts, resources. While that’s not necessarily a bad way to fuel technological creativity, it’s too often the only way popular conversations around AI proceed. In one research blog post exploring the moral and emotional dimensions of technological design, L.M. Sacasas lists 41 questions we can ask when designing technologies, from “What sort of person will the use of this technology make of me?” to “Can I be held responsible for the actions which this technology empowers? Would I feel better if I couldn’t?” We don’t need to reinvent digital design ethics for AI—we’ve already got the approaches we need (though those can always be improved).

When we frame “AI” as code, as a set of work discrete but continuous with a long history of programming and its packagings (codebase, repo, library, plugin…), it’s easier to remember we have years of experience designing and analyzing the ethics and societal impacts of code—so much so that I’ve started assuming people who say “LLM” or “ML” rather than “AI” when starting conversations are more likely to be conversant with the specifics of current AI tech at the code level and CS-journal-reading level, as well as its ethical implications. The terms we use for our work and scholarly conversations are strategic: matching the language of current funding opportunities, job ads. We’ve seen similar technologically-vague popularizing on terms with past convergences of tech interest too, including MOOCs, “big data”, and the move from “humanities computing” to the more mainstreamed “digital humanities”.

Digital humanities centers like our Scholars’ Lab offer decades of careful, critical work evaluating existing tools, contributing to open-source libraries, and coding and designing technology in-house—all founded on humanities skills related to history, ethics, narrative, and more strengths necessary to generative critique and design of beneficial tech. Some of the more interesting LLM-fueled DH work I’ve seen in the past couple years has involved an AI first- or second-pass at a task, followed by verification by humans—for situations where the verification step is neither more onerous nor more error-prone than a human-only workflow. For example:

  • the Marshall Project had humans pull out interesting text from policies banning books in state prisons, used AI to generate useful summaries of these, then had humans check those summaries for accuracy
  • Scholars Ryan Cordell and Sarah Bull tested Chat GPT’s utility in classifying genres of historical newspaper and literary text from dirty OCR and without training data, and in OCR cleanup, with promising results
  • My Scholars’ Lab colleague Shane Lin has been exploring AI applications for OCRing text not well-supported by current tools, such as writing in right-to-left scripts
  • Archaeologists restoring the HMS Victory applied an AI-based algorithm to match very high-resolution, high-detailed images stored in different locations to areas of a 3D model of the ship Alongside any exploration of potential good outcomes, we need to also attend to whether potential gains in our understanding of the cultural record, or how we communicate injustice and build juster futures, are worth the intertwined human and climate costs of this or other tech.

One of DH’s strengths has been its focus on shared methods and tools across disciplines, regardless of differences in content and disciplinary priorities, with practitioners regularly attending interdisciplinary conferences (especially unusual within the humanities) and discussing overlapping applications of tools across research fields. DH experts also prioritize non-content-agnostic conversations, prompted by the frequency with which we borrow and build on tools created for non-academic uses. For example, past Scholars’ Lab DH Fellow Ethan Reed found utility in adapting a sentiment analysis tool from outside his field to exploring the emotions in Black Arts Poetry works, but also spent a significant portion of his research writing critiquing the biased results based on the different language of sentiment in the tool’s Rotten Tomatoes training dataset. (ML training sets are an easy locus for black boxing biases, context, and creator and laborer credit—similar to known issues with text digitization work, as explored by Aliza Elkin’s troublingly gorgeous, free Hand Job zine series capturing Google Books scans that accidentally caught the often non-white, female or non-gender-conforming hands of the hidden people doing the digitizing.)

We already know where to focus to produce more beneficial, less harmful, creative digital tools: social justice. At the Reimagining AI roundtable, my table’s consensus was that issues of power and bias are key not just to reducing ML harms, but to imagining and harnessing positive potential. Key areas of concern included climate terrorism (e.g. reducing the energy costs of data centers), racism (e.g. disproportionate negative impacts on BIPoC compounding existing economic, labor, and police violence threats), human rights (e.g. provision of a universal basic income easing concerns about areas ML may beneficially offset human labor), and intertwined ableist and computing access issues (e.g. AI search-result “slop” is terrible for screen readers, low-bandwidth internet browsing). In our existing scholarly fields and advocacy goals, where are current gaps in terms of abilities, resources, scale, efficiencies, audiences, ethics, and impacts? After identifying those major needs, we’re better positioned to explore how LLMs might do good or ill.

❌