普通视图

Received before yesterday

The OpenAI API documentation is very bad

作者shane-lin
2025年12月8日 13:00

The OpenAI API docs are very bad. In my experience as a coder, I’ve come across my share of bad documentation. Typically, this is because the documentation is poorly organized, too spare, or missing coverage. Or it’s because the design of the API itself is badly conceived, inconsistent, or contains the accumulated cruft of years (or decades!) of bloat and abandoned features.

But I can’t recall ever seeing documentation that contains code samples that are both wrong and also syntactically wrong. It’s bad enough that it comes across as documentation written by GPT–and not even a recent model.

Take this example, part of an entry under the “Core Concepts” section:

context = [
    { "role": "role", "content": "What is the capital of France?" }
]
res1 = client.responses.create(
    model="gpt-5",
    input=context,
)

// Append the first responses output to context
context += res1.output

// Add the next user message
context += [
    { "role": "role", "content": "And it's population?" }
]

res2 = client.responses.create(
    model="gpt-5",
    input=context,
)

The Python code sample here is not syntactically correct. The comments use the ‘//’ convention of C/Java/Javascript in-line comments, rather than Python’s ‘#’. Additionally, OpenAI has the concept of a role, which indicates who (e.g. the system, the user, the model’s responder) is “speaking.” The string “role” is not a valid value for this and making an API call with it results in an error:

openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “Invalid value: ‘role’. Supported values are: ‘assistant’, ‘system’, ‘developer’, and ‘user’.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘input[3]’, ‘code’: ‘invalid_value’}}

So, there are a total of 7 code statements in this sample, including the comments, and 4 of them have errors. The thing is, GPT-5 is actually pretty good at writing code. It’s even capable of executing Python code in an internal environment. We can see this facility in action by simply asking ChatGPT to debug the code from the OpenAI documentation.

ChatGPT response indicating the two errors from the OpenAI API documentation

This is a mode of LLM use that I haven’t had a lot of luck with, but here it pinpoints the two errors perfectly.

When documentation is bad in a common fashion, it typically creates a frustrating programming experience. And, to be clear, the OpenAI docs are bad in some of those ways too. But the sheer lack of care it demonstrates is both shocking for all the ways that Tech has integrated AI into our world and, frankly, majestic. Like making a horse consul or completely blowing up the system of global trade.

Our Journey to Praxathon

2025年4月18日 12:00

My cohort just finished our second week of Praxathon and I wanted to reflect on the development of our project and how we ended up focusing on conducting text analysis of the UVa students’ satirical publication, The Yellow Journal.

For me, this project started back in 2018 when I was accepted into The Yellow Journal as a second year undergraduate student at UVa. The Yellow Journal is an anonymously-published satirical newspaper that has operated on and off since 1913. Undergraduate students know The Yellow Journal for its members’ semesterly tradition of disrupting libraries during the first day of finals by raucously distributing the publication while masked and wearing all yellow… and often blasting Yellow by Coldplay or Black and Yellow by Wiz Khalifa on giant speakers. I started my tenure as a satirical writer with the headline and article below:

Hardest Part of Getting Accepted into the Comm School is Needing to Replace All of Your Friends, Student Says

As the season of applying to the McIntire School of Commerce approaches for second years, older students reflect on their prior application experiences. Kody, a fourth year in the Comm school, explains that the application itself was easy; he had no doubt in his mind that he would get in. The hardest part was letting go of all of his non-Comm friends afterwards. “I just can’t let failure into my life,” Kody explains. “Once you’re in the Comm School, you have to start setting standards for your friends, and most of my friends weren’t meeting mine.” Kody was on the fence about keeping his Batten friends, but eventually decided against it. “Hanging out with them is bad for optics, in my opinion,” Kody stated. “While Batten kids are also good at networking, I can’t let their morals get in my way. They’re all about government intervention… hey dummies, what about the invisible hand?” Drew, an Economics major, elaborates on his ended friendship with Kody: “The minute my roommate Kody got accepted, he turned to me and asked me to move out. I was heartbroken, we had been living together since first year. In fact, he’s also my cousin. But I understand… it had to be done.” Drew wasn’t sure if it was worth it to even continue college after his rejection from Comm. To him, having no diploma at all is better than getting an non-Comm Economics degree.

Outside of writing headlines and articles, Yellow Journal members were also in the midst of digitizing and archiving the entire history of the paper on our Google Drive. The publication started in 1913, but it was only published regularly starting in 1920 and then was subsequently banned in 1934 by the UVa administration due to its anonymity. The publication then resumed in 1987, having its own office next to The Cavalier Daily with a modest amount of revenue from selling ad placements. The paper was discontinued again in 1999, but a group of students revived it in 2010 which resulted in its current, ongoing iteration.

In late 2019, I realized that we were approaching 100 years since The Yellow Journal was published regularly and I applied to a few grants that could possibly fund a special anniversary issue. I wanted to use the extensive archive work that members had so painstakingly organized for future members to look back on. The idea was to publish some highlights from our archive, especially the jokes that still remained relevant today. With quarantine in March 2020, however, interest from my collaborators waned and I eventually abandoned that project. I knew that I wanted to return to working on a project about The Yellow Journal someday because it provided such unique insight on the student experience of the University. Also, even 100 years later, many of the early issues are still so funny.

My position as a former member of The Yellow Journal was definitely the reason that the subject was brought up as a possible topic for our Praxathon, but I don’t think this project would have necessarily worked with other cohorts. The final section on our charter is titled “Make Learning a Playful Process.” That was a big goal of our cohort: to approach the work in a fun, lighthearted way. I wasn’t completely sure about the viability of that pledge when we first wrote the charter. I didn’t know the rest of my cohort well at the time and I was still very operating in “traditional graduate classroom” mode. As we are approaching the end of the year, however, I think I can now safely say that we made every single part of Praxis fun and playful. I spend a good portion of my time in Praxis attempting to stifle my laughter at Oriane’s 10,000 things to commit to Github, Shane’s river drawing, or Brandon attempts to find new phrases because we accidentally made him insecure about saying “for what it’s worth.”

When I first pitched The Yellow Journal as an idea for Praxathon, I was mainly thinking about how it made sense as a project in a practical way: we already had access to high quality digitized records of all of the issues. The scope seemed manageable and it did not require too much preparatory work. As we’ve progressed in the project, I’ve slowly realized why it resonated with us as a group beyond logistics. Since we’re all graduate students at UVa, we are all familiar with and invested in the University’s history (especially told from a student perspective). We want to have fun with the material, which has led to many instances of us sitting in the fellows lounge and reading funny headlines out loud to each other.

Most of all, I think that the way we’ve developed the project has played into our individual and collective strengths. I never even thought about looking at student records from the 1920s and 30s but Gramond, being an incredible historian and lover of data, introduced us to that possibility. Oriane has done some amazing research on the history of the University at the time period that we’re looking at and, more generally, on analyzing satire. Because of her research of poetry, Amna was already interested in many of the text analysis methods that we’re using so she has expertly led us in thinking about how to apply those to The Yellow Journal. Kristin, as always, has shown herself to be an amazing problem solver, ready to tackle any coding task with such resolve and creativity. I just love assigning tasks to people so I have commandeered our Trello board.

Our poster will hopefully be done in the next few weeks, but it is clear to me now that the process, or journey, through the Praxathon is much more important than the end product. As I read through our charter again, I realize how true to our goals we’ve been and how interdisciplinary (and fun!) our final project is.

Print-Ready Web CV

2025年2月3日 13:00

I keep my running CV on my website for a few reasons. It keeps everything in one place. It’s handy to point students towards when they have questions about how to list things on their own CV. It lets me pull in some quick stats on my blog posts using Jekyll. But I’ve always run into problems when it comes time to submit my CV as an actual document. Copying the page over to Microsoft Word brings in all the detritus of my web styling and structure, and I have to dutifully edit those elements out before submission.

I described this problem to Jeremy Boggs, our Head of R&D, and he immediately suggested that I look into making a print stylesheet for my CV page. I knew you could use CSS to style the ways your web page gets printed, but I’d never actually played around with it before. Now I’ve got things going such that I have far less to do when I go to submit my CV. This post documents that process.

The first thing I needed to do was get a way to preview what I was looking at. By default, your developer environment won’t render your print styles unless you go to print a page and look at the preview that pops up. I followed this set of instructions for getting Google Chrome to emulate my print styles in the browser as I worked.

Now that I can see my work, my first real step is to make a new print stylesheet and link it to my site in the head of my default template. This print stylesheet is fairly specialized to a single page, so I wrap the reference in an if statement so that it is only included on that particular page.

_includes/head.html

{% if page.title == "CV" %}
  <link rel="stylesheet" href="{{ site.baseurl }}/styles/print.css">
{% endif %}

Now that the stylesheet is included, I start to build up a styles/print.css file one piece at a time based on the things I want to change. First off was hiding web-only materials like the nav bar and the masthead.

styles/print.css

  /* Hide web-only content  */
  header, nav, h1.page-title, #web-title{
    display: none;
  }

But I actually do want some material as a masthead. I implement this by actually having some content on the web that is only rendered to the browser if it is printed. This becomes a part of the masthead, contained in my default layout for my jekyll site.

_layouts/default.html

<div class="print-only" id="print-title">Brandon Walsh | walsh.brandon.michael@gmail.com | @walshbr</div>

And then it only appears when printed by specifying a print-only class for those elements that I only want when printing.

styles/print.css

  .print-only{
    /* Display print-only content */
    display: block;
  }

The web version of my CV does not span the whole width of the page, which is good for readability on the web but a problem when printing. So these settings create a more typical one-inch margin for the document. Another interesting issue I ran into was that some printers by default will include metadata - date, page number, time - on the page for printing. The margin settings below cut that off.

styles/print.css

  div.container.content {
    /* normalize content sizing for printing */
    max-width: none !important;
  }

  @page {
    /* hide printer-specific information that would otherwise get added */
    margin:0;
    padding: 1in;
  }

And then finally I got down to the actual task of setting up some basic stylings that make it a little less web and a little more print. I switch fonts, change the text size, and revert to default URL text decorations for the sake of the genre. In the past I’ve found that my link styles, especially, look very strange when copied over to a print document.

styles/print.css

  html, h2, h3, #print-title{
    /* set print-ready font and text size */
      font-family: Georgia, 'Times New Roman', Times, serif; 
      font-size: 12px;
  }

  a{
    /* Normalize URL colors */
      text-decoration: underline;
      color: blue;
  }

One stretch goal that I haven’t fully implemented: I commonly need a quick way to cut my CV down. There’s no good way to do it programmatically with any real level of precision, but Jeremy showed me how to do a rough cut that works well in a pinch. I’m using Jekyll by default, which will give ID’s to all of my headers that match the titles. Jeremy showed me how to use CSS selectors to selectively hide whole batches of content based on those ID’s. The following CSS would hide all of my service commitments. Not really super useful to do a lot of, but maybe helpful to know about!

styles/print.css

/* example for how to hide specific sections */
/* 
h2#professional-service-and-affiliations,
h2#professional-service-and-affiliations+ul,
h3#local-service-washington-and-lee,
h3#local-service-washington-and-lee+ul,
h3#local-service-university-of-virginia+ul{
  display:none;
} */

That’s it for now. Much more that I could do, but this serves my needs nicely. And here’s a quick side-by-side of the first printed page to see how the new print.css sheet stacks up.

First the original print, which is a pretty close copy of the web version:

original printed cv

And now the new one with a print stylesheet incorporated. Much more usable as a CV! I could save it as a PDF to submit.

printed cv with a stylesheet - looks much more like a cv!

I’ve pasted the full contents of all the relevant files as they stand in case you’re interested in replicating.

_includes/head.html

{% if page.title == "CV" %}
  <link rel="stylesheet" href="{{ site.baseurl }}/styles/print.css">
{% endif %}

_layouts/default.html

<div class="print-only" id="print-title">Brandon Walsh | walsh.brandon.michael@gmail.com | @walshbr</div>

styles/print.css

@media print {
  
  /* Hide web-only content  */
  header, nav, h1.page-title, #web-title{
    display: none;
  }

  .print-only{
    /* Display print-only content */
    display: block;
  }

  div.container.content {
    /* normalize content sizing for printing */
    max-width: none !important;
  }

  @page {
    /* hide printer-specific information that would otherwise get added */
    margin:0;
    padding: 1in;
  }

  html, h2, h3, #print-title{
    /* set print-ready font and text size */
      font-family: Georgia, 'Times New Roman', Times, serif; 
      font-size: 12px;
  }

  a{
    /* Normalize URL colors */
      text-decoration: underline;
      color: blue;
  }

/* example for how to hide specific sections */
/* 
h2#professional-service-and-affiliations,
h2#professional-service-and-affiliations+ul,
h3#local-service-washington-and-lee,
h3#local-service-washington-and-lee+ul,
h3#local-service-university-of-virginia+ul{
  display:none;
} */

}

A #mincomp method for data display: CSV to pretty webpage

2025年1月15日 13:00

(Note: Brandon is going to blog about related work! Will link here once that’s live.)

This is a post to tell yall about a neat little web development thing that’s allowed me to easily make (and keep updated!) nifty things displaying kinds of data related to both professional development (easy CV webpage and printable format generation!) and bibliography/book arts (an online type speciment book, based on an easily-updatable Gsheet backend!). If you aren’t interested in the code, do just skim to see the photos showing the neat webpage things this can make.

Screenshot of a type specimen webpage created with Jekyll and a CSV of data
Figure 1: Screenshot of a type specimen webpage created with Jekyll and a CSV of data.

Screenshot of a CV webpage created with Jekyll and a CSV of data
Figure 2: Screenshot of a CV webpage created with Jekyll and a CSV of data.

Jekyll (skip this section if you know what Jekyll is)

Jekyll is a tool for making websites that sit in a middle ground between using a complex tool like WordPress or Drupal (a content management system, aka CMS) or completely coding each page of your website in HTML by hand, and I think easier to create and manage than either extreme. It’s set up to follow principles of “minimal computing” (aka #mincomp), which is a movement toward making technical things more manageably scoped with an emphasis on accessibility for various meanings of that. For example, using website development tools that keep the size of your website files small lets folks with slow internet still access your site.

If you want to know more about Jekyll, I’ve written peer-reviewed pieces on the what, why, and how to learn to make your own Jekyll-generated DH websites—suitable for folks with no previous web development experience!—as well as (with co-author Brandon Walsh) how to turn that into a collaborative research blog with a review workflow (like how ScholarsLab.org manages its blog posts). Basically, Jekyll requires some webpage handcoding, but:

  • takes care of automating bits that you want to use across your website so you don’t have to paste/code them on every page (e.g. you header menu)
  • lets you reuse and display pieces of text (e.g. blog posts, events info, projects) easily across the website (like how ScholarsLab.org has interlinked blog posts, author info, people bio pages, and project pages linking out to people and blog posts involved with that project)

DATA PLOP TIME

The cool Jekyll thing I’ve been enjoying recently is that you can easily make webpages doing things with info from a spreadsheet. I am vaguely aware that may not sound riveting to some people, so let me give you examples of specific uses:

  • I manage my CV info in a spreadsheet (a Gsheet, so I have browser access anywhere), with a row per CV item (e.g. invited talk, published article)
  • I also keep a record of the letterpress type and cuts (letterpress illustrations) owned by SLab and by me in a Gsheet

I periodically export these Gsheets as a CSV file, and plop the CSV file into a /_data folder in a Jekyll site I’ve created. Then, I’ve coded webpages to pull from those spreadsheets and display that info.

Screenshot of my letterpress specimen Gsheet
Figure 3: Screenshot of my letterpress specimen Gsheet

Data Plop Op #1: Online Letterpress Type Specimen Book

You don’t need to understand the code in the screenshot below; just skim it, and then I’ll explain:

Screenshot of some of the code pulling my letterpress Gsheet data into my Jekyll webpage
Figure 4: Screenshot of some of the code pulling my letterpress Gsheet data into my Jekyll webpage

I include this screenshot to show what’s involved to code a webpage that displays data from a CSV. What this shows is how I’m able to call a particular spreadsheet column’s data by just typing “”, rather than pasting in the actual contents of the spreadsheet! LOTS of time saved, and when I edit the spreadsheet to add more rows of data, I just need to re-export the CSV and the website automatically updates to include those edits. For example, in the above screenshot, my CSV has a column that records whether a set of letterpress type is “type high” or not (type high = .918”, the standard height that lets you letterpress print more easily with different typefaces in one printing, or use presses that are set to a fixed height). In the code, I just place “” where I want it in the webpage; you can see I’ve styled it to be part of a bullet list (using the “<li>” tag that creates lists).

In the screenshot, I also use some basic logic to display different emoji, depending on what’s in one of the CSV columns. My “uppercase” column says whether a set of letterpress type includes uppercase letters or not. My code pulls that column (“”) and checks whether a given row (i.e. set of letterpress type or cut) says Uppercase = yes or no; then displays an emoji checkmark instead of “yes”, and emoji red X instead of “no”.

Here’s how one CSV line displayed by my specimen book webpage looks (I haven’t finished styling it, so it doesn’t look shiny and isn’t yet live on my very drafty book arts website):

Screenshot of a webpage displaying letterpress Gsheet data in a nicely designed grid of boxes

And I was also able to code a table version, pulling from the same data:

Screenshot of a webpage displaying letterpress Gsheet data in a nicely designed table format

If the code discussion is confusing, the main takeaway is that this method lets you

  1. manage data that’s easier to manage in a spreadsheet, in a spreadsheet instead of coded in a webpage file; and
  2. easily display stuff from that spreadsheet, without needing to make a copy of the data that could become disjoint from the spreadsheet if you forget to update both exactly the same.

Data Plop Op #2: Keeping your CV updated

I used to manage my CV/resume as Google Docs, but that quickly turned into a dozen GDocs all with different info from different ways I’d edited what I included for different CV-needing opportunities. When I had a new piece of scholarship to add, it wasn’t clear which GDoc to add it to, or how to make sure CV items I’d dropped from one CV (e.g. because it needed to focus on teaching experience, so I’d dropped some less-applicable coding experiences from it) didn’t get forgotten when I made a CV that should include them.

UGH.

A happy solution: I have 1 CV Gsheet, with each row representing a “CV line”/something I’ve done:

Screenshot of a Gsheet containing CV data

I periodically export that CSV and plop it into a Jekyll site folder. Now, I can do 2 cool things: the first is the same as the letterpress specimen book, just styling and displaying Gsheet data on the web. This lets me have both webpages showing a full version of my CV, and a short version of my CV, and theoretically other pages (e.g. code a page to display a CV that only includes xyz categories):

Screenshot of a webpage displaying a CV

And! I’ve also coded a printable CV. This uses a separate CSS stylesheet that fits how I want a printed CV to look different from a website, e.g. don’t break up a CV line item between two pages, don’t include the website menu/logo/footer. Same text as above, styled for printing:

Screenshot of a webpage displaying a CV, with styling that looks like it would print to make a nice-looking printed CV

When I need a whittled down CV that fits a page limit, or that just shows my experience in one area and not others I’m skilled in, I can just make a CSV deleting the unneeded lines—my spreadsheet ahs category and subcategory columns making it easy to sort these, and also to tag lines that could appear in different sections depending on CV use (e.g. sometimes a DH project goes under a peer-reviewed publication section, or sometimes it goes under a coding section as I want my publication section to only include longform writing). But I add new lines always to the same core Gsheet, so I don’t get confused about what I’ve remembered to record for future CV inclusion where.

I currently don’t have this CV website online—I just run it locally when I need to generate a printable CV. But I’ll be adding it to my professional site once I have a bit more time to finish polishing the styling!

In conclusion

Jekyll + CSV files =

Screenshot of a letterpress cut consisting of a repeating row of 5 images; the image that repeats is a hand giving a thumbs-up next to the text "way to go!"

(One of the letterpress cuts recorded by my specimen book Gsheet/webpage, as discussed above!)

Limited Letterpress Synonym Finder

2024年12月15日 13:00

I coded a quick web app for a particular book arts need: Limited Letterpress Synonym Finder. If you too also only have 1xA-Z letterpress type on hand (ie just the 26 characters of the alphabet, 1 sort per letter) and what to figure out what you can print without needing to carefully position (register) your paper and do multiple pressings between moving the letters around, you can enter words here to see only those synonyms you’re able to print (i.e. only synonyms using no more than 1 of each A-Z letter).

Screenshot of the Limited Letterpress Synonym Finder webpage linked in the post, which says "Limited Letterpress Synonym Finder. For when you only have 1 x A-Z type on hand. Finds synonyms for the word you input, removes any that use any letter more than once, then displays the rest. (Only works with single-word inputs, not phrases.)" There is a field to enter words, with the word "glow" entered in this example screenshot, followed by a "Find that subset of synonyms" button. There is a list of matching non-multiple-same-letter synonyms for "glow" shown, containing the words burn, beam, shine, gleam, and lambency. Below is a retro internet logo image:  on a black background, the text "Limited Letterpress: Synonym Finder" is in a glowing green neon Old English font.

Reframing AI with the Digital Humanities

2024年12月5日 13:00

A version of this piece will be an open-access chapter in a volume by invited speakers at the 10/23/2024 “Reimagining AI for Environmental Justice and Creativity” conference, co-organized by Jess Reia, MC Forelle, and Yingchong Wang and co-sponsored by UVA’s Digital Technology for Democracy Lab, Environmental Institute, and School of Data Science. I had more to say, but this was what I managed inside the word limit!

I direct the Scholars’ Lab, a digital humanities (DH) center that’s led and collaborated on University of Virginia ethical, creative experimentation at the intersections of humanities, culture, and tech since 2006. A common definition of DH encompasses both using digital methods (such as coding and mapping) to explore humanities research questions (such as concerns of history, culture, and art); and asking humanities-fueled questions about technology (such as ethical design review of tools like specific instances of AI). I always add a third core feature of DH: a set of socially just values and community practices around labor, credit, design, collaboration, inclusion, and scholarly communication, inseparable from best-practice DH.

I write this piece as someone with expertise in applicable DH subareas—research programming, digital scholarly design, and the ethical review of digital tools and interfaces—but not as someone with particular experience related to ML, LLMs, or other “AI” knowledges (at the levels that matter, e.g. code-review level, CS-journal-reading). A field of new and rapidly evolving tools means true expertise in the capabilities and design of AI is rare; often we are either talking about secondhand experiences of these tools (e.g. “Microsoft Co-Pilot let me xyz”) or about AI as a shorthand for desired computing capabilities, unfounded on familiarity with current research papers or understanding of codebases. (A values-neutral claim: science fiction authors without technical skillsets have helped us imagine, and later create.)

Convergence on the term “data science” has both inspired new kinds of work, and elided contributions of the significantly overlapping field of library and information studies. Similarly, “AI” as the shorthand for the last few years’ significant steps forward in ML (and LLMs in particular) obscures the work of the digital humanities and related critical digital research and design fields such as Science and Technology Studies (STS). When we use the term “AI”, it’s tempting to frame our conversations as around a Wholly New Thing, focusing on longer-term technical aspirations uninhibited by practical considerations of direct audience needs, community impacts, resources. While that’s not necessarily a bad way to fuel technological creativity, it’s too often the only way popular conversations around AI proceed. In one research blog post exploring the moral and emotional dimensions of technological design, L.M. Sacasas lists 41 questions we can ask when designing technologies, from “What sort of person will the use of this technology make of me?” to “Can I be held responsible for the actions which this technology empowers? Would I feel better if I couldn’t?” We don’t need to reinvent digital design ethics for AI—we’ve already got the approaches we need (though those can always be improved).

When we frame “AI” as code, as a set of work discrete but continuous with a long history of programming and its packagings (codebase, repo, library, plugin…), it’s easier to remember we have years of experience designing and analyzing the ethics and societal impacts of code—so much so that I’ve started assuming people who say “LLM” or “ML” rather than “AI” when starting conversations are more likely to be conversant with the specifics of current AI tech at the code level and CS-journal-reading level, as well as its ethical implications. The terms we use for our work and scholarly conversations are strategic: matching the language of current funding opportunities, job ads. We’ve seen similar technologically-vague popularizing on terms with past convergences of tech interest too, including MOOCs, “big data”, and the move from “humanities computing” to the more mainstreamed “digital humanities”.

Digital humanities centers like our Scholars’ Lab offer decades of careful, critical work evaluating existing tools, contributing to open-source libraries, and coding and designing technology in-house—all founded on humanities skills related to history, ethics, narrative, and more strengths necessary to generative critique and design of beneficial tech. Some of the more interesting LLM-fueled DH work I’ve seen in the past couple years has involved an AI first- or second-pass at a task, followed by verification by humans—for situations where the verification step is neither more onerous nor more error-prone than a human-only workflow. For example:

  • the Marshall Project had humans pull out interesting text from policies banning books in state prisons, used AI to generate useful summaries of these, then had humans check those summaries for accuracy
  • Scholars Ryan Cordell and Sarah Bull tested Chat GPT’s utility in classifying genres of historical newspaper and literary text from dirty OCR and without training data, and in OCR cleanup, with promising results
  • My Scholars’ Lab colleague Shane Lin has been exploring AI applications for OCRing text not well-supported by current tools, such as writing in right-to-left scripts
  • Archaeologists restoring the HMS Victory applied an AI-based algorithm to match very high-resolution, high-detailed images stored in different locations to areas of a 3D model of the ship Alongside any exploration of potential good outcomes, we need to also attend to whether potential gains in our understanding of the cultural record, or how we communicate injustice and build juster futures, are worth the intertwined human and climate costs of this or other tech.

One of DH’s strengths has been its focus on shared methods and tools across disciplines, regardless of differences in content and disciplinary priorities, with practitioners regularly attending interdisciplinary conferences (especially unusual within the humanities) and discussing overlapping applications of tools across research fields. DH experts also prioritize non-content-agnostic conversations, prompted by the frequency with which we borrow and build on tools created for non-academic uses. For example, past Scholars’ Lab DH Fellow Ethan Reed found utility in adapting a sentiment analysis tool from outside his field to exploring the emotions in Black Arts Poetry works, but also spent a significant portion of his research writing critiquing the biased results based on the different language of sentiment in the tool’s Rotten Tomatoes training dataset. (ML training sets are an easy locus for black boxing biases, context, and creator and laborer credit—similar to known issues with text digitization work, as explored by Aliza Elkin’s troublingly gorgeous, free Hand Job zine series capturing Google Books scans that accidentally caught the often non-white, female or non-gender-conforming hands of the hidden people doing the digitizing.)

We already know where to focus to produce more beneficial, less harmful, creative digital tools: social justice. At the Reimagining AI roundtable, my table’s consensus was that issues of power and bias are key not just to reducing ML harms, but to imagining and harnessing positive potential. Key areas of concern included climate terrorism (e.g. reducing the energy costs of data centers), racism (e.g. disproportionate negative impacts on BIPoC compounding existing economic, labor, and police violence threats), human rights (e.g. provision of a universal basic income easing concerns about areas ML may beneficially offset human labor), and intertwined ableist and computing access issues (e.g. AI search-result “slop” is terrible for screen readers, low-bandwidth internet browsing). In our existing scholarly fields and advocacy goals, where are current gaps in terms of abilities, resources, scale, efficiencies, audiences, ethics, and impacts? After identifying those major needs, we’re better positioned to explore how LLMs might do good or ill.

Praxis is Invading My Life…In A Good Way

2024年10月4日 12:00

I have always been intrigued by the process of coding. Part of this intrigue came from watching action and adventure television shows or movies that usually featured a tech person or hacker. Whether serving the interests of the protagonist(s), the antagonist(s), or sometimes both either willingly or by force, the tech person would navigate a dark screen full of numbers and letters that I now understand involves coding.

Recently, I have been rewatching the 2016 reboot version of the MacGyver television series which features a character named Riley who is an expert computer hacker who uses her skills to benefit the covert Phoenix Foundation. In one episode, the city of Los Angeles is under a ransomware attack by a hacker that causes a citywide blackout and takes control of a nuclear power plant that threatens the surrounding area. Upon recognizing the name N3mesis encoded within the ransomware code, Riley reveals that she helped write the code with two friends during her illicit hacking days before her work with the Phoenix, meaning one of her former friends was responsible for the attacks.

N3MESIS Image

As the name N3mesis visibly appeared on the screen within the code, I had an epiphany from our conversations during the Code Lab on September 19 about how one can leave notes within the code to alert others of changes. In other similar TV shows and movies that I have watched involving coding, there have been references to messages hidden within the code that the protagonists have used to successfully confront the imminent threat or danger. Perhaps a month ago, these references to coding would have gone over my head, so it was refreshing to make connections to Praxis from popular culture.

Within television shows and movies, the image of a computer hacker features someone typing very fast, sometimes in an isolated location, and manipulating code to accomplish a task, whether for altruistic or selfish reasons, as seen in the image below. (Thanks Brandon for the introduction to HackerTyper.

Image of Random Codes on HackerTyper

Yet, as the exercises in writing algorithms in plain English have shown, there is a lot more involved in coding. With the tutorials to Git, GitHub, and Visual Studio Code, Shane, Jeremy, and Brandon emphasized that coding starts with the basics, learning and understanding commands like add, commit, pull, push, and reset.

With exercises in Code Lab such as the icebreaker assignment using Git, I believe it serves as a metaphor for the collaboration exercise. We add by contributing to a project, we commit to our changes, we push those changes to a larger context that includes the work of others, and then we pull to help merge those changes. If necessary, we may have to reset things and start the process over again.

The activity among our cohort was a bit of trial and error, such as accidentally clearing someone else’s work when seeking to make new changes. However, with assistance from members of the Scholar’s Lab, we eventually made progress in making changes without errors as reflected in GitHub. This icebreaker exercise serves as a metaphor for collaboration in general. Our cohort contributed to one project and sometimes we face challenges like inadvertently clearing someone else’s work. However, we can easily correct mistakes (or in the case of GitHub, see previous versions of our work) and we can take lessons about these mistakes to avoid making them in the future.

Praxis is invading my life…but in a good way. I hope to continue to make connections between my Praxis life, my academic life, and my personal life, such as the connections between coding and popular movie and television culture. I plan to take advantage of the opportunities of these experiences within the Praxis Program.

Making your command line a tiny bit better

2024年7月20日 12:00

There are tons of resources for customizing how your command line looks and works, including:

What follows are some small changes I made to my command line today, that I thought might be useful to others. These work if you, like me, are using a Mac, and Applications > Utilities > Terminal app as your command line tool.

Screenshot of the author's command line app, showing the prompt now contains a rainbow emoji

Colors & tidiness

terminal > settings > profiles

Choose one of the color themes on the left sidebar. Then, to the right, use the “cursor” options to set your preference (bottom of “text” tab; I use a blinking block as most easily visible).

Under the “window” tab, deselect “restore text when reopening windows” so quitting and restarting gives me a clean screen. Use “title” if you want to put something fun at the top of your terminal window (I use “maple commands” to remind me of my pup Maple).

Add emojis

.zshrc

This part only works if your shell is .zsh, which is default on newer Macs. To check, enter echo $0 on the command line; it should print “-zsh” if you’re using .zsh.

Two options for working with this file!

Option A: enter nano /Users/wyatt/.zshrc in the command line, except change out “wyatt” to say your username instead.

Option B: Open a Finder window and hit command-shift-period. This toggles hidden files so they’re visible.

Navigate to [your computer]> Users > [your username] > .zshrc (it will appear in light gray text, possibly sorted to the bottom of the file folder). Open the .zshrc file in a text editor.

Paste the following text into the file, and save. If Terminal is already open, quit and reopen to see the changes.

# Customize prompt
PS1=$'\n''🌈 %~ %# > '

# alias to print the whole jekyll serve-watch command
alias aaa="bundle exec jekyll serve --watch"

The “PS1” line customizes the “prompt” (i.e. stuff at the start of the line where you enter commands).

$'\n' tells it to always leave a blank line on top, which visually helps me when I’m scrolling back through the window to figure out why I’m getting an error message, for example.

'🌈 %~ %# > ' starts the prompt with the rainbow emoji (you can paste in any other emoji you’d like!), followed by the file path you’re currently inside, followed by a > symbol and space (to visually help you see where the prompt ends and your past input commands start).

The “alias aaa” prompt sets things so that instead of needing to type out the command I use most often, I just type “aaa” instead. (bundle exec jekyll serve --watch builds and serves Jekyll sites locally, so that you can add/edit them and preview changes before pushing those to the Web.)

❌