阅读视图

New research shows everyone prefers human writers, including AI!

Meredith Martin, Professor of English and CDH Faculty Director, and Wouter Haverals, CDH Postdoctoral Research Associate, have published a pre-print revealing a striking pattern: both humans and AI systems show strong bias based on perceived authorship rather than actual content quality. The researchers built a dataset of stylistic rewrites inspired by Raymond Queneau's "Exercises in Style," then asked 556 people and a panel of LLMs a simple question: which of these two texts better captures a specific style—the original or the AI's attempt? Crucially, they manipulated the labels, sometimes telling evaluators the AI-generated text was human-written and vice versa. They found that the same text received significantly different ratings depending solely on its attributed author, with AI models showing 2.5 times stronger bias than humans toward content labeled as human-authored. This suggests AI systems have absorbed cultural assumptions that devalue machine creativity.

This research was supported by the Princeton Language and Intelligence (PLI) Seed Grant Program.

  •  

Welcome back, CDH community!

Welcome back, CDH community!

I hope that you had a chance to unplug a bit. This year's return to campus feels particularly consequential—it’s a moment when destabilizing shifts in higher education are intersecting with the dizzying pace of AI development. I'm reminded of the insights shared by Ted Chiang and Nnedi Okorafor, the speculative fiction writers we invited to Princeton last year as part of CDH’s 10th anniversary. Their perspectives on AI—ranging from critique to curious exploration to grounded optimism—remind us that the future of AI depends not on the technology itself, but on how we choose to shape and use it within our communities, institutions, and lives.

The role of humanistic thought in shaping AI lies at the heart of our ongoing "Humanities for AI" initiative. Building on last year's momentum, we're launching an ambitious new project for 2025-26: Modeling Culture: New Humanities Practices in the Age of AI. This collaborative effort features three interconnected components: a year-long faculty and graduate seminar, a public lecture series, and an open-access curriculum for humanities researchers. Developed in partnership with talented colleagues from Princeton and beyond, this project aims to offer specifically humanistic frameworks for engaging with AI both creatively and critically.

As we look ahead to our eleventh year as a Center, I’m impressed with the diversity of scholarship that the CDH continues to foster. CDH affiliates published two significant books last spring—Grant Wythoff's A User's Guide to the Age of Tech and my own Poetry's Data: Digital Humanities and the History of Prosody. Our Research Software Engineering team collaborated with faculty and supported digital humanities across Princeton: they developed custom software pipelines that advance historical research, published high-quality humanities datasets, adapted open-source tools for Princeton researchers exploring manuscript culture, transformed an existing database project for computational analysis, published peer-reviewed articles, mentored award-winning computer science undergraduate independent work, and taught project management skills to the broader Princeton community. And I am pleased to announce that long-time collaborator Marina Rustow’s incredible Princeton Geniza Lab has become a CDH Affiliate Lab, highlighting our role as both a research catalyst and a research infrastructure.

As Princeton’s computational humanities research scholarship continues to evolve, we’re delighted to announce an expansion of CDH’s faculty leadership team: Paul Vierthaler, in East Asian Studies and hired through the Interdisciplinary Data Science initiative, will serve as CDH's inaugural Associate Faculty Director. Paul brings a wealth of experience in text mining and natural language processing for late imperial Chinese literature, and will be teaching exciting undergraduate and graduate CDH courses this year. In addition, Elizabeth Margulis (Professor of Music and Director of Princeton’s Music Cognition Lab) will be our new Director of the Graduate Certificate. Paul and Elizabeth’s expertise and experience will be invaluable as we navigate the challenges and opportunities of the year ahead, and I am encouraged to have their support.

We’re equally proud to see how CDH students are meeting the AI moment. Over the last year, they’ve used AI tools on projects ranging from analyzing Sigmund Freud's handwriting to identifying patterns in nineteenth-century Chinese painting. CDH students won awards for developing Large Vision-Language Models (LVLMs) to analyze historical documents, traveled to Kenya to create technologies for African languages, and explored digital mapping practices to illuminate contested urban histories in Greece.

We are excited to continue to collaborate with you on questions of how to engage computational technologies within the humanities. Ahead of our call for new research software collaborations this spring, we encourage you to reach out with project ideas through our consultation program. In addition, we are excited to announce a new offering this year: the Digital Humanities Accelerator. This program offers personalized project design and strategy support for humanities faculty with digital research projects at any stage and is accepting applications on a rolling basis.

As always, I continue to welcome all faculty and students—regardless of discipline or technical experience—to engage with CDH through our projects, programs, and events.

We look forward to an exciting year ahead.

Warmly,
Meredith Martin
Faculty Director, Center for Digital Humanities

Modeling Culture: New Humanities Practices in the Age of AI

A year-long seminar for faculty and grads with a public lecture series, culminating in a comprehensive and accessible curriculum for advanced humanities researchers.

modeling culture plain copy

Humanities for AI

Foregrounding the centrality of the humanities in the development, use, and interpretation of the field broadly known as “AI”

HUM-AI-LOGO_1_L343as1.original copy
  •  

Meredith Martin publishes Poetry’s Data: Digital Humanities and the History of Prosody

We are excited to announce the publication of CDH Faculty Director Meredith Martin’s new book: Poetry’s Data: Digital Humanities and the History of Prosody (Princeton University Press, 2025).

Martin’s book explores the intersections between two histories: the history of prosody in English and the history of digital humanities. As Martin shows, the link between these two narratives is, appropriately, “poetry’s data,” information that illuminates the changing definitions of poetry, its sounds, linguistic elements, and more.

To tell these stories, Martin explains how the Princeton Prosody Archive (PPA), a web application that would become the CDH’s flagship project—and, in turn, the Center for Digital Humanities itself—emerged from her research on meter.

Poetry's Data is a love letter to the CDH,” Martin explained. “It narrates how the CDH was born —out of the sense that humanities scholars deserved more respect and more infrastructure for their work in the modern information age.”

Martin writes that to understand the history of prosody—what scholars think makes poetry poetry—she needed an archive of materials. In collaborating to build a digital database of these materials, Martin produced, unearthed, analyzed, and argued from data, and in doing so learned about not only the history of prosody but also the way we access that history.

“Poetry’s data is also metadata—how we find information about poems,” Martin writes, “and it involves the transformations of a variety of formats into data so that we can find (or fail to find) poems in a digital environment” (10).

The concept of mediation looms large in Martin’s work. Of course, researchers, like many readers, read poems—and writing about poems—online. Moreover, technological advancements have led to new ways of doing the work of research. Databases like the PPA bring together sources that might be separated by old classifications (changing definitions of literary vs. non-literary works, for example), but they too produce their own kinds of categories—in some cases determined by the corporations who own them.

“Because we live and research in this technologically mediated landscape, our old models of reading and researching—methods that presume an autonomous, single scholar gathering resources and making claims—no longer hold,” Martin argues. “We need to theorize both the embeddedness of our sources inside multiple layers of mediation and how we are situated inside an information ecosystem that demands our active participation” (3–4).

As Martin explains, the new models of reading and research reveal that collaboration, interdisciplinarity, and behind-the-scenes work are critical to the project of research. Accordingly, each chapter of the book includes the title of one or two of the PPA’s “Collections” in brackets. Titles such as “How We Classify [Linguistic]” and “How We Argue [Original Bibliography]” reinforce the connection between Martin’s work as the author of a monograph, a widely accepted form of scholarship, and the collaborative work of building the PPA and the CDH, which supports researchers in creating exciting but often unacknowledged scholarly interventions.

“When humanities scholars do not cite or acknowledge the databases in which they encounter a digitized historical page (citing, instead, its print form in an archive they have never visited),” Martin explained, “they invisibilize the scholarly labor of turning that page into data; similarly, we do not acknowledge the scholarly interventions of creating digital scholarly sources in our outdated structures of tenure and promotion.”

Similarly, Martin notes that her involvement with the Historical Poetics reading group—the participants in which do not all identify as digital humanitists—and her contributions to building the PPA and CDH run parallel to each other both chronologically and thematically.

“Just as I don’t believe we can navigate the new information environment as literary critics and believe we are in any way alone, so too did I learn that I am able to come to understand the material mediations of nineteenth-century poetry only with a group of devoted colleagues” (16).

These values continue to inform Martin’s projects. In addition to directing the CDH, Martin is at work on two co-authored books: Data Work in the Humanities, with former CDH postdoctoral associate and Weld Fellow Zoe LeBlanc, now assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign; and Walled Gardens, with CDH Project Manager Mary Naydan and Lead RSE Rebecca Sutton Koeser. She also serves as co-PI on the PLI-funded “Exercises in Literary Style” project with Associate Research Scholar and Perkins Fellow Wouter Haverals.

“I could not have helped to create the PPA without everyone past and present at the CDH,” Martin explained, “and I could not have argued in Poetry's Data that modern humanities research is fundamentally collaborative, and that we need to re-conceive it as such, without learning this first-hand from and with these same colleagues.”

  •  

CDH Faculty Director discusses the evolution of the humanities engagement with data at the DARIAH annual conference

In June, CDH Faculty Director Meredith Martin gave the keynote address at the DARIAH-EU Annual Event in Lisbon, Portugal. The topic of the conference was Workflows: Digital Methods for Reproducible Research Practices in the Arts and Humanities. In her talk, “Worked Up About Data,” Martin looks at the history of humanists’ use of data in their scholarship, and how its complexity has—and continues to—impact the development of methodologies and workflows for data work in the humanities fields.

In the piece below, Martin elaborates on aspects of her talk, connecting some of her key ideas to her forthcoming book projects, Poetry’s Data: Digital Humanities and the History of Prosody (2025) and Data Work (with Zoe LeBlanc, 2026).

MM-Keynote-Slide1.001

Art by María Medem

In your talk, you discussed how humanists’ engagement with data continues to sometimes be uncomfortable or even controversial. Why do you think that is still the case today?

In general, humanists who aren’t already working in DH are averse to the concept of “data.” Miriam Posner has shown that most humanities researchers refer to their sources or evidence as “research materials” whereas in other fields, a variety of research materials are classified as data. We humanists are most comfortable with the broadest possible definition of data—a way to find our books in a library system. We are only slightly less comfortable with arguments like Laura McGrath’s, Mark McGurl’s, or Melanie Walsh’s, that use book data—sales—to think about how books make cultural meaning. At the same time, fields that I might classify as “pre-computational” rely heavily on this kind of “sociological” book data. In fact, William St Clair’s The Reading Nation in the Romantic Period (2004) changed the way that scholars of the nineteenth century made claims about the “influence” of particular books. In many ways this was a blow to some of the looser claims of New Historicism. St Clair used book data, but he didn’t call it data even though he quantified it: the “information” of book prices, print runs, intellectual property law, and readership information came from “publishing and printing archives.” No real uproar there from literary scholars—in fact the book has been incredibly influential in the field. But book data from last year? How could that possibly qualify as literary history or literary analysis?

Part of the resistance is that the word “data” itself is something we know to be evil—it’s stolen from us, we are datafied, there are data breaches, we don’t have data privacy, our data is “harvested” and sold without our knowledge, etc., etc. Datafication has had and continues to have terrible consequences for U.S. citizens, particularly when that data is about race or class or age or gender. Critical data studies shines a light on these issues—thinking carefully about how we are tracked and monetized and at what cost. And our archives—our literary archives that we may have digitized and put online or that Google may have digitized and put online—are also datafied and harvested and then sold back to us in the form of chatbots many don’t want or need.

But those of us who work in computational and data-driven humanities see our understanding of our own materials as data as part of a crucial way that the humanities can alter the direction of data science. Our expertise in choosing how to navigate our archives—with all of that necessary metadata that guides us via catalogs or archival descriptions—can and should extend to noticing why and how our information infrastructure guides what we know and how we know it.

Some of the best literary and historical research draws on the insights of critical archival studies, a field that warns us that we must bring our understanding of archival incompleteness and historical bias to our source materials before we make claims. Now that seems a long way from data, and the relationship between archive and data is particularly tricky in the history of information science, but some of the best work in historical data, like Jessica Marie Johnson’s and Ian Milligan’s, shows how the datafication of both people and of archives shapes how we know what we know. That’s a tricky thing to admit, because it shows that as scholars we are already trapped in a system of knowledge production and dissemination that is not so different from that more monetized version of datafication where we accept cookies without thinking and then shrug as the shoes we looked at once follow us around on every browser window. What if we thought about literary or historical knowledge production in the same way? It would mean acknowledging that information retrieval systems and our modern research libraries’ ability to subscribe to large-scale databases where many of our materials have now been digitized are part of this ickier, datafied landscape, and that we aren’t escaping to some pure non-monetized realm when we conduct our research. I think acknowledging that discomfort, that we’ve accepted the cookies and that the acceptance is incompatible with some dreamed-of feeling of autonomy as scholars, is the underbelly of the resistance to the “data.” Of course this is just one reason! Zoe [LeBlanc] and I explore several angles as to why “data work” is undervalued and even scoffed at in the humanities in our new book.

Meredith Martin delivers a speech in a lecture hall

Do you think the rise of generative AI is changing humanists’ relationship to data work?

I do. I think that because people are interacting with a human-like chat agent, it is somehow allowing them the freedom to think about how the chatbot works and be curious about combinations of tokens in ways that they may never have been creative about, say, search optimization algorithms or how and why JSTOR works the way it does. My hope is that we take the opportunity to introduce data literacy broadly at the same time as we push for AI literacy – AI is data + machine learning, and the former is just as important as the latter.

We can do a lot of work empowering humanists to understand how much they have to offer by way of expertise if we demystify the choices we make when we make humanities materials into machine-readable form. By foregrounding the interpretive decisions behind how we choose to classify our data and by educating humanists about the kinds of data standards that already exist and have been used for interoperability by information scientists for some time, we can show humanities scholars that they are already capable data workers.

How did your talk touch upon the topic of workflows, which was the theme of the DARIAH Annual Event?

I wanted to spend a bit more time with workflows than I did, but in essence I wanted to make a case that we can acknowledge that this work is hard and emotional work. There is emotion in the kinds of interpretive choices we make when we are working in the archives of vulnerable populations or the archives of slavery, for example, where building emotional reckoning and careful consideration is part of the process— becomes part of the workflow. There, paying attention to where there is an eddy or a stopper—a tangle that you can’t quickly agree on or move past—those are the places in humanities data work that are the most productive, that all the scholars we interviewed for our book noticed as the places where we learn the most about the world.

I also wanted to think more about the understanding of iterations and revision in workflows as something akin to writing or drafting. Sometimes data work means that you have to go back and start over and redo thousands and thousands of cells on a spreadsheet because your question has become more nuanced or your understanding of the material has changed. These are humanistic processes, but they can also teach any scholar who works with data to slow down. Humanists are comfortable with slow and excruciating workflows, nuance, tangles, complications.

We can also benefit greatly from higher-level project management and dividing our work into smaller chunks with more achievable goals, but there’s a tension between one kind of workflow with data (say where you just “get” or “make” a dataset without being responsible or putting it into context) and what Zoe and I think about as fundamentally humanities data work, and we think that all data driven work would benefit from the more responsible, contextual, and even collaborative way that we propose. Or, to put it another way, yes, you can agree on an entity that is good enough in order to start analyzing your data, but as humanists we need to write about that wavering, that interpretive decision, to make clear that our choices in how we constructed our dataset are just as crucial as our analysis.

And that is hard work that can be really tiring and time consuming, but I wanted to draw attention to that and honor it, especially for my DARIAH community that spends so much time helping people through the process. It’s okay to be frustrated and to start over. We don’t have to pretend that workflows protect us from emotion—if anything, I think that they should normalize it.

group photo of DARIAH Annual Event attendees

The DARIAH-EU Annual Event brings together DH scholars from all over the world.

In your presentation, you drew connections between the work of Victorian prosodists and those of data-driven humanists today. How has your work as a scholar of English prosody shaped your scholarship in digital humanities?

Okay, for that answer I think you need to just read Poetry’s Data: Digital Humanities and the History of Prosody, which will be published in May 2025! But briefly—it’s related to the answer above. We are making choices about what we look at. Gerard Manley Hopkins, a Victorian poet, wrote in his journal in the early 1870s: “what you look hard at seems to look hard at you.” He’s trying to describe what he sees as the binding energy (what he calls “instress”) between our attention and between patterns in nature (or man-made patterns) that can remind us of our connection to the divine (what he calls “inscape”). We live in an era completely guided by pattern-making mechanisms and machines. I’m not arguing for the aesthetic value of a spreadsheet over that of a sonnet (though I put the two in parallel in my fall course), but I am interested how Hopkins and other writers who wanted to measure both poetry and language find themselves at the same intersection as data-driven humanists: a choice between understanding meaning as existing in the pattern itself, versus understanding meaning as residing in the relationship between the pattern and the person looking at it. How, and why, do we decide to interpret a pattern one way versus another way? These stakes, for me, are a way to get at the tension between the disciplines of literary studies and data science, which, when I look at it this way, doesn’t feel like much of a tension at all.

Your slides integrate color and art in a provocative and striking way. Can you tell us more about the artist, and why you chose these images to accompany your talk?

María Medem is everywhere, and once you know what her illustrations look like you’ll start seeing them all the time! She’s an artist from Seville I’ve followed for some time. I might have come across her work first in the New York Times or Wired? But she tends to illustrate articles about our relationships with technology and with nature. I find her colors make me slow down and dwell, which was the feeling I was trying to get across in my talk.

Quotation with an illustration of a person in a sunset to the right

Art by María Medem

Where did you see connections or resonance between your talk and the other presentations at the DARIAH Annual event? What were some of your takeaways from the conference?

I felt very at home at the conference and admired all the papers I heard and all the posters I saw. To go back to the topic of emotion, one thing in my title and in the tenor of my talk that maybe resonated with folks was the issue of how to value data work—which, just to clarify—requires agreed upon workflows. Some papers presented workflows and their challenges, others talked more about data work and interpretive labor, but some of the best conversations showed how people were worked up that this interpretive knowledge was not something that was being seen, valued, credited, recognized, or counted for promotion in either academic, library, or Research Software Engineering communities.

How has being a DARIAH cooperating partner influenced your research or the work of the CDH?

We admire DARIAH’s commitment to advancing knowledge across boundaries—linguistic, cultural, national, and disciplinary. I think being in their orbit has helped us be strategic about our research areas and their potential impact. From New Languages for NLP to Humanities for AI (which includes our African Languages Technologies project) to our Humanities RSE program, we understand that our work, when we do it right, will have lasting impact globally. Though we don’t have the European umbrella funding or research organizations to generate cross-institutional collaborations, we bring that ethos of collaboration to everything we do.

  •  

Happy birthday, CDH! 🥳

Dear CDH Community,

We are thrilled to mark a major milestone this year: the CDH is 10! 🎂 It has been a remarkable decade, and we are incredibly proud of our team’s research accomplishments and the vibrant community we have all built together.

Since opening our doors in 2014, the CDH has launched countless projects and programs that help Princeton scholars think critically, creatively, and deeply about how data, algorithms, and information systems shape our understanding of the human experience. Each year, we collaborate with dozens of faculty, librarians, archivists, graduate students, and undergraduates on research and teaching that pushes the boundaries of our fields and professions. For the past decade, we’ve grown a truly interdisciplinary community and inspired new ways of thinking and working together.

Looking ahead, we are excited about what the future holds. At a time when the explosion of generative AI is transforming many areas of our lives, the thoughtful, deliberate, and critical approach to technology that defines CDH work is more essential than ever. As part of our 10th-anniversary celebrations, we’ve chosen “Humanities for AI” as our theme for the year. Through a series of events, conversations, and projects, we’ll explore how humanistic approaches are crucial for developing more fair and inclusive technologies.

This year’s programming will explore a range of topics, from computer vision to art and AI to poetry as data. Our fall speaker series kicks off with a focus on the intersection of global tech with local and diverse cultures, specifically highlighting the rich linguistic traditions of Africa. The African Languages in the Age of AI series will bring leading experts to Princeton to discuss how scholars and communities are creating data, models, and software for African languages. The series begins on Thursday, September 19 at 4:30 p.m. with a talk by Vukosi Marivate (University of Pretoria), titled “A New Agenda for African Languages x AI: Everything, Everywhere, All At Once” (RSVP). Other scholars and thinkers we will be hearing from this year include Lauren Tilton, Matthew Kirschenbaum, Nnedi Okorafor, and Ted Chiang.

Finally, we invite you to join us on Monday, September 23 at 4:30 p.m. at our space on the B Floor of Firestone Library for our annual Open House, which will also be our birthday party. Come meet our amazing team, learn about our projects, programs, and research groups, and find out how you can get involved with our offerings for graduate and undergraduate students.

Be sure to follow us on social media at @PrincetonCDH. Be sure to check out our newly redesigned CDH website and sign up for our newsletter to stay updated on all of our activities and events.

Warmly,

Meredith Martin
Faculty Director, Center for Digital Humanities
Director, Graduate Certificate in Digital Humanities
Princeton University

Natalia Ermolaev
Executive Director, Center for Digital Humanities
Princeton University

CDH - TEN - solo - pink
  •