普通视图

Received yesterday — 2026年4月28日

When the Algorithm Disagrees With Your Eyes

2026年4月27日 12:00

Digital images are in constant motion. They traverse various platforms, feeds, databases, and archives, often reappearing in modified forms. Through my research on digital art, I have recognized this phenomenon as more than a mere feature of online dissemination. It constitutes both a methodological challenge and a perceptual issue.

What appears to be a single image may, in actuality, exist as a collection of various versions: cropped, compressed, recoloured, or reposted without proper attribution. Although these differences may seem insignificant at first glance, they give rise to a question that is more complex to answer than it initially appears.

      Under what circumstances can two images be considered identical?

That question became the basis of my assignment for the CodeLab course in my ongoing Praxis Fellowship Program. Using Python with the ImageHash and Pillow libraries in VS Code, I built a small tool to test how visual similarity might be measured across images that have changed through circulation. What started as an exercise became a way of thinking through something larger: what does it mean for a computer to recognize an image, and does that match what we mean when we say two images are the same?

The approach

The tool uses the imagehash library to compute perceptual hashes and compare images by visual similarity.1 Unlike cryptographic hashing, which changes entirely if even a single byte changes, perceptual hashing captures how an image looks. Two visually similar images should produce similar hashes; unrelated images should not.

After generating the comparison data, I modified the script to export results as JSON and render them as an HTML page. Instead of raw values, the interface ranked each image against the reference, displayed a distance score, and grouped results into categories from “nearly identical” to “different from the original.” The script processed files in the images/ folder, saved results to version_results.json, and generated output in results.html.

Image variant comparison

Figure 1. HTML interface showing ranked comparison of image variants against the reference image. See https://jimgaconcept.github.io/image-versioning-demo/

The dataset

The reference image is a digitized hand-drawn cartoon illustration made with pen and ink and watercolor on paper. This detail turned out to matter a great deal. I compared it to two modified copies (resized and compressed), one digitally recreated version, and three visually unrelated images, to test whether the tool could distinguish genuine variants from unrelated works.2

Results

The two modified versions, resized and compressed, both scored between zero and two, confirming their close relationship to the reference. The three unrelated images all scored above 20, well outside any similarity range. The digitally recreated version (Fig. 1) scored 18, placing it in the category that the interface labeled as different from the original.

That score of 18 was the result I did not expect, and the one worth thinking about most carefully.

What the computer sees, and what we see

The recreated image and the original share the same subject, composition, and color palette. A human viewer encountering both would almost certainly recognize them as versions of the same thing. The algorithm did not. Scoring 18, it placed the recreation closer to the unrelated images than to the two modified copies, which scored between 0 and 2.

The reason lies in what each image actually is at the data level. The original is a scan of a physical drawing, and its pixel data carries the texture of its medium: the grain of the paper, the way ink spreads at the edges of marks, the tonal variation of pigment on a physical surface. The digital recreation was built entirely within Photoshop and saved as a JPEG. Even a faithful digital reconstruction is made from digital brushes and algorithmically generated marks. There is no paper grain, no ink bleed. The two images look the same to us, but their underlying data structures are built from entirely different material.

This is a version of what computer vision researchers call the cross-depiction problem: the gap between human visual recognition, which operates on meaning and composition, and machine recognition, which operates on statistical patterns in pixel data. My experiment gave that abstract problem a specific, personal form. What appears identical to the human eye may share almost nothing in common at the data level. The computer is not seeing the image. It is reading a numerical structure, and two images that represent the same thing visually can be built from entirely different data, depending on how and where they were made.

This relates to a broader discourse within the field of digital humanities. As Drucker (2013) has articulated, digitization constitutes not merely a neutral representation but rather a form of interpretation. Factors such as resolution, lighting conditions, and the medium of capture all influence the transformation of an image into data.3 My findings exemplify this argument concretely. The scanned watercolor and the Photoshop recreation are not simply two variants of the same image; rather, they represent two distinct interpretations, which the algorithm processes accordingly.

If we are building archival systems or image databases that rely on computational similarity to group and relate works, we need to ask whose sense of “the same image” is being encoded. A tool trained on pixel-level data will consistently separate a scanned physical artwork from its digital recreation, not because they are different images in any humanistic sense, but because they are different kinds of data.

Limitations and what comes next

Perceptual hashing assesses visual similarity at the data level. It does not establish authorship, confirm provenance, nor consider contextual factors. Outcomes may also differ based on the specific hashing algorithm employed, as various implementations assign different weights to visual features. This tool serves as one component within a broader interpretive framework, rather than substituting human judgment.

This assignment illuminated a perception that is both straightforward and profound. It is evident that the computer and the human eye do not observe the same aspects, even when examining the same image. The disparity between data and meaning represents the realm where the most compelling inquiries within digital art history reside. As Burdick et al. (2012) suggest, the significance of computational tools in the humanities lies not in their capacity to resolve questions, but rather in their ability to render certain questions newly answerable.4 This experience has prompted a question I was previously unaware of having.

The live output and ranked visualization are at the project web interface. Full code is on GitHub.


  1. The imagehash library was developed by Johannes Buchner: https://github.com/JohannesBuchner/imagehash. Distance between hashes is computed using Hamming distance. See Hamming, R.W. (1950). Error detecting and error correcting codes. Bell System Technical Journal, 29(2), 147–160. doi:10.1002/j.1538-7305.1950.tb00463.x 

  2. The distance thresholds used (0 for near-identical, 1–5 for minor modification, 6–10 for significant transformation, above 10 for visually distinct) are derived from standard imagehash benchmarks and calibrated through iterative testing against the dataset. 

  3. Drucker, J. (2013). Is there a “digital” art history? Visual Resources, 29(1–2), 5–13. doi:10.1080/01973762.2013.761106. The argument that digitisation is interpretive rather than neutral runs throughout the article and is developed across pp. 5–8. 

  4. Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., and Schnapp, J. (2012). Digital_Humanities. MIT Press. The claim is consistent with the book’s central thesis; p. 14 is the closest anchor. 

Received before yesterday

Seeing, Describing, and Imagining: Human and Machine Vision in the Humanities

2026年1月3日 13:00

Framing the Workshop: Vision, Interpretation, and Context

In recent years, digital tools have quietly transformed how we experience and interpret images in museums, classrooms, and research settings. As an art historian working at the intersection of art history, digital media, and visual culture, I am drawn to examining how people translate visual experience into words, and how that process compares with machine analysis of the same images. I am especially interested in creating spaces that invite us to pause, pay closer attention, and make the act of interpretation visible, rather than treating images or technologies as self-evident.

Seeing, Describing, and Imagining originated from a simple, low-stakes classroom exercise I first encountered while serving as a teaching assistant in a course on formal and visual analysis taught by my advisor. Watching students work through the challenge of turning what they were seeing into words made it clear how tentative and negotiable description can be. That experience stayed with me and prompted me to rethink the exercise beyond the classroom, adapting it into a workshop format.

The workshop aims to create a shared, practice-based method for visual analysis that can be applied in various settings, from visual analysis courses to digital humanities labs, while staying rooted in art-historical approaches to looking.

From Looking to Language: Description and Interpretation

The workshop is conceived as a hands-on, collaborative way of exploring how images move between seeing, describing, and imagining. It is designed to begin with a simple exercise. Participants would look closely at an artwork and translate what they see into words. Working in pairs, one person would study the artwork and describe it in detail, while the other would create a quick line sketch using only that description, without ever seeing the image itself.

This phase aims to slow the process in a constructive way. Participants are encouraged to reflect on the act of describing itself: What do you choose to mention first, and why? Which parts of the artwork are hardest to put into words? These questions are designed to show that description is never neutral. Emphasis, order, and omission all influence how an image is understood.

When sketches and original artworks are placed side by side, the workshop is designed to shift from creating to comparing. Instead of viewing differences as mistakes, participants are encouraged to explore what moments of similarity and difference may reveal about the connection between image and text. The aim is not to fix these gaps but to use them as a way to think about how seeing, knowing, and describing are linked in art history practice.

Human–Machine Translation: AI, Images, and Visual Convention

Starting from this analog foundation, the workshop is structured to move into a digital phase by introducing AI text-to-image systems. Participants would revisit and refine their descriptions before entering them into an AI model such as DALL·E or Adobe Firefly. The resulting AI-generated image would then be placed alongside the original artwork and the participant-created sketch as a third object for comparison.

Rather than evaluating which image is better or more accurate, this stage emphasizes observation. Participants are encouraged to ask what kinds of visual patterns might emerge across AI-generated images. Which elements seem emphasized, simplified, or made more uniform across different outputs? Looking across multiple results is meant to create space for noticing patterns without assuming in advance what those patterns will be.

Existing scholarship by authors such as Kate Crawford, Safiya Umoja Noble, Ruha Benjamin, and Johanna Drucker suggests that AI systems are shaped by the datasets they are trained on, the ways information is classified, and the cultural assumptions embedded in those systems. Drawing on these works, the workshop is designed to create conditions where such influences could become visible through hands-on engagement rather than explanation. As participants compare images, the process opens up the possibility of exploring whether familiar visual conventions emerge, particularly when prompts involve artworks or visual traditions that are not widely represented in large image datasets. What becomes noticeable is deliberately left open and expected to take shape through comparison rather than as a predetermined outcome.

The workshop also introduces a reverse process, moving from image to text. Participants would upload an artwork into an AI vision tool and examine how the system translates the image into language. Reading these AI-generated descriptions alongside participants’ own interpretive accounts is intended to prompt reflection on differences in tone, emphasis, and confidence, and to raise questions about how uncertainty functions in human versus machine descriptions.

Staying with the Process: Open-Ended Inquiry and Reflection

Taken together, Seeing, Describing, and Imagining is framed as an open-ended inquiry rather than a demonstration. Prompt writing and refinement are approached not as purely technical tasks but as interpretive acts, similar to the analytical frameworks art historians use when working with images. While elements of the workshop align with existing practices in art history education, digital humanities, and critical AI studies, Seeing, Describing, and Imagining brings these approaches together in a distinctive sequence that foregrounds interpretation as an active, negotiated process involving both human and machine systems of vision.

The workshop is designed to foster attentiveness, curiosity, and careful comparison. It encourages participants to stay with the process and to observe what may emerge as images move between eyes, words, algorithms, and back again. In this way, both human and machine vision are presented not as stable endpoints, but as ongoing, context-dependent practices shaped by history, culture, and interpretation.

Works Cited

  • Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code. Cambridge, UK: Polity Press, 2019.
  • Crawford, Kate. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. New Haven, CT: Yale University Press, 2021.
  • Drucker, Johanna. Graphesis: Visual Forms of Knowledge Production. Cambridge, MA: Harvard University Press, 2014.
  • Noble, Safiya Umoja. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: New York University Press, 2018.

Digital Artefacts Series: Concept & Format

2025年11月19日 13:00

From Shrine to Screen: Reimagining Ìbejì Through Analog and Digital Lenses

Analog Photograph of Twins

Fig. 1. Taiwo holding a multiple-printed photograph representing herself and her deceased twin sister.1

In a black-and-white photograph from the 1970s (fig. 1), a young girl stands before a cracked mud wall, clutching a framed image that seems to speak beyond words. The light falls softly across her face, illuminating both her stillness and the quiet intensity of what she holds. The girl, Taiwo, is actually holding the photograph depicting herself beside her deceased twin sister. The sister did not live to take the picture when they were still babies, but the photographer was able to transform imagination into reality by printing Taiwo’s image twice, side by side, within a single frame. What the camera captures and the darkroom reproduces is not simply likeness but longing made visible—a visual invocation of return. Within the frame, two toddlers sit side by side, identical in posture and dress, summoned into being through the darkroom’s alchemy of double exposure.

Here, the photograph becomes more than representation: it is a vessel of memory, a surrogate body standing in for the lost twin. The image performs the ritual labor once carried by the carved ìbejì her mother would have cherished, transforming silver salts and pigment into a spiritual medium. In this quiet act of holding, the boundaries between presence and absence blur. One twin is gone, yet through this image—tinted by grief, devotion, and the faint shimmer of hand-applied color—her spirit endures, luminous within the photograph’s fragile surface.

In Yorùbá cosmology, where twins (ìbejì)2 are regarded as sacred, such portraits exceed mere commemoration. They function as ritual technologies,visual acts of spiritual equilibrium, mourning, and metaphysical repair. In that moment, the darkroom becomes a shrine; the photograph, a surrogate body. Before photography, the Yorùbá carved wooden figures (ère ìbejì) to represent departed twins, embodying presence through stylized form (fig.2). These sculptural surrogates served as tactile conduits between the living and the spirit world, each polished and adorned as if alive.

Ere Ibeji - wood sculpture

Fig. 2. Ere Ibeji with Beaded Gown (Yoruba twin figure), Wood, fabric, glass beads, string, metal, pigment, H: 36.0 cm, W: 9.5 cm, D: 9.0 cm. Fowler Museum at UCLA. https://jstor.org/stable/community.12004960.

Fast forward to 2018: a new image materializes—sharp, hyperreal, and unmistakably digital. Created by Bénédicte Kurzen and Sanne De Wilde3, their collaborative series on Yorùbá twins revisits this visual and spiritual terrain through the lens of contemporary technology. In this photograph (fig. 3), a young girl meets the viewer’s gaze, her likeness mirrored and doubled through software. The symmetry is deliberate—an homage to twinship, rendered not in the darkroom but on the digital screen. Photoshop replaces the enlarger; code performs the ritual labor.

Twin Image - digital manipulation

Fig. 3. Twins at Igbo-Ora, Nigeria, Digital image by Bénédicte Kurzen and Sanne De Wilde as part of the series Land of Ìbejì.Published in The Guardian, 12 May 2019.

From hand-carved ère ìbejì to analog portraiture to digital manipulation, the act of duplication no longer merely restores presence—it extends it, transforming remembrance into possibility and ritual into a new form of technological devotion.

Through such digital reanimations, the dialogue between ritual and reproduction extends beyond the material to the virtual. Artists continue to navigate this liminal space, where ancestral cosmologies encounter algorithmic systems and the act of remembrance becomes a gesture of creative resistance. These works trace a continuum of visual thought that resists erasure by adapting across media. Memory, in this context, is not an archive of the past but a living process—reconfigured, remixed, and projected into the digital future.

Notes
  1. Sprague, Stephen. “Yoruba Photography: How the Yoruba See Themselves,” African Arts 12, no. 1 (1978): 253.
  2. For further discussion of Twins Images in Yoruba traditions, see George Chemeche, John Pemberton, and John Picton, Ìbejì: The Cult of Yoruba Twins. Hic Sunt Leones II. Milan: 5 Continents Editions, 2003.
  3. Bénédicte Kurzen (b. 1980) and Sanne De Wilde (b. 1987) are award-winning photographers whose collaborative projects, including Land of Ìbejì, merge documentary and conceptual practices to explore cross-cultural mythologies, identity, and perception through experimental and visually poetic storytelling.
❌