普通视图

Received today — 2026年4月12日1 - DSH(Digital Scholarship in the Humanities)

Movers and haters: dynamic network visualization and analysis of an historical French controversy

2026年1月13日 08:00
Abstract
This article presents the results of a novel, mixed-methods project that studied an eighteenth-century French debate about education reform using approaches from Social Network Analysis and contemporary Controversy Mapping, alongside traditional humanities methodologies. Toggling between close reading of over 200 texts, and distant reading of the whole debate as a network, revealed that this little-known polemic—which we call the Querelle des collèges—was one of many early modern quarrels, and that its central actors were not straightforwardly those one might have expected. SNA uncovered the paradoxical role in the debate of the celebrity philosophe, Jean-Jacques Rousseau, as well as the previously overlooked participation of politicians, little-known teachers, and chronologically unexpected points of reference in this late-Enlightenment debate. Moreover, we underscore the value of analysing an historical controversy not just as a static network, considered at its end state, but as a dynamic entity that develops and changes over time. We show how a network can be dynamically visualized, to serve both exploratory and communicative ends. Ultimately, this article proposes a ‘bifocal reading’ method for historical controversy mapping, and identifies directions for future comparative research that might help us understand the dynamics of controversies from the past to the present.

From romance to reality: lexical and topic evolution in Chinese popular lyrics through digital humanities approaches

2026年1月13日 08:00
Abstract
This study explores the lexical and topic evolution of Chinese popular music lyrics from 2000 to 2025, reflecting changing public sentiments and broader socio-cultural transitions. While prior research has largely overlooked non-Western lyric corpora, this study addresses that gap by constructing the Chinese Popular Music Diachronic Corpus, comprising 1,560 representative popular songs sampled across twenty-five years. Using digital humanities approaches, it integrates word frequency analysis, readability metrics, and BERTopic-based topic modelling to trace lexical and topic evolution of lyrics over time. Findings reveal that high-frequency words consistently revolve around emotional expression, individual introspection, and interpersonal dynamics. Type-token ratio (TTR) has increased significantly, indicating growing lexical diversity, while textual complexity shows fluctuation, reflecting stylistic shifts in song writing. Topic analysis identifies twelve major topics, including romantic love, future aspirations, and urban life, with topic structures evolving from natural and temporal abstraction to emotional concreteness and psychological introspection. Recent years show a significant rise in negative emotional topics and self-referential artistic motifs. This study contributes a novel methodological framework for interdisciplinary research at the intersection of music, language, and society, and underscores the value of digital humanities tools in mapping collective emotions and cultural change through large-scale lyric analysis.

Investigating the structural formation system of modern simplified Chinese characters from a complex network perspective

2026年1月13日 08:00
Abstract
Chinese characters hold pivotal significance in the development of Eastern civilization. While their structural formation exhibits a systematic nature, system science approaches are seldom seen in analyzing the formation system. Therefore, based on complex network methods, this study aims to examine macro-scale network features and the essential system properties of modern simplified Chinese characters. Specifically, three types of networks were constructed: co-occurrence, directed, and weighted networks, along with their random counterparts. In these networks, primitive components served as nodes, and their relationships as edges. Subsequently, seventeen general network metrics were measured to analyze the network features, enabling deeper discussion of the system properties. Research results show that component networks exhibit five distinct network features compared to their random counterparts, including small-world feature, scale-free feature, disassortative mixing, high centrality, and hierarchical organization. These features demonstrate that the formation system displays three properties, namely complexity, robustness, and economy, which emerge respectively from organized, close-knit, and efficient component combinations. These findings serve as a significant supplement to the empirical research on the structural formation system of Chinese characters.

Topic counts and quality in topic models for historic corpora

2026年1月13日 08:00
Abstract
Topic modelling methods enable the identification of potential topics within a corpus of historical texts; in particular, they enable the identification of latent topics that are not described just by a single word. Like so many computational methods for the automatic processing of historical text corpora, they come with a number of parameters with which the method can be tuned and adapted. Each change in the settings of any of these parameters will generate a new set of topics that will differ in larger or smaller ways and which may be qualitatively better or worse. One of the main parameters for tuning topic models is setting the number of topics to be generated. In this article, we present an analysis of the impact of the number of topics on the quality of topic models for two historical text corpora. Two manual evaluation approaches are combined with an automated evaluation metric, and based on the results, we propose a formalized process for choosing the final set of parameters for a topic model. The process ensures the quality of the final model, while minimizing the amount of manual evaluation work. The more structured process also allows for better documentation of the parameter choices and in that way enables better replicability of any research using topic models.

Decoding AI discourse: contrastive analysis of media representations in German and Chinese contexts (2018–23) using machine learning techniques

2026年1月11日 08:00
Abstract
Previous studies have primarily focused on artificial intelligence (AI) discourse within specific language media, with limited contrastive analyses across different cultural contexts. This study analyzes the representation of AI in German and Chinese media discourses from 2018 to 2023, employing a modified version of Ruth Wodak’s discourse analysis framework alongside advanced machine learning methods. Our findings indicate that both German and Chinese media concentrate on AI issues pertinent to their regions. Chinese media adopt a perspective strategy by frequently quoting political figures, particularly President Xi Jinping, and consistently maintain a positive stance on AI. Conversely, German media, especially after the launch of ChatGPT, highlight high-tech figures and adopt a more critical and cautious approach toward AI. These differences in media discourses arise from distinct media cultural systems shaped by their respective contexts. In China, media outlets are party-affiliated and promote a narrative framing AI as a national strategic endeavor crucial for economic growth, reflecting governmental viewpoints. In contrast, media from Germany, Austria, and Switzerland present diverse perspectives on AI, expressing significant concerns about its potential risks. This study offers valuable insights for interpreting and formulating AI policies across different nations.

Digital technologies for Early Modern Portuguese manuscripts: an experiment with two ‘Handwritten Text Recognition’ software applications

2026年1月9日 08:00
Abstract
This article presents the results of an experiment involving the automatic transcription of sixteenth-century Portuguese Inquisition manuscripts using two Handwritten Text Recognition (HTR) applications. The experiment focused on modelling the handwriting of a single scribe and comparing the machine-generated transcriptions with human-made diplomatic transcriptions, from a palaeographic perspective. The tools used were Transkribus (developed by ReadCoop) and Lapelinc Transcriptor (developed by the Corpus Linguistics Laboratory at the State University of Southwest Bahia, Brazil), both based on machine learning techniques. The results reveal that palaeographic expertise plays a crucial role not only in interpreting the output but also in improving HTR model performance. By addressing both practical and scholarly challenges, this study contributes to ongoing discussions in the field of so-called digital palaeography, particularly regarding Early Modern Portuguese manuscripts.

Measuring sounds from the East: digital approaches to commonality and specificity in Chinese Mandarin pop lyrics

2026年1月8日 08:00
Abstract
Since China’s reform and opening-up in 1978, its popular music industry has experienced rapid development and emerged as a significant component of global contemporary pop music. This study conducts a systematic analysis of contemporary Chinese Mandarin pop lyrics (1978–2019) with digital methodologies. A stylometric measurement, the Busemann coefficient, was employed to demonstrate the lyrics’ high textual activity from both static and dynamic perspectives. This finding supports the lyrics’ stylistic essence as narratives. Besides, sentiment analysis of lyrics reveals an overall positive tone, though a significant watershed occurred around the year 2000, marking a divergence between the two periods. Furthermore, analyses with topic modeling demonstrate diversified topic distribution patterns across these periods, reflecting both the universality of popular culture and the distinctive Chinese characteristics shaped by cultural and temporal contexts.

Exploring the cross-cultural influence of song Jiangnan Chan Buddhist poetry on Goryeo Han poetry through text visualization

2026年1月7日 08:00
Abstract
This study examines the influence of Jiangnan Chan poetry on Goryeo Han poetry through a comparative analysis of three representative poets from four dimensions: vocabulary imagery, spatiotemporal deconstruction, thematic evolution, and emotional expression. It proposes a text visualization-based methodology employing Term Frequency–Inverse Document Frequency (TF-IDF)-based keyword extraction, Latent Dirichlet Allocation (LDA)-based topic modeling, and sentiment analysis, supplemented by neologism mining and Kimi Intelligent Assistant (KIMI)-based word segmentation using AI large models. The results reveal four key findings: (1) both poetries share similar vocabulary and imagery centered on nature and Chan Buddhism; (2) both reflect an interweaving of eternal perspective and historical consciousness, alongside a dialectic between cosmic vastness and human reality; (3) in terms of thematic evolution, the original focus on “Nature and Chan” in Jiangnan poetry was transformed in Goryeo Han poetry into a trinitarian framework of “national protection, Dharma propagation, and social salvation”; and (4) Jiangnan poetry emphasizes transcendence and emptiness, while Goryeo Han poetry incorporates depictions of secular life and historical memory. The findings also suggest that the proposed methodology is effective in improving the accuracy of Chinese word segmentation for ancient poetry texts. This study underscores the value of quantitative comparative analysis in visualizing the intercultural diffusion and transformation of Chan Buddhism poetry.

A GIS view of the word orders of numeral bases and numeral classifiers in Kuki-Chin languages

2026年1月1日 08:00
Abstract
Tibeto-Burman (TB) languages are known for their diversity in numeral systems and classifiers. This paper investigates the Kuki-Chin (KC, also called South-Central Tibeto-Burman) languages in TB’s Northeast Indian Areal Group and provides a comprehensive description of the four types of numeral systems: base-final, base-initial, base-split, and no-base, and the four types of classifier systems: CL-final, CL-initial, CL-split, and no-CL. A thorough survey of the literature, aided by fieldwork, enables a GIS view of the distribution of the different types of languages. Also, KC languages conform to Greenberg’s Universal 20A: N does not come between Num and CL, and the numeral base and classifier harmonize in word order. We propose the following hypothesis for Proto-KC (PKC) and Proto-TB (PTB) to account for the variation in numeral bases and classifiers in KC, that is, PKC, like PTB, is base-initial and without numeral classifiers, and the current variation in numeral bases and numeral classifiers in KC is due to horizontal external influence via language contact. Bayesian phylogenetic inference tests, however, show mixed results, as PKC is likely to be base-initial, as expected, but CL-initial, and PTB is likely to be base-final but CL-initial. Thus, we plan to conduct a comprehensive survey of all TB languages and further explore the state of PTB in terms of numeral bases and classifiers.

Reading the unreadable: creating a dataset of 19th century English newspapers using image-to-text language models

2025年12月30日 08:00
Abstract
Oscar Wilde said, ‘The difference between literature and journalism is that journalism is unreadable, and literature is not read’. Unfortunately, the digitally archived journalism of Oscar Wilde’s 19th century often has no or poor quality Optical Character Recognition (OCR), reducing the accessibility of these archives and making them unreadable both figuratively and literally. This paper helps address the issue by performing OCR on ‘The Nineteenth Century Serials Edition’ (NCSE), an 84k-page collection of 19th-century English newspapers and periodicals, using Pixtral 12B, a pre-trained image-to-text language model. The OCR capability of Pixtral was compared to four other OCR approaches, achieving a median character error rate of 1%, 5x lower than the next best model. The resulting NCSE v2.0 dataset features improved article identification, high-quality OCR, and text classified into four types and seventeen topics. The dataset contains 1.4 million entries, and 321 million words. Example use cases demonstrate analysis of topic similarity, readability, and event tracking. NCSE v2.0 is freely available to encourage historical and sociological research. As a result, 21st-century readers can now share Oscar Wilde’s disappointment with 19th-century journalistic standards, reading the unreadable from the comfort of their own computers.

Pre-digitally decoded alice’s adventures: an early record of readability computations

2025年12月30日 08:00
Abstract
This article offers a new reading of Lewis Carroll’s Alice’s Adventures in Wonderland and Through the Looking-Glass, proposing that Alice’s journey is an illustration of the process of learning to read. Using metrics, including mean length of utterance, lexical profiling, and the Gunning Fog Index, this study evaluates the linguistic progression of Alice’s utterances and their readability to trace parallels between her experiences and the stages of reading development: pictorial, phonological, and orthographic. Alice in Wonderland symbolizes a “pre-reading” phase, while Through the Looking-Glass represents her progression into reading fluency and meta-linguistic play showcasing how Carroll, who reportedly struggled with reading himself, strategically designed the complexity of the text to align with the cognitive development of the reader. By combining literary analysis with computational tools, the study reveals the paradox of reading in Carroll’s work: a simultaneous breaking and following of linguistic norms. It may also enhance understanding of Carroll’s literary innovation in how his narratives mirrored the cognitive development of the reader and situate his work as a precursor of modern readability frameworks.

Making trans-regional, cross-platform streaming catalogues a reality: Arguing for linked SVOD open data

2025年12月29日 08:00
Abstract
Despite a constant growth as a mainstream way of consuming media, subscription-based video on-demand (SVOD) platforms have remained under-studied in a specific yet core aspect of their service: their contents. Potential quantitative studies have been prevented by the unstable nature of their catalogues, both in time and space, and the lack of will to publish publicly accessible catalogues of these platforms’ works. This hinders many potential Digital Humanities projects pertaining to modern digital media, causing a lack of research by lack of data. After studying available data catalogues and their caveats, we suggest and detail the implementation of a simple model proposal to standardize, archive and eventually publish such metadata, proving that all technical aspects and benefits are already well-defined as well as commonly used, and that this kind of data would be particularly suited for Digital Humanities projects and Semantic Web standards. We thus further argue for more public initiatives to support work on open media catalogues, and push for more data releases from major SVOD platforms that would benefit both scholars and the general public without much burden for their service.

How inquisitive was medieval inquisition? A network-analytical approach to information flow in the trials for Brandenburg-Pomeranian Waldensians (late 14th c.)

2025年12月25日 08:00
Abstract
In this study, we analyse a medieval inquisitorial campaign by conceptualizing it as an information process. We investigate how investigative decision-making was structured by testimony-driven data gathering. Our case study is Peter Zwicker’s well-documented 1393–4 anti-Waldensian inquisition in Stettin. We explore the reconstruction of the inquisitor’s strategy by examining the sequencing of interrogations and subsequent actions based on suspects’ names appearing in previous testimonies. We assess the extent to which the process was adaptive, with suspects summoned dynamically based on new testimonies versus being guided by pre-existing knowledge. We apply network analysis and temporal visualization to incriminations operationalized as network data and use statistical methods to map the feedback between information retrieval and decision-making. Our analysis follows sequences of interrogations where deponents incriminated others on specific dates. This allows us to identify inquisitorial responses to accumulated data, distinguishing between planned strategies and reactive decisions based on new testimony. The challenge of missing data adds complexity and theoretical engagement. A substantial portion of the depositions is lost, yet we can estimate the original volume, enabling an assessment of the impact of data loss. We employ data imputation simulations to test how missing records might obscure evidence of follow-up strategies. The results indicate that network visualization must be complemented by statistical analysis. Comparisons between deponents’ testimony types reveal an interplay between structured pre-planning and selective incorporation of new intelligence. By conceptualizing inquisitorial work as a dynamic information process, this study proposes a novel methodological framework for analysing historical trial documents.

Metricizing diaspora dualities: a comparative quantitative analysis of poetry and prose in Southeast Asian Chinese literature

2025年12月22日 08:00
Abstract
The intersection of quantitative literary studies and diaspora research presents critical opportunities for examining linguistic divergence between immigrant literatures and their cultural origins. While digital humanities scholarship has increasingly embraced computational methods, systematic applications in comparative diaspora studies—particularly those focusing on Southeast Asian Chinese communities—remain underdeveloped. This study bridges this gap through a corpus-driven analysis of 372 literary works (comprising 240 poems and 132 prose pieces) using natural language processing pipelines, hierarchical clustering, and one-way analysis of variance. The common features we proposed have played a greater role in distinguishing Southeast Asian Chinese diaspora new poetry from Chinese mainland poetry. We found that the new poetry of Southeast Asian Chinese diaspora new poetry is not as natural as that of Chinese Mainland, showing greater freedom and diversity in vocabulary selection. Specifically, the vocabulary usage rate of Southeast Asian diaspora new poetry is relatively high, while the grammar usage rate is low. These computationally derived patterns not only establish measurable criteria for distinguishing diaspora literatures through machine-learning classifiers, but also challenge conventional assumptions about linguistic “naturalness” in cross-cultural contexts. Our methodology, which integrates computational stylistics with diaspora theory, provides a replicable analytical framework for studying literary migration phenomena, thereby advancing both digital humanities methodologies and Southeast Asian literary studies.

Digital extraction and segmentation of intangible cultural heritage paper-cut patterns

2025年12月16日 08:00
Abstract
As a unique intangible cultural heritage in Chinese traditional culture, paper-cut art is now facing the dilemma of inheritance and development, and with the death of paper-cut artists and the damage and loss of paper-cut works, some paper-cut types also disappear. Therefore, the digital protection of paper-cut art is urgent. This research is based on the improved genetic algorithm adaptive optimization of Canny operator threshold and Grab-Cut algorithm to achieve intelligent extraction and segmentation of intangible cultural heritage paper-cut patterns. First, the collected paper-cut images are smoothed by bilateral filtering to improve the image quality. Second, based on the Canny operator optimized by the improved genetic algorithm, the overall contour of the paper-cut pattern is extracted. Then, the Grab-Cut algorithm is designed to segment the contours of decoupage design elements in a targeted way, and the vector image is processed by CDR software to obtain an independent editable vector image. Finally, the contour extraction experiments of different kinds of paper-cut images are compared by different algorithms. The results show that the method proposed in this article can effectively detect the true edge of the pattern in paper-cut images and complete the extraction of the pattern contour, and the accuracy of the segmentation pixels of each design element of paper-cut pattern is greater than 96 per cent. It provides a new method for the digital protection and innovative application of intangible cultural heritage paper-cut art.

Text, author, and reader: mutations of Arabic creativity in the digital age

2025年12月16日 08:00
Abstract
The triad of the reader, author, and text is undergoing significant transformations with the advent of the new digital paradigm and the emergence of born-digital literary forms. Since the late 1980s, when electronic literary studies began to take shape, ‘reader agency’ has expanded (Murray 2018: 6–7), and authorship has evolved into a distributed, shared, and collaborative activity between the author and the reader/interactor. With the advancement of digital communication technologies and the rise of artificial intelligence (AI), this triad has been assigned even more advanced capacities and roles. This study examines the dynamics of interaction and collaboration within the reader–author–text triad, analysing their manifestations in several Arabic literary texts against the backdrop of English texts. The primary aim is to trace the evolution of reading, authorship, and textuality as they transition from traditional literacy to digitality, offering a conceptualization and definition of their new roles and horizons in Arabic literature. An interdisciplinary approach, grounded in digital literary studies and cultural critique, will be employed. The theoretical framework will address related concepts such as interactivity, distributed authorship, and augmented and immersive reading. The practical analysis will explore how these creative tactics and dynamics have opened new avenues and expanded the boundaries of authorship and reading in various Arabic literary texts.

Diff-BAM: a generalized adaptive diffusion model for cultural heritage image inpainting

2025年12月13日 08:00
Abstract
Cultural heritage, as a precious carrier of history and culture, embodies profound artistic and historical value. However, over time, cultural heritage items are vulnerable to varying degrees of damage due to improper preservation. Existing image-inpainting methods have limitations in detail recovery and inpainting efficiency, making it difficult to meet the demands for high precision and efficiency. This article proposes an image-inpainting method for cultural heritage based on adaptive denoising and multi-scale filtering (Diff-BAM). The model improves inpainting effectiveness and efficiency by dynamically adjusting the reverse denoising process and introduces an edge-aware strategy to address the issue of rough edges in inpainting the image. Additionally, the model incorporates a multi-scale filtering mechanism within the U-Net architecture to further enhance the inpainting details. Due to the lack of publicly available Thang-ka datasets, the article uses a self-built dataset and a landscape painting dataset for experimental validation. Experimental results show that Diff-BAM outperforms the mainstream methods in terms of inpainting details and efficiency. This study demonstrates the potential application of image inpainting technologies in the field of digital culture for the preservation of cultural heritage, providing an efficient and precise solution for artifact inpainting.
❌