普通视图

Received yesterday — 2026年6月5日

Story from the Research Trenches: Bonnie Varlet on Transforming Research Workflows with Zotero

2026年6月4日 17:28

As part of our blog series, “Stories from the Research Trenches,” we often invite researchers and colleagues to share their personal experiences. For this post, we are delighted to hand the floor to Bonnie Varlet from the KU Leuven Cultural Studies Research Group, who offers a closer look at how she integrates Zotero into her research workflow.

Written by Bonnie Varlet

New technology, like machine learning systems, are being deployed across a wide range of institutional backgrounds. Hospitals use it to flag diagnoses, and archives use it to catalog their collections. Modern machine learning has made great leaps in its capabilities. However, these tools do not exist in a vacuum.

My research looks at what governs the relation between institutions and the technology they use. For example, when a machine learning system is introduced into a new operational environment, it changes workflows, changes what skills are required, and creates new dependencies that did not previously exist. Validation processes that were designed for human-scale output volumes become inadequate when a system can produce ten times more at the same time. Accountability structures built around individual judgment become harder to maintain when outputs are generated algorithmically. At the same time, organizations also change technology. Institutional priorities shape which systems get acquired and how they are used, and informal workarounds created by staff can become de facto operating procedures. The system produced by this process is often meaningfully different from the system initially deployed.

These processes do not happen independently. They influence each other; it is iterative, and it compounds. There is currently no widely adopted methodology for tracking this relationship in a way that is observable in comparable and replicable terms. Most existing research either examines technology deployment in isolation or analyzes governance structures without tracing their operational consequences. This remains largely unmapped, which makes it difficult for organizations, regulators, and researchers to fully understand how technology and institutions interact in practice. My work aims to help develop a way of systematically observing these interactions as they take shape in real operational contexts.

Tackling a project like this, especially my first one done independently during my Fulbright, was also a lesson in how small logistical problems can get you off course. Over the course of the project, I collected tens of papers, books, website links, and other sources. At the start, when it was only a couple of papers, it was manageable, but as the project matured, it became increasingly difficult to stay organized. This became particularly challenging given the breadth of the topic, which required me to move between technical material, governance literature, and case-based examples.

At that point, I was lucky to have resources available through KU Leuven, such as the Artes Research team, where I was introduced to tools that could help manage my workflow. I decided to try Zotero, which was easy to set up and start using immediately. What changed right away was that I stopped getting lost in my sea of sources. All my papers, books, and links were kept in one place dedicated to my project, and I did not have to go back and look up publication details because the browser extension stored that information when I saved a new source.

As I explored Zotero further, I also shifted how I organized my work. Because it makes it easy to tag and sort sources, I grouped them in the manner in which I used them. Sources used for case studies were given a case study tag, while sources that provided foundational knowledge were grouped separately. Since my project is broken up into different parts, this made it easier to see where I was pulling information from and how it influenced my analysis. In a project that tries to trace relationships between governance decisions and technical systems, being able to clearly track how different types of sources contributed to different parts of the argument was particularly useful.

I also began annotating and brainstorming directly within the same program, instead of splitting my workflow across multiple tools. For example, I would highlight a quote in a source, save it, and add a note explaining how or why it was useful. This made it easier to trace my thought process and how I arrived at certain conclusions, both for myself and in the final project.

Looking back, I wish I had reached out for help managing my workflow sooner. Not only with this specific software, but also with the more general question of how to structure a research project of this scope. I spent a significant amount of time at the start recreating approaches to organizing my work, rather than focusing on the project itself.

I would not treat exploring research management tools as a last resort. No matter the field, the people around you have likely encountered similar challenges and found better ways to address them. Trying something new partway through a project should not feel like a disruption. In my case, it was what allowed the second half of the project to go substantially better than the first, and made it easier to carry out research that depends on systematically tracing complex relationships between institutions and technology. It also made it possible to more clearly trace how different sources, ideas, and cases connect—something that is central to my research itself, which focuses on understanding how relationships between institutions and technology take shape over time.

New Publication Details Cooperative Broadband and Community-Supported Infrastructure

2026年6月5日 03:55

DH Strategist and head of CDH graduate programs Grant Wythoff has published a new chapter in the latest Debates in DH volume, Critical Infrastructure Studies and Digital Humanities. The chapter — Alternative Infrastructures for Digital Equity: Community-Based Internet Access —  details the creation of Philly Community Wireless (PCW), a community-controlled network. PCW offers free public Wi-Fi across the Philadelphia neighborhoods of Norris Square, Fairhill, and Kensington. Today, dozens of homes and businesses host PCW rooftop antennas. The network provides access to 50+ city blocks and 7,600+ monthly users. It is maintained by dozens of volunteers, community partners, and a dedicated staff.

PCW was first launched in the summer of 2020 with the support of a Rapid Response Grant from the Princeton Humanities Council and student interns from the Pace Center for Civic Engagement’s RISE Program. During the Covid pandemic’s earliest days, a group of organizers, researchers, librarians, and neighbors — including Wythoff and the chapter’s co-authors Alex Wermer-Colan, Devren Washington, and Allan Gomez — gathered to address the problem of internet accessibility at a time when most of daily life had moved online. Their goals were to expand internet access, grow tech literacy, and build community autonomy.

The chapter reflects on the lessons learned in building PCW’s technical and social infrastructures, puts concepts from digital humanities and community technology into conversation, and finally, gestures toward the future of community-supported infrastructure. The authors write:

Our question is how infrastructure cooperatives that provide community mesh networks for urban environments can facilitate collaboration between neighbors while fostering community consensus on network architecture, organizational structure, and the geographic redistribution of digital resources. Ultimately, our theory of change is that access leads to adoption in a fuller social sense: if one empowers the communities most affected by the biases and harms of the tech industry to own and control last-mile internet infrastructure in their neighborhood, such communities will be equipped not just to use broadband technology, but to advocate for better outcomes in the way that such technologies are utilized, governed, and regulated.

You can read the open access Manifold edition of the chapter now: Alternative Infrastructures for Digital Equity: Community-Based Internet Access

会议通知 | 第十届中国语言智能大会暨第二届语言脑机接口镜湖论坛

2026年6月5日 00:00

徐惠 2026-06-05 00:00 江苏

7月3日至7月5日川外举办语言智能大会,聚焦语言智能与脑机接口前沿研讨。

转载自“川外学坛”

会议介绍

为推动语言智能与脑机接口领域的交叉创新,促进学术交流与学科建设,“第十届中国语言智能大会暨第二届语言脑机接口镜湖论坛”将于2026年7月3日-5日在四川外国语大学召开。本次会议由中国人工智能学会(CAAI,https://www.caai.cn/)、中国语言智能研究中心主办,CAAI语言智能专委会、四川外国语大学语言智能学院(通识教育学院)及AI+领域应用关键技术北京市重点实验室承办。

本次大会特设“语言智能本科专业建设院长论坛”“青年学术论坛”;将邀请语言智能与脑机接口领域知名专家、学者作主旨发言。欢迎各高等院校、科研机构、中小学及行业产业领导、专家、学者、同仁拨冗与会,共襄盛会。

组织单位

主办

中国人工智能学会

中国语言智能研究中心

承办

CAAI语言智能专委会

四川外国语大学语言智能学院(通识教育学院)

AI+领域应用关键技术北京市重点实验室

协办

重庆市人工智能学会

重庆市沙坪坝区国际语言脑机接口联合研究院

支持

中国残疾人康复协会语言障碍康复专业委员会

会议初步议程

报到地点

重庆市沙坪坝区沙磁公馆

会议主要议题

1. 语言智能科学研究

2. 语言智能学科体系构建研究

3. 语言智能本科专业建设

4. 语言脑机接口

5. 语言智能教育

6. 中小学语言教育智能体开发与应用

7. 其他相关议题

主旨报告专家(更新中..)

(以姓氏拼音为序)

侯文生(重庆大学)

姜   孟(四川外国语大学)

李舟军(北京航空航天大学)

林鸿飞(大连理工大学)

刘   杰(北方工业大学)

吕学强(北京信息科技大学)

王国胤(重庆师范大学)

王丽丹(西南大学)

吴   庄(广东外语外贸大学)

尧德中(电子科技大学)

尹   明(云南财经大学)

余正涛(昆明理工大学)

赵   晨(广东外语外贸大学)

周   强(清华大学)

周建设(首都师范大学)

周俊生(南京师范大学)

会议注册

1. 会议费用:本次会议不收取会务费,交通食宿自理(会务提供酒店推荐信息)。

2. 注册方式:通过下方二维码扫码填写报名信息。

3. 会议规模:为保证质量,会议将控制参会人数。

4. 联系电话:18182211733(江老师);15023537602(郝老师)

酒店信息(仅供参考)

1. 沙磁公馆:

约458元/晚(以预订实际为准)

电话:17723999508(余经理)

2. 沙磁时光酒店:

约336元/晚(以预订实际为准)

电话:17723999508(余经理)

3. 维也纳3好酒店:

约308元/晚(以预订实际为准)

电话:15523222536(吴经理)

4. 桔子酒店:

约430元/晚 (以预订实际为准)

电话:13696702083(王经理)

报名方式

请扫描下方二维码,填写参会回执:

会议期间,CAAI语言智能专委会拟筹备成立“语言智能学术专家委员会”“青年工作委员会”,热忱欢迎广大专家、学者、同仁参会加盟。加盟方式(点击链接,下载填写相关表格):

语言智能学术专家委员会:

学术专家委员会招募启事.docx

关注公众号

川外学坛

比特人文

投稿邮箱:dhbase@126.com

扫码关注 获取更多资讯

图片

阅读原文

跳转微信打开

Registration Now Open for the EADH 2026 Conference

2026年6月4日 21:23

Registration Now Open for the EADH 2026 Conference

​Registration is now open for the 2026 Conference of the European Association for Digital Humanities (EADH), which will take place at the Jagiellonian University in Kraków, Poland, from 15 to 19 September 2026.

As one of the leading international events in the field of Digital Humanities, the conference will bring together researchers, scholars, and practitioners from across Europe and beyond to discuss current developments, innovative projects, and emerging challenges in the discipline.

The programme will feature keynote lectures by Melissa Terras and Tomasz Majkowski, an ERC-sponsored plenary session, and a rich selection of contributions, including 60 long papers, 60 short papers, and 40 posters.

Early Bird registration rates are available until 30 June 2026. The basic conference fee includes participation in all conference sessions, coffee breaks, and lunches. A full registration option is also available and includes the conference banquet and a guided tour of the University Museum.

For registration, please visit:
https://eadh2026.systemcoffee.pl/

For more information about the conference programme and venue:
https://eadh2026.confer.uj.edu.pl/

Follow EADH 2026 on social media for updates and announcements:

Received before yesterday

转载 | 观澜·驻访学者沙龙(第16期)回顾:沟通、照护与制度:医学人文的跨学科实践

2026年6月3日 09:00

2026-06-03 09:00 广东

5月26日,观澜・驻访学者沙龙(第16期)在我校深圳校区文学园5栋顺利举办。本次活动由我院2026年春季学期校内驻访学者徐翌茹教授发起召集。来自法国里昂第二大学、广州中医药大学以及我校十二个院系和附属医院的师生,围绕“沟通、照护与制度:医学人文的跨学科实践”这一主题,开展了深度研讨与思想交流。

徐翌茹教授主持沙龙

圆桌论坛一

《临床医学与医学人文导论》课程教学创新与实践

第一场论坛由中山大学附属第一医院肝胆胰外科主任医师张昆松围绕医学人文课程教学改革展开分享。张昆松主任系统梳理了团队五年来分两阶段推进医学人文课程建设的改革历程。第一阶段,团队依托优质临床师资构建特色教学体系、创新课程评价机制。第二阶段,团队深化升级教学改革,整合多学科资源,搭建起覆盖全年级、全学制的阶梯式医学人文育人框架。同时,团队积极打造“AI+医学人文”的新型教学模式,赋能智慧医学教育发展。张昆松主任总结,医学人文教育需以沉浸式体验为核心抓手,推动教学模式从教师“教书”向精准“育人”转变。

随后中山大学附属第六医院林琳医生和中山大学国际翻译学院郭聪副教授进行评议。林琳医生认为,医学人文教育是贯穿医学人才培养全过程的系统性工程,对塑造医学生职业自信、规避行业发展风险、夯实医疗行业人文根基具有重要意义。郭聪副教授坦言,目前校内医学翻译相关课程仍存在作业设计不完善、学情研判不精准等教学难点,她表示将充分借鉴本次课程改革在培育学生专业自信、开展反思性实践教学等方面的优秀经验,进一步优化培养方案。

左右滑动查看第一场圆桌论坛现场

圆桌论坛二

社会科学研究与政策落地——以普惠性安宁疗护建设为例

第二场论坛中,中山医学院程瑜教授以普惠性安宁疗护建设为案例,阐释了社会科学立足民生、赋能政策落地的核心使命。程瑜教授介绍,安宁疗护秉持“身无痛苦、心无遗憾、尊严离世”理念,是民众享有的基本善终人权,可有效杜绝临终过度医疗、节约医疗资源。当前,我国安宁疗护试点正逐步推广,但仍面临大众死亡认知薄弱、服务供给不足、法律伦理体系尚不完善等问题。程瑜教授团队创新构建了卫健、民政双轨服务模式,搭建了医社家一体化服务体系,践行了社科研究扎根现实、服务大众、助力政策落地的初心与担当。

随后,中山大学护理学院夏薇副院长和中山大学政治与公共事务管理学院王楠副教授进行评议。夏薇副院长表示,安宁疗护项目体系复杂,涵盖患者个体、家属、社会多方主体、全流程时间四大维度,推进中充斥多重现实矛盾。王楠副教授对程瑜教授跨学科、跨行业、跨层级的探索历程深有共鸣,认为打破学科、业界与政府间的壁垒需要极强的决心、勇气与行动力。王楠副教授表示本次论坛能够有效坚定青年学者的研究信念,并勉励广大青年学者坚守“位卑未敢忘忧国”的初心与担当。

左右滑动查看第二场圆桌论坛现场

圆桌论坛三

机构养老照护沟通:互动形式、身份建构与社会参与

第三场论坛上,法国里昂二大郭薇薇副教授结合养老院田野调研,围绕机构养老照护的沟通机制、身份建构与现实困境展开探讨。她谈到,机械化照料、不当言语易引发老人抵触,温情平等的双向互动能提升照护成效。她重点剖析养老场景中普遍存在的“老年语”现象,强调这类善意化的特殊沟通方式暗含老年刻板印象,易引发老人抵触。基于戈夫曼理论,她提出养老院的特殊场域会让老人原有社会身份消退,需在不对等的照护关系中重构多元身份。她总结,养老照护困境源于机构标准化任务逻辑与老人个性化情感需求的冲突,提出需融合任务型与社会性互动、构建以老人主体性与尊严为核心的照护体系,而资源约束下标准化管理与人文照护的平衡,仍是行业亟待破解的核心难题。

随后,中山大学外国语学院易利副教授与中山大学附属第七医院符隆文副研究员进行评议。易利副教授谈到,这项研究打破了"养老沟通仅为高效照护工具"的单一认知,强调养老场景中的沟通不仅是护理手段,更是多方主体建构身份、维护老人主体性与生命尊严的重要过程。符隆文副研究员提到,语言学与社会学、人类学均善于挖掘日常照护中隐性的沟通问题,且语言学依托直观可分析的语料证据,可有效助力从业者反思沟通短板。

左右滑动查看第三场圆桌论坛现场

我院2026年春季学期校内驻访学者也积极参与此次研讨,中山大学新闻传播学院李艳红教授、心理学系周国梅教授、马克思主义学院李珍教授、历史学系李智副教授与中山大学附属第七医院临床医学人文研究中心周殷华副主任、广州中医药大学叶芬老师立足多学科视角,围绕养老照护、医学人文教育、智能技术应用、临终关怀等议题深入交流。各位嘉宾一致主张推行个性化人文照护,关注老年人及一线照护人员的身心健康。与谈嘉宾认为,人工智能仅可作为照护辅助工具,人文关怀的核心价值不可替代。此外,嘉宾们充分肯定了医学人文实践课程的建设成果,也客观阐述了课程落地过程中的现实挑战,并倡议深化跨学科联动协作,推动医学人文理念落地生根,持续完善教学与服务体系。

左右滑动查看与谈嘉宾交流

最后,我院2025年秋季学期校内驻访学者、中山大学历史学系曹家启教授在总结发言中从历史学视角分享观点。曹家启教授认为,学界聚焦医学人文、安宁疗护等议题,是应对现实社会困境的积极探索,但生老病死与各类社会难题具有常态化、固有化特征。他强调,医学人文建设意义深远,人文素养培育不应局限于课堂教学,医学生与医务工作者更需在长期临床实践中不断修习、坚守人文初心。

曹家启教授作总结

第16期观澜·驻访学者沙龙合照

END

内容来源:中山大学人文高等研究院

编辑:陈旖旎

初审:蔡一峰

复审:林  耿、陈诗诗

审定发布:张  伟

阅读原文

跳转微信打开

转载 | “人工智能时代的博雅教育及其挑战”座谈会在中山大学深圳校区成功召开

2026年6月3日 09:00

2026-06-03 09:00 广东

2026年5月29日,“人工智能时代的博雅教育及其挑战”座谈会在中山大学深圳校区人文社会科学研究院2

     2026年5月29日,“人工智能时代的博雅教育及其挑战”座谈会在中山大学深圳校区人文社会科学研究院202致用·雅集空间举行。本次会议由中山大学博雅学院、中山大学人文高等研究院、中山大学人文社会科学研究院联合主办,邀请清华大学新雅书院讲席教授甘阳、复旦大学学术委员会副主任兼人文学部主任孙向晨教授、重庆大学博雅学院副院长唐杰副教授,以及中山大学专家学者齐聚一堂,围绕人工智能时代与博雅教育融合发展的核心命题,共同探讨人工智能时代拔尖创新人才培养的路径与策略。会议由中山大学博雅学院院长陈建洪教授主持。

     当前,人工智能技术正深刻改变教育的底层逻辑、知识传播方式与人才培养模式。面对这一变革,与会专家一致认为,博雅教育在AI时代非但不能弱化,反而应更加凸显其不可替代的“人本价值”。

     清华大学新雅书院甘阳教授指出,批判性思维、原创性思考与伦理判断力是人工智能无法替代的核心素养。他强调,要敏锐捕捉学生的焦虑心态,及时调整教学范式,通过加强小班教学、师生互动、小组讨论等方式,着重提升学生思考深度、口头表达和临场应变能力,以应对未来的不确定性。

     复旦大学孙向晨教授认为,人工智能在提供便利的同时,消除了学习中必要的“摩擦感”。要坚持经典文本细读,通过小组讨论、耕读、游学等具身性活动,强化人与人、人与文本的真实互动与思想碰撞,以此弥补技术带来的抽象化冲击,在算法时代守护教育的“厚重感”与人文温度。

     重庆大学博雅学院唐杰副教授指出,AI擅长赋予“形式”和“效率”,却无法提供基于生命体验的“意义”。要通过坚持经典阅读、读书会、实践项目等,以书院为载体,强化师生间的情感联结和具身经验,在慢节奏的深度浸润中,帮助学生找回被技术压缩的时间感与体验感。

     在自由研讨环节,与会学者普遍认为,人工智能技术的快速迭代与广泛应用,既为拔尖创新人才培养提供了广阔的空间,也带来了前所未有的挑战。博雅教育所强化的好奇心与想象力、观察力与感知力、审美力与分析判断力等核心素养,在人工智能时代更显珍贵。教育应从知识传授向能力培养转型,以学生成长为中心,既要探索技术赋能教育的潜力,更要深度挖掘AI共生时代的人文价值。

     陈建洪院长在总结发言中指出,面对人工智能浪潮,博雅学院要持续推进教育教学改革的前瞻性探索,着力推动课程体系改革、教师角色转型与教学方式优化,形成博雅教育理念下可持续、有特色的人才培养新路径,为学校的人才培养模式改革贡献鲜活的“博雅方案”。会议期间,与会嘉宾还参观了人文社会科学研究院与图书馆。

     本次座谈会的成功召开,汇聚了国内顶尖高校的智慧与办学经验,为人工智能时代博雅教育的理念创新与课程体系改革凝聚了重要共识,也为新时代高等教育的转型发展注入了新思路与新动能。

内容来源:中山大学人文高等研究院

编辑:陈旖旎

初审:蔡一峰

复审:林  耿、陈诗诗

审定发布:张  伟

阅读原文

跳转微信打开

AIUCD 2026 al via: programma, Companion e informazioni utili

2026年6月3日 12:52

Oggi, 3 giugno 2026, prende avvio a Cagliari il XV Convegno annuale dell’Associazione per l’Informatica Umanistica e la Cultura Digitale, dedicato a Digitale e Public Engagement: pratiche e prospettive nelle Digital Humanities.

Saranno tre giornate di interventi, sessioni parallele, poster, momenti di confronto, occasioni di incontro e appuntamenti associativi. Per accompagnare al meglio la partecipazione al convegno, quest’anno sono disponibili anche nuovi strumenti pensati per facilitare l’orientamento tra programma, sessioni e comunicazioni in tempo reale.

Il convegno in numeri

AIUCD 2026 conferma la vitalità della comunità italiana e internazionale delle Digital Humanities. Alla call for papers sono arrivate 151 proposte; al termine del processo di valutazione sono stati accettati 134 contributi, con un acceptance rate dell’88,7%.

Le proposte coinvolgono 371 firme d’autore, con una media di 2,46 autori per contributo, e circa 163 enti di ricerca, tra atenei, istituti e centri coinvolti. Sono inoltre 50 le candidature al Premio Gigliozzi.

La distribuzione tematica restituisce un quadro articolato delle linee di ricerca oggi attive nella comunità AIUCD: DH e co-costruzione, archivi ed edizioni, memorie, storia e patrimonio, dati e rappresentazione, testualità digitali e altri contributi trasversali alle Digital Humanities.

Il punto di accesso principale è il sito del convegno:

AIUCD 2026 Cagliari
https://aiucd2026.unica.it/

Dal sito è possibile raggiungere il programma completo, la sessione poster, le informazioni locali, gli aggiornamenti e il Conference Companion.

Il Conference Companion

Una delle novità di AIUCD 2026 è il Conference Companion, una guida digitale pensata per orientarsi più facilmente tra programma, poster, sessioni, informazioni locali e strumenti utili.

Il Companion può essere consultato da browser e installato sul telefono come web app, così da avere il programma sempre a portata di mano durante il convegno, anche offline.

Attraverso il Companion è possibile seguire le sessioni, consultare la poster session, accedere alle informazioni locali e costruire più facilmente la propria agenda delle giornate.

Il broadcast Telegram del convegno

Tra le novità di quest’anno, vi invitiamo a entrare nel broadcast Telegram riservato ai soli partecipanti al convegno, che sarà utilizzato per comunicazioni in presa diretta, aggiornamenti istantanei, news e indicazioni operative prima e durante le giornate del convegno.

La Lectio Magistralis

Tra gli appuntamenti centrali del convegno ci sarà la Lectio Magistralis di Philip Crowe dell’University College Dublin.

Il suo intervento toccherà temi che intrecciano patrimonio, public engagement, strumenti digitali, sostenibilità, comunità e co-progettazione.

La Lectio sarà discussa e commentata da Francesca Tomasi (Università di Bologna) e Arianna Ciula (King’s College London).

La sessione poster

Anche quest’anno la sessione poster rappresenta uno spazio centrale del convegno: un momento di dialogo diretto tra ricercatrici, ricercatori, studiose, studiosi e professionisti del settore.

La pagina dedicata permette di consultare i contributi selezionati, gli autori e le autrici, e le aree tematiche di riferimento. La sessione poster sarà anche un’occasione importante per conoscere lavori in corso, progetti emergenti e prospettive di ricerca che attraversano le diverse anime delle Digital Humanities.

L’assemblea dei soci AIUCD

Nel corso del convegno si terrà anche l’assemblea dei soci AIUCD, momento centrale della vita associativa.

L’assemblea è prevista per giovedì 4 giugno alle ore 17.30, in Aula Capitini.

La partecipazione all’assemblea è particolarmente importante: sarà l’occasione per condividere aggiornamenti sulle attività dell’associazione, discutere le prossime linee di lavoro e confrontarsi sul futuro della comunità AIUCD.

Le informazioni locali

Sul sito e nel Companion sono disponibili anche le informazioni pratiche relative alla sede del convegno, agli spostamenti, alla città e ai luoghi utili per chi partecipa alle tre giornate cagliaritane.

Una sezione specifica è dedicata anche all’esplorazione di Cagliari, con luoghi selezionati e indicazioni pensate per accompagnare le pause tra una sessione e l’altra.

Buon convegno

AIUCD 2026 sarà un’occasione importante non solo per discutere ricerche, strumenti, pratiche e prospettive delle Digital Humanities, ma anche per ritrovarsi come comunità: nei panel, nelle sessioni poster, nei momenti informali e nell’assemblea annuale dell’associazione.

Auguriamo buon convegno a tutte e tutti, e invitiamo chi partecipa a seguire gli aggiornamenti attraverso il sito, il Companion, il broadcast Telegram e i canali AIUCD.

L'articolo AIUCD 2026 al via: programma, Companion e informazioni utili proviene da AIUCD.

Prendere parola, prendere parte: giovani DHer e la vita di AIUCD

2026年6月2日 19:23

Il 24 marzo 2026, sulla mailing list AIUCD, è stata rilanciata una riflessione nata in occasione del convegno AIUCD2025 di Verona: come favorire una partecipazione più attiva delle socie e dei soci più giovani alla vita dell’associazione?

Per raccogliere proposte, osservazioni e bisogni in vista del convegno AIUCD2026 di Cagliari e della relativa assemblea, è stato aperto un breve sondaggio con un campo libero per i suggerimenti e con la possibilità, per chi rispondeva, di indicare se pubblicare o meno il proprio nome. Il 29 marzo 2026, dopo le prime risposte, la discussione è stata rilanciata in lista; proponiamo qui una prima restituzione dei contributi raccolti.

Le risposte arrivate finora mostrano che il tema non riguarda soltanto la vita interna dell’associazione: parlare di partecipazione giovanile significa interrogarsi anche sulle condizioni in cui si entra in una comunità scientifica, sulla possibilità di prendere parola, sul riconoscimento del lavoro interdisciplinare, sulla precarietà dei percorsi accademici e professionali, sui modi concreti in cui un’associazione può diventare uno spazio realmente accessibile.

Un primo elemento ricorrente riguarda la fiducia. Il coinvolgimento attivo può avvenire solo se chi si affaccia a un campo di studi sente di essere accolto e riconosciuto. Molte e molti giovani arrivano alle Digital Humanities dopo essersi formati in contesti disciplinari nei quali esporsi può apparire rischioso: fare una domanda, proporre un’idea, intervenire in un dibattito o prendere un’iniziativa possono essere percepiti come gesti da ponderare con estrema cautela.

Questa difficoltà non nasce necessariamente dentro AIUCD, che viene anzi descritta da più risposte come un ambiente accogliente e positivo. Nasce però nei percorsi formativi, nei settori disciplinari di provenienza, nelle gerarchie accademiche e nelle condizioni materiali del lavoro intellettuale. Per questo la domanda non è soltanto come “coinvolgere” i/le giovani, ma come costruire le condizioni perché possano sentirsi legittimati/e a prendere parola e a prendere parte. Una delle risposte al sondaggio insiste proprio su questo aspetto:

Personalmente, sono d’accordo con quanti hanno sollevato un problema “strutturale” educativo in Italia, che scoraggia l’esporsi con opinioni e idee proprie (e la capacità stessa di formularle) quando si ha a che fare con persone gerarchicamente in posizione di superiorità, attitudine che chi ha frequentato corsi in altri paesi anche europei ha potuto verificare essere molto diversa altrove.

La proposta che segue è molto concreta:

Occorrerebbero strategie pratiche per disinnescare la percezione gerarchica nei gruppi di lavoro, promuovendo occasioni di confronto diretto. Un esempio potrebbe essere quello di includere una quota “giovani” all’interno degli organi direttivi e organizzativi, in modo che il confronto possa essere vicendevolmente fruttuoso (ovvero, invitare anche la “vecchia guardia” a confrontarsi fuori dall’ambito del cerchio delle conoscenze).

Il tema della partecipazione si intreccia poi con quello del riconoscimento istituzionale delle Digital Humanities. Una risposta osserva che le DH non sono ancora pienamente riconosciute né come settore autonomo né come metodologia trasversale applicata alle discipline:

Concordo anche sul problema delle DH, non riconosciute né come settore a sé (con proprio SSD), né valorizzata come metodologia trasversale, applicata alle discipline (i cui contributi devono essere valutati almeno come pienamente disciplinari). Questo è un problema che nelle valutazioni effettivamente penalizza.

La stessa risposta collega questo nodo alla questione più generale dell’interdisciplinarità:

È un problema più vasto che riguarda l’interdisciplinarità in sé, fortemente penalizzata in Italia. Ma senza scomodare massimi sistemi, credo che chi approcci alle DH abbia già in sé una vocazione a superare il settarismo, e quindi ribadisco che il problema maggiore rimane a mio avviso quello sollevato in prima battuta.

Accanto a questi problemi strutturali, dalle risposte emergono proposte operative e facilmente discutibili in sede associativa. La prima è la creazione di uno spazio specifico per le socie e i soci più giovani:

Creare un “gruppo giovani” che sia coinvolto nella programmazione e nella realizzazione delle attività di AIUCD.

Il punto decisivo, in questa prospettiva, è che la partecipazione non sia soltanto consultiva o simbolica. Un gruppo giovani potrebbe diventare uno spazio stabile di proposta, confronto e progettazione, ma dovrebbe essere collegato alla programmazione reale delle attività dell’associazione.

Un’altra risposta sposta l’attenzione sul riconoscimento concreto del lavoro:

Fare da tramite per incarichi piccoli ma pagati.
Dare loro responsabilità nell’organizzazione di eventi.

Questa indicazione è importante perché evita di pensare la partecipazione soltanto come disponibilità volontaria. Per chi è nelle prime fasi della carriera, anche incarichi circoscritti possono produrre competenze, esperienza, visibilità e riconoscimento. Quando possibile, inoltre, il lavoro dovrebbe essere anche retribuito, proprio perché le condizioni materiali incidono sulla possibilità di partecipare.

Una risposta mette bene a fuoco un’altra esigenza: creare spazi in cui sia possibile esporsi anche senza avere già un risultato definitivo.

Riprendo volentieri questa riflessione, anche a partire dalla mia esperienza di dottoranda. Grazie ad AIUCD e alla mailing list ho avuto diverse occasioni di formazione e di confronto che sono state per me molto utili; in particolare, anche l’ultimo convegno di Verona è stato un momento ricco di stimoli, scambi e possibilità di crescita, che però rischiano poi di disperdersi nel tempo.

Il problema, qui, non è la mancanza di occasioni, ma la loro accessibilità effettiva:

Quello che percepisco, e che condivido, è che non si tratti solo di creare più occasioni di partecipazione, ma di renderle realmente accessibili, soprattutto per i/le più giovani che spesso non si sentono ancora (e mai) “abbastanza pront*”. Personalmente, una delle difficoltà maggiori è proprio quella di trovare spazi in cui sia legittimo esporsi anche con lavori non ancora definitivi.

Da qui nasce una proposta precisa:

In questo senso, potrebbe essere utile affiancare ai momenti più strutturati anche occasioni a “bassa soglia”: brevi interventi su lavori in corso, momenti di confronto meno formalizzati, che rendano più semplice iniziare a partecipare. Allo stesso tempo, credo che possano fare molto la differenza anche forme di scambio più diretto e informale: momenti di confronto tra pari, occasioni di feedback preliminare, ma anche forme di mentoring leggero e non gerarchico, che aiutino a orientarsi e a sentirsi meno “fuori posto”, soprattutto nelle prime fasi.

Questo passaggio suggerisce una direzione chiara: non tutti gli spazi dell’associazione devono avere la forma del convegno, della relazione compiuta o dell’articolo maturo. Possono esistere anche spazi per lavori in corso, dubbi metodologici, esperimenti, ipotesi iniziali, richieste di feedback. Per molte persone all’inizio del percorso, la possibilità di presentare qualcosa di non definitivo può essere il primo passo verso una partecipazione più stabile.

La stessa risposta insiste anche sulla continuità:

In questa direzione, penso possa servire pensare di rafforzare anche occasioni più continuative, come piccoli gruppi di lavoro, momenti di confronto informale o spazi di scambio regolari, che permettano di costruire relazioni nel tempo e rendano la partecipazione meno episodica, e quindi più sostenibile e produttiva.

Un ultimo nodo, molto netto, riguarda la precarietà. Una risposta invita a non leggere la partecipazione solo in termini di motivazione individuale:

A mio avviso, non si tratta tanto di una criticità interna ad AIUCD, che percepisco invece come un ambiente accogliente e positivo, quanto piuttosto di un problema legato al contesto lavorativo in cui molti giovani soci si trovano oggi a operare.

La difficoltà viene descritta come effetto di una condizione di urgenza permanente:

La precarietà costante, i lavori scanditi da consegne e la necessità di investire continuamente energie mentali e fisiche nella ricerca dell’incarico successivo producono una condizione di urgenza permanente. In questo quadro, il tempo e le risorse da dedicare a una partecipazione associativa attiva sono purtroppo molto limitati.

E la conclusione è altrettanto esplicita:

Pur essendo AIUCD una realtà inclusiva e stimolante, credo che le condizioni di incertezza e urgenza che caratterizzano molti percorsi professionali limitino la possibilità di un coinvolgimento costante.

Durante i giorni del convegno, e dopo la pubblicazione della prima versione di questo articolo (03.06.2026), è arrivata anche la seguente risposta, che insiste sulla creazione di occasioni di formazione ed eventi:

Organizzare attività di formazione, divulgazione ed eventi in cui i giovani soci possono mettersi in gioco scambiando non solo idee ma anche, e soprattutto, dubbi e criticità. Vorrei che queste occasioni siano uno spazio per aiutarsi concretamente e anche condividere competenze.

Le risposte raccolte indicano quindi alcune possibili linee di lavoro:

  • creare un gruppo giovani coinvolto nella programmazione e nella realizzazione delle attività;
  • affidare incarichi concreti, circoscritti e, quando possibile, retribuiti (è il caso, ad esempio, dell’affidamento dei lavori per il nuovo sito di AIUCD);
  • dare responsabilità organizzative reali in eventi, gruppi di lavoro, blog, comunicazione, formazione e documentazione;
  • prevedere spazi a bassa soglia per lavori in corso, idee preliminari e discussioni metodologiche;
  • favorire momenti di confronto tra pari;
  • sperimentare forme di mentoring leggero e non gerarchico;
  • rendere più continuativi i gruppi di lavoro e gli spazi informali di scambio;
  • discutere la possibilità di una presenza strutturata delle persone più giovani negli organi direttivi e organizzativi;
  • continuare a lavorare, anche sul piano pubblico e istituzionale, per il riconoscimento delle Digital Humanities e dell’interdisciplinarità.

Queste proposte saranno portate all’attenzione dell’assemblea di Cagliari (4 giugno 2026) per fare il punto e per capire quali idee possano essere tradotte in nuove azioni concrete.

AIUCD è una comunità giovane non solo perché molte dottorande e dottorandi, assegniste e assegnisti, ricercatrici e ricercatori nelle prime fasi della carriera partecipano alle sue attività, ma anche perché le Digital Humanities continuano a essere un campo in costruzione. Per questo la partecipazione delle persone più giovani non è un tema accessorio. È una condizione per il futuro dell’associazione e per la crescita delle DH in Italia.

Chi desidera contribuire può ancora farlo attraverso il sondaggio o intervenendo in lista. Ogni proposta, anche minima, può aiutare a costruire forme di partecipazione più accessibili, più continuative e più riconosciute.

L'articolo Prendere parola, prendere parte: giovani DHer e la vita di AIUCD proviene da AIUCD.

Graduate Fellows Scan, Model, and Map: New Discoveries from Sermons to Ballet

2026年6月4日 00:47

This spring, six CDH Graduate Fellows arrived with their research in progress, asking six different questions across multiple disciplines ranging from History and Comparative Literature to Music and English. Their findings? Working with data and computational methods rarely unfolded the way they expected, and while oftentimes arduous, the labor uncovered “strange and beautiful” discoveries.

"We are receiving more competitive applications to the Graduate Fellowship than ever before," said Grant Wythoff, who directs CDH graduate student programs. "Some of these emerging scholars bring knowledge of data curation standards and machine learning methods. Others are tuned into the latest debates on AI's political and epistemological impacts. The mix of voices makes for an incredibly exciting group dynamic."

Cecelia Ramsey (French and Italian) came to the CDH with a question about literary afterlives: what makes a book experience a revival many years after its initial release? To study this at scale, she worked with BiblioBase, a database of the nineteenth-century Bibliographie de la France, tracking gaps between editions and reeditions and looking for patterns in a book's reintroduction.

The data was messy—inconsistent titles, variable spellings of authors' names—and rather than cleaning the inconsistencies away, Cecelia explored them as an opportunity to learn more about the nature of reeditions and the format of the Bibliographie itself. "Interacting with the messy data taught me how slippery the very name of a work can be," she reflected.

It was also her first substantial engagement with DH methods, and she described the fellowship's atmosphere as essential. When Grant Wythoff opened the semester by telling the cohort it was normal not to know things—that the DH world is so interdisciplinary that all scholars often feel that way—it changed what was possible. "This introduction made it a space where it's normal to ask questions, to learn, and to just be openly curious," Cecelia said. "What a gift."

family tree

A pedigree chart tracing the lineage of Ignez de Guiné, the matriarch of several prominent Portuguese families.

Amanda Pinheiro (History) has been working with a database developed over the last ten years, containing 115,545 baptismal, notarial, and judicial documents from ten villages in colonial Brazil. These records detail the eighteenth- and nineteenth-century lives of roughly 6,000 individuals who inhabited the south of Brazil and may have migrated through the frontier zone between the Spanish and Portuguese empires. The issue: that number is inflated by duplicate names, with varying spellings and characteristics across documents. Her fellowship project used Splink, a Python library for probabilistic record linkage, to calculate the statistical likelihood that two records refer to the same individual—generating a unique identifier for each person so she could cross-reference this database with her new archival findings.

I realized that automation necessarily requires diligent and continuous manual labor.

Amanda Pinheiro

What surprised her was how much human judgment, or as she describes it, laborious decision-making, the automation required. "I realized that automation necessarily requires diligent and continuous manual labor," Amanda reflected. "The two are interconnected and walk hand-in-hand in the digital realm." With Wouter Haverals’ (Associate Research Scholar, CDH; Perkins Fellow, Humanities Council) guidance, she completed Splink tests and generated an analytical report on the quality of her datasets—one she hopes to publish and add to her metadata in the future.

Amy Weng (English) asked whether a seventeenth-century English preacher's confessional affiliation—Anglican, Nonconformist, or Catholic—leaves a detectable fingerprint in their printed sermons. Her project, Godly and Learned Divines (GoLD), represented each of 809 preachers across 2,877 books as a 1,000-dimensional vector built from scripture citations, named references, topic modeling, and entity types, then trained a Random Forest classifier to predict denomination. The model achieved an F1-score (a machine learning metric used to evaluate the performance of a classification model) of 0.80 for Anglicans, 0.51 for Nonconformists, and 0.12 for Catholics.

Screenshot 2026-06-03 at 12.24.12 PM

Wikidata-Linkable Preachers in EEBO-TCP

Clustered based on the distribution of topics, named entities, and scriptural references in their sermons

The results were striking. Place of education turned out to be the least important feature—far less predictive than the types of sources a preacher reached for. "Bible versions matter more than the proportions of Bible divisions," Amy concluded, "and ancient entities once again outrank medieval and contemporary references. Generally, godly learnedness—patterns of referencing scripture—distinguishes preachers across confessional divides more than overall learnedness."

Amy credited Jacob Murel (Research Software Engineer, Classics) for sustained mentorship on using large language models for orthographic standardization, and Wouter Haverals for introducing her to Wikidata reconciliation.

Pierre Azou (French and Italian) examines the relationship between literature and political violence in his doctoral research and found himself drawn to the “digital sphere” as the space where the questions he studies in published books are being reconfigured. For his fellowship project investigating the link between "manliness" and insecurity in contemporary French public discourse, he turned to two foundational texts in the French debate on masculinity: Élisabeth Badinter's XY, de l'identité masculine (1992) and Éric Zemmour's Le Premier Sexe (2006). These works contain opposing premises, one theorizing a fragile masculinity, the other insisting it is strong but under siege, yet both binding manliness tightly to a language of threat and crisis.

Using keyness analysis and topic modeling in Python allowed Pierre to compare the density of each author's clusters of insecurity-related words (fear, violence, war, crisis, domination). He identified the author's statistically distinctive vocabulary and examined the semantic neighborhoods of shared terms. The most productive approach was contextual: looking at what words appear near a key shared term like virilité in each text. "It turns out the same word lives in completely different semantic environments in Badinter and Zemmour," Pierre noted.

A technical challenge gave him pause early on—preprocessing French text that contained English-language citations required combining stopword lists (words like “a,” “the,” “and” or “un,” “le,” “et”) and filtering bibliographic noise—but his more substantive reflection was on what data cleaning actually does. "It reminded me that cleaning decisions in DH are more than purely technical, as they also shape the findings."

Cleaning decisions in DH are more than purely technical, as they also shape the findings.

Pierre Azou

Nathaniel Gallant (Comparative Literature) studies the relationship between Buddhism and the history of dramatic and poetic theory across Japanese and Tibetan literary traditions. In his daily research, he relies on well-developed digital tools and databases built for pre-modern Japanese sources—resources that reflect years of philological groundwork by scholars who came before him. For Tibetan studies, that infrastructure is still being built. DH projects in the field are scattered across academic, non-profit, and private spheres, with no centralized view of what exists or where the gaps are.

Nathaniel’s fellowship project addressed that directly: he created a database cataloging existing DH projects in Tibetan studies, with visualizations mapping networks of funding sources, text archives, OCR and LLM development projects, and institutional stakeholders. The goal was to understand current patterns in project development and identify potential directions for future text-digitization projects, particularly in the history of Tibetan literature and poetry.

The hours of scanning documents, mental grappling...crystallized into something coherent, beautiful.

Rachel Glodo

SamplePage_1895-96_Spisok_BalletArtistsSP

Each issue of the Yearbook of the Imperial Theaters includes detailed lists (spiski) of creators and artists.

Rachel Glodo (Music) is reconstructing the world of the Imperial Ballet in the Russian Silver Age through eighteen volumes of the Yearbook of the Imperial Theaters (1890–1908)—elaborate annual retrospectives documenting productions, performers, choreographers, musicians, designers, and administrators across St. Petersburg and Moscow. The challenge was getting that data out of the page and into a form that a researcher could query. Rachel used optical text recognition (OTR) to convert nineteenth-century printed Cyrillic into machine-readable text, while Andy Janco (Digital Scholarship Specialist) developed custom Python scripts, based on her project design, to convert images of lists and tables into structured spreadsheets.

What Rachel hadn't anticipated was how much the project would begin with physical, analog labor. "The most challenging part of my project wasn't the implementation of DH methodologies," she said, "but the quotidian task of scanning and saving thousands of images spanning 18 volumes." She described it as "the strange and beautiful juxtaposition of 'distant' and 'close' readings that characterizes DH." And she was surprised by how much the technology itself shifted between her original proposal and the start of the fellowship—she ended up using an entirely different processing strategy than she had planned, with Christine Roughan (Postdoctoral Research Associate, CDH/MARBAS) and Andy as crucial partners in identifying her priorities and methods.

A eureka moment was had when they ran the Python scripts together for the first time. "All the hours of scanning documents, mental grappling, design, and redesign suddenly crystallized into something coherent, beautiful, and—almost miraculously—exactly what I needed," she recalled. "It was a glorious moment."

SamplePage_1896-97_Repertoire

A record of all productions on the Imperial stages, including ballets and operas.

Throughout the semester, the cohort's monthly sessions became as important as the technical work itself. "The regular meetings provided me with more productive time and space to learn about digital tools than scheduling different consultations could have," Amanda said. For Cecelia, the cross-disciplinary exchange was its own kind of finding: "It's exciting to step outside your discipline and be invited into someone else's world while it's still in the making—while they're still experimenting and puzzling through the challenges."

Interested in applying for a Graduate Fellowship? Visit here, or head to the CDH Graduate Program page to see more opportunities for graduate students.

Related posts

Graduate Fellowships

A one-semester studio for workshopping research in progress.

jcasey_just_data_sms_0597.original

Announcing the Fall 2026 CDH Graduate Fellows

2026年6月3日 21:11

We are delighted to announce the recipients of the Fall 2026 CDH Graduate Fellowship. Grad Fellows are mentored by CDH staff to employ computational and data-driven methods in their research. As a cohort, fellows explore tools, methods, and best practices that will benefit them throughout their careers.

Please join us in congratulating this cohort:

Maximilian Diemer (History) is researching the origins of meritocratic practice in eighteenth-century France and Britain, with a focus on professionalisation in the army and royal service.

Aneka Kazlyna (History) examines the transmission and transformation of scientific knowledge between the Islamic world and early modern Europe, with a particular focus on Arabic and Latin astronomical/astrological and mathematical texts.

Shiqi Pan (East Asian Studies) studies the medieval history of the Huai River region in China, exploring it as an internal frontier shaped by environmental, political, and cultural change.

Benjamin Price (Art and Archaeology) studies art, anarchism, and histories of science in late-nineteenth century France.

Sun Shen (Politics) is studying presidential influence in the making of U.S. foreign policy.

Filippo Ugolini (East Asian Studies) examines the discourse of romance in mid-to-late Tang China (8th–9th century) at the intersection of literary analysis, gender studies, and socio-economic history.

Tirzah Anderson (History) is researching Afro-Indigenous worldmaking in Indian Territory and Oklahoma from the 1890s through the 1930s.

We look forward to working with this remarkable cohort and sharing more about their projects and fellowship outcomes as the year unfolds.

Graduate Fellowships

A one-semester studio for workshopping research in progress.

jcasey_just_data_sms_0597.original

讲座预告| 美国威斯康辛大学麦迪逊分校韩瑞亚(Rania Huntington)教授 北美高校《聊斋志异》的跨文化教学

2026年6月4日 00:00

徐惠 2026-06-04 00:00 江苏

6月10日南师大开讲,韩瑞亚教授分享《聊斋志异》海外跨文化教学经验。

转载自“南师国教”

讲座预告

美国威斯康辛大学麦迪逊分校

韩瑞亚(Rania Huntington)教授

北美高校《聊斋志异》的跨文化教学

主题

北美高校《聊斋志异》的跨文化教学

主讲嘉宾

韩瑞亚(Rania Huntington)教授 

美国威斯康辛大学麦迪逊分校 (University of Wisconsin–Madison)

与谈嘉宾

韩    石    南京师范大学

国际文化教育学院副教授

徐正龙     南京师范大学

国际文化教育学院副教授

钱慧真     南京师范大学

国际文化教育学院副教授

杨    娟     南京师范大学

国际文化教育学院副教授

主持人

孙晓苏     南京师范大学

国际文化教育学院副教授

时间

2026年6月10日(星期三) 

下午 3:00

地点

南京师范大学随园校区

200号楼108室

内容提要:

《聊斋志异》篇幅短小而内容新奇多样,适合跨文化语境中的语言与文化教学。本讲座基于北美高校三十多年的教学经验,探讨如何将《聊斋志异》及中国志怪传统融入“初级古代汉语”、“古典文学选读”、 “亚洲神仙鬼怪”、“亚洲推想小说”等课程。讲座将重点分享两个层面的实践策略:一是如何将学者个人的志怪研究方向与更广泛的教学领域有效对接;二是如何指导不同语言、文化、专业背景的国际学生阅读和理解《聊斋志异》及其它志怪小说。本讲座旨在为国际中文教育、比较文学与翻译研究等领域的师生提供可操作的课程设计思路与跨文化教学范例。

主讲嘉宾介绍

韩瑞亚 (Rania Huntington),威斯康辛大学麦迪逊分校亚洲语言文化系中国文学教授,武汉大学文学院兼职教授。博士毕业于美国哈佛大学东亚语言与文明系,曾于南京大学、南开大学进修。研究领域为明清小说,特别关注志怪文学、文学与记忆、文学与地理等。代表作有Alien Kind: Foxes and Late Imperial Chinese Narrative (Harvard University Asia Center, 2004)(中译本《异类:狐狸与中华帝国晚期的叙事》, 中西书局,2019), Ink and Tears: Memory, Mourning, and Writing in the Yu Family(《墨与泪:俞氏家族的记忆、哀悼与书写》, University of Hawaii Press,2021)等。

与谈嘉宾介绍

韩石,南京师范大学国际文化教育学院副教授。主要承担中国古代文学、文化典籍阅读和来华留学生汉语课程等教学工作,专业方向为中国古代文学。

徐正龙,文学博士,南京师范大学国际文化教育学院副教授,从事国际中文教育。曾在美国、印尼等国推广中文及教师培训工作。主编《老外在中国》《问鼎HSK》;参编《中国历史常识》《菲律宾华语课本》等。

钱慧真,南京师范大学国际文化教育学院副教授,硕士生导师。主要研究方向汉语域外传播、语言接触及明清训诂学史研究。主持国家社科基金、教育部社科基金各一项、江苏省社科基金两项。出版专著《惠栋训诂研究》《<荷谷朝天记>校注》等四部,在《古汉语研究》《语言研究》《古籍整理研究学刊》等专业期刊发表论文20余篇。

杨娟,文学博士,南京师范大学国际文化教育学院副教授,曾任阿根廷国会大学孔子学院中方院长。研究方向为国际中文教育、海外华语与文化传承与传播。主持教育部社科项目、省社科项目、厅项目各一项,著有专著1部,发表论文10余篇。

关注我们

图文|国教院

排版|赵梓萌

审核|孙绪敏 孙道功

比特人文

投稿邮箱:dhbase@126.com

扫码关注 获取更多资讯

图片

阅读原文

跳转微信打开

融合语义理解与图谱推理的党史文献模糊指代消解方法

2026年6月3日 12:43

原创 冉凌宇 2026-06-03 12:43 北京

模糊指代消解; 党史文献; 时序知识图谱; 图神经网络; 预训练语言模型

转载请注明“刊载于《数字人文研究》2025年第4期”;参考文献格式:冉凌宇.融合语义理解与图谱推理的党史文献模糊指代消解方法[J].数字人文研究,2025,5(4):84-98.全文PDF已在知网、万方及编辑部网站(http://dhr.ruc.edu.cn)上发表,此处注释及参考文献从略。

融合语义理解与图谱推理的党史文献模糊指代消解方法

冉凌宇

摘要:党史文献因广泛使用化名、代称并蕴含复杂隐性关联,其智能化处理面临严峻挑战。研究提出一种融合多策略语义理解与动态知识图谱推理的模糊指代消解方法,以解决该领域存在的语义鸿沟、时序演变与证据稀疏性三大难题。该方法构建了覆盖万余实体的党史领域词典与化名—真名映射库以注入先验知识;采用领域词典引导的负样本采样策略对预训练语言模型进行微调,增强其对特定表达的语义感知能力;最终在自建的时序知识图谱上,运用时间约束的图神经网络推理算法进行隐性关联挖掘与一致性校验。实验表明,该方法在权威评测指标上综合F1值达到80.6%,显著优于现有基线模型,并能有效发现深层历史关联。研究成果已集成至可视化原型系统,为党史研究提供了可靠的智能化工具。

关键词:模糊指代消解; 党史文献; 时序知识图谱; 图神经网络; 预训练语言模型

作者简介:冉凌宇,重庆邮电大学马克思主义学院讲师,Email:jadecrane@139.com 。

0  引言

党史文献作为记录中国共产党百年奋斗历程的珍贵宝藏,其独特的文本特征——包括高密度的化名使用、错综复杂的代称指向以及隐藏在字里行间的隐性关系网络,使其具有重要的历史价值,但也为当前的整理与研究带来了严峻挑战。习近平总书记所强调的“要用好红色资源,传承好红色基因,把红色江山世世代代传下去”,凸显了党史文献系统性整理与精准解读的极端重要性和紧迫性。虽然指代模糊是中文文献处理中普遍存在的现象,但相较于一般中文文献,党史文献中的模糊指代问题具有显著的领域特殊性与历史复杂性,主要体现在以下几个方面。

首先,化名使用的系统性与政治性。党史人物常因地下工作、政治运动等原因使用多个化名,且这些化名具有明确的时代背景与政治含义,如“伍豪”特指周恩来在大革命时期,其映射关系随时间演变,而非简单的同义词替换。其次,代称的高度语境依赖。如“井冈山部”“中央区”等代称,其指代实体随历史阶段变化,需结合具体时间、地点与组织背景才能准确解读,这与现代文献中相对稳定的命名习惯形成鲜明对比。再次,隐性关联的历史逻辑性。党史文献中的关联常隐晦表述,需通过历史事件时序、组织变迁脉络等外部知识进行推理,这对模型的时序推理与多源知识融合能力提出了更高要求。最后,语言表述的时代性。文献中大量使用特定历史时期的术语、简称与隐喻(如“教条宗派”),其语义已与现代汉语有所脱节,增加了语义理解的难度。以延安大学图书馆的实践为例,为完成早期中共中央机关报的整理,投入了大量人力耗时多年进行辨认、校对与核实,不仅进程缓慢,也难以避免疏漏。高密度代称和隐性关系的识别高度依赖专家经验,难以规模化推广,更制约了深层知识挖掘与利用。因此,如何突破当前人工处理效率的局限,实现党史文献的智能化、精准化处理,已成为一项重要而急迫的学术与工程课题。这不仅对深化党史学习、传承红色基因具有重大意义,也是新时代推进党史研究和文献开发利用的关键基础。

总结而言,党史文献智能处理的核心技术挑战集中于“模糊指代消解”与“隐性关联挖掘”两个关键问题。模糊指代消解旨在识别文本中指向同一实体的多样化表述并将其正确归类,典型如党史文献中人物化名、代称与不同称谓的统一识别问题。隐性关联挖掘则致力于发现文本中未明示但可通过逻辑推理得到的深层关系,如通过行为模式、时空交集等线索推断组织隶属或历史影响等非直接陈述的关联。

解决两个问题技术上必须克服三大难点。首先是语义复杂性,文献语言具有高度时代特定性,这要求模型必须深度融合历史背景与领域知识。其次是时序动态性,人物关系与事件影响力随历史进程动态变化,模型需具备时间感知的推理能力。最后是证据稀疏性,关键线索往往分散在大量孤立记载中,要求系统具备多跳推理和协同分析能力,同时还要应对历史信息本身的不完整性与矛盾之处。这些难题的共同根源在于领域知识的有效表示与注入困难,既要构建高质量的领域词典与实体映射库,又需解决符号化知识与向量化表示之间的语义失配问题,这是当前技术尚未完全突破的关键瓶颈。

针对党史文献中模糊指代与隐性关联挖掘所面临的以上核心挑战,本研究提出一种融合领域知识、预训练语言模型与时序知识图谱推理的多层次解决方案。其核心创新在于针对性设计领域适配的微调策略与时间约束的图谱推理算法,旨在系统性地解决党史文献模糊指代消解所面临的独特挑战。

1 相关研究述评与理论基础

本节将通过系统回顾模糊指代消解的核心技术路径及其在特定领域的应用,辨析现有成果的贡献与不足,明确本研究问题的独特性和解决思路。

1.1 指代消解的技术演进:从规则到深度语义理解

指代消解(Coreference Resolution)的研究经历了显著的方法论演进。早期基于规则的方法(如Hobbs算法)依赖语言学家的句法规则,虽具可解释性但难以适应语言多变性和大规模文本。随着机器学习发展,基于统计学习的方法(如决策树、条件随机场)通过从标注数据中学习规律,提升了处理能力,但严重依赖人工特征工程,且在历史文献等标注稀疏领域泛化能力有限。当前主流是基于深度学习与预训练模型的方法。以BERT、GPT等为代表的预训练语言模型,通过大规模语料训练获取了深层上下文语义表示,在多项通用共指消解任务上取得突破。

然而,这些通用模型在处理党史文献时面临严峻挑战:历史文本中高密度的化名、代称和时代性用语构成了显著的“语义鸿沟”,而通用模型缺乏对领域特定知识的感知能力。近年来,研究前沿进一步向融入外部知识和处理复杂指代现象(如桥接回指、语篇直指)的方向发展。同时,针对汉语特点的研究也受到关注,特别是“零指代”(即无显性语言形式的指代)现象,相关研究提出了基于修辞句法树的标注框架和分类体系,为中文篇章理解提供了新视角。这些进展为本研究设计融合领域知识的深度模型提供了重要参照。

1.2 知识图谱推理:从静态关联到时序动态挖掘

知识图谱推理旨在补全缺失事实。早期表示学习模型如TransE、ComplEx,通过向量空间运算模拟实体关系,但主要处理静态二元关系。对于党史文献中动态演变的关系,这类方法显得不足。

图神经网络(GNN)的最新进展为关系推理提供了新工具,它通过消息传递聚合邻域信息来学习节点表示。时序知识图谱推理是当前热点,旨在建模实体关系随时间的变化。部分研究尝试将时间信息嵌入图结构或GNN的消息传递过程,以捕捉动态演化。然而,现有方法多假设大规模、结构规整的图谱,而党史文献构建的图谱常面临实体稀疏、关系模糊且缺乏精确时间标注的难题,直接应用先进模型效果受限。

1.3 红色文献智能化处理:通用方法与领域局限

党史文献是“红色文献”的核心组成部分,中央及地方机构已开展了大规模的红色文献整理、出版与专题数据库建设工作。当前,红色文献研究正从基础的史料整理迈向深入的“学理化阐释”,这对其智能化、精细化处理提出了迫切需求。如何从海量文献中自动、准确地抽析人物关系、事件脉络与思想关联,成为释放红色文献丰厚价值、赋能学科建设的关键技术瓶颈。然而,针对中共党史文献这一特殊领域,智能化研究尚处于起步阶段。现有研究或偏重通用技术而忽视党史文献在化名、代称、组织沿革等方面的领域特异性;或集中于档案数字化与数据库建设,在深层语义理解与隐性关联推理层面探索不足。尤其缺乏能够同时克服语义鸿沟、时序演变和证据稀疏性三大挑战的端到端解决方案。

综上,当前研究存在如下空白:第一,技术融合不足,缺乏将深度语义理解(特别是适应领域特性的预训练模型微调)与动态时序知识图谱推理进行深度融合的端到端框架;第二,领域适配欠缺,现有共指消解模型未能有效注入和利用党史领域庞大的先验知识(如化名—真名映射、组织沿革),以解决语义鸿沟问题;第三,时序建模薄弱,在知识推理环节,多数方法未能将时间约束作为核心机制嵌入模型,难以应对党史中实体关系的动态演变。因此,本研究提出一个融合领域知识、预训练模型微调及时序图谱推理的多层次解决方案。

2  多策略融合的模糊指代消解模型构建

本研究提出的方案先采用分层递进的架构设计,通过构建党史领域专用词典与规则库为模型注入先验知识,有效应对术语与代称的领域特异性问题。继而利用经大规模语料预训练的语言模型通过领域适配的微调策略,增强对党史文本语义表示与上下文依赖的深层捕捉。最终将识别出的实体与关系置于融入了时间属性的动态知识图谱中,运用基于时序约束的图谱推理算法实现跨片段、跨文档的隐性关系挖掘与一致性校验。

这一框架的核心创新体现在两个方面:一是领域适配的微调策略,通过在预训练模型微调阶段引入党史领域词典引导的负样本构造方法和实体感知的遮蔽机制,使模型在学习过程中不仅依赖通用语义表示,更显式地关注领域内实体、化名及典型表达模式,从而有效缩小通用模型与党史文献之间的语义鸿沟。二是基于时序约束的图谱推理算法,该算法将时间信息作为“一等公民”嵌入至图神经网络的消息传递与表示学习过程中,使实体关系推理能够严格遵循历史事件发生的先后顺序与时效性约束,例如仅允许在特定时间窗口内可能存在的关系才被纳入推理路径,从而克服因时序演变而导致的关联歧义与证据断裂问题,为隐性关系的发现提供更加可靠且可解释的计算框架。

2.1 整体技术框架

本研究的整体技术框架是一个端到端的流水线系统(图1),它从原始党史文献文本的输入开始,经过一系列逐步深化的计算模块,最终输出经过消解的实体指代和挖掘出的隐性历史关联。

图1 多策略融合的模糊指代消解模型技术路线图

整个流程首先进入文本预处理与领域词典匹配模块(图1模块一),这里采用基于规则和词典的方法对原始文本进行初步清洗和结构化,包括句子分割、词汇切分以及词性标注等基础自然语言处理操作,更重要的是利用事先构建的党史领域实体词典和化名—真名映射库进行快速模式匹配。例如当文本中出现“伍豪”“周翔宇”等字符串时,系统能立即将其映射到实体“周恩来”并打上相应标签,这种基于词典的快速匹配不仅为后续深度语义模型提供了强领域先验,也显著降低了模型的计算负担和歧义程度。正如信息检索领域长期验证的高效关键词匹配技术如AC自动机或多模式匹配算法所展现的那样,能够为后续复杂模型提供可靠的预处理基础。

随后经过初步标注的文本进入语义模型精细计算模块(图1模块二),这里采用经过领域适配微调的预训练语言模型如RoBERTa或ERNIE进行深度语义表示和指代消解计算。具体而言,模型会接收带有领域标签的文本序列,通过其多层Transformer结构捕获上下文敏感的词汇表示,并特别关注那些未被词典覆盖的潜在指代项和模糊表述。模型微调过程中采用领域词典引导的负样本采样策略,例如刻意构造化名—真名混淆的样本以增强模型对党史特定表达的判别能力,其核心评分函数可形式化地表示为对候选指代链的似然评估:

其中公式中符号的含义为:表示在给定输入文本的条件下预测指代链的概率分布;W代表线性变换层的权重矩阵;表示预训练模型输出的特殊标记对应的上下文表示向量,该向量捕获了整个输入序列的语义信息;b表示偏置向量。整个模型通过最大化正确指代链的似然概率进行优化,这种设计使得模型既能够利用预训练获得的通用语言理解能力,又能够通过领域微调适应党史文献的特殊语义环境。

最终,经过语义模型处理后的实体和关系被送入时序知识图谱构建与推理模块(图1模块三),在这里系统会依据实体识别结果和关系抽取结果动态构建一个包含时间属性的知识图谱,其中每个事实三元组都被赋予相应的时间戳或时间范围标签。随后基于时序约束的图神经网络推理算法开始工作,该算法通过消息传递机制沿时间轴聚合邻域信息。例如当推断某个历史人物在特定时期可能的关系网络时,算法会自动过滤掉不在该时间窗口内的关联边,从而确保推理结果既符合语义逻辑又满足时序一致性。其图卷积操作可表示为:

其中公式中符号的含义为:表示节点i在第l+1层的特征表示;σ表示非线性激活函数如ReLU;N(i)表示节点i在时序约束下的邻居节点集合;表示节点i与节点j之间的注意力权重,该权重计算融入了时间一致性约束确保只有时间上合理的邻居节点才会被考虑; 表示第l层的可学习权重矩阵; 表示邻居节点j在第l层的特征表示。这种时序感知的图推理机制能够有效克服党史文献中常见的时间证据稀疏和关系动态演化问题,从而为隐性历史关联的发现提供可靠的计算框架。

2.2 领域知识导入

领域知识导入的核心在于构建高质量、高覆盖的党史领域词典与规则库,这是整个系统能够准确识别文献中化名、代称及隐性关系的基石,其构建过程首先依赖于对权威党史资料的系统性梳理,包括但不限于《中国共产党历史大事记》《中共党史人物传记》以及中央文献出版社出版的系列丛书。通过人工精读与专家校验相结合的方式从中提取标准实体名称、历史事件名称、重要机构名称以及地理名称等形成基础实体词典,每个实体都赋予唯一标识符并关联其属性信息如出生年份、职务变动、重要活动时间等。例如从《中共党史人物别名录》中系统收录“毛润之”映射至“毛泽东”“李德胜”映射至“毛泽东”等一系列化名与真名的对应关系。同时根据《中国共产党组织史资料》建立组织机构的历史名称演变链条,如“中共中央北方局”与“中共中央华北局”在不同时期的隶属关系与职能变迁。

在化名—真名映射库的构建中不仅需要建立静态对应表,更要充分考虑历史语境下化名使用的时空特性,因此每个映射关系都会附加时间有效期字段与上下文使用场景注释。例如“伍豪”作为周恩来的化名主要在大革命时期至延安前期使用,而“胡公”则是其在上海地下工作时期的常用代称,这种时空约束的映射关系能极大提升后续推理模块的准确性。映射库的构建采用半自动化流程,首先基于已有史料建立初步映射表,然后通过算法在大量党史文献中进行匹配验证与冲突检测,当发现同一化名在不同时期可能指向不同人物时,自动标记需要人工介入校验的冲突项,其验证过程可以形式化表示为:

其中Verify表示验证函数,其输出为1或0,分别代表验证通过或不通过;alias表示待验证的化名字符串;entity表示待验证的标准实体名称;表示当前文献的时间上下文,以年份数值形式表示;K表示化名—真名知识库,是一个包含多条映射记录的数据集合;∃是存在量词符号,表示至少存在一条满足条件的记录;分别表示某条映射记录中化名使用的起始时间和结束时间,共同定义了该化名的有效使用时段。整个公式表示只有当知识库中存在一条映射记录,且当前时间上下文t处于该记录定义的有效时间范围内时,验证才会通过。

针对党史文献中频繁出现的典型指代模式还需构建一套多层级的匹配规则库,这些规则不仅包括简单的字符串匹配,如“陕北”代指“陕甘宁边区”,更包含基于上下文的推断规则,如当文本中出现“红一方面军”与“中央红军”且在1935年左右的语境中应视为同一实体。规则库采用声明式语法描述便于维护与扩展,每条规则由触发模式、约束条件与映射动作三部分组成,例如一条典型的位置代称规则可表示为:

PATTERN: ["陕北", "陕甘宁"]

CONSTRAINT: year >= 1937 AND year <= 1947

ACTION: MAP_TO("陕甘宁边区")

在这段规则代码中:PATTERN 是模式匹配关键词,后面跟着的列表["陕北", "陕甘宁"]表示需要匹配的文本模式,即当文本中出现这两个词中的任何一个时可能触发此规则。CONSTRAINT 是约束条件关键词,后面的逻辑表达式 year >= 1937 AND year <= 1947 表示此规则仅在文献时间处于1937—1947年间才会被激活,其中year是一个系统变量,表示从文献元数据或内容中提取的时间信息。ACTION 是动作关键词,MAP_TO("陕甘宁边区")表示当模式匹配且约束条件满足时,系统将把匹配到的文本映射到标准实体“陕甘宁边区”。

这套规则库与词典共同构成领域知识的核心载体,通过将其嵌入到预处理与语义计算模块中,系统能够显著提升对党史文献中复杂指代的识别精度,为后续深度学习模型提供强领域先验的同时也增强了整个系统的可解释性,这正是历史文献处理中不可或缺的可靠性保障。

2.3 基于预训练模型的语义消解模块

基于预训练模型的语义消解模块首先需要解决如何将非结构化的党史文献文本转化为模型可处理的规范化输入格式,这里的数据预处理流程在继承前序模块输出的领域词典标注基础上进一步深化。具体而言,每个文本片段都会被转换为一个带有丰富标注信息的序列结构,其中包括原始词汇、词性标注、实体类型标注,以及从文献元数据中提取的时间信息标注。特别重要的是对时间信息的处理,我们采用统一的时间标准化方法,即将所有日期表达转换为标准时间戳格式,同时为每个文档片段自动生成一个时间上下文向量:

其中表示时间上下文向量,表示该文献片段描述事件的可能最早发生时间,表示该文献片段描述事件的可能最晚发生时间,方括号表示向量封装,即这是一个包含两个时间元素的二维向量。在标注方案设计上采用BIO标注体系对实体提及进行标记,但同时扩展了时间维度的标注信息,每个实体提及不仅标注其类型,还标注其时间属性。例如“伍豪(1932)”表示这个提及在1932年上下文中出现,这样的设计使得模型能够显式地学习到指代现象随时间演变的规律。

在模型选型方面我们选择RoBERTa作为基础预训练模型而非原始BERT,这主要因为通过改进训练策略如移除Next Sentence Prediction任务,采用更大批次训练和更长时间训练等方式,RoBERTa在多项自然语言理解任务上展现出的性能优势。更重要的是,RoBERTa的动态掩码机制使得模型在多次训练周期中看到不同掩码模式的同一文本,这特别适合党史文献这种训练数据相对稀缺的场景,能够有效提升模型的泛化能力。此外,RoBERTa在大规模中文语料上的预训练效果已经得到多项研究的验证。

针对党史文献指代消解的特殊需求,我们设计了一套领域自适应的微调策略,其核心是一个多任务学习框架同时优化指代链预测和时序一致性验证两个相关任务。在损失函数设计上我们采用加权多任务损失函数:

其中 是指代消解的主损失函数,采用标准的交叉熵损失;是时序一致性验证的辅助损失函数,和 采用对比损失;为平衡两个任务重要性的超参数。指代消解损失函数具体定义为:

这里N表示训练样本数量;表示候选指代链类别数;是指示函数当样本i属于类别j时为1,否则为0; 是模型预测样本i属于类别j的概率。

在正负样本构造策略上,我们采用基于领域词典的引导式采样方法,正样本包括明确的历史人物化名对应关系,如“伍豪—周恩来”;负样本则包括刻意构造的时序冲突样本,如在1920年文本中出现“总书记”指代(此时中国共产党尚未成立);以及语义相似但实际不同的混淆样本,如“中央局”与“中央分局”这类容易混淆的组织机构名称。这种样本构造方式确保模型不仅学习语义相似性,还要学习时序约束和细粒度语义差异,从而全面提升在党史文献上的指代消解精度和鲁棒性。

2.4 时序知识图谱构建与推理模块

时序知识图谱构建与推理模块的核心在于建立一个能够捕捉历史动态演变的知识表示与推理框架,其图谱模式设计采用扩展的时间四元组结构而非传统的三元组,即每个事实表示为(头实体,关系,尾实体,时间戳),其中时间戳不仅可以是一个具体的时间点,也可以是一个时间区间。这种设计使得图谱能够准确记录诸如“毛泽东1935—1943年担任中共中央军委主席”这样的时效性事实。在存储方案上,我们采用时态图数据库进行存储,每个实体和关系都带有有效时间属性,同时建立专门的时间索引以支持高效的时间范围查询,这种存储方式使得系统能够快速检索特定时间段内的子图结构,为后续的时序推理提供数据基础。

隐性关联推理算法采用我们专门设计的时序图神经网络模型(T-GNN),该模型在传统图神经网络的基础上引入了时间感知的消息传递机制。其核心思想是在信息聚合过程中加入时间约束,确保只有时间上合理的邻居节点才能参与信息传递。具体而言,对于每个实体节点i在时间t的表示更新过程可以表示为:

其中 表示实体i在第l层神经网络、时间下的特征表示向量,l代表神经网络的层数深度;α代表非线性激活函数,如ReLU或Sigmoid,用于引入模型的非线性表达能力; 是第l层中分别用于更新节点自身状态和聚合邻居状态的可学习权重矩阵; 表示实体i的邻居节点集合; 是一个计算得到的时间感知注意力权重,用于衡量在时间下邻居节点j对当前节点i的重要性,其值介于0和1之间; 则是邻居节点j在第l层、时间t的特征表示。

是时间感知的注意力权重,计算方式为:

这里是代表时间上下文的时间特征向量;是将时间和节点特征映射到同一向量空间的可学习参数矩阵;a是注意力机制中的参数向量,用于计算注意力能量;∥表示向量拼接操作,用于将不同来源的特征信息组合在一起;Leaky Re LU是一种改进的激活函数,允许较小的负值梯度通过,有助于缓解梯度消失问题;k是求和索引,代表节点i的所有邻居节点。

在链接预测任务中我们采用时间约束的评分函数,对于候选三元组(h,r,

其存在概率得分为:

其中和  分别是在时间τ下的头实体、尾实体和关系的表示向量。这些向量通过T-GNN模型在特定时间片上的前向传播计算得到。 是一个与特定关系r相关的可学习变换矩阵; 表示向量的L2范数的平方,用于衡量向量之间的欧几里得距离。整个推理过程的伪代码实现如下:

这个算法首先从时序知识图谱中提取时间窗口内的子图确保覆盖相关时间上下文,然后通过T-GNN模型计算所有实体在时间下的表示,最后对每个候选实体计算时间约束的链接预测分数并返回最可能的结果。这种方法不仅能够进行准确的时序链接预测,还能发现诸如“某人在特定时期与哪些组织存在潜在关联”这类隐性关系,为党史研究提供深层次的洞察力支撑。

3 实验分析、案例研究与系统应用

3.1 实验设置与基线模型

为系统评估本研究提出方法的有效性,我们构建了一个专门针对党史文献模糊指代消解任务的数据集,该数据集源自中央文献出版社出版的《中国共产党历史》第一卷和第二卷、《毛泽东年谱》以及从中央档案馆精选的100份1949年前党内文件。这些文献经过重庆邮电大学马克思主义学院三位党史专家历时六个月的精细标注,不仅标注了文本中所有实体提及及其共指链关系,还额外标注了每个提及的时空上下文信息以及难以通过表面字符串匹配发现的隐性关联。最终数据集包含12,857个文档片段、45,632个实体提及和9,741条共指链,其中化名与代称的复杂案例占比达到37.5%,充分反映了党史文献处理的特殊挑战性。

数据集的划分遵循机器学习标准实践,按7:1.5:1.5的比例随机划分为训练集、验证集和测试集,并确保各集合中文档类型(著作、年谱、文件)的比例基本一致,以避免分布偏差。

在标注质量把控方面,我们实施了严格的流程:(1)规范化:编撰了详细的《党史文献实体与共指标注指南》,明确化名、代称、隐性关联的标注标准与边界案例;(2)迭代与培训:标注过程采用多轮迭代,每轮后组织专家对争议案例进行讨论并更新指南,对标注员进行统一培训;(3)交叉校验:每份文献最终由一位专家标注、另一位专家校验,并以计算随机子样本的标注者间信度达到0.85以上为标准来量化评估标注一致性。

针对专家标注不一致的情况,处理流程如下:首先由两位标注专家进行讨论协商;若无法达成一致,则提交至由第三位资深党史研究员担任仲裁专家,依据史料和指南做出最终裁定,并将此案例作为典型补充至标注指南中。

关于数据可用性,由于所涉党史文献的权威性与敏感性,原始全文语料及完整标注数据集暂不适用于公开开源,以恪守文献管理规定。但为促进学术交流与技术复现,我们计划在论文发表后,在严格遵守数据安全与隐私规范的前提下,于项目主页(网址待定)提供经脱敏处理的样例数据集、完整的标注指南及实体词典,供学界同行参考。研究者亦可通过正式学术合作途径,联系作者申请受限访问部分数据。

在评估指标选择上,我们采用共指消解研究社区广泛认可的MUC、B-CUBED和CEAF三指标综合评价体系。其中MUC指标最早由MUC-6会议提出,并由Luo(2005)系统分析了其通过计算共指链间最小链接操作数来评估mention-pair一致性的原理;B³指标由Amigó等人提出,其核心思想是基于每个项(item)的精度和召回率进行平均,该指标后被引入共指消解领域,用于评估提及(mention)层次的链接一致性;CEAF指标也由Luo提出,该指标通过寻找系统输出的实体簇与参考标注的实体簇之间的最优对齐方式(constrained entity alignment),并计算其F1值来评估性能。这三个指标从不同角度衡量共指消解性能且互为补充,我们报告每个指标的F1值并以三个F1值的平均值作为总体性能评价标准,这种综合评估方式能够全面反映模型在不同类型指代错误上的表现。

为进行公平且全面的对比实验,我们选择了三类具有代表性的基线模型:第一类是传统规则与统计方法的代表,包括Stanford CoreNLP系统提供的共指消解模块和基于特征工程的Berkeley Coreference System。第二类是深度学习方法代表,包括在CoNLL-2012共享任务上表现优异的端到端神经网络模型,以及基于标准BERT和RoBERTa的微调模型。第三类是专门针对历史文献设计的定制化方法,包括基于时序约束的规则系统和结合领域词典的统计模型。所有基线模型均使用相同的训练、验证和测试数据集,且超参数均经过网格搜索优化至最佳性能,确保对比实验的公平性和结果的可信度。

3.2 结果与分析

在主实验对比部分,我们通过系统性的量化评估证明了本文提出的多策略融合模型在党史文献模糊指代消解任务上的显著优势。如表1所示,我们的模型在MUC、B-CUBED和CEAF三个权威评测指标上全面超越了所有基线模型,其中在综合F1值上达到了82.3%的最高性能,相比最佳的基线模型RoBERTa+CRF提升了7.2个百分点,这充分证明了融合领域知识、深度学习与时序推理的整体方案的有效性。

表1 各模型在测试集上的性能对(%)

值得注意的是,本研究的模型在召回率指标上的提升尤为明显,这在党史文献处理中具有重要意义,因为发现所有潜在的指代关系往往比精确识别部分关系更具挑战性也更有价值。正如普拉丹(Pradhan)等人在探讨稳健语言分析时所指出的,在历史文献分析中高召回率通常意味着系统能够捕获更多有价值的隐性关联。

在消融实验分析中,我们通过控制变量方法逐一验证了三个核心模块的贡献度,具体设置了四个对比实验配置:完整模型、移除领域词典模块、移除预训练模型微调模块,以及移除图谱推理模块。实验结果清楚地表明每个模块都对最终性能有着不可或缺的贡献。

表2 消融实验结果(平均F1值%)

移除领域词典模块导致性能下降最为显著达到7.0个百分点,这突显了领域知识在党史文献处理中的基础性作用,特别是在处理高度领域化的化名和代称时,缺乏先验知识会导致模型产生大量误判。移除预训练模型微调策略使性能下降3.6个百分点,这表明尽管通用预训练模型提供了强大的语义表示基础,但缺乏领域适配仍然限制了其在党史文献上的表现。移除图谱推理模块带来3.1个百分点的性能下降,证明了时序推理在发现隐性关联方面的重要价值,这一发现与Luo在共指消解评估指标上的工作一致,并已被广泛引用,其核心思想表明,引入适当的约束(如时空约束)能够显著提升历史文献分析的准确性。

这些消融实验结果有力地证明了我们提出的多策略融合方案的必要性和有效性,每个模块都解决了党史文献模糊指代消解中的特定挑战,它们的有机组合才使得系统能够全面应对语义复杂性、时序动态性和证据稀疏性等多重困难,为党史文献的智能化处理提供了一个完整而高效的解决方案。

3.3 典型案例研究

一个极具代表性的案例来自对1931年《红旗周报》第24期一篇关于苏区工作报告的文本分析(表3),其中出现了“朱毛红军”“井冈山部”“中央区”等多个代称,以及“特委”“前委”等组织机构缩写。系统首先通过文本预处理与领域词典匹配模块识别出“井冈山”是一个地理位置实体,并将其与“井冈山革命根据地”这一标准实体链接,同时通过化名—真名映射库将“朱毛”分解并映射到“朱德”和“毛泽东”两个实体。随后语义模型精细计算模块基于上下文分析,发现“井冈山部”与“朱毛红军”存在共指关系,其置信度得分达到0.92。这是因为模型在微调过程中学习到“部”在军事文献中常作为“部队”的缩写,而“朱毛红军”正是井冈山时期对工农红军第四军的习惯称呼。最后时序知识图谱推理模块介入,根据文献时间1931年查询知识图谱,发现此时毛泽东同志已离开井冈山前往赣南闽西地区开辟中央苏区,因此系统推断文中“中央区”极可能指代新开辟的中央革命根据地而非传统的井冈山地区,这一推断最终通过图谱路径查询得到验证:在1931年的子图中存在“毛泽东—任职于—中央苏区”和“中央苏区—别名—中央区”两条关联边。

另一个典型案例涉及对1942年延安整风运动期间一份党内学习文件中“教条宗派”与“经验宗派”指代对象的隐性关系挖掘。系统初始分析仅能识别这两个术语为抽象概念实体,无法直接关联到具体人物。语义消解模块通过分析上下文,发现该文件多次提及“莫斯科回来的同志”和“山沟里的马克思主义”等短语,结合领域词典将其分别映射到“留苏派”和“本土派”两个群体。时序图谱推理模块随后在1942年的子图上执行多跳查询,首先通过“王明—属于—留苏派”和“毛泽东—倡导—山沟里的马克思主义”等关系找到候选人物集合,然后计算这些人物与“教条主义”“经验主义”等概念在历史文献中的共现频率及其在组织网络中的中心度指标。最终系统发现“教条宗派”与王明、博古等留苏领导人存在强关联,其关联置信度达0.87,而“经验宗派”则与周恩来、彭德怀等具有丰富实践经验的本土领导人关联密切置信度达0.79,这一发现与金冲及在《二十世纪中国史纲》中对该历史时期的分析结论高度一致。

表3 典型案例分析过程

这两个案例充分展示了系统如何通过多模块协同工作,逐步从表面文本深入到隐性关联挖掘,其推理过程不仅依赖于语义理解,更紧密结合了历史时空背景与组织网络关系,最终得出具有高度可信度的结论,为党史研究提供了传统人工阅读难以发现的深层洞察。

3.4 原型系统应用

基于前述多策略融合模型的研究成果,本研究开发了一个面向党史研究人员的可视化原型系统,该系统旨在将复杂的算法过程封装为直观易用的交互工具,真正实现人工智能辅助历史研究的落地应用。该系统核心界面采用三栏式设计,左侧为文献上传与预处理区域,支持用户批量导入TXT或PDF格式的党史文献,系统会自动解析文本并调用领域词典匹配模块完成初步的实体标注。中间主体部分为智能阅读界面,采用色彩编码技术对文本中的不同实体类型进行高亮显示。例如人物实体用蓝色、组织机构用绿色、地理位置用橙色,而化名与代称则用特殊的闪烁边框标注以引起研究者注意,当用户将鼠标悬停在任一标注实体上时,系统会实时弹出浮动窗口显示该实体的标准化名称、生平简介以及在当前文献中的所有出现位置,这种设计将显著降低研究人员交叉核验不同文献中同一实体的时间成本。

系统右侧为动态知识图谱可视化面板,这是整个原型系统的创新亮点,它能够实时展示从文本中抽取并经过推理增强的时序知识图谱,研究者可以通过顶部的时间轴滑动条自由选择特定历史时期,图谱会自动演变为该时间段的子图结构,清晰呈现人物、事件、地点之间的关联关系。例如当研究者将时间轴调整至1935年时,图谱会突出显示长征途中关键会议的组织关系,而拖动到1945年则自动转换为七大后的中央领导机构网络,这种时序动态可视化功能使得传统静态图谱难以展现的历史演变过程变得一目了然,正如贝克(Beck)等人在对动态图可视化的综述中所指出的,时变网络的交互式探索能够极大增强用户对复杂系统演进规律的理解。

该系统还提供一键生成智能分析报告的功能,能够自动汇总文献中的核心实体、关键关系及其时间分布,并标识出需要人工重点核验的潜在矛盾或不确定推断,这种设计既发挥了人工智能高效处理大规模数据的优势,又充分尊重了领域专家在最终判断中的主导地位,完美体现了人机协同的先进理念。具体的系统原型如图2所示。

图2 党史文献智能分析系统原型界面示意图

4  结论与展望

针对党史文献中因高密度化名、代称和隐性关联带来的模糊指代消解难题,本研究所提出的融合领域知识、预训练语言模型与时序图谱推理的多层次技术框架,经实验验证有效。在权威评测指标上,该方法的综合F1值达到80.6%,显著优于现有基线模型,证实了其在处理党史文献语义复杂性、时序动态性与证据稀疏性方面的优越性能。

从理论层面看,这项工作的核心价值在于构建了一套“领域知识深度耦合”的文本智能处理新范式。这一范式推动了马克思主义理论、历史学与信息科学的深度融通,将历史考据的内在逻辑转化为可计算模型的外在约束,为数字人文领域贡献了知识驱动型人工智能的一个典型案例。所采用的领域词典引导的预训练模型微调方法,为破解专业领域普遍存在的“语义鸿沟”问题提供了可迁移的技术路径,探索了符号知识与统计语义模型协同增强可解释性的新可能。进一步地,所设计的时间约束图谱推理算法,为历史这类动态演进系统的知识建模,提供了一个具备严格时序感知能力的通用计算框架,对时序知识表示与推理的基础研究具有启发意义。

在实践应用层面,该研究成果展现出多方面的拓展潜力。其可直接应用在于赋能红色文献的数字化与智能化传承,为构建下一代核心档案库提供从“数字储存”升级为“智能解读”的关键技术支持。在此基础上,能够深度支撑智慧党建与新型学习教育平台的构建,实现关联查询、智能问答与脉络溯源等深度知识服务。该框架本身具备良好的可迁移性,可扩展至军史、地方志、古籍整理等具有类似文本特征的专门领域,并为未来融合多模态史料分析奠定了基础。从更宏观的视角看,通过对海量文献中隐性关联的规模化挖掘,该方法有望为探究长时段、结构性的历史演变规律提供数据驱动的分析工具,开辟史学研究的新路径。

尽管取得了上述进展,当前研究仍存在若干可改进之处,例如对极罕见代称的处理能力、对长文档的全局连贯性建模尚有提升空间。未来的工作将着眼于引入主动学习机制以更有效地捕捉长尾案例,设计跨文档注意力模型以加强长距离依赖建模,并致力于拓展面向多源异构数据的融合推理能力,以期推动历史文献智能分析系统向更深、更广的维度演进。

图片

排版:樊军君

初审:徐碧姗

复审:段婧怡

终审:夏翠娟

阅读原文

跳转微信打开

A dataset of geographic entities and relationships from Song Dynasty texts on Lin'an

2026年6月1日 18:00

Sci Data. 2026 May 30. doi: 10.1038/s41597-026-07527-2. Online ahead of print.

ABSTRACT

The automatic extraction of geographical entities and spatial relationships from historical texts is a fundamental task for Named Entity Recognition (NER) and relation extraction (RE), with important implications for historical geography and digital humanities. Classical Chinese documents describing ancient cities pose particular challenges due to archaic language, implicit spatial expressions, and complex entity hierarchies. In this study, we present a manually annotated dataset designed for joint geographical entity and spatial relationship extraction from texts related to Lin'an, the capital of the Southern Song Dynasty. The dataset consists of 18 in-domain and 1 out-of-distribution historical documents comprising approximately one million Chinese characters, annotated with 24 categories of geographical entities and 34 types of spatial relationships. This dataset provides a valuable resource for advancing NER and spatial relation extraction in historical texts and supports future research in historical Geographic Information Systems (GIS), cultural geography, and digital heritage reconstruction.

PMID:42225712 | DOI:10.1038/s41597-026-07527-2

Interview Series: In Conversation with BiblioTech Hackathon Participants

作者Sam Goven
2026年6月2日 17:36

The following interview was conducted by Sam Goven, a master’s student in Journalism at KU Leuven, with Luisa Ripoll-Alberola, team leader of the BiblioTech Hackathon project Captacats. Luisa is a PhD candidate at Leipzig University working on the Horizon Europe funded MECANO project. Luisa’s team, Captacats, worked with the travelogues collection. You can learn more about the team’s work by having a look at their project poster in the BiblioTech Zenodo community and by visiting their project website.

The BiblioTech Hackathon is a 10-day event organized by KU Leuven Libraries and the Faculty of Arts. Students, researchers, and staff members of KU Leuven worked in multidisciplinary teams with digitized collections from KU Leuven Libraries. The theme of the 2026 edition was travel, which was reflected in the selected datasets: historical postcards and historical travelogues. More information about the hackathon and its results can be found on the BiblioTech 2026 website.

Team_Captacats
Team captacats with their project poster during the closing event of the BiblioTech Hackathon.

Congratulations again on your team winning the prize for most original project! To start, could you tell us a bit about your background, what first interested you in the hackathon, and whether you had participated in one before?

I’m currently a PhD student in Digital Humanities, working on the MECANO project. I had never participated in a hackathon before, but I knew that I wanted to take part in one at some point. There’s a very large Digital Humanities hackathon in Helsinki every year, with five or six different datasets, but participating there can be quite expensive.

While I was doing a research stay here in Leuven, I learned about the BiblioTech Hackathon. It really felt like the stars were aligning, because it was the perfect situation. As I mentioned, I was already thinking about joining a hackathon, and having the opportunity not only to participate but also to be a team leader was exactly what I was looking for. It allowed me to take part in a Digital Humanities activity in a more informal setting, which I really liked.

Could you describe your project and your output in a nutshell?

We created a prototype web visualization called ShipAdvisor, which is loosely inspired by modern platforms like TripAdvisor, but focused on historical Mediterranean travel routes. Using travelogues from the eighteenth and nineteenth centuries, the tool allows users to navigate different routes and see how travelers at the time rated places and journeys.

Through the visualization, users can explore which routes were most popular and how perceptions of safety and danger varied across regions. These perceptions were shaped not only by environmental factors such as weather, but also by historical phenomena like Mediterranean piracy. In terms of design and approach, we drew inspiration from digital humanities platforms such as ArcGIS StoryMaps and Itiner‑e.

You mentioned that you were the team leader in your group. What did that role involve, and was it in line with what you expected? Did you find it difficult to lead the team throughout the project?

I have to say it was actually super easy. I was very, very lucky with my team, they were all extremely motivated. Supporting them felt very natural and light. We had quite a lot of meetings during the process, but it never felt forced; everything just happened quite organically.

As a team leader, I didn’t want to take up too much space. I really wanted the group to feel horizontal and collaborative. However, in the first few days, when people were still a bit shy, I think it helped for the team leader to propose ideas, bring different ideas together, and guide things slightly in that sense. Beyond that, my role was more about offering support, often acting as a bridge between the participants and the experts. Before reaching out to the experts, I was there to help where I could.

Overall, it was, as I said, very easy, and it never felt like an artificial hierarchy or like I was in a superior position. It really felt like teamwork.

At the ‘Meet the Data, Meet the People’ event, you were introduced to the data for the first time. How did the brainstorming process go?

At the beginning, we had four or five main ideas. Our approach was to take some time after the first day to reflect on them individually, and then meet again the following Monday to make a decision. During that meeting, we decided to go with the idea of ShipAdvisor, mainly because it allowed us to integrate many different elements.

For example, we could look at which routes were more affected by piracy, which was a particular interest for some of the team members, while others wanted to work with illustrations. The concept really allowed for different approaches to come together within the same interface.

At first, it can feel a bit overwhelming, you think, I need to produce something, but I’m not yet sure what that will be. But because everyone in the team was so motivated, we ended up arriving at a solid idea quite naturally.

What kind of audience did you have in mind when working on your project and the website? Who should be able to use it?

We mainly had the general public in mind. We didn’t want the website to require any specific background knowledge, whether technical or academic. The idea was that anyone could use it, people who are simply curious and want to explore the corpus in a different way.

Did you run into any problems during the hackathon, and how did you tackle them?

File coordination was probably the trickiest part. At the beginning, we planned to use all the infrastructure the library was offering, such as the computing cluster. In the end, though, we didn’t really use it. One reason was that the team had different levels of technical expertise, and for some people the computing cluster felt like too much to handle. As a result, everyone ended up working in their own way and sharing files through the Teams group instead.

That approach worked, but it wasn’t always ideal. At times it felt a bit overwhelming to navigate, because we had many documents and different versions circulating. Sometimes people were working in parallel, and you had to wait for the latest version from a teammate before you could continue your own work. Our file‑sharing setup certainly wasn’t the most structured solution, but in the end it worked for us.

You mentioned that this was the first hackathon you participated in. Do you feel you picked up any new skills along the way, and how might you use them in future research?

As a PhD student in Digital Humanities, I mainly work with text analysis. My thesis focuses on the reception of ancient authors in academic prose and academic discourse, so my work is very text‑based. Before this hackathon, I had never really worked with geographical data.

That made this project especially interesting for me, because in my own research I don’t often have the opportunity to work with spatial data. The hackathon gave me the chance to explore that a bit, experiment with different tools, and see how geographical data could be integrated into a digital humanities project.

What kind of advice would you give to someone who might be hesitant to participate in their first hackathon?

I think one of the biggest insecurities people often have is feeling that they don’t have enough technical skills to participate. What I would say is that the support provided by the library and the pool of experts was truly incredible, you were never really on your own. You were always supported, both by the experts and by your teammates.

People with less technical experience found other important roles within the team. That could be doing more close reading, contributing to the final analysis, or working on the design of the poster. I would definitely encourage anyone who feels insecure about their technical background to take part. First of all, you learn a lot. Second, as I’ve said, you’re never alone, you’re very well supported by both the experts and the team. And finally, even if you don’t feel fully comfortable at first, you will definitely find meaningful ways to contribute to the group.

And what kind of advice would you give to a future team leader of a hackathon team?

I would say: don’t stress too much. I remember feeling quite insecure at times about our final outcome, but in the end, whatever you produce is going to be fine. In reality, the hackathon is meant to be fun, and not a competition.

What really matters is not the end product, but the process: working together, learning new things, and enjoying the experience. That’s what makes it valuable.

한국어 역사 말뭉치 DB, ‘ᄎᆞ자쎠’

作者Baro
2026年6月3日 01:07

https://find.xn--gt1b.xyz/search

한국어 역사 말뭉치 DB, ‘ᄎᆞ자쎠’에 오신 것을 환영합니다!

중세 한국어와 근대 한국어, 석독구결을 포함한 텍스트 약 1,000개를 망라한 말뭉치를 검색해보세요.

게시물 한국어 역사 말뭉치 DB, ‘ᄎᆞ자쎠’KADH / 한국디지털인문학협의회에 처음 등장했습니다.

[특강] 중국어교육학회 하계방학 AI 역량 집중 프로그램 2026 @한국중국어교육학회 2026.06.27.-28.

作者Baro
2026年6月2日 19:40

한국중국어교육학회 회원 여러분께

안녕하십니까, 회원 여러분.

항상 한국중국어교육학회에 보내주시는 관심과 성원에 감사드리며, 회원 여러분의 건강과 학문적 발전을 진심으로 기원합니다. 한국중국어교육학회에서는 중국어 교수자의 전문성 강화와 AI 활용 역량 제고를 지원하고자 『2026년 여름방학 특강』을 마련하였습니다.

[특강 개요]

■ 프로그램 안내 링크 : 중국어교육학회 하계방학 AI 역량 집중 프로그램 2026

https://aicourse.up.railway.app/index.html​

■ 프로그램명 : AI 역량 집중 프로그램 – 중국어 교육의 미래를 설계하다

■ 일시 : 2026년 6월 27일(토)~28일(일), 오전 10시~오후 5시

■ 대상 : 한국중국어교육학회 2026 연회비 납부한 회원 혹은 평생회원(선착순 30명)

■ 회원 가입 및 연회비 납부 안내

회원 가입 : https://kacle.jams.or.kr/co/main/jmMain.kci

연회비 : 3만 원

평생회비 : 30만 원(평생회비 납부 회원은 연회비 납부 면제)

입금 계좌 : 토스뱅크 1001-6993-9125 (예금주: 구현아)

※ 입금 시 입금자명을 “홍길동 연회비” 형식으로 기재해 주시기 바랍니다.

■ 수업 방식 : Zoom을 활용한 온라인 실시간 강의

※ 접속 주소는 참가 신청자에 한해 추후 안내 예정

이번 특강은 한국외국어대학교 박정원 교수님을 모시고 「AI 역량 집중 프로그램 – 중국어 교육의 미래를 설계하다」를 주제로 진행됩니다. 2026년 6월 27일(토)과 28일(일) 양일간 운영되며, 급변하는 교육 환경 속에서 중국어 교육의 새로운 방향을 모색하고 교수 역량을 한층 높일 수 있는 뜻깊은 시간이 될 것으로 기대합니다.

특강은 Zoom을 활용한 온라인 방식으로 진행되오니, 여름방학 기간 중에도 회원 여러분의 많은 관심과 적극적인 참여를 부탁드립니다.

[신청 안내]

■ 신청 기간 : 2026년 6월 4일(목) ~ 10일(수)

■ 신청 방법 : 첨부된 참가신청서를 작성하여 chinedu@hanmail.net으로 제출

  • 바로: 참가를 원하시는 분은 참가 신청서 양식을 위 메일로 직접 요청해주시기 바랍니다.

기타 문의사항은 chinedu@hanmail.net으로 연락해 주시면 성심껏 안내해 드리겠습니다.

회원 여러분의 건강과 학문적 성취를 기원하며, 이번 특강에서 뜻깊은 배움과 교류의 시간을 함께 나누기를 기대합니다.

감사합니다.

한국중국어교육학회 드림

게시물 [특강] 중국어교육학회 하계방학 AI 역량 집중 프로그램 2026 @한국중국어교육학회 2026.06.27.-28.KADH / 한국디지털인문학협의회에 처음 등장했습니다.

Nicht nur Text, nicht nur Daten … aber was dann? – ‚Theoretisieren‘ durch Praktiken in der digitalen Editorik, der Digital History und den Computational Literary Studies. Ein Bericht.

2026年6月2日 19:31

Autor*innen (alph.)

Originalblogpost: https://dhtheorien.hypotheses.org/2442

Einleitung

Am 25. Februar 2026 fand im Rahmen der jährlichen Digital-Humanities-Konferenz (DHd) ein Panel der AG Digital Humanities Theorie statt, das eine zentrale Frage in den Mittelpunkt stellte: Wie werden digitale Forschungspraktiken selbst zum Ausgangspunkt theoretischer Reflexion? Die Diskussion, an der sich Laura Untner (Freie Universität Berlin) und Alexa Lucke (Universität Siegen) aus den Computational Literary Studies (CLS), Silke Schwandt (Universität Bielefeld) und Christian Wachter (Universität Münster) aus der Digital History und Philipp Hegel (Akademie der Wissenschaften und der Literatur Mainz) aus der Digitalen Editorik beteiligten, zeigte, dass Theoriearbeit in den Digital Humanities (DH) nicht nur als externes Gerüst verstanden werden kann, sondern vielmehr auch aus der Praxis selbst erwächst. Im Folgenden werden einige Aspekte der Debatte zusammengefasst und es wird dazu eingeladen, über die Verflechtung von Theorie und Praxis sowie Theorie und Empirie in den DH wieder nachzudenken.

Was ist Theorie in den Digital Humanities?

In den Geisteswissenschaften war Theorie lange Zeit eng mit textbasierter Argumentation verbunden – dem Lesen, Interpretieren und Schreiben. In den DH verschiebt sich dieser Fokus: Hier entsteht Wissen nicht nur in klassischen Publikationen, sondern auch in Datenmodellen, Code-Zeilen oder Visualisierungen. Theorie und Theoriebildung hängen zunehmend von technischen Bedingungen ab, wodurch ihre impliziten Annahmen und Voraussetzungen nicht immer sichtbar werden. Zugleich wurde betont, dass Theoriearbeit in den DH drei zentrale Dimensionen umfasst: die methodologische (Wie operationalisieren wir Begriffe wie ‚Genre‘ oder ‚Autor*innenschaft‘?), die epistemologische (Welche Erkenntnisbedingungen liegen unseren Datenmodellen zugrunde?) und die technische (Wie prägen Infrastrukturen wie APIs oder Datenbanken unser Verständnis von Forschungsobjekten?). Theorie muss in diesem Zusammenhang nicht als Hindernis, sondern kann als Produktivitäts- und Transparenzwerkzeug verstanden werden – gerade dann, wenn sie explizit gemacht wird.

Praktiken des Theoretisierens in den Teilfeldern

Für die Computational Literary Studies (CLS) lag der epistemologische Schwerpunkt des Panels auf der Frage, wie Modellierungsprozesse theoretische Annahmen sichtbar machen (können). So wurden Text- und (Meta-)Datenmodelle wie Genre-Taxonomien oder Autorschaftskonzepte nicht als neutrale Perspektiven auf literaturgeschichtliche Kategorien wahrgenommen, sondern vielmehr hinsichtlich ihres Status zwischen (häufig normativen) theoretischen Konstrukten und empirischen Befunden diskutiert. Die digitale Hermeneutik (Möbus et al. 2025) etwa versuche, (latente) Vorannahmen in Daten und Datensätzen zu entschlüsseln und auf diese Weise den Mythos der ‚rohen Daten‘ (Gitelman 2013) zu entkräften. Ein Extrembeispiel für theoretische Reflexion stellt das WEMI-Modell (Work-Expression-Manifestation-Item) dar, das durch seine relationale Struktur zeigt, wie stark unser Verständnis von ‚Text‘ von theoretischen Vorannahmen geprägt ist. Besonders hervorgehoben wurde die Rolle der Formalisierung: Dabei wurde kritisch gefragt, warum bestimmte Literaturtheorien (wie Strukturalismus und Formalismus) formalisierbarer seien als andere und ob der Fokus in den CLS nicht stärker auf der Prozesshaftigkeit von Theoriebildung liegen sollte.

Im Hinblick auf die Digital History zeigte sich, dass das Digitale nicht einfach nur neue Methoden, sondern einen reflexiven turn (König 2021) auslöste. Traditionelle Praktiken wie Quellenkritik oder narrative Darstellungen erhalten durch digitale Werkzeuge eine neue Dimension: Erstere muss etwa Unsicherheiten in Datenmodellen reflektieren, während Visualisierungen wie Netzwerkanalysen zu ‚produktiven Irritationen‘ führen, die etablierte Erzählungen herausfordern. Ein Schlüsselmoment ist die Erkenntnis, dass Digital History nicht ‚Geschichte in Bits‘ ist, sondern eine Auseinandersetzung mit den Bedingungen, unter denen historisches Wissen heute entsteht. Besonders kontrovers diskutiert wurde, ob der Übergang von analogen zu digitalen Medien lediglich ein Medientransfer oder eine tiefgreifende Transformation darstelle.

Für die Digitale Editorik wurde besonders deutlich, wie technische Entwicklungen theoretische Fragen erst ermöglichen. Digitale Editionen entstanden etwa durch die Verschmelzung heterogener Bausteine, deren Zweck nicht primär die Erstellung digitaler Editionen war: von der Mikroelektronik (integrierter Schaltkreis, 1958) über Codierungsstandards auf der Grundlage von SGML (ab etwa 1987) bis hin zu webbasierten Präsentationsformen (HTTP, 1991) und vermutlich Transformermodellen (2017). Jede dieser Technologien brachte implizite Modell-Entscheidungen mit sich – etwa die Frage, wo die Grenze zwischen ‚Text‘ und ‚Apparat‘ verläuft oder wie Faksimiles die Entstehungsgeschichte eines Dokuments sichtbar machen. Eine zentrale These war, dass digitale Editionen nicht länger nur als ‚Ergebnisse‘, sondern als dynamische Forschungsumgebungen verstanden werden können, in denen Theorie und Praxis untrennbar miteinander verbunden sind.

Diskursive Synthesen

Über alle beteiligten Felder hinweg zeigte sich in der Diskussion, dass ‚Theorie‘ in den DH oft implizit in Infrastrukturen, Datenmodellen/Schemata oder Code/Algorithmen u. v. m. eingebettet ist, wodurch sie erst in Test- und Grenzsituationen sichtbar wird. So entstehen theoretische Reflexionen etwa dann, wenn Historiker*innen mit Informatiker*innen über Datenmodelle diskutieren oder wenn Literaturwissenschaftler*innen feststellen, dass ihre Operationalisierungen rechnerisch nicht nutz- oder ausführbar sind. 

Ein weiterer Schwerpunkt war die Pluralität theoretischer Ansätze in Verbindung mit der Frage nach Formalisierbarkeit: Trotz (Quasi-)Standards wie den Richtlinien der TEI zeigt sich, dass fast jedes Projekt eigene Modellanpassungen vornimmt und dadurch die Standardisierungsidee aufspreizt. Einig waren sich die Panelist*innen vor allem darin, dass Kollaboration, gerade in interdisziplinären Teams, einen Raum für theoretische Reflexion schaffe, der in traditionellen Einzeldisziplinen oft fehle – was auch einem gewissen Zwang geschuldet sein könnte, in der Sache zusammenzukommen (Stichwort ‚Interoperabilität‘).

Besonders hervorgehoben wurde auch die Rolle von Visualisierungen: Tabellen, Netzwerke oder interaktive Karten machen nicht nur Daten sichtbar, sondern auch die Modell-Entscheidungen, die hinter ihnen stecken. So wird etwa durch die Darstellung von Textvarianten in digitalen Editionen deutlich, wie stark unser Verständnis von ‚Varianten‘ von technischen Repräsentationsmöglichkeiten abhängt. Gleichzeitig sei angemerkt, dass Visualisierungen auch mit Verdeckungseffekten einhergehen.

Ausblick: Theorie als Aufgabe der Community

Die Diskussion mündete in den Appell, Theoriearbeit in den DH systematischer zu verankern. Dazu gehören konkrete Schritte wie die noch stärkere Einbindung theoretischer Reflexionen in Projektbeschreibungen, die Förderung von Methodenpluralismus in Drittmittelanträgen oder die Entwicklung von Schulungsformaten, die Historiker*innen und Literaturwissenschaftler*innen in theorie- und datengetriebene Denkweisen einführen. Ein vielversprechender Ansatz ist die Idee, kritische Datenpraxis als Standard zu etablieren, also die kontinuierliche Reflexion darüber, wie Algorithmen, Interfaces und Infrastrukturen unser Wissen formen und dabei selbst Ausdruck unseres Wissens sind.

Wenn Theorie als integraler Bestandteil von Forschungspraxen verstanden wird, können die DH eine noch tragfähigere Brücke zwischen geisteswissenschaftlicher ‚Tiefe‘ und technischer Innovation schlagen. Der Schlüssel liegt darin, die unsichtbare Arbeit des Theoretisierens – im Labor, im Code, in den Datenmodellen – bewusster zu machen. Denn erst dann wird klar: Theorie ist nicht das Gegenteil von Praxis, sondern ihre treibende Kraft (wie auch Praxis die treibende Kraft von Theorie ist).

Die AG Digital Humanities Theorie bedankt sich beim Organisationsteam der DHd-Konferenz sowie bei allen Diskutant*innen im Plenum für Fragen, Anregungen und Kommentare. Die AG wird das Thema weiter bearbeiten und lädt daher alle Interessierten, vor allem auch aus weiteren Bereichen der DH, um die Debatte über CLS, Digitale Editorik und Digital History hinaus zu erweitern, zur Mitarbeit ein. Schreiben Sie dazu sehr gerne eine Mail an die Convenor der AG Jonathan D. Geiger (jonathan.geiger@adwmainz.de), Rabea Kleymann (rabea.kleymann@phil.tu-chemnitz.de) und Alexa Lucke (Alexa.Lucke@uni-siegen.de).

Referenzen

Gitelman, Lisa (Hrsg.) (2013). „Raw Data“ is an Oxymoron. Cambridge, MA: The MIT Press. DOI: https://doi.org/10.7551/mitpress/9302.001.0001.

König, Mareike (2021). „Die digitale Transformation als reflexiver turn: Einführende Literatur zur digitalen Geschichte im Überblick“. Neue Politische Literatur 66, Nr. 1: 37–60. DOI: https://doi.org/10.1007/s42520-020-00322-2.

Möbus, Dennis et al. (Hrsg.) (2025). Digital Hermeneutics II: Sources, Analysis, Interpretation, Annotation, and Curation. (Special Issue, Lecture Notes in Computer Science, LNCS, vol. 14566), Heidelberg u.a.: Springer. DOI: https://doi.org/10.1007/978-3-032-08697-6.

❌