[1]刘炜、叶鹰:《数字人文的技术体系与理论结构探讨》,《中国图书馆学报》2017年第5期。
[2]Roberto Busa, “The Annals of Humanities Computing: the Index Thomisticus,” Computers and theHumanities, vol.14, 1980, pp. 83-90.
[3]毕文韬:《唐诗的图像生成》,硕士学位论文,东南大学,2022年。
[4]A. Radford, J. W. Alec et al., “Learning Transferable Visual Models from Natural Language Supervision,” International Conference on Machine Learning, PMLR, 2021, pp. 8748-8763; Aditya Ramesh, Prafulla Dhariwal et al., “Hierarchical Text-Conditional Image Generation with CLIP Latents,” arxiv preprint, arxiv: 2204.06125, 2022.
[5]Nataniel Ruiz et al., “Dreambooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22500-22510; Hertz Amir et al., “Prompt-to-Prompt Image Editing with Cross Attention Control,”arxiv preprint, arxiv: 2208.01626, 2022; Avrahami Omri et al., “Spatext: Spatio-Textual Representation for Controllable Image Generation,” Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition, 2023, pp. 18370-18380.
[6]Kumari Nupur et al., “Multi-Concept Customization of Text-to-Image Diffusion,” Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp.1931-1941.
[7]Lvmin Zhang, Anyi Rao, and Maneesh Agrawala, “Adding Conditional Control to Text-to-Image Diffusion Models,” Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836-3847; Yufan Zhou et al., “Shifted Diffusion for Text-to-Image Generation,” Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition, 2023, pp. 10157-10166; Zhengyuan Yang etal., “Reco: Region-Controlled Text-to-Image Generation,” Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition, 2023, pp. 14246-14255.
[8]James Betker et al., “Improving Image Generation with Better Captions,” 2023, https://cdn.openai.com/papers/dall-e-3.pdf, accessed on May 12, 2025.
[9]Preechakul Konpat et al., “Diffusion Autoencoders: Toward a Meaningful and Decodable Representation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10619-10629.
[10]Hila Chefer et al., “Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models,” ACM Transactions on Graphics, vol.42, 2023, pp. 1-10.
[11]R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition, 2022, pp. 10684-10695.
[12]Edward J. Hu et al., “Lora: Low-Rank Adaptation of Large Language Models,” arxiv preprint, arxiv: 2106.09685, 2021.
[13]Haiyan Zhao et al., “Explainability for Large Language Models: A Survey,” ACM Transactions onIntelligent Systems and Technology, vol. 15, 2024, pp. 1-38.
[14]Long Ouyang et al., “Training Language Models to Follow Instructions with Human Feedback,” Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 27730-27744.
[15]参考 OpenAI, “GPT-4 Technical Report,” arxiv preprint, arxiv: 2303.08774, 2023。
[16]https://www.anthropic.com/index/introducing-claude, accessed on May 12, 2025.
[17]“Palm 2 Technical Report,” arxiv preprint, arxiv: 2305.10403, 2023.
[18]E. Mansimov, E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Generating Images from Captions with Attention,” ICLR, 2016; S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative Adversarial Text to Image Synthesis,” International Conference on Machine Learning, PMLR, 2016, pp. 1060–1069; H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks,” Proceedings of the IEEEInternational Conference on Computer Vision, 2017, pp. 5907–5915.
[19]T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks,” Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2018, pp. 1316–1324; A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-Shot Text-to-Image Generation,” International Conference on Machine Learning, 2021, pp. 8821-8831; M. Ding, Z. Yang, W. Hong, W. Zheng, C. Zhou, D. Yin, J. Lin, X. Zou, Z. Shao, H. Yang et al., “Cogview: Mastering Text-to-Image Generation via Transformers,” Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 19822–19835.
[20]Ian Goodfellow et al., “Generative Adversarial Nets,” Advances in Neural Information ProcessingSystems, vol. 27, 2014.
[21]H. Cao, C. Tan et al., “A Survey on Generative Diffusion Model,” arXiv preprint, arXiv: 2209.02646, 2022;S. Frolov, T. Hinz, F. Raue, J. Hees, and A. Dengel, “Adversarial Text-to-Image Synthesis: A Review,” NeuralNetworks, vol. 144, 2021, pp. 187–209; R. Zhou, C. Jiang, and Q. Xu, “A Survey on Generative Adversarial Network-Based Text-to-Image Synthesis,” Neuro computing, vol. 451, 2021, pp. 316–336.
[22]Jonathan Ho, Ajay Jain, and Pieter Abbeel, “Denoising Diffusion Probabilistic Models,” Advances inNeural Information Processing Systems, vol.33, 2020, pp.6840-6851.
[23]A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models,” ICML, 2022.
[24]C. Saharia, W. Chan, S. Saxena et al., “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” arXiv preprint, arXiv: 2205.11487, 2022.
[25]A. Radford, J. W. Kim et al., “Learning Transferable Visual Models from Natural Language Supervision,” ICML, 2021.
[26]A. Abuzayed, H. Al-Khalifa, “BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique,” Procedia Computer Science, vol. 189, 2021, pp. 191-194.
[27]T. Brown, B. Mann et al., “Language Models are Few-Shot Learners,” Advances in Neural InformationProcessing Systems, 2020.
[28]C. Raffel, N. Shazeer et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” Journal of Machine Learning Research, vol. 21, 2020, pp. 1–67.
[29]A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical Text-Conditional Image Generation with CLIP Latents,” arXiv preprint, arXiv: 2204.06125, 2022.
[30]Jason Lee, Kyun ghyun Cho, and Douwe Kiela, “Countering Language Drift via Visual Grounding,”EMNLP, 2019; Yuchen Lu, Soumye Singhal, Florian Strub, Aaron Courville, and Olivier Pietquin, “Countering Language Drift with Seeded Iterated Learning,” International Conference on Machine Learning (ICML),2020.
[31]James Kirkpatrick, Razvan Pascanu et al., “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings of the National Academy of Sciences, vol. 114, 2017, pp.3521-3526; Dingcheng Li, Zheng Chen, Eunah Cho, Jie Hao, Xiaohu Liu, Fan Xing, Chenlei Guo, and Yang Liu, “Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation,” NAACL, 2022; Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou, “Overcoming Catastrophic Forgetting with Hard Attention to the Task,” International Conference on Machine Learning, 2018, pp. 4548–4557.
[32]Robin Rombach et al., “High-Resolution Image Synthesis with Latent Diffusion Models,” Proceedings ofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684-10695.
[33]Jing Jiang, Yiran Ling et al., “Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry,” arXiv preprint, arXiv: 2407.06196, 2024.
[34]Vivian Liu, Lydia B. Chilton, “Design Guidelines for Prompt Engineering Text-to-Image Generative Models,” Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, pp. 1-23.