An improved attention guided convolutional neural network and transformer hybrid model for emotion classification in traditional Chinese paintings
Sci Rep. 2026 May 21. doi: 10.1038/s41598-026-52522-7. Online ahead of print.
ABSTRACT
Traditional Chinese paintings pose unique challenges for computational emotion analysis due to culturally-specific aesthetic principles that differ fundamentally from Western art paradigms. This study proposes an attention-guided CNN-Transformer hybrid model that integrates local feature extraction with global contextual modeling. The architecture employs spatial, channel, and cross-attention modules to fuse CNN-extracted brushwork details with Transformer-captured compositional relationships. Evaluated on a dataset of 7842 traditional Chinese paintings across seven emotion categories-tranquility, melancholy, vigor, elegance, desolation, joy, and solemnity-the model achieves 91.4% classification accuracy. Comparative experiments demonstrate superior performance over ResNet-101, DeiT-B, and ConvNeXt-T baselines. Ablation studies confirm the critical role of the attention-guidance module, while visualization analysis reveals alignment with traditional art theory principles. These results provide empirical support for domain-specific architectural designs in culturally-sensitive visual analysis within Han Chinese literati painting traditions, with generalizability to broader artistic domains constituting an important direction for subsequent investigation.
PMID:42168395 | DOI:10.1038/s41598-026-52522-7