作者:
单位:
Abstract: Sentence punctuation and entity recognition are important steps in the process of collating and publishing ancient Chinese books. In recent years, with the development of artificial intelligence technology, automatic punctuation has achieved considerable progress,and the name entity recognition has also received more and more attention. Considering the knowledge dependence between the tasks, this paper proposes a joint learning method based on deep neural networks. First, we pre-train the language model with large-scale ancient Chinese corpus to equip the model with grammatical and semantic knowledge of ancient Chinese. Second, we introduce a joint learning mechanism to enable the model to learn multiple tasks at the same time, and use the data augmentation strategy to alleviate the problem of insufficient training data. With only one model, our method can automatically label various types of tags such as punctuation, quotation marks, book names, place names,person names, and dynasties with high accuracies. On multi-domain test set, our method reaches an F1 score of higher than 94% on automatic sentence segmentation task, 85% on automatic punctuation task, 87% on name entity recognition task. The system based on our method can be publically accessed at https://seg.shenshen.wiki/.MoreReset