普通视图

Received before yesterday

Using the isiZulu GF Resource Grammar for morphological annotation

2025年12月31日 08:00

The isiZulu GF Resource Grammar (ZRG) enables syntactic parsing using the GF runtime system. In order to perform this task, the ZRG implicitly encodes rich morphosyntactic information about isiZulu. In this paper we show how such information can be made explicit by adapting the way the grammar linearises GF abstract syntax trees. The result is annotated text, which can be utilised in various ways for supporting natural language processing of an under-resourced, morphologically complex language like isiZulu.

Bootstrapping Siswati lexical resources from isiZulu

2025年12月31日 08:00

IsiZulu and Siswati are closely related languages that share significant morphosyntactic characteristics. Systematic differences between these languages have been identified at the phonological and morphosyntactic levels. Due to the resource-scarce status of these languages, this similarity has led to bootstrapping of computational language resources at the morphological and syntactic levels. In this work, we investigate the feasibility of adapting lexical items in a computational lexicon from isiZulu to Siswati. We use Grammatical Framework resource grammars for both languages to analyse and transform lexical items, which are then evaluated against a parallel term list. An iterative process yields a success rate of 70.5 %, indicating that this approach is largely viable as a means of significantly reducing the manual effort needed to develop lexicons for computational resources for Siswati.

Generation of segmented isiZulu text

2024年2月19日 08:00

The complex morphology, conjunctive orthography and widespread occurrence of morphophonological alternation in the Nguni languages have
given rise to several efforts towards morphological
segmentation of tokens of Nguni languages. For
supervised methods, annotated data is required,
which currently exists as canonically segmented
data in the NCHLT corpus and surface segmented
data in the Ukwabelana corpus. In this paper, we
present a method and segmentation strategy based
on a computational grammar for isiZulu. The
grammar, which itself has some limitations in processing speed and robustness to unexpected input,
is used to create a new set of segmentations for the
tokens of the Ukwabelana corpus.

By training various models with the same architecture but on different datasets, we first show that our
approach enables us to match the performance of a
model trained on pre-existing data. We also show
that our approach provides the flexibility to determine a suitable segmentation strategy and to generate data that reflects this strategy.

Bringing Children’s Dictionaries to Digital Life

2024年2月19日 08:00

South Africa is facing a literacy crisis, with the latest PIRLS results showing that 8 out of 10 learners cannot read for basic comprehension by the time they leave the foundation phase. In this climate, the development of strategies to assist educators in harnessing the available resources to maximum effect is needed. However, most teaching resources are not digitally available, and even fewer are available in formats that make them readily available for use in natural language applications.
The Ngiyaqonda! project aims to provide an interactive, multimodal digital environment within which learners can practise their reading and writing skills. Computational grammars and speech technology are combined in a mobile application to facilitate the transition from oral competency in a language to written competency. In this paper, we show how words from a multilingual dictionary for foundation phase learners can be brought to digital life within the Ngiyaqonda! application to enhance the learning experience of core concepts and vocabulary.
We use the official foundation phase CAPS English-isiZulu dictionary (Mbatha et al. 2018) to ensure that the content of the computational grammars is aligned with relevant learning outcomes. The result is a fully parallel, multilingual computational grammar that is aligned at the semantic level, ready to be included in the Ngiyaqonda! application.

❌