Abstract:
Most of the African languages are under resourced
languages hence suffer from data sparsity due to lack of
sufficient digital corpora making data driven methods not
efficient for developing language technology resources.
However, the availability of digital devices and ubiquitous
computing demands these low-density languages to have
language resources for application purposes. Therefore, this
paper describes the engineering of Swahili grammar using
Grammatical Framework (GF), a rapid grammar writing tool
and formalism. A morphology rule based driven approach has
been used where morphology is developed first, then followed by
the syntactic part. The typical evaluation metrics BLEU and
PER metrics were used to evaluate the grammar resulted in
encouraging 77.95% and 9.46% respectively. The work is a
significant step for the low resourced Swahili language since it
provides a morphological analyzer and interlingua machine
translation in the GF ecosystem which is useful in the analysis
and generation of the language. Finally, the grammar lays a
foundation for the development of controlled natural language
applications on top of the Swahili grammar and the platform for
extracting bilingual corpus for use in data driven methods.