Abstract:
The under-resourced Kikamba language has few language technology tools
since the more efficient and popular data driven approaches for developing
them suffer from data sparseness due to lack of digitized corpora. To address
this challenge, we have developed a computational grammar for the Kikamba
language within the multilingual Grammatical Framework (GF) toolkit. GF
uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed
regular expressions for morphology inflection and thereafter developed the
syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error
rate (PER) of 10.96%. Finally, we have made a contribution to the language
technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a
platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently
defined in GF, making it easier to experiment with data driven approaches.