Open Kreol Project - Translation
Philosophy
We present the first open source digital language preservation project in Mauritius for our native language Mauritian Creole. The philosophy behind the Open Kreol Project is inline with META's No Language Left Behind (NLLB) initiative which embodies the vision of ensuring equitable access to information and communication by developing advanced AI models capable of translating low-resource languages, such as Mauritian Kreole, with high accuracy.
Mauritian Creole is one of the 15 French-based creoles and is second in the number of its speakers (1.2 million), after Haiti (7 million), and yet there is currently no reliable way of translating any language into Mauritian Creole and vice-versa. By leveraging the NLLB initiative, the Open Kreol Project aims to bridge this gap and provide high-quality translation data for Mauritian Creole. This will not only preserve our linguistic heritage but also ensure that our language thrives in the digital age.
Open source projects are essential as they democratize technology, making advanced tools and resources accessible to everyone, regardless of economic background. These initiatives empower communities by fostering collaboration, innovation, and local solutions tailored to specific needs. By uplifting marginalized groups, open source projects contribute to social equity, ensuring that no community is left behind in the digital revolution.
Project Goals
- To create a comprehensive, open-source dataset tailored for translation purposes, ensuring high-quality translation data.
- To develop a cutting-edge, open-source Machine Translation Model, accessible for public download via HuggingFace.
- To establish a compact Python framework for effectively training and fine-tuning translation models, compatible with the HuggingFace API.
- To cultivate a collaborative environment within the AI community in Mauritius, placing a strong emphasis on AI education to drive further innovation.
Project Timeline
The development phase has yielded models that surpass previous open-source work. Additionally, a substantial amount of Kreol data has been sourced from Mauritian references, and a foundational portion of the Python framework has been completed.
Despite these achievements, the quality of English-Kreol pairs remains suboptimal. To address this, the collaboration of the Kreol speaking community is sought to annotate and rectify irregularities in the translations.
Following the annotation of posted batches, an updated model will be trained, evaluated, and subsequently made publicly available alongside the data used in its training.
Future avenues for advancement include the development of models with fewer parameters, the creation of a Kreol ChatGPT, and the enhancement of website translation capabilities.