Annotation Manual
Overview
The objective of the annotation task is to provide high quality translation data such that the model can learn appropriate translations. The current models unfortunately still translate word by word due to lack of data to provide context.
The task covers 4 different sentence lengths as they provide a different outcome when finetuning. Short sentences would focus more on syntax correctness while Long sentences would also contribute to broad context learning.
Fields
- Input Text: Source Language input to the model (For example: English Sentence)
- Predicted Text: Target Language output by the model (For example: Creole Sentence translated by current SOTA model)
- Rating: Scale from 1 to 5 rating how good a translation was.
- Suggested Text: Text inputted by the user to correct Predicted Text
Rating
We propose 5 rating levels to help our machine learning models understand the differences between the Predicted Text and Suggested Text.
- Completely wrong, makes no sense.
- Got some words right.
- Got most words right, context and grammar might be slightly off.
- Perfect word by word translation, close to perfect grammar.
- Perfect translation, sentence should sound like spoken by a local (not word by word translation). Perfect grammar.
Nuances
The kreol morisyen language on its own, for most speakers, do not follow inherent grammatical rules. However these rules exist and were implemented by linguists, kreol advocates and the government. We aspire to get translations as close as possible to the standardized kreol morisyen.
One could argue that there are multiple ways of saying the same thing in kreol morisyen, I would agree and I would suggest the user to write what they believe a local speaker would say. The diversity in sentence structure is an asset for more robust learning.