Project Types - AI4Bharat/Shoonya GitHub Wiki

Project Types

Shoonya currently supports Contextual Sentence Verification and Contextual Translation Editing projects. Every project has a set of tasks which have to be annotated.

Contextual Sentence Verification

This project type is used for verifying if a given sentence is clean, grammatically correct and doesn't have any errors. The context from which this sentence has been taken will also be shown. The annotator will be given the option to make edits to the given sentence as well. They can correct the sentence and then update the quality status accordingly. The cleaned sentence can be used for further annotation projects. The various quality statuses that can be assigned to a sentence are: Clean, Profane, Difficult vocabulary, Ambiguous sentence, Context incomplete, Corrupt. View this video to understand in detail.

Contextual Sentence Verification and Domain Classification

This project type is used for verifying if a given sentence is clean, grammatically correct and doesn't have any errors. The context from which this sentence has been taken will also be shown. The annotator will be given the option to make edits to the given sentence as well. They can correct the sentence and then update the quality status accordingly. The various quality statuses that can be assigned to a sentence are: Clean, Profane, Difficult vocabulary, Ambiguous sentence, Context incomplete, Corrupt.

For all the sentences for which the user gives a 'Clean' status, the annotator will be shown a list of domains and the annotator has to select the domain to which this sentence belongs to. The cleaned sentence can be used for further annotation projects.
View this video to understand in detail.

Contextual Translation Editing

This project type is used for translating a given sentence in a language to another language. The context from which this sentence has been taken will also be shown in the source language. The machine translation will also be shown. The annotator can either make edits to the given machine translation to correct it or add a new translation altogether. View this video to understand in detail.

Semantic Textual Similarity Scale5

This project type is used for rating a given translation of a sentence from one language to another language. The annotators will have to rate the given translation-pairs on a 5-point scale (integer scores from 0-4). View this video to understand in detail.

Conversation Translation Editing

This project type provides a common template for annotators to translate a whole conversation in a single task. Each conversation can have multiple speakers, each speaker having multiple messages. View this video to understand in detail.

Single Speaker Audio Transcription Editing

This project type is used for entering transcript for a given audio file. The audio file can be in any language. Each audio file will be a single task. The annotator has to select the regions in the audio file which has speech and then enter the transcripts for those particular regions. A single audio file can have multiple regions. View this video to understand in detail.

Audio Segmentation

This project type is used for segmenting the audio based on the speakers for a given audio file. The audio file can be in any language. Each audio file will be a single task. The annotator has to select a speaker tag and then select the regions in the audio file where that speaker has spoken. A single audio file can have multiple regions. View this video to understand in detail.

Audio Transcription Editing

This project type is used for entering transcript for a given audio file. The audio file can be in any language. Each audio file will be a single task. The annotator will be shown the audio with different regions selected along with tagging of the speakers of each region. The annotator has to listen to the audio of each region then enter the transcript for it. A single audio file can have multiple regions. View this video to understand in detail.