Shoonya Workflow - AI4Bharat/Shoonya GitHub Wiki

Shoonya User Management

The login system management is done through Djoser in Shoonya. A user is invited to Shoonya through the invite management system. A user will receive a link in their email for signing up for Shoonya. While signing up, the password validation is handled through Django password validator. Once a user is authorized after logging in, the session management for them is handled using JWT (JSON Web Token). At the backend, apart from handling the authorization and session management token, the user management module supports basic CRUD (Create, Read, Update and Delete) operations to be performed on its Django model.

Shoonya Organization Management

At the topmost of the user and data management hierarchy in Shoonya is an organization that has a group of workspaces, users, projects, and tasks belonging to it. An organization will have an organization owner who will have access to all the data within an organization. At the backend, the organization management module supports basic CRUD operations to be performed on its Django model. It also has analytics feature which can generate reports for project as well user level for any selected date range.

Shoonya Workspace Management

Within an organization, there can be multiple workspaces that will have a group of similar projects. Each workspace will have its own workspace manager. At the backend, the workspace management module supports basic CRUD operations to be performed on its Django model along with features for assigning a manager to a workspace and archiving a workspace. It also has analytics feature which can generate reports at project level for any selected date range.

Shoonya Data Management

The annotation data - the data to be annotated as well as the annotation results - are stored in the form of datasets in Shoonya. A dataset instance is a named set of data items belonging to a specific dataset type. A dataset instance is first created. A dataset is a logical table with a defined schema of fields and corresponding metadata. For each column in every dataset, a parent_id which has the id pointing to the annotation data row in its source dataset and also the metadata which contains details of how the data in that row was created, is stored. Metadata will contain details like whether this data has been annotated by users with a link pointing to the annotation id or has been created by a function specifying arguments to the function. Considering an example scenario, for an annotation task involving verifying the machine translation of a given English sentence into an Indian language, a dataset instance named ‘English to Hindi Translation Pair’ is created. Further, a dataset of type ‘Translation Pair’ is created, with its dataset instance id pointing to the ‘English to Hindi Translation Pair’ dataset instance.

Shoonya Project Management

Within a workspace, there will be multiple projects, either belonging to the same type, say, translation or the same language. A project acts as a human annotation task definition with a predesigned user interface (UI) and pre-mapped schema for input and output sources. The project management module uses a project registry to explicitly specify the project specifications for different annotation types like translation, OCR, or monolingual data collection:

  • Project Type - Monolingual Translation, Translation Editing, OCR Annotation, etc.
  • Project Mode - Data Annotation or Data Collection
  • Label Studio template to be used for UI
  • Enable Task Reviews - Enabling annotations done by annotators to be reviewed by reviewers
  • Input Dataset
    • Class - Source of the input data like Sentence Text or Translation Pair
    • Fields - The fields to be used as input like the language and the text to be annotated
  • Output Dataset
    • Class - The dataset to which the annotation output has to be exported, which can either be the same as the input source or a different dataset. For example, an OCR annotation can result in data that has to be stored in a dataset having blocks of text.
    • Fields - The annotation result to be exported to the dataset

Project management starts with creating a project by sampling a set of data from a dataset having data to be annotated. Upon creation of a project, each data row of its input data is populated as an annotation task in the task model. Upon creating a project, the project will be in ‘Draft’ status. Once language experts/annotators are assigned to it, a project is published and it will then be in ‘Published’ status. The language experts can work on the annotation tasks only after a project is published. A project moves to ‘Archived’ status if it is explicitly archived by the workspace manager or the organization owner or the admin. Once the annotation tasks of a project are completed, the annotation outputs of the task can be exported to the output dataset. At the backend, the project management module supports the above-mentioned flow along with allowing the basic CRUD operations to be performed on its Django model. It also has analytics feature which can generate reports for the performance of its users for any selected date range.

Shoonya Task Management

A project has a set of annotation tasks belonging to it. The task and annotation models work together to store the data to be annotated, the annotation result, along with whom it is annotated by, reviewed by, superchecked by, metadata, and the task status.

Each annotation has its own annotation status and the task status changes based on the change in annotation status of its associated annotations. An annotation can be of 3 types (annotation_type field indicates this): annotator's annotation, reviewer's annotation and superchecker's annotation (indicated by values 1, 2 and 3 respectively)

Each annotation has the date and time when it was created, when it was annotated for the very first time and when it was updated most recently. These fields remain the same irrespective of the annotation type. The annotation creation time indicates the date and time when a task was pulled by the user. The annotated at time shows the very first time a task was marked as 'labeled' by the annotator, marked as 'accepted (with major/minor changes)' or 'to be revised' by the reviewer or marked as 'validated (with changes)' or 'rejected' by the superchecker.

Task Status

The task status is initially ‘Incomplete’. Upon submission of annotation by the annotator, it changes to ‘Annotated’. If the project has task reviews enabled and once the task is reviewed and accepted by a reviewer, it moves to 'Reviewed' status. If the reviewer skips a task or does not accept the annotation of a task, it will continue to remain in 'Annotated' status. If the project has superchecking enabled and once the task is superchecked and accepted by a reviewer, it moves to 'Superchecked' status. If the super checker skips a task or does not accept the annotation of a task, it will continue to remain in 'Reviewed' status. Once the final correct annotation of a task is exported to the dataset, the task status gets changed to 'Exported'.

For review-disabled projects, the annotations of all tasks under 'Annotated' status will be exported to dataset. For all review-enabled projects, the annotations of all tasks under 'Reviewed' status will be exported to dataset.

For all superchecking-enabled projects, the annotations of all tasks under 'Superchecked' status will be exported to dataset.

Annotation Status for Annotator Workflow

Once an annotator pulls a new batch of tasks for annotating, an empty annotation is created for the pulled tasks with annotation status as 'Unlabeled'. Once the task is annotated, its annotation status changes to 'Labeled' and the associated task status changes to 'Annotated'. If the task is not annotated and instead skipped, its annotation status changes to ‘Skipped’. If a task annotation is not fully complete and the user wants to do it later, it can be saved under 'Draft' annotation status. For annotation statuses 'Draft' and 'Skipped', the task status will continue to remain 'Incomplete'. Any task sent to the annotator for revision ('To Be Revised' status) by the reviewer will also remian in 'Incomplete' status.

The user can utilize the 'Annotation Notes' to save any notes on the task.

Review Annotation Status for Reviewer Workflow

When a reviewer pulls a new batch of tasks for reviewing, an empty annotation is created for the pulled tasks with review annotation status as 'Unreviewed'.

The reviewer can accept the annotator's annotation as it is, in which case, he can mark the task as 'Accepted'. In this case, the review annotation status will be marked as 'Accepted'. The reviewer can make changes to the annotator's annotation and then accept it, in which case, he can mark the task as either 'Accepted with Minor Changes' or 'Accepted with Major Changes', depending upon the level of changes made. The task status of only all tasks which are marked under any of the accepted changes review annotation status will get changed to 'Reviewed'.

A reviewer can also skip a task, in which case, it will be given a review annotation status as 'Skipped'. When a reviewer marks a task as 'Draft' so as to continue editing it later, that task gets a review annotation status as 'Draft'. The task will continue to remain under 'Annotated' when it is marked as draft or skipped by the reviewer.

The reviewer can add some 'Review Notes' to the task either for communicating with the annotator or for his own reference.

A reviewer can also mark a task as 'To Be Revised', in which case it will be sent to the annotator again for revision. In this case, both the annotator's annotation status as well as review annotation status will be marked as 'To be Revised' and the task status will change from 'Annotated' to 'Incomplete'. The annotator annotates the 'To Be Revised' task again or can communicate with the reviewer through the 'Annotation Notes' feature. After the annotator annotates the 'To Be Revised' task, it goes back to 'Labeled' for the annotator and 'Unreviewed' for the Reviewer and the task status changes to 'Annotated'.

Supercheck Annotation Status for Superchecker Workflow

When a super checker pulls a new batch of tasks for superchecking, an empty annotation is created for the pulled tasks with supercheck annotation status as 'Unvalidated'.

The superchecker can validate the reviewer's annotation and accept it as it is, in which case, he can mark the task as 'Validated'. In this case, the supercheck annotation status will be marked as 'Validated'. The superchecker can make changes to the reviewer's annotation and then accept it, in which case, he can mark the task as 'Validated with Changes. The task status of only those tasks which are marked under 'Validated' or 'validated with Changes' supercheck annotation status will get changed to 'Superchecked'.

A superchecker can also skip a task, in which case, it will be given a supercheck annotation status as 'Skipped'. When a superchecker marks a task as 'Draft' so as to continue editing it later, that task gets a supercheck annotation status as 'Draft'. The task will continue to remain under 'Reviewed' when it is marked as draft or skipped by the superchecker.

The superchecker can add some 'Supercheck Notes' to the task either for communicating with the reviewer or for his own reference.

A superchecker can also mark a task as 'Rejected', in which case it will be sent to the reviewer again for revision. In this case, both the supercheck annotation status as well as review annotation status will be marked as 'Rejected' and the task status will change from 'Reviewed' to 'Annotated'. The reviewer annotates the 'Rejected' task again or can communicate with the superchecker through the 'Review Notes' feature. After the reviewer annotates the 'Rejected' task, it goes back to 'Accepted (with major or minor changes)' for the reviewer and 'Unvalidated' for the superchecker and the task status changes to 'Reviewed'.

At the backend, the task management module supports the above-mentioned flow through basic CRUD operations to be performed on its Django model.