Technology Used - AI4Bharat/Shoonya GitHub Wiki

Shoonya Backend

Built using Python and Django REST Framework, Shoonya Backend is used for storing, retrieving, and modifying data pertaining to both users, workspace, organization, project and task management as well as annotation. It will be further enhanced to have functions to automate certain functionalities like extraction of data, generation of machine translation, and cleaning up audio.

Shoonya Frontend

Built using Material UI, Shoonya Frontend facilitates user, workspace, organization, project, and task management in the user interface and uses the frontend package of Label Studio, an open-source data labeling tool, to allow the language experts/ annotators to perform data annotation. Label Studio Frontend is a JavaScript web app developed using React and MobX-state-tree (MST). Installing Frontend and Backend Shoonya Backend can be installed by following the steps in the README.md of GitHub documentation. Shoonya Frontend can be installed by following the steps in the README.md of GitHub documentation.

Shoonya Annotation Templates

Shoonya uses Label Studio templates for showing different UIs to the annotators/ language experts based on the annotation task to be done. These templates are Javascript XML (JSX) files that use tags to handle the visual layout of the annotation page as well as to specify the data to be annotated. Label Studio Frontend supports both in-built templates as well as custom templates developed by the user. In the current version of Shoonya, the OCR annotation task uses the existing template of Label Studio and the translation tasks (monolingual translation and translation editing) use the custom templates.

Shoonya Backend Models

With Shoonya being built using Django REST Framework and Python, a Django model represents a database table with each of its attributes representing a field. All the Django models pertaining to the user, workspace, organization, project, task, and data management are present here along with the specifications of each attribute of the model. Database PostgreSQL is the database used for Shoonya. Shoonya Backend API The APIs pertaining to the user, workspace, organization, project, task, and data management are present here. Apart from the basic CRUD (Create, Read, Update and Delete) operations, the following are the APIs that are used to perform other operations:

  • A function API for splitting a block of text to separate sentences and storing each sentence in a separate dataset called ‘Sentence Text’.
  • A function API for storing a block of text annotated from an OCR document into a separate dataset called ‘Block Text’.
  • A project API for archiving and publishing a project
  • A project API for exporting a project
  • A project API for pulling new items to a project
  • A project API for downloading a project
  • A project API for generating analytics
  • A create project API that automatically creates annotation tasks for a project during project creation itself
  • A project API for generating review tasks
  • A task API for assigning users to a task
  • Users API for invite generation for inviting new users to Shoonya
  • Users APIs for creating, refreshing, and verifying JWTs (JSON Web Tokens)
  • Workspace API for assigning workspace manager
  • Workspace API for archiving the workspace
  • Workspace API for generating analytics
  • Organization API for generating analytics