Toloka - TrentoCrowdAI/crowdhub-api GitHub Wiki
Yandex.Toloka
Toloka is the crowdsourcing service of Yandex.
In this document there is a brief description of the entities of the services and some notes about the usage of the web interface and the API.
Sandbox
Like Amazon MTurk, Toloka offers a sandboxed version of their website. This means that the interface and the API is the same as the production website, and so we just need to change the URL of the website to switch between the testing environment and the production one.
The URL of the sandbox is https://sandbox.toloka.yandex.com and to test a simple scenario we need to register at least two accounts:
- one for the requester (to create a project);
- one for the performer (to complete assignments);
Documentation
Entities
Project
A project has N Task pools, and it defines the graphical interface for the tasks.
A project has two set of attributes called "input" and "output" data:
- input data: text and links that form the question;
- output: the field submitted by the performer;
The interface of a task is created using HTML, CSS e JS. In the HTML part handlebars expressions are also supported and some useful components are already defined.
Task Pool
A task pool contains N Tasks and it defines the following properties:
- name and description;
- reward given to the performer for each task page/suite;
- time per task;
- set of attributes and rules to manage the quality control:
- overlap: number of performer asked for a task;
- skills required by the performer to complete a task;
- parameters to configure the smart mixing mode;
Training Task Pool
A training task pool is like a normal pool, but it only contains tasks of the training type (correct answer + hint). Task pages/suites of this task pool type doesn't have a reward.
NOTE: These task pools can only be created through the web interface. It is not possible to create a training task pool using the API.
Task
There are 3 types of tasks:
- main: task composed only by the input data. Most of the task will be of this type;
- control (golden data): question with the correct answer. These tasks will be submitted to the performers to check his/her performance;
- training: similar to control tasks, but also requires a hint other than the correct answer;
Task suite (API) / Task page (Web interface)
In the web interface this entity is called Task page, where in the API it's called Task suite. This entity describes a set of tasks inside a page shown to the performer.
Task pages/suites can be created manually or automatically (smart mixing), letting Toloka choose which task insert in a page. Parameters to configure the smart mixing mode are stored in the Task pool entity.
NOTE: If you request to create pages automatically, it will be no longer possible to create them manually and vice versa. If you want to mix the types of page generation, you need to create two different task pools for the same project.
Assignment
This entity is created automatically when a performer answers a task suite/page. It contains the tasks of the suite/page, the id of the performer and the answer.
Skills
Skills can be created by the requester. Skills can be public or private and can be assigned:
- automatically: using the quality control rules;
- after the performer completed the training;
- manually;
Web interface
Training task pool
As stated before, it is possible to create a training task pool only using the web interface.
Task pages
The web interface allow data to be uploaded only in TSV. When uploading data Toloka will ask how to create the pages:
- by empty row;
- specifing the number of tasks per page;
- automatically (smart mixing);
API
Task suites and tasks
When using the API you can choose to:
- create a task: task suites are automatically created using the smart mixing mode;
- create a task suite: you decide which task include in every page;