A Quick Tutorial: Working with GA4GH TES API - microsoft/ga4gh-tes GitHub Wiki
The GA4GH Task Execution Service (TES) API is a standardized interface for executing and managing computational tasks on various computing environments. This tutorial provides a simple guide to help you get started with the TES API, including how to create, submit, and manage tasks.
Prerequisites
Before you begin, ensure you have the following:
- A running TES server.
- Basic knowledge of RESTful APIs and JSON.
- A tool like
curlor a REST client (Postman, Insomnia, etc.) to interact with the API.
Overview of TES API
The TES API allows you to submit, track, and manage computational tasks. Each task typically defines inputs, execution resources, commands, and outputs. The API endpoints you'll interact with include:
- POST /v1/tasks: Submit a new task.
- GET /v1/tasks/{id}: Retrieve information about a specific task.
- GET /v1/tasks: List tasks.
- DELETE /v1/tasks/{id}: Cancel or delete a task.
1. Submitting a Task
The first step is submitting a task to the TES API. A task typically includes metadata, input files, the commands to execute, and where to store the outputs.
Here’s an example of a task submission using curl:
curl -X POST https://tes.example.com/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"name": "Example Task",
"inputs": [
{
"url": "/myContainer/input.txt",
"path": "/data/input.txt",
"type": "FILE"
}
],
"outputs": [
{
"path": "/myContainer/output.txt",
"url": "/data/output.txt",
"type": "FILE"
}
],
"resources": {
"cpu_cores": 4,
"ram_gb": 16,
"preemptible": true,
},
"executors": [
{
"image": "ubuntu:latest",
"command": [
"bash", "-c", "cat input.txt > output.txt"
],
"workdir": "/data"
}
]
}'
This task runs a simple cat command inside an Ubuntu container to copy an input file to an output file.
- inputs: Specifies the files needed for the task.
- outputs: Defines where the task’s outputs should be stored.
- executors: Contains the details of the command(s) to be executed in a container, including the Docker image.
- resources: Specifies the CPU cores and RAM required for the task.
Example Response
A successful task submission will return a task ID:
{
"id": "12345"
}
You can now use this ID to track the task’s progress.
2. Checking Task Status
Once the task is submitted, you can retrieve its status using the task ID:
curl -X GET https://tes.example.com/v1/tasks/12345
Example response:
{
"id": "12345",
"state": "RUNNING",
"logs": [
{
"start_time": "2024-09-17T10:00:00Z",
"end_time": "",
"system_logs": [],
"outputs": []
}
]
}
- state: Indicates the current status of the task (e.g.,
QUEUED,RUNNING,COMPLETE,ERROR). - logs: Includes information about when the task started, finished, and any system logs.
3. Listing All Tasks
You can list all the tasks submitted to the TES server using the following endpoint:
curl -X GET https://tes.example.com/v1/tasks
This will return a list of all tasks with their current statuses:
{
"tasks": [
{
"id": "12345",
"state": "RUNNING"
},
{
"id": "12346",
"state": "COMPLETE"
}
]
}
4. Canceling or Deleting a Task
To cancel a task that is running or queued, you can use the DELETE method:
curl -X DELETE https://tes.example.com/v1/tasks/12345
If the task was successfully canceled or deleted, the response will be:
{
"message": "Task canceled"
}
Example Use Case: Running a Bioinformatics Workflow
In a bioinformatics context, the TES API can be used to run tasks such as:
- Aligning sequencing data
- Variant calling
- Data preprocessing for machine learning
You could configure the inputs as large sequencing files (BAM/FASTQ), define the appropriate Docker container with the relevant bioinformatics tool (e.g., BWA, GATK), and submit the task to the TES server. Outputs could be the processed results, which are saved back to a cloud bucket.
curl -X POST https://tes.example.com/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"name": "BWA Alignment",
"inputs": [
{
"url": "/myContainer/inputs/input.fastq",
"path": "/data/input.fastq",
"type": "FILE"
}
],
"outputs": [
{
"path": "/data/aligned.bam",
"url": "/myContainer/outputs/aligned.bam",
"type": "FILE"
}
],
"executors": [
{
"image": "biocontainers/bwa:v0.7.17_cv1",
"command": [
"bwa", "mem", "input.fastq", "output.bam"
],
"workdir": "/data"
}
]
}'
Conclusion
The GA4GH TES API provides a standardized and flexible way to submit, manage, and track computational tasks across various environments. By using the endpoints described above, you can automate task submissions, monitor their progress, and manage large-scale computational pipelines with ease.
For more details, check out the GA4GH TES API specification.