Python API Specification - GenomicsDB/GenomicsSampleAPIs GitHub Wiki

API Specification

The APIs and their input and output formats are described below. Examples of python code using the APIs described below are under GenomicsSampleAPIs/python_api/example.

getNumSamples

API to get the total number of samples that is in Tile DB.
Input: None
Output: Number of samples
Example:

num_samples = api.getNumSamples()  

getValidPositions

API to get the ranges where valid data is available in Tile DB. The API returns a string that is in json format, and you can use the json library in python to quickly convert the string to a dictionary object.
Example:

import json
data = json.loads(api.getValidPositions("5", 1, 1000))  

Input:

Parameter Type Description
chromosome string name of the contig (first column in VCF)
start long start of the range to search for valid positions
end long end of the range to search for valid positions

Output:

Parameter Type Description
data json-dict<string, list<long>> dictionary with keys = ['indices', 'POSITION', 'END']. indices is a list of sample ids that have valid data in the queried range. POSITION and END values are list objects with the start and end position values, where there exists values between start and end at each index in the list.

getPosition

API to get the ranges where valid data is available in Tile DB. The API returns a string that is in json format, and you can use the json library in python to quickly convert the string to a dictionary object.
Example:

import json
data = json.loads(api.getPostion("5", 500, ["REF", "QUAL"]))  

Input:

Parameter Type Description
chromosome list<string> name of the contig (first column in VCF)
position list<long> position to fetch the data from
Attributes list<string> list of attributes that need to be fetched from Tile DB

Output:

Parameter Type Description
data json-dict<string, list<attribute_type>> dictionary with keys = ['indices', 'POSITION', 'END', 'attribute0', ...]. indices is a list of sample ids that have valid data in the queried range. POSITION and END values are list objects with the start and end position values, where there exists values between start and end at each index in the list. Each attribute is a is a list object each of which has elements of the type of the attribute.

NOTE:

  1. & in ALT refers to <NON-REF>
  2. indices are Samples Ids that can be used to construct scipy sparse matrices. For e.g., to construct a 1D scipy sparse matrix,
import json  
from scipy.sparse import csc_matrix  
data = json.loads(api.getValidPositions("5", 1, 1000, ["REF", "QUAL"]))  
row = data['indices'] # indices==sample_ids start from 0 and correspond to row #  
col = [0] * len(row) # constructing a matrix with single column  
csc_matrix((data['QUAL'], (row, col)))  

getPosition - multi-positions

getPosition is also overloaded to take a list of contigs and list of positions as input, and returns the values for all the queried positions.

Example
The example below queries for position 500 in both contig 5 and 6.

import json
data = json.loads(api.getPostion(["5", "6"], [500, 500], ["REF", "QUAL"]))  

Input:

Parameter Type Description
chromosome string name of the contig (first column in VCF)
position long position to fetch the data from
Attributes list<string> list of attributes that need to be fetched from Tile DB

Output:

Parameter Type Description
data json-dict<string, list<attribute_type>> dictionary with the following format { contig : { POSITION : {'indices' : [results], 'POSITION': [results], 'END':[results], 'attribute0': [...], 'attribute1': [...], ...}, next_POSITION : {...}}, next_contig: {...} }

getSampleNames

API gets the sample name corresponding to the sample IDs, that was returned from the APIs above.
Example:

sample_names = api.getSampleNames(sample_ids)  

Input:

Parameter Type Description
Sample IDs list list of sample IDs

Output:

Parameter Type Description
Sample Names list list of sample names
⚠️ **GitHub.com Fallback** ⚠️