Readers - labsquare/cutevariant GitHub Wiki

Cutevariant imports variant data from an Abstract class called AbstractReader. You can inherit from it to support custom format which contains variants. VCF readers already have been done and are located in readers package.

Methods to override

3 methods must be overrided :

AbstractReader.get_fields()

Yield fields as dictionnaries with the following structure:

name (str): the field name
category (str): the table which the field belongs to. It can be (variants,annotations, samples) #TODO rename
description (str): The definition of the field
type (str): Field type in Python format ( str, int, float,bool)
constraint (str:optional): SQL constraints

{
     "name": "chr", 
     "category": "variants",    
     "description": "chromosom",    
     "type": "str",    
     "constraint": "NOT NULL",     
 }

AbstractReader.get_variants()

Yield variants as dictionnaries with the following structure:

chr (str): chromosom name
pos (str): chromosom name
ref (str): chromosom name
alt (str): chromosom name
field n (type): n fields more returns by get_fields with category variants
annotations (list):
- transcript (str): Transcript name
- gene (str): gene name
- field n (type_): n fields more returns by get_fields with category annotations
samples (list):
- name (str): name of sample
- gt (int): Genotype of variant for the sample. ( 0: homozygous wild, 1: heterozygous, 2: homozygous muted, -1: unknown)

{
"chr": "11",
 "pos": 125010,
 "ref": "T",
 "alt": "A",
 "dp": 32,
 "annotations": [
     {"transcript": "NM_234234", "gene": "CFTR", "in_exon": true, "pathogen_score": 0.2},
     {"transcript": "NM_234235", "gene": "CFTR","in_exon": false, "pathogen_score": 0.5},
 ],
 "samples": [{"name": "sacha", "gt": 1, "af": 0.4}]
}

AbstractReader.get_samples()

Return a list of samples. If you have no sample, you can avoid to override this method.

Exemple

You can get inspired by the FakeReader

from .abstractreader import AbstractReader

class FakeReader(AbstractReader):
    def __init__(self):
        super().__init__(None)

    def get_variants(self):
        yield {
            "chr": "11",
            "pos": 125010,
            "ref": "T",
            "alt": "A",
            "annotations": [
                {"transcript": "NM_234234", "gene": "CFTR"},
                {"transcript": "NM_234235", "gene": "CFTR"},
            ],
            "samples": [{"name": "sacha", "gt": 1}],
        }

        yield {
            "chr": "12",
            "pos": 125010,
            "ref": "T",
            "alt": "A",
            "annotations": [
                {"transcript": "NM_234234", "gene": "CFTR"},
                {"transcript": "NM_234235", "gene": "CFTR"},
            ],
            "samples": [{"name": "sacha", "gt": 1}],
        }

        yield {
            "chr": "13",
            "pos": 125010,
            "ref": "T",
            "alt": "A",
            "annotations": [
                {"transcript": "NM_234234", "gene": "CFTR"},
                {"transcript": "NM_234235", "gene": "CFTR"},
            ],
            "samples": [{"name": "sacha", "gt": 1}],
        }

    def get_fields(self):
        yield {
            "name": "chr",
            "category": "variants",
            "description": "chromosom",
            "type": "str",
            "constraint": "NOT NULL",
        }
        yield {
            "name": "pos",
            "category": "variants",
            "description": "position",
            "type": "int",
            "constraint": "NOT NULL",
        }

        yield {
            "name": "ref",
            "category": "variants",
            "description": "reference base",
            "type": "str",
            "constraint": "NOT NULL",
        }
        yield {
            "name": "alt",
            "category": "variants",
            "description": "alternative base",
            "type": "str",
            "constraint": "NOT NULL",
        }

        yield {
            "name": "gt",
            "category": "samples",
            "description": "genotype",
            "type": "int",
        }

        yield {
            "name": "af",
            "category": "samples",
            "description": "allele frequency",
            "type": "float",
        }

        yield {
            "name": "gene",
            "category": "annotations",
            "description": "gene name",
            "type": "str",
        }

        yield {
            "name": "transcript",
            "category": "annotations",
            "description": "gene transcripts",
            "type": "str",
        }

    def get_samples(self):
        return ["sacha"]

Usage

AbstractReader take a device as input.

reader = FakeReader(open("yourfile","r))
for variant in reader.get_variants():
     print(variant)