VQL language - labsquare/cutevariant GitHub Wiki

VQL is a Domain Specific Language to perform several action on a cutevariant project. The main purpose is to filters variants in the same fashion as a SQL query. VQL language can be run from the user interface vql plugin or directly in command line.

SELECT clause

SELECT clause is used to choose which fields are displayed.

  • Display two fields from variants tables
SELECT chr, pos FROM variants
  • Display annotations fields from variants tables
SELECT chr, pos, annotations.impact FROM variants
# You can omit annotations prefix 
SELECT chr, pos, impact FROM variants
  • Display genotype field from "boby" sample
SELECT sample['boby'].gt FROM variants
# If you omit the field, by default it will takes gt 
SELECT sample['boby'] FROM variants

Special fields

there is special computed fields which allow to perform operation on genotype

  • count_hom : count how many homozygous mutant within samples
  • count_ref : count how many homozygous wild within samples
  • count_het : count how many heterozygous mutant within samples
  • count_var : count how many variation (homozygous or heterozygous) within samples

For instance, with a variant dataset of two sample (boby and raymond), the following lines are equivalents :

SELECT chr, pos FROM variants WHERE sample["boby"].gt = 1 AND sample["raymond"].gt = 2
SELECT chr, pos FROM variants WHERE count_hom = 1 AND count_het=1

If your dataset has been imported with a pedigree, you can also filter with case and control status of samples

  • case_count_hom : count how many homozygous mutant within samples
  • case_count_ref : count how many homozygous wild within samples
  • case_count_het : count how many heterozygous mutant within samples
  • control_count_hom : count how many homozygous mutant within samples
  • control_count_ref : count how many homozygous wild within samples
  • control_count_het : count how many heterozygous mutant within samples

For instance, with 4 affected samples and 6 normal samples, look for all variant which are heterozygous in affected samples and homozygous in normal samples :

SELECT chr, pos FROM variants WHERE case_count_ref = 4 AND control_count_hom = 6

WHERE clause

WHERE clause is used to filter variants according to condition rules.

List of accepted operators

  • Comparative operators:
WHERE field = "value" 
WHERE field = 234
WHERE field != 324
WHERE field > 34
WHERE field < 30
  • Membership operators:
WHERE field IN ("CFTR", "boby")
WHERE field IN WORDSET["mylist"]  # See WORDSETS
WHERE field HAS "exonic" (apply to list-like fields, *eg.* `WHERE consequence HAS "stop_gained"`)
  • String comparison operators (apply mostly to string-like fields):
WHERE field LIKE "test" (, *eg.* `WHERE gene LIKE "NTRK%"`, to filter variants on genes NTRK1, NTRK2...)
WHERE field NOT LIKE "test" (opposite of 'LIKE' operator)
WHERE field ~ "\d+"  # Regular expression 
  • Special cases using 'NULL' SQL private word:
WHERE field IS NULL (the 'NULL' being a field with NO defined value ; 0 != NULL) 
WHERE field IS NOT NULL (opposite of 'IS NULL' expression)

Examples using WHERE

  • Filters fields using WHERE:
SELECT chr,pos FROM variants WHERE pos > 3 
  • WHERE can be a nested condition of OR/AND statements:
SELECT chr,pos FROM variants WHERE (pos > 3 AND pos < 100) OR (impact = 'HIGH')
  • WHERE supports regular expression. But this can be computing intensive:
SELECT chr,pos FROM variants WHERE transcript ~ "NM.+"
  • HAS operator looks for a word in a list separated by semicolon (exonic;intronic)
SELECT chr,pos FROM variants WHERE consequence HAS "exonic"
  • You can also filter using inline wordSet or a created one:
SELECT chr,pos FROM variants WHERE gene IN ("CFTR", "GJB2")
SELECT chr,pos FROM variants WHERE gene IN WORDSET["genelist"] 

CREATE clause

CREATE clause is used to create a selection or tables of variants. For instance the following code create a new selection named myselection

CREATE myselection FROM variants WHERE gene = 'CFTR' 
# You can now select variants from CFTR gene without a filters 
SELECT chr,pos FROM myselection

You can also create a selection using set operation performed on 2 different selections. Actually union, intersection and difference are supported

# Intersection between A and B 
CREATE myselection = A & B 
# Union between A and B 
CREATE myselection = A | B 
# Difference between A and B 
CREATE myselection = A - B 

You can also make an intersection with a bed file :

CREATE myselection FROM variants INTERSECT "/home/boby/panel.bed"  

IMPORT clause

VQL supports importation of different features. Currently, it supports only word WORDSETS. For example, using a simple text file containing a list of words :

IMPORT WORDSETS "gene.txt" AS mygenes
SELECT chr, pos FROM variants WHERE gene in WORDSET["mygenes"]

DROP clause

This clause remove selection or wordset accoding to their names.

# Remove selection named myselection
DROP SELECTIONS myselection

# Remove wordset named mygenes
DROP WORDSETS mygenes