How to query sample metadata - Illumina/Polaris GitHub Wiki

Introduction

As described in Data Structure, sample metrics are:

You can query either of these, but the most convenient ones are the last two.

Hail queries

Querying sample metadata with Hail is done in exactly the same way as querying multi sample VCF data with Hail.
The additional advantage of Hail is that you can mix queries on variants and metadata together.

CSV file queries

  • Set up a (possibly very small) execution environment as described in samtools tutorial.

  • Query the CSV file with any standard linux tool:

cd "BaseSpace/Projects/Polaris 1 Diversity Cohort/AppResults/GVCF_Genotyper/Files"

# Find the column number for "Non-synonymous SNVs"
COLUMN=`sed "s/Non-synonymous SNVs.*/,/" metadata.header.csv | grep -o , | wc -l`
echo $COLUMN

# Get the value of this metric for sample HG03445
grep -P "^HG03445," metadata.csv | cut -d , -f $COLUMN
⚠️ **GitHub.com Fallback** ⚠️