How to query sample metadata - Illumina/Polaris GitHub Wiki
As described in Data Structure, sample metrics are:
- individually stored in each Whole Genome Sequencing appResult
- aggregated in the metadata.csv and metadata.header.csv files of the GVCF Genotyper appResult
- included in the Hail VDS data structure
You can query either of these, but the most convenient ones are the last two.
Querying sample metadata with Hail is done in exactly the same way as querying multi sample VCF data with Hail.
The additional advantage of Hail is that you can mix queries on variants and metadata together.
-
Set up a (possibly very small) execution environment as described in samtools tutorial.
-
Query the CSV file with any standard linux tool:
cd "BaseSpace/Projects/Polaris 1 Diversity Cohort/AppResults/GVCF_Genotyper/Files"
# Find the column number for "Non-synonymous SNVs"
COLUMN=`sed "s/Non-synonymous SNVs.*/,/" metadata.header.csv | grep -o , | wc -l`
echo $COLUMN
# Get the value of this metric for sample HG03445
grep -P "^HG03445," metadata.csv | cut -d , -f $COLUMN