How to query BAM files with samtools - Illumina/Polaris GitHub Wiki
This page describes how to query the BaseSpace BAM files from Polaris 1 Diversity Cohort using Samtools without having to download them entirely.
Pre-requisite: Having access to the data in BaseSpace
We are setting up an Amazon instance located in the same region as the data for faster data transfer and lower latency.
-
Launch an Amazon EC2 instance in the Frankfurt region (a.k.a. eu-central-1)
- You can use a very small instance.
- AMI: Ubuntu Server 16.04 LTS
- You can use a very small instance.
-
Install BaseMount and samtools
sudo bash -c "$(curl -L https://basemount.basespace.illumina.com/install)"
sudo apt install -y samtools
- Authenticate with BaseSpace to access the data remotely
mkdir BaseSpace
basemount --api-server=https://api.euc1.sh.basespace.illumina.com BaseSpace
<Open the URL in the browser you usually use to log in to BaseSpace>
# Check that you see the data
ls "BaseSpace/Projects/Polaris 1 Diversity Cohort/AppResults/"
You are now ready to run your first samtools query.
# Let's choose sample HG01707
cd "BaseSpace/Projects/Polaris 1 Diversity Cohort/AppResults/HG01707/Files"
# Fetch BAM headers
samtools view -H HG01707_S1.bam
# How many alignments are overlapping the BEST2 gene?
samtools view HG01707_S1.bam 19:12862516-12869272 | wc -l