How to query BAM files with samtools - Illumina/Polaris GitHub Wiki

Introduction

This page describes how to query the BaseSpace BAM files from Polaris 1 Diversity Cohort using Samtools without having to download them entirely.

Pre-requisite: Having access to the data in BaseSpace

Setting up your execution environment

We are setting up an Amazon instance located in the same region as the data for faster data transfer and lower latency.

  • Launch an Amazon EC2 instance in the Frankfurt region (a.k.a. eu-central-1)

    • You can use a very small instance.
      • AMI: Ubuntu Server 16.04 LTS
  • Install BaseMount and samtools

 sudo bash -c "$(curl -L https://basemount.basespace.illumina.com/install)"
 sudo apt install -y samtools
  • Authenticate with BaseSpace to access the data remotely
 mkdir BaseSpace
 basemount --api-server=https://api.euc1.sh.basespace.illumina.com BaseSpace
 <Open the URL in the browser you usually use to log in to BaseSpace>
 
 # Check that you see the data
 ls "BaseSpace/Projects/Polaris 1 Diversity Cohort/AppResults/"

You are now ready to run your first samtools query.

Running Samtools

# Let's choose sample HG01707
cd "BaseSpace/Projects/Polaris 1 Diversity Cohort/AppResults/HG01707/Files"

# Fetch BAM headers
samtools view -H HG01707_S1.bam

# How many alignments are overlapping the BEST2 gene?
samtools view HG01707_S1.bam 19:12862516-12869272 | wc -l
⚠️ **GitHub.com Fallback** ⚠️