average - MiraldiLab/maxATAC GitHub Wiki

Averaging the signal from multiple .bigwig files

maxatac average schematic

Overview

In many experimental protocols, you will have biological replicates. For the maxATAC publication, we had multiple biological replicates available for our ATAC-seq data from transformed cell lines such as GM12878. We aggregate the signal across replicates using the average signal at 1bp resolution.

The maxatac average function uses pybigwig to average the signal across multiple input .bigwig files. This function is not specific to ATAC-seq and can be used to average the signal across multiple .bigwig files from any experimental source.

Required flags

There are two required flags.

  • -i : Input bigwig files
  • -n : Output bigwig filename. A .bw will be added to this string.

Optional flags

There are 5 optional flags described in more detail in the average readme. The optional flags allow for filtering of chromosomes and usages of custom chromosome sizes files.

Tutorial

These instructions assume you are using the hg38 genome.

You will need at least two bigwig files as input. The other required piece of information is the name you want to use as the output bigwig filename.

For this example, we will use the IMR-90 cell line which has two biological replicates: SRX10474876 and SRX10474877. These data can be found in the maxATAC testing data repository for the average function.

We currently do not test the .bigwig files to make sure that they contain the same chromosomes sets that want to be averaged. Therefore, you have to make sure the chromosomes that you want to average are found in both .bigwig files.

Averaging two biological replicates

The maxatac average function is fairly simple and if you are using data aligned to the hg38 genome, you will not need any optional flags.

For example, if you have two biological replicates for ATAC-seq data, you can average their signal into a single file. There are many other aggregation functions that could be used, but maxATAC has only been tested using the average signal across replicates.

We will use the maxATAC testing data for the average function as practice.

maxatac average -i SRX10474876_Tn5_slop20_blacklisted.bw SRX10474877_Tn5_slop20_blacklisted.bw -o ./IMR-90_bio_rep_average -n IMR-90_all_chrom.bw

A truncated version of the output is shown below. The first few dozen lines print the parameters, filenames, and chromosomes that will be output. The last lines print progress information about the current chromosome being processed. It took ~18 minutes to run this function on a 2019 2.6 GHz 6-core Macbook with 16 GB of memory.

                             _______       _____ 
                          /\|__   __|/\   / ____|
 _ __ ___   __ ___  __   /  \  | |  /  \ | |     
| '_ ` _ \ / _` \ \/ /  / /\ \ | | / /\ \| |     
| | | | | | (_| |>  <  / ____ \| |/ ____ \ |____ 
|_| |_| |_|\__,_/_/\_\/_/    \_\_/_/    \_\_____|
                                                 
                                                 

[2022-08-02 16:45:13,803]
Averaging 2 bigwig files. 
Input bigwig files: 
   - SRX10474876_Tn5_slop20_blacklisted.bw
   - SRX10474877_Tn5_slop20_blacklisted.bw
Output name: IMR-90_Tn5_slop20
Output directory: ./IMR-90_bio_rep_average 
Output filename: /IMR-90_bio_rep_average/IMR-90_Tn5_slop20.bw
Restricting to chromosomes: 
   - chr1
   - chr2
   - chr3
   - chr4
   - chr5
   - chr6
   - chr7
   - chr8
   - chr9
   - chr10
   - chr11
   - chr12
   - chr13
   - chr14
   - chr15
   - chr16
   - chr17
   - chr18
   - chr19
   - chr20
   - chr21
   - chr22

[2022-08-02 16:45:13,816]
Opening /IMR-90_bio_rep_average/IMR-90_Tn5_slop20.bw and writing header.
[2022-08-02 16:45:13,816]
Averaging values for chromosome: chr1
[2022-08-02 16:45:34,698]
Writing average values for chromosome: chr1
[2022-08-02 16:46:11,357]

...

[2022-08-02 16:52:48,875]
Averaging values for chromosome: chr9
[2022-08-02 16:52:59,714]
Writing average values for chromosome: chr9
[2022-08-02 17:02:45,783]
Total averaging time: 0:17:31.

[2022-08-02 17:02:45,783]
Results saved to: /IMR-90_bio_rep_average/IMR-90_Tn5_slop20.bw

Averaging specific chromosomes from two biological replicates

In some analysis, you only want to include specific chromosomes in the output file. Use the -c flag to select specific chromosomes to average.

maxatac average -i SRX10474876_Tn5_slop20_blacklisted.bw SRX10474877_Tn5_slop20_blacklisted.bw -n IMR-90 -o ./test -c chr1

Expected output should only report the desired chromosome being processed:

                             _______       _____ 
                          /\|__   __|/\   / ____|
 _ __ ___   __ ___  __   /  \  | |  /  \ | |     
| '_ ` _ \ / _` \ \/ /  / /\ \ | | / /\ \| |     
| | | | | | (_| |>  <  / ____ \| |/ ____ \ |____ 
|_| |_| |_|\__,_/_/\_\/_/    \_\_/_/    \_\_____|
                                                 
                                                 

[2022-08-03 12:02:02,908]
Averaging 2 bigwig files. 
Input bigwig files: 
   - SRX10474876_Tn5_slop20_blacklisted.bw
   - SRX10474877_Tn5_slop20_blacklisted.bw
Output name: IMR-90
Output directory: /test
Output filename: /test/IMR-90.bw
Restricting to chromosomes: 
   - chr1

[2022-08-03 12:02:02,977]
Opening /test/IMR-90.bw and writing header.
[2022-08-03 12:02:02,977]
Averaging values for chromosome: chr1
[2022-08-03 12:02:25,872]
Writing average values for chromosome: chr1
[2022-08-03 12:04:08,752]
Total averaging time: 0:2:5.

[2022-08-03 12:04:08,754]
Results saved to: /test/IMR-90.bw

Averaging specific chromosomes from two biological replicates using a custom chromosomes sizes

If you want to use custom chromosomes sizes that are specific to your reference genome, then you can use the -cs flag.

maxatac average -i SRX10474876_Tn5_slop20_blacklisted.bw SRX10474877_Tn5_slop20_blacklisted.bw -n IMR-90 -o ./test -c chrY -cs hg38.custom.chrom.sizes

Tests

There are currently three tests that are available for maxatac average. The tests are three variations on averaging based on different chromosomes sets. They use the data presented in the tutorial.

Inputs

The inputs are located in the /data/average_data directory.

  • SRX10474876_Tn5_slop20_blacklisted.bw
  • SRX10474877_Tn5_slop20_blacklisted.bw

Test description

  1. test_average_single_chrom(): Tests whether two files can be averaged together and output results for only chr22. Run if you want to check whether there is an issue with the most basic requirements to run maxatac average.
  2. test_average_multi_chrom(): Tests whether two files can be averaged together and output only results for chr22 and chr3. Run if you want to check whether there is an issue with writing multiple chromosomes to a bigwig.
  3. test_average_all_chrom(): Tests whether two files can be averaged together for all chromosomes. Run if you want to test a full chromosome set or test parallel processing functionality.