average - MiraldiLab/maxATAC GitHub Wiki
.bigwig
files
Averaging the signal from multiple Overview
In many experimental protocols, you will have biological replicates. For the maxATAC publication, we had multiple biological replicates available for our ATAC-seq data from transformed cell lines such as GM12878. We aggregate the signal across replicates using the average signal at 1bp resolution.
The maxatac average
function uses pybigwig
to average the signal across multiple input .bigwig
files. This function is not specific to ATAC-seq and can be used to average the signal across multiple .bigwig
files from any experimental source.
Required flags
There are two required flags.
-i
: Input bigwig files-n
: Output bigwig filename. A.bw
will be added to this string.
Optional flags
There are 5 optional flags described in more detail in the average readme. The optional flags allow for filtering of chromosomes and usages of custom chromosome sizes files.
Tutorial
These instructions assume you are using the hg38 genome.
You will need at least two bigwig files as input. The other required piece of information is the name you want to use as the output bigwig filename.
For this example, we will use the IMR-90 cell line which has two biological replicates: SRX10474876 and SRX10474877. These data can be found in the maxATAC testing data repository for the average function.
We currently do not test the .bigwig
files to make sure that they contain the same chromosomes sets that want to be averaged. Therefore, you have to make sure the chromosomes that you want to average are found in both .bigwig
files.
Averaging two biological replicates
The maxatac average
function is fairly simple and if you are using data aligned to the hg38 genome, you will not need any optional flags.
For example, if you have two biological replicates for ATAC-seq data, you can average their signal into a single file. There are many other aggregation functions that could be used, but maxATAC has only been tested using the average signal across replicates.
We will use the maxATAC testing data for the average function as practice.
maxatac average -i SRX10474876_Tn5_slop20_blacklisted.bw SRX10474877_Tn5_slop20_blacklisted.bw -o ./IMR-90_bio_rep_average -n IMR-90_all_chrom.bw
A truncated version of the output is shown below. The first few dozen lines print the parameters, filenames, and chromosomes that will be output. The last lines print progress information about the current chromosome being processed. It took ~18 minutes to run this function on a 2019 2.6 GHz 6-core Macbook with 16 GB of memory.
_______ _____
/\|__ __|/\ / ____|
_ __ ___ __ ___ __ / \ | | / \ | |
| '_ ` _ \ / _` \ \/ / / /\ \ | | / /\ \| |
| | | | | | (_| |> < / ____ \| |/ ____ \ |____
|_| |_| |_|\__,_/_/\_\/_/ \_\_/_/ \_\_____|
[2022-08-02 16:45:13,803]
Averaging 2 bigwig files.
Input bigwig files:
- SRX10474876_Tn5_slop20_blacklisted.bw
- SRX10474877_Tn5_slop20_blacklisted.bw
Output name: IMR-90_Tn5_slop20
Output directory: ./IMR-90_bio_rep_average
Output filename: /IMR-90_bio_rep_average/IMR-90_Tn5_slop20.bw
Restricting to chromosomes:
- chr1
- chr2
- chr3
- chr4
- chr5
- chr6
- chr7
- chr8
- chr9
- chr10
- chr11
- chr12
- chr13
- chr14
- chr15
- chr16
- chr17
- chr18
- chr19
- chr20
- chr21
- chr22
[2022-08-02 16:45:13,816]
Opening /IMR-90_bio_rep_average/IMR-90_Tn5_slop20.bw and writing header.
[2022-08-02 16:45:13,816]
Averaging values for chromosome: chr1
[2022-08-02 16:45:34,698]
Writing average values for chromosome: chr1
[2022-08-02 16:46:11,357]
...
[2022-08-02 16:52:48,875]
Averaging values for chromosome: chr9
[2022-08-02 16:52:59,714]
Writing average values for chromosome: chr9
[2022-08-02 17:02:45,783]
Total averaging time: 0:17:31.
[2022-08-02 17:02:45,783]
Results saved to: /IMR-90_bio_rep_average/IMR-90_Tn5_slop20.bw
Averaging specific chromosomes from two biological replicates
In some analysis, you only want to include specific chromosomes in the output file. Use the -c
flag to select specific chromosomes to average.
maxatac average -i SRX10474876_Tn5_slop20_blacklisted.bw SRX10474877_Tn5_slop20_blacklisted.bw -n IMR-90 -o ./test -c chr1
Expected output should only report the desired chromosome being processed:
_______ _____
/\|__ __|/\ / ____|
_ __ ___ __ ___ __ / \ | | / \ | |
| '_ ` _ \ / _` \ \/ / / /\ \ | | / /\ \| |
| | | | | | (_| |> < / ____ \| |/ ____ \ |____
|_| |_| |_|\__,_/_/\_\/_/ \_\_/_/ \_\_____|
[2022-08-03 12:02:02,908]
Averaging 2 bigwig files.
Input bigwig files:
- SRX10474876_Tn5_slop20_blacklisted.bw
- SRX10474877_Tn5_slop20_blacklisted.bw
Output name: IMR-90
Output directory: /test
Output filename: /test/IMR-90.bw
Restricting to chromosomes:
- chr1
[2022-08-03 12:02:02,977]
Opening /test/IMR-90.bw and writing header.
[2022-08-03 12:02:02,977]
Averaging values for chromosome: chr1
[2022-08-03 12:02:25,872]
Writing average values for chromosome: chr1
[2022-08-03 12:04:08,752]
Total averaging time: 0:2:5.
[2022-08-03 12:04:08,754]
Results saved to: /test/IMR-90.bw
Averaging specific chromosomes from two biological replicates using a custom chromosomes sizes
If you want to use custom chromosomes sizes that are specific to your reference genome, then you can use the -cs
flag.
maxatac average -i SRX10474876_Tn5_slop20_blacklisted.bw SRX10474877_Tn5_slop20_blacklisted.bw -n IMR-90 -o ./test -c chrY -cs hg38.custom.chrom.sizes
Tests
There are currently three tests that are available for maxatac average
. The tests are three variations on averaging based on different chromosomes sets. They use the data presented in the tutorial.
Inputs
The inputs are located in the /data/average_data
directory.
SRX10474876_Tn5_slop20_blacklisted.bw
SRX10474877_Tn5_slop20_blacklisted.bw
Test description
test_average_single_chrom()
: Tests whether two files can be averaged together and output results for onlychr22
. Run if you want to check whether there is an issue with the most basic requirements to runmaxatac average
.test_average_multi_chrom()
: Tests whether two files can be averaged together and output only results forchr22
andchr3
. Run if you want to check whether there is an issue with writing multiple chromosomes to a bigwig.test_average_all_chrom()
: Tests whether two files can be averaged together for all chromosomes. Run if you want to test a full chromosome set or test parallel processing functionality.