Testing Your Installation - GarrettJenkinson/informME GitHub Wiki

informME is distributed with a comprehensive but small "toy model" intended for testing and debugging your local installation, and for familiarizing yourself with the tool. The reference genome consists of five chromosomes of length 10 kb each. The WGBS reads were simulated so that the resulting mean methylation level is known in advance and the cancer sample suffers from genome-wide hypo-methylation. Roughly, processing the entire toy example takes about 15 minutes. Once informME has been installed through install.sh, follow the steps described here to test our comprehensive toy example:

  1. If on a server that uses modules to load dependencies, load MATLAB and SAMtools:
module load matlab
module load samtools
  1. Reference Genome. Run the following:
cd informME/src/bash_src/parseBamFile/fastaToCpg/main
./main.sh

and then run ls -lthr to check that a file CpGlocationChrX.mat with a size of approximately 3.2K has been created for each of the five chromosomes:

total 76K
-rw-rw-r-- 1 usr usr  49K Apr 20 17:23 toy_genome.fa
-rwxrwxr-x 1 usr usr 1.1K Jun  5 15:43 main.sh
-rw-rw-r-- 1 usr usr 3.2K Jun  5 15:50 CpGlocationChr1.mat
-rw-rw-r-- 1 usr usr 3.2K Jun  5 15:50 CpGlocationChr2.mat
-rw-rw-r-- 1 usr usr 3.1K Jun  5 15:50 CpGlocationChr3.mat
-rw-rw-r-- 1 usr usr 3.1K Jun  5 15:50 CpGlocationChr4.mat
-rw-rw-r-- 1 usr usr 3.1K Jun  5 15:50 CpGlocationChr5.mat

  1. Generate input matrices by running the following:
cd informME/src/bash_src/parseBamFile/getMatrices/main
./main.sh

and then run ls -lthrR out/ to check that files toy_normal_pe_matrices.mat and toy_cancer_pe_matrices.mat with a size of approximately 70K have been created for each of the five chromosomes:

out/:
total 20K
drwxrwxr-x 2 usr usr 4.0K Jun  5 15:53 chr5
drwxrwxr-x 2 usr usr 4.0K Jun  5 15:53 chr4
drwxrwxr-x 2 usr usr 4.0K Jun  5 15:53 chr3
drwxrwxr-x 2 usr usr 4.0K Jun  5 15:52 chr2
drwxrwxr-x 2 usr usr 4.0K Jun  5 15:52 chr1

out/chr5:
total 144K
-rw-rw-r-- 1 usr usr 69K Jun  5 15:53 toy_cancer_pe_matrices.mat
-rw-rw-r-- 1 usr usr 69K Jun  5 15:52 toy_normal_pe_matrices.mat

out/chr4:
total 136K
-rw-rw-r-- 1 usr usr 67K Jun  5 15:53 toy_cancer_pe_matrices.mat
-rw-rw-r-- 1 usr usr 67K Jun  5 15:52 toy_normal_pe_matrices.mat

out/chr3:
total 144K
-rw-rw-r-- 1 usr usr 70K Jun  5 15:53 toy_cancer_pe_matrices.mat
-rw-rw-r-- 1 usr usr 70K Jun  5 15:52 toy_normal_pe_matrices.mat

out/chr2:
total 144K
-rw-rw-r-- 1 usr usr 70K Jun  5 15:52 toy_cancer_pe_matrices.mat
-rw-rw-r-- 1 usr usr 69K Jun  5 15:52 toy_normal_pe_matrices.mat

out/chr1:
total 144K
-rw-rw-r-- 1 usr usr 72K Jun  5 15:52 toy_cancer_pe_matrices.mat
-rw-rw-r-- 1 usr usr 72K Jun  5 15:51 toy_normal_pe_matrices.mat
  1. Run informME using the following:
cd informME/src/bash_src/informME_run/main
./main.sh

and then run ls -lthrR out/ to check that analysis files for the normal, cancer, and pooled model with a size of approximately 68K have been created for each of the five chromosomes:

out/:
total 20K
drwxrwxr-x 2 usr usr 4.0K Jun  5 16:01 chr5
drwxrwxr-x 2 usr usr 4.0K Jun  5 16:01 chr4
drwxrwxr-x 2 usr usr 4.0K Jun  5 16:00 chr3
drwxrwxr-x 2 usr usr 4.0K Jun  5 16:00 chr2
drwxrwxr-x 2 usr usr 4.0K Jun  5 15:59 chr1

out/chr5:
total 204K
-rw-rw-r-- 1 usr usr 68K Jun  5 16:01 toy_pooled_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:59 toy_cancer_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:56 toy_normal_analysis.mat

out/chr4:
total 204K
-rw-rw-r-- 1 usr usr 68K Jun  5 16:01 toy_pooled_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:58 toy_cancer_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:56 toy_normal_analysis.mat

out/chr3:
total 204K
-rw-rw-r-- 1 usr usr 68K Jun  5 16:00 toy_pooled_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:57 toy_cancer_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:55 toy_normal_analysis.mat

out/chr2:
total 204K
-rw-rw-r-- 1 usr usr 68K Jun  5 16:00 toy_pooled_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:57 toy_cancer_analysis.mat
-rw-rw-r-- 1 usr usr 68K Jun  5 15:55 toy_normal_analysis.mat

out/chr1:
total 216K
-rw-rw-r-- 1 usr usr 69K Jun  5 15:59 toy_pooled_analysis.mat
-rw-rw-r-- 1 usr usr 69K Jun  5 15:56 toy_cancer_analysis.mat
-rw-rw-r-- 1 usr usr 69K Jun  5 15:54 toy_normal_analysis.mat
  1. Obtain bedGraph output for single analysis and check mean methylation level is approximately 0.8 for normal and 0.5 for cancer by looking at files MML-toy_normal.bed and MML-toy_cancer.bed respectively:
cd informME/src/bash_src/analysis/singleAnalysis/singleMethAnalysisToBed/main
./main.sh
cat out/MML-toy_normal.bed | awk '{if(NR>1){total+=$4}}END{print total/NR}'
cat out/MML-toy_cancer.bed | awk '{if(NR>1){total+=$4}}END{print total/NR}'

also you should run ls -lthr out/ to see the following files of similiar file sizes:

total 236K
-rw-rw-r-- 1 usr usr  113 Jun  5 16:02 VAR-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 TURN-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 RDE-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 NME-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 MSI-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 MML-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 METH-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 ESI-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 ENTR-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 CAP-toy_normal.bed
-rw-rw-r-- 1 usr usr 7.7K Jun  5 16:02 VAR-toy_cancer.bed
-rw-rw-r-- 1 usr usr 8.1K Jun  5 16:02 TURN-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 RDE-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 NME-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 MSI-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 MML-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 METH-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 ESI-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 ENTR-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 CAP-toy_cancer.bed
-rw-rw-r-- 1 usr usr  161 Jun  5 16:02 VAR-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 TURN-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 RDE-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 NME-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 MSI-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 MML-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 METH-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 ESI-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 ENTR-toy_pooled.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:02 CAP-toy_pooled.bed
  1. Obtain bedGraph output for differential analysis and check mean JSD is approximately 0.68 by looking at file JSD-toy_normal-VS-toy_cancer.bed:
cd informME/src/bash_src/analysis/diffAnalysis/diffMethAnalysisToBed/main
./main.sh
cat out/JSD-toy_normal-VS-toy_cancer.bed | awk '{if(NR>1){total+=$4}}END{print total/NR}'

also you should run ls -lthr out/ to see the following files with similiar file sizes:

total 80K
-rw-rw-r-- 1 usr usr 8.0K Jun  5 16:05 JSD-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:05 dRDE-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 8.3K Jun  5 16:05 dNME-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 8.0K Jun  5 16:05 DMU-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:05 dMSI-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 8.0K Jun  5 16:05 dMML-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 8.3K Jun  5 16:05 DEU-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 7.9K Jun  5 16:05 dESI-toy_normal-VS-toy_cancer.bed
-rw-rw-r-- 1 usr usr 8.0K Jun  5 16:05 dCAP-toy_normal-VS-toy_cancer.bed

This concludes the toy model included as part of the repository.