ASEperRegion - molgenis/systemsgenetics GitHub Wiki

#This module is used to determine ASE per region

It requires the following:

  1. A number of indivuals with ASreads files
  2. Phased genotype data of the individuals in a vcf format which is tabix indexed
  3. A test and gene region file

In addition to this, some other information needs to be added. A user should be able to do this based on the above information

For the ASfiles you need a corresponding ASlocations file, the same as you may use in the steps for [ASEperSNP], also explained in [Basic Usage].

In combination with the phased genotypes, you will also need a coupling file which maps ASread files to their respective individual.

##the gene region file

The gene region file will specify which part of the genome should be taken into account for testing and which region should be taken into account for allele specific information gathering.

The gene region file is tab separated per column:

1. Region name (for the user to decide)
2. Chromosome
3. Start of the gene region
4. End of the gene region
5. Start of the test region (optional)
6. End of the test region (optional)

If the test region (columns 5 and 6) is not specified, the gene region is also the test region.

The gene region is the region where the AS information from the AS files is used, and compared to some SNP in the test region.

##Testing for ASEperRegion

One can test for ASE per region with the following command:

java -jar cellTypeSpecificAlleleSpecificExpression.jar \
      -A ASEperRegion \
      -O testoutput \
      -L ASreadsAllIndividuals.txt \ 
      -C Coupling.txt \
      -G phasedVCF.vcf.gz \ 
      -R GenomicRegions.txt \

-A denotes the action to take.

-O denotes the base where to output the files, a suffix is appended based on the result (_BetaBinomial_results.txt, _Binomial_results.txt and _dispersionFile.txt), currently, celltype specific ASEperRegion is not supported, but will be implemented.

-L is the ASlocation file

-C is the coupling between the ASfiles and individuals.

-G is the location of the VCF containing phased information

-R is the region file.

##Interpreting results

The results for ASEperSNP are structured in the following way per column:

1.  Chromosome
2.  Start End of test region "<START>-<END>"
3.  Start End of gene region "<START>-<END>"
4.  Region name
5.  P value of the region compared to the test SNP
6.  Chi squared value of the region compared to the test SNP
7.  Number of heterozygous gene SNPs.
8.  Number of reads on the reference allele compared to the (first) test SNP
9.  Number of reads on the alternative allele compared to the (first) test SNP
10. Binomial ratio (reference compared to alternative)
11. Genotype of (first) test SNP "[<ref>, <alt>]"
12. position of the test SNP(s)
13. name of the test SNP(s)

Please note that there can be more than one test-snp in this results line, this is because sometimes, multiple SNPs are in the same phase on the genome. To reduce computational time, we pool these together. Data in columns 8, 9, 10 and 11 in the file are based on the test SNP that is written first in columns 12 and 13.

⚠️ **GitHub.com Fallback** ⚠️