3 conditions DEG - Kan-E/RNAseqChef GitHub Wiki

3 conditions DEG detects and visualizes differentially expressed genes by EBSeq multi-comparison analysis.
Note: Multiple comparison analysis takes 5-10 minutes.

Setting

Input format

The input format and settings for the "3 conditions DEG" are the same as for pair-wise DEG.
If you have read the description of pair-wise DEG, you do not need to read the below.

Two types of raw count data formats can be used as input.

1. raw count data

The analysis can only be performed with raw count data if the following conditions are fulfilled:

- A file consists of only three groups of data to be analyzed.
- The replication number is represented by the underline “_”.
- Do not use the underline "_" for anything else.

2. raw count data + metadata

This format can be used if the above conditions are not fulfilled, for example, if the sample name is an accession number, or if the raw count data contain extra information that is not the subject of analysis.
Metadata must contain the following information:

- The first column is the sample names used in the raw count data.（e.g., accession number）
- The second column is the corresponding sample name that matches the sample name in the first column. (e.g. Control_1)
- The third and subsequent columns do not affect the analysis.

3. Recode.Rdata

When you previously performed 3 conditions DEG and obtained "Recode.Rdata" file by clicking the 'Download summary' button, you can use this option to skip the time-consuming EBSeq analysis.

Species

The following analysis is performed by selecting the dataset species.

- Conversion to gene symbols if the gene name is ENSEMBL ID
- Enrichment analysis

Cut-off conditions

Three types of thresholds can be set: fold change, FDR, and base mean.

Option: normalized count input

The base mean cut-off can be set using the uploaded normalized count data, such as TPM counts.
The y-axis of the boxplot can be displayed using the uploaded normalized count data.
Note: Uploading raw count data is not unnecessary.

Output

Input Data

The uploaded raw count data are displayed.
In the case of the "Raw count data + metadata" format, the raw count data that is re-defined using the uploaded metadata.

Result overview

Three types of clustering analyses are performed: principal component analysis (PCA), multidimensional scaling (MDS), and hierarchical clustering with the ward.D2.
An EBSeq multi-comparison analysis is performed to detect the DEGs. Scatter plots and heatmaps are displayed as a result of the DEG analysis.
By visualizing DEGs under all three conditions, the characteristics of gene expression under each condition were extracted.
The result table data of the DEG analysis are displayed.

GOI (Genes of interest) profiling

Scatter plot is shown.
By selecting genes from the GOI list, only the GOI can be labeled among the points displayed in the volcano plot.
The x- and y-axis ranges could be freely changed by operating the slide bar.
A heatmap and boxplot of the genes selected from the GOI list are displayed.