WGCNA stage 1: Data input and preprocessing, sample clustering and soft threshold choice - Persilian/WGCNA GitHub Wiki

Prepare your workspace

Prepare a directory called WGCNA and copy a TMM-normalized expression matrix into it.
Copy the WGCNA_stage1.R script to that working directory and adapt it to use your working directory and expression matrix.

Run the first stage of the WGCNA pipeline

Invoke the following code in an environment that uses R-version 3.6 or higher.

Rscript WGCNA_stage1.R

Output

In your working directory, the sub-directories "Data" and "Plots" will be generated. Your standard output will show you the number of genes that are excluded from your WGCNA analysis due to too little expression or zero-variance. The filtered expression matrix used for your network will be saved to "Data/dataInput.RData" and used in the following stages of WGCNA.

Your input TMM-normalized expression matrix will be formatted to fit network-construction in the WGCNA_stage2.R script and can be found in "Data/dataInput.RData".

Additionally the soft-threshold test results are shown in "Data/sft.txt". Choose an adequate soft-thresholding power for WGCNA_stage2.R, after the authors tutorials. Basically you should chose a soft-threshold power as low as possible while still fitting the approximate scale-free topology (SFT) criterion with R^2 > 0.9. If you use expression data from very few samples (<10), it's possible that you will need to choose the maximum soft-thresholding power of 30, or a very low power that would fit SFT with a high negative R^2 (e.g. -0.9) but still would create a network with a high connectivity (mean.k; median.k; max.k in sft.txt).