Sample Sheet - sups-k/methylation GitHub Wiki

Steps to follow before running the pipeline:

These headers are required and should be written with the exact same spellings:
- Sample_Name - Name of the patient - First string in the file name
- Sample_Group - RA or Healthy
- Sentrix_ID - 10 digit number - Second string in the file name
- Sentrix_Position - Third string in the file name
These headers are optional when the initial file is created but required for analysis using the pipeline:
- age
- sex
- smoking status
Copy-paste all the files to a names.txt file. This will give all the file paths.
Find and replace "GSM" with "\nGSM".
grep "GSM" names.txt > out.txt
In out.txt, find "Grn" and replace with "Red".
uniq out.txt > uniq_names.txt
cut -d_ -f 1 uniq_names.txt > sample_names.txt
cut -d_ -f 2 uniq_names.txt > sentrix_ID.txt
cut -d_ -f 3 uniq_names.txt > sentrix_Pos.txt
Go to the GEO page of the study.
Click on Analyse with GEO2R at the bottom.
CTRL+A CTRL+C
Paste in a text file table.txt and remove unnecessary text. Everything is tab separated.
Find and replace "-" with "nothing". Save the file.
To remove the spaces after each line, replace "GSM" with "$GSM".
Replace "\n" with "nothing".
Replace "$" with "\n" and backspace the first line. Save.
cut -f 7 table.txt > age.txt
cut -f 8 table.txt > sex.txt
cut -f 9 table.txt > smoke.txt