Sample Sheet - sups-k/methylation GitHub Wiki
Steps to follow before running the pipeline:
- Download the raw IDAT files from GEO.
- Make sure that the
Sample_Sheet.csv file is also downloaded.
- If
Sample_Sheet.csv is not present, you must create it.
- The Sample Sheet and the files must be in the same folder.
- Run the pipeline.
Sample IDAT file name: GSM1052071_5730053055_R01C01_Grn.idat
Steps to create the Sample_Sheet.csv file:
- These headers are required and should be written with the exact same spellings:
- Sample_Name - Name of the patient - First string in the file name
- Sample_Group - RA or Healthy
- Sentrix_ID - 10 digit number - Second string in the file name
- Sentrix_Position - Third string in the file name
- These headers are optional when the initial file is created but required for analysis using the pipeline:
- Copy-paste all the files to a
names.txt file. This will give all the file paths.
- Find and replace "GSM" with "\nGSM".
grep "GSM" names.txt > out.txt
- In
out.txt, find "Grn" and replace with "Red".
uniq out.txt > uniq_names.txt
cut -d_ -f 1 uniq_names.txt > sample_names.txt
cut -d_ -f 2 uniq_names.txt > sentrix_ID.txt
cut -d_ -f 3 uniq_names.txt > sentrix_Pos.txt
- Go to the GEO page of the study.
- Click on Analyse with GEO2R at the bottom.
CTRL+A CTRL+C
- Paste in a text file
table.txt and remove unnecessary text. Everything is tab separated.
- Find and replace "-" with "nothing". Save the file.
- To remove the spaces after each line, replace "GSM" with "$GSM".
- Replace "\n" with "nothing".
- Replace "$" with "\n" and backspace the first line. Save.
cut -f 7 table.txt > age.txt
cut -f 8 table.txt > sex.txt
cut -f 9 table.txt > smoke.txt