Sample Sheet - sups-k/methylation GitHub Wiki

Steps to follow before running the pipeline:

  1. Download the raw IDAT files from GEO.
  2. Make sure that the Sample_Sheet.csv file is also downloaded.
  3. If Sample_Sheet.csv is not present, you must create it.
  4. The Sample Sheet and the files must be in the same folder.
  5. Run the pipeline.

Sample IDAT file name: GSM1052071_5730053055_R01C01_Grn.idat

Steps to create the Sample_Sheet.csv file:

  1. These headers are required and should be written with the exact same spellings:
    • Sample_Name - Name of the patient - First string in the file name
    • Sample_Group - RA or Healthy
    • Sentrix_ID - 10 digit number - Second string in the file name
    • Sentrix_Position - Third string in the file name
  2. These headers are optional when the initial file is created but required for analysis using the pipeline:
    • age
    • sex
    • smoking status
  3. Copy-paste all the files to a names.txt file. This will give all the file paths.
  4. Find and replace "GSM" with "\nGSM".
  5. grep "GSM" names.txt > out.txt
  6. In out.txt, find "Grn" and replace with "Red".
  7. uniq out.txt > uniq_names.txt
  8. cut -d_ -f 1 uniq_names.txt > sample_names.txt
  9. cut -d_ -f 2 uniq_names.txt > sentrix_ID.txt
  10. cut -d_ -f 3 uniq_names.txt > sentrix_Pos.txt
  11. Go to the GEO page of the study.
  12. Click on Analyse with GEO2R at the bottom.
  13. CTRL+A CTRL+C
  14. Paste in a text file table.txt and remove unnecessary text. Everything is tab separated.
  15. Find and replace "-" with "nothing". Save the file.
  16. To remove the spaces after each line, replace "GSM" with "$GSM".
  17. Replace "\n" with "nothing".
  18. Replace "$" with "\n" and backspace the first line. Save.
  19. cut -f 7 table.txt > age.txt
  20. cut -f 8 table.txt > sex.txt
  21. cut -f 9 table.txt > smoke.txt