01 Setting up - GeertsManon/EEG_Metagenomics GitHub Wiki

Step 1: Connect to the supercomputer

Since most of you likely have laptops not managed by KU Leuven, a new step was introduced in March to enable SSH login. Open the firewall access and request a certificate.

Then, we can connect to the KU Leuven HPC cluster. Open your Terminal (Mac/Linux) or PowerShell (Windows) and SSH into the login node. You will be prompted to open a firewall verification link in your browser — this is normal and required to whitelist your IP address.

Be sure to update XXXXX with your own HPC username:

ssh [email protected]

Step 2: Request an interactive session

The login node is a shared environment — it's not meant for heavy computation. We need to request a dedicated compute node to safely run our analyses. The command below asks SLURM (the job scheduler) to allocate resources for us:

cd $VSC_DATA
srun -A lp_edu_eeg_2026 -M genius --cpus-per-task=8 --time=3:00:00 --pty bash -l

srun — SLURM command to submit and run an interactive job on the cluster
-A lp_edu_eeg_2026 — account/allocation to charge the compute time to; in this case the course allocation for EEG 2026 (-A is the same as --account)
-M genius — specifies the cluster to run on; Genius is one of KU Leuven's Tier-2 clusters (-M is the same as --cluster)
--cpus-per-task=8 — request 8 CPU threads for your session
--time=3:00:00 — maximum walltime of 3 hours; the job will be killed after this, regardless of what's running
--pty — allocate a pseudo-terminal, needed for interactive sessions
bash — start a bash shell as the interactive session
-l — login shell, meaning it loads your full environment (.bash_profile, conda, modules, etc.) just like a normal login would

Once SLURM grants your request, you will be automatically moved from the login node to your assigned compute node. You can verify this by checking that your prompt changes (e.g. from tier2-p-login-2 to something like r25i13n03).

Step 3: Get the data

Now that we have a compute node, we need to get the course data into your personal space before we can start the analysis.

You'll be working with real cave microbiome data from the Democratic Republic of Congo, specifically the metagenomic sequencing of the DRC_cave_A sample. When you begin the practical, you'll have access to several key files, prepared for you to complete today's session.

Create a new directory for today's sessions and navigate into it:

mkdir EEG_metagenomics
cd EEG_metagenomics

All course data is available on the supercomputer. Simply copy the folders to your current directory:

cp -r /data/leuven/347/vsc34774/EEG_metagenomics/exercises .
cp -r /data/leuven/347/vsc34774/EEG_metagenomics/assignment .

Verify the copy:

tree

This will output:

.
├── assignment
│   ├── contigs.db
│   ├── MERGED_PROFILE
│   │   ├── AUXILIARY-DATA.db
│   │   ├── PROFILE.db
│   │   └── RUNLOG.txt
│   └── MERGED_taxonomy_summary.txt
└── exercises
    ├── contigs.db
    ├── contigs.fasta
    ├── mappedReadsToContigs.bam
    ├── mappedReadsToContigs.bam.bai
    └── readDepthPerContig.txt

3 directories, 10 files