How to launch course tutorials - statgenetics/statgen-courses GitHub Wiki
Our course tutorials are available via two options: a pre-configured cloud server or docker images with pre-installed software and utility to download data. In both cases, you will be able to launch and work with all exercises using JupyterLab.
At the beginning of the course, you will be assigned an active JupyterLab server on hosted on the cloud. Unless otherwise announced, the link to your server is
https://statgenetics.github.io/statgen-courses/<your_name>
where you replace <your_name>
with what the course organizers provide you based on the name used in your course registration form in lower case. Please adjust this URL and enter it to your web browser.
We have already pre-configured all the software and data needed to run the tutorials. The tutorial commands are available in their respective folders, in either of two formats: Jupyter Notebooks or text files. Detailed explanations for each format will be provided in the following sections.
This will be the JupyterLab interface once you open the URL,

The left panel of the JupyterLab displays all files available. Currently, you are seeing a list of folders for the exercises that we support for the course. Exercises in the archive
folder were used in courses from previous years and are no longer being actively maintained.
On the right panel, you can see a launcher window where you can choose to launch a Notebook
or a Console
of different kernels, or a command Terminal
.
Currently available tutorials based on IPython Notebooks:
Exercise / folder name | IPython Notebook name |
---|---|
finemapping | finemapping.ipynb, finemapping_answers.ipynb |
ldpred2 | ldpred2_example.ipynb |
multivariate_finemapping | multivariate_finemapping.ipynb |
ngs_qc_annotation | NGS_QC_Annotation.ipynb |
plink | plink_Data_QC.ipynb plink_Substurcture.ipynb |
regenie | regenie_example.ipynb |
statgen_basic | statgen_equations.ipynb |
twas | mr_mash.ipynb, twas_test.ipynb |
Currently available tutorial with commands provided in text files:
Exercise / folder name | Text file name |
---|---|
plink_r_nothnagel | plink_r_commands.txt |
epistasis | epistasis_commands.txt |
fastlmm_gcta | FASTLMM_GCTA_commands.txt |
mendelian_randomization | MR_exercise_TwoSampleMR.R |
pleiotropy | pleiotropy_commands.txt |
popgen_nothnagel | popgen_drift.R, popgen_selection.R, genepi_popgen.q |
regression_nothnagel | regression.R, multifactorial_script.txt |
Since commands for most of the exercises are written in a mix of bash and R, we recommended launching a SoS Notebook
. This is because SoS Notebook
allows you to change kernels for each code chunk, which prevents the confusion of opening multiple notebooks or consoles.
To launch a new SoS Notebook
, simply select SoS
under the Notebook
section in the launcher window, and you will be able to select kernels for each code chunk on the top right corner,

If you are running the exercise in an jupyter notebook, the plink commands will be stuck for part of the analysis. This is because this command:
more plink.assoc
is used to read the result interactively in a console or terminal. You can either run this line separately in console or terminal, or change to
head plink.assoc
to read the beginning of the file.
When running the exercise, you may receive the following error:
ERROR: regenie_qc (id=387a6c10b5d599d9) returns an error.
ERROR: [regenie_qc (regenie_qc)]: [0]: Failed to obtain lock /tmp/jovyan/.sos/ae7fddb7f73ac3ee.lock for input regenie_statgen_mwe/1000G.EUR.mwe.pruned.bed and output /home/jovyan/handson-tutorials/contents/regenie/output_vc/cache/1000G.EUR.mwe.pruned.qc_pass.id /home/jovyan/handson-tutorials/contents/regenie/output_vc/cache/1000G.EUR.mwe.pruned.qc_pass.snplist. It is likely that these files are protected by another SoS process or concurrant task that is generating the same set of files. Please manually remove the lockfile if you are certain that no other process is using the lock.
This may happen when you run code chunks containing sos run pipeline/regenie.ipynb
. It happens sometimes because of disk latency on the cloud server that created unnecessary file locks. Simply re-run the chunk where you receive this error; or, run in your command line terminal rm -rf /tmp/jovyan/.sos/*
to remove all lock files.
Please follow the setup here to install all the software needed and launch the exercises. The instruction works for Linux, MacOS and Windows with WSL system configured.
Note that some software packages used in the exercises do not support MacOS. This include plink.multivariate
and fastlmm
. Also, regenie
does not support MacOS with Apple Silicon chips. We therefore recommend using your institutes HPC Linux system to configure these local environments.
Data download: all data used for the exercise are available on synapse.org, with synapse ID syn18700992
. You can either download the entire data-set to your computer, or, depending on which exercise you run, only download that sub-folder. Please follow these instructions for download from synapse.org.
Currently the IT support of the course, provided by MemVerge Inc, includes assistance with setting up the cloud computing environment for participants to launch the exercises directly, as well as support for those who wish to install the course material on their own computers. Support for individual participants will be provided via Slack, which you will be invited to join prior to the course. You will have one week before the course to set up your computing environment if desired. This step is optional, as you can always use the cloud computing environment provided by the course.
For those who have not participated in the course but still interested in setting it up for self-education, please open a ticket in our issue tracker if you have any difficulty setting up your system: https://github.com/statgenetics/statgen-courses/issues We will help you trouble-shoot.