How to launch course tutorials - statgenetics/statgen-courses GitHub Wiki
Our course tutorials are available via two options: a pre-configured cloud server or docker images with pre-installed software and utility to download data. In both cases, you will be able to launch and work with all exercises using JupyterLab.
Running on a pre-configured cloud server
At the beginning of the course, you will be assigned an active JupyterLab server on hosted on the cloud. Unless otherwise announced, the link to your server is
https://statgenetics.github.io/statgen-courses/<firstname_lastname>
where you replace <firstname_lastname>
with your name used in your course registration form, in lower case. For example, for someone named Amanda Liu, the link will be https://statgenetics.github.io/statgen-courses/amanda_liu
. Please adjust this URL and enter it to your web browser.
We have already pre-configured all the software and data needed to run the tutorials. The tutorial commands are available in their respective folders, in either of two formats: Jupyter Notebooks or text files. Detailed explanations for each format will be provided in the following sections.
This will be the JupyterLab interface once you open the URL,
The left panel of the JupyterLab displays all files available. Currently, you are seeing a list of folders for the exercises that we support for the course. Exercises in the archive
folder were used in courses from previous years and are no longer being actively maintained.
On the right panel, you can see a launcher window where you can choose to launch a Notebook
or a Console
of different kernels, or a command Terminal
.
Tutorial commands available in IPython notebook
Currently available tutorials based on IPython Notebooks:
Exercise / folder name | IPython Notebook name |
---|---|
finemapping | finemapping.ipynb, finemapping_answers.ipynb |
ldpred2 | ldpred2_example.ipynb |
multivariate_finemapping | multivariate_finemapping.ipynb |
ngs_qc_annotation | NGS_QC_Annotation.ipynb |
plink | plink_Data_QC.ipynb plink_Substurcture.ipynb |
regenie | regenie_example.ipynb |
statgen_basic | statgen_equations.ipynb |
twas | multivariate_prediction.ipynb |
Tutorial commands available in text files
Currently available tutorial with commands provided in text files:
Exercise / folder name | Text file name |
---|---|
basic_plink_r_nothnagel | plink_r_commands.txt |
epistasis | epistasis_commands.txt |
fastlmm_gcta | FASTLMM_GCTA_commands.txt |
mendelian_randomization | MR_exercise_TwoSampleMR.R |
pleiotropy | pleiotropy_commands.txt |
popgen_nothnagel | popgen_drift.R, popgen_selection.R, genepi_popgen.q |
regression_nothnagel | regression.R, multifactorial_script.txt |
Since commands for most of the exercises are written in a mix of bash and R, we recommended launching a SoS Notebook
. This is because SoS Notebook
allows you to change kernels for each code chunk, which prevents the confusion of opening multiple notebooks or consoles.
To launch a new SoS Notebook
, simply select SoS
under the Notebook
section in the launcher window, and you will be able to select kernels for each code chunk on the top right corner,
Trouble shoot
Epistasis
If you are running the exercise in an jupyter notebook, the plink commands will be stuck for part of the analysis. This is because this command:
more plink.assoc
is used to read the result interactively in a console or terminal. You can either run this line separately in console or terminal, or change to
head plink.assoc
to read the beginning of the file.
Regenie
When running the exercise, you may receive the following error:
ERROR: regenie_qc (id=387a6c10b5d599d9) returns an error.
ERROR: [regenie_qc (regenie_qc)]: [0]: Failed to obtain lock /tmp/jovyan/.sos/ae7fddb7f73ac3ee.lock for input regenie_statgen_mwe/1000G.EUR.mwe.pruned.bed and output /home/jovyan/handson-tutorials/contents/regenie/output_vc/cache/1000G.EUR.mwe.pruned.qc_pass.id /home/jovyan/handson-tutorials/contents/regenie/output_vc/cache/1000G.EUR.mwe.pruned.qc_pass.snplist. It is likely that these files are protected by another SoS process or concurrant task that is generating the same set of files. Please manually remove the lockfile if you are certain that no other process is using the lock.
This may happen when you run code chunks containing sos run pipeline/regenie.ipynb
. It happens sometimes because of disk latency on the cloud server that created unnecessary file locks. Simply re-run the chunk where you receive this error; or, run in your command line terminal rm -rf /tmp/jovyan/.sos/*
to remove all lock files.
Running on local computer
We use a script called statgen-setup
, written in SoS
language to initialize the docker containers for running the tutorials. If you use your own computer, e.g. a Mac or Linux laptop, you need to install docker
and SoS
, as well as the statgen-setup
script.
Caution that:
- Instructions below are tested on Mac and Linux computers. Currently we do not offer support to running directly under Windows, although it will still work for a Windows running the Windows Subsystem for Linux (WSL). Windows users can follow the WSL installation instructions to configure their Windows system, before returning to the installation instructions below.
- For savvy users who wish to setup the computing environment without using Docker, we have this internal document to show how it works. Interested users are on their own to explore this option, and although it is still possible to post potential issues on GitHub to discuss with us, this approach is not officially supported by our course instructor team and therefore relevant requests may be responded at a lower priority.
To open up JupyterLab server on Windows, it is recommended that a modern web browser be used, eg Edge instead of Internet Explorer.
Install Docker
Please download the docker installer for your operating system, and follow the instructions from the download page to install Docker.
A few notes for Mac users:
- After Docker is installed, please click on the Docker icon (found in your
Applications
) to turn on the Docker Engine. You will need to use your user account at https://hub.docker.com/ to login to your Docker client in order to access imaged released on dockerhub. If you do not have an account please register one and use it. - Every time you restart your machine you may need to restart Docker Engine manually if you would like to run the tutorials. You should see a whale-like icon on your task bar indicating that docker service is running on the background of your computer.
- To test if Docker is installed properly and is ready to use, please open up a command terminal (found in your
Applications
) and typedocker run hello-world
. You should see greeting messages output on the screen, indicating successful installation.
Install SoS
SoS is distributed on both pypi
and conda-forge
. If you are familiar with pypi
or conda
please install SoS the same way as a conventional Python package distributed through these repositories.
If you have never worked with any Python package management tools or never used conda
(none of miniconda
, micromamba
, mamba
, pixi
) and would like to start from scratch, this document provides a quick way to setup a production conda
environment using pixi
and micromamba
, with SoS installed along with other tools such as R, Python 3, and Jupyter Lab. The document is written with setting up the software environment on a high performance computing cluster (HPC) although the exact same setup should also apply to your MacOS, Linux PC and Windows PC with WSL installed. Caution that this will make changes to your local computing environment, including adding extra lines of shell environment configuration such as export PATH
commands, to your shell configuration file. Usually for novices with a computer not configured with these tools, it is not harmful to use our setup. However for savvy developers it is strongly suggested that you complete SoS installation within your existing software environment.
statgen-setup
script
Install Please download this statgen-set
script, save it to your computer with filename statgen-setup
. Then please open your command terminal, use cd
command to navigate to where the file is downloaded and saved to (on Mac OS it should be ~/Downloads
by default), and run
chmod +x statgen-setup
to make this script executable. You should now be able to run this script in the command terminal as ./statgen-setup
from the directory it is downloaded to. Please test it by typing ./statgen-setup -h
to output the help information for this script.
You can also move this script to specific folders in your system (bash
PATH) such that you will be able to run it simply as statgen-setup
without having to type in the path e.g. ./
. One possibility is to install it to where sos
program is installed. To do so, first type which sos
to see the path where sos
is installed to. Then you can move statgen-setup
script to that same path (either via mv
command on the terminal, or cut and paste it through the file manager in your operating system).
Launching tutorials
To launch the environment to run the tutorials, please run from the command terminal:
./statgen-setup serve
Caution that the first time you run this command, it will pull the docker image containing all software packages used for the course which is about 5GB and may take a while to download.
You will see in the terminal that the script is downloading the latest tutorial files
When it completes, you should see a line printed on the screen that contains a URL:
Please copy that URL to your web browser. Your JupyterLab server should start like this:
At this point, please open a Terminal
under Other
in the launcher window, and run
get-data
This is to download all the data required to run the tutorials. It may take a while for get-data
command to load the data. Please wait to using the tutorials until after this command is completed.
All data and tutorials will be downloaded to this folder on your computer, ~/statgen_course_$USER
where $USER
is your username on the system.
The tutorials are in the same structure as described in section Running on a pre-configured cloud server. Please follow the same instructions described in that section to run the tutorials.
ANNOVAR
A note on The ANNOVAR software has a user license that prohibits redistribution of the software. Our team had obtained permission from the authors to redistribute ANNOVAR for educational purpose, limited to the context of this course and inside of the Docker image that we distribute with the computer tutorials. Users of tutorials involving ANNOVAR should fill up the academic user registration form.
Video illustrations
To facilitate users setting up their computer for the course exercises, we have prepared video instructions available on YouTube.
Please skip the part for conda and SoS installation as it is obsolete, but it should be straightforward to follow from our text description earlier which is simpler than the steps recorded in the video. Please only use this videos for instruction on WSL2 and Docker installations
Install WSL2, Docker, Conda and SoS on Windows
Install Docker, Conda and SoS on MacOS
Install Docker, Conda and SoS on Ubuntu Linux
Get help
Please open a ticket in our issue tracker if you have any difficulty setting up your system: https://github.com/statgenetics/statgen-courses/issues We will help you trouble-shoot.