How to launch course tutorials - statgenetics/statgen-courses GitHub Wiki

Our course tutorials are available via two options: a pre-configured cloud server or docker images with pre-installed software and utility to download data. In both cases, you will be able to launch and work with all exercises using JupyterLab.

Running on a pre-configured cloud server

At the beginning of the course, you will be assigned an active JupyterLab server on hosted on the cloud. Unless otherwise announced, the link to your server is

https://statgenetics.github.io/statgen-courses/<firstname_lastname>

where you replace <firstname_lastname> with your name used in your course registration form, in lower case. For example, for someone named Amanda Liu, the link will be https://statgenetics.github.io/statgen-courses/amanda_liu. Please adjust this URL and enter it to your web browser.

We have already pre-configured all the software and data needed to run the tutorials. The tutorial commands are available in their respective folders, in either of two formats: Jupyter Notebooks or text files. Detailed explanations for each format will be provided in the following sections.

This will be the JupyterLab interface once you open the URL,

The left panel of the JupyterLab displays all files available. Currently, you are seeing a list of folders for the exercises that we support for the course. Exercises in the archive folder were used in courses from previous years and are no longer being actively maintained.

On the right panel, you can see a launcher window where you can choose to launch a Notebook or a Console of different kernels, or a command Terminal.

Tutorial commands available in IPython notebook

Currently available tutorials based on IPython Notebooks:

Exercise / folder name IPython Notebook name
finemapping finemapping.ipynb, finemapping_answers.ipynb
ldpred2 ldpred2_example.ipynb
multivariate_finemapping multivariate_finemapping.ipynb
ngs_qc_annotation NGS_QC_Annotation.ipynb
plink plink_Data_QC.ipynb plink_Substurcture.ipynb
regenie regenie_example.ipynb
statgen_basic statgen_equations.ipynb
twas multivariate_prediction.ipynb

Tutorial commands available in text files

Currently available tutorial with commands provided in text files:

Exercise / folder name Text file name
basic_plink_r_nothnagel plink_r_commands.txt
epistasis epistasis_commands.txt
fastlmm_gcta FASTLMM_GCTA_commands.txt
mendelian_randomization MR_exercise_TwoSampleMR.R
pleiotropy pleiotropy_commands.txt
popgen_nothnagel popgen_drift.R, popgen_selection.R, genepi_popgen.q
regression_nothnagel regression.R, multifactorial_script.txt

Since commands for most of the exercises are written in a mix of bash and R, we recommended launching a SoS Notebook. This is because SoS Notebook allows you to change kernels for each code chunk, which prevents the confusion of opening multiple notebooks or consoles.

To launch a new SoS Notebook, simply select SoS under the Notebook section in the launcher window, and you will be able to select kernels for each code chunk on the top right corner,

Trouble shoot

Epistasis

If you are running the exercise in an jupyter notebook, the plink commands will be stuck for part of the analysis. This is because this command:

more plink.assoc

is used to read the result interactively in a console or terminal. You can either run this line separately in console or terminal, or change to

head plink.assoc

to read the beginning of the file.

Regenie

When running the exercise, you may receive the following error:

ERROR: regenie_qc (id=387a6c10b5d599d9) returns an error.
ERROR: [regenie_qc (regenie_qc)]: [0]: Failed to obtain lock /tmp/jovyan/.sos/ae7fddb7f73ac3ee.lock for input regenie_statgen_mwe/1000G.EUR.mwe.pruned.bed and output /home/jovyan/handson-tutorials/contents/regenie/output_vc/cache/1000G.EUR.mwe.pruned.qc_pass.id /home/jovyan/handson-tutorials/contents/regenie/output_vc/cache/1000G.EUR.mwe.pruned.qc_pass.snplist. It is likely that these files are protected by another SoS process or concurrant task that is generating the same set of files. Please manually remove the lockfile if you are certain that no other process is using the lock.

This may happen when you run code chunks containing sos run pipeline/regenie.ipynb. It happens sometimes because of disk latency on the cloud server that created unnecessary file locks. Simply re-run the chunk where you receive this error; or, run in your command line terminal rm -rf /tmp/jovyan/.sos/* to remove all lock files.

Running on local computer

We use a script called statgen-setup, written in SoS language to initialize the docker containers for running the tutorials. If you use your own computer, e.g. a Mac or Linux laptop, you need to install docker and SoS, as well as the statgen-setup script.

Caution that:

  1. Instructions below are tested on Mac and Linux computers. Currently we do not offer support to running directly under Windows, although it will still work for a Windows running the Windows Subsystem for Linux (WSL). Windows users can follow the WSL installation instructions to configure their Windows system, before returning to the installation instructions below.
  2. For savvy users who wish to setup the computing environment without using Docker, we have this internal document to show how it works. Interested users are on their own to explore this option, and although it is still possible to post potential issues on GitHub to discuss with us, this approach is not officially supported by our course instructor team and therefore relevant requests may be responded at a lower priority.

To open up JupyterLab server on Windows, it is recommended that a modern web browser be used, eg Edge instead of Internet Explorer.

Install Docker

Please download the docker installer for your operating system, and follow the instructions from the download page to install Docker.

A few notes for Mac users:

  1. After Docker is installed, please click on the Docker icon (found in your Applications) to turn on the Docker Engine. You will need to use your user account at https://hub.docker.com/ to login to your Docker client in order to access imaged released on dockerhub. If you do not have an account please register one and use it.
  2. Every time you restart your machine you may need to restart Docker Engine manually if you would like to run the tutorials. You should see a whale-like icon on your task bar indicating that docker service is running on the background of your computer.
  3. To test if Docker is installed properly and is ready to use, please open up a command terminal (found in your Applications) and type docker run hello-world. You should see greeting messages output on the screen, indicating successful installation.

Install SoS

SoS is distributed on both pypi and conda-forge. If you are familiar with pypi or conda please install SoS the same way as a conventional Python package distributed through these repositories.

If you have never worked with any Python package management tools or never used conda (none of miniconda, micromamba, mamba, pixi) and would like to start from scratch, this document provides a quick way to setup a production conda environment using pixi and micromamba, with SoS installed along with other tools such as R, Python 3, and Jupyter Lab. The document is written with setting up the software environment on a high performance computing cluster (HPC) although the exact same setup should also apply to your MacOS, Linux PC and Windows PC with WSL installed. Caution that this will make changes to your local computing environment, including adding extra lines of shell environment configuration such as export PATH commands, to your shell configuration file. Usually for novices with a computer not configured with these tools, it is not harmful to use our setup. However for savvy developers it is strongly suggested that you complete SoS installation within your existing software environment.

Install statgen-setup script

Please download this statgen-set script, save it to your computer with filename statgen-setup. Then please open your command terminal, use cd command to navigate to where the file is downloaded and saved to (on Mac OS it should be ~/Downloads by default), and run

chmod +x statgen-setup

to make this script executable. You should now be able to run this script in the command terminal as ./statgen-setup from the directory it is downloaded to. Please test it by typing ./statgen-setup -h to output the help information for this script.

You can also move this script to specific folders in your system (bash PATH) such that you will be able to run it simply as statgen-setup without having to type in the path e.g. ./. One possibility is to install it to where sos program is installed. To do so, first type which sos to see the path where sos is installed to. Then you can move statgen-setup script to that same path (either via mv command on the terminal, or cut and paste it through the file manager in your operating system).

Launching tutorials

To launch the environment to run the tutorials, please run from the command terminal:

./statgen-setup serve

Caution that the first time you run this command, it will pull the docker image containing all software packages used for the course which is about 5GB and may take a while to download.

You will see in the terminal that the script is downloading the latest tutorial files

When it completes, you should see a line printed on the screen that contains a URL:

Please copy that URL to your web browser. Your JupyterLab server should start like this:

At this point, please open a Terminal under Other in the launcher window, and run

get-data

This is to download all the data required to run the tutorials. It may take a while for get-data command to load the data. Please wait to using the tutorials until after this command is completed.

All data and tutorials will be downloaded to this folder on your computer, ~/statgen_course_$USER where $USER is your username on the system.

The tutorials are in the same structure as described in section Running on a pre-configured cloud server. Please follow the same instructions described in that section to run the tutorials.

A note on ANNOVAR

The ANNOVAR software has a user license that prohibits redistribution of the software. Our team had obtained permission from the authors to redistribute ANNOVAR for educational purpose, limited to the context of this course and inside of the Docker image that we distribute with the computer tutorials. Users of tutorials involving ANNOVAR should fill up the academic user registration form.

Video illustrations

To facilitate users setting up their computer for the course exercises, we have prepared video instructions available on YouTube.

Please skip the part for conda and SoS installation as it is obsolete, but it should be straightforward to follow from our text description earlier which is simpler than the steps recorded in the video. Please only use this videos for instruction on WSL2 and Docker installations

Install WSL2, Docker, Conda and SoS on Windows

installation Windows

Install Docker, Conda and SoS on MacOS

installation MacOS

Install Docker, Conda and SoS on Ubuntu Linux

installation Ubuntu Linux

Get help

Please open a ticket in our issue tracker if you have any difficulty setting up your system: https://github.com/statgenetics/statgen-courses/issues We will help you trouble-shoot.