Guide: setting up workstation and cluster - polizzilab/software-wiki GitHub Wiki
Compute resources
Welcome! Our lab uses the following computer resources:
- Your personal computer for quick simple tasks
- The main workstation
npl1.in.hwlabfor developing code and for relatively lightweight tasks - The CPU and GPU servers,
np-cpu-1,np-gpu-1, andnp-gpu-2for a similar purpose, although these have much more compute than the workstation - The o2 compute cluster, for running more intensive jobs.
Our workstation and servers are shared by all lab members for interactively developing code. They are not intended for larger, more intense jobs! To make sure everybody gets to share the resources, please be mindful and monitor your resource use with htop. If you are running a compute-intensive or memory-intensive job, please submit them to o2. We have a guide below on how to use O2.
Once your have access to the workstations and o2, please configure your account to use our lab's shared python environments as follows:
- On the workstation, copy the lines in our shared .bashrc onto the top of your personal bashrc at
~/.bashrc. This will set various environmental variables. - On o2, make sure that you are part of the
polizziuser group so that you have read/write permissions to our lab shared directory. To check whether you do, type the commandgroupsto list the user groups your account is part of. Ifpolizziis not in the list, please email o2 IT and ask them to add you to the UNIX user grouppolizzi. - Once you do, on o2, copy the lines in our shared .bashrc onto the top of your personal bashrc at
~/.bashrc.
Connecting to the workstation
Most lab members choose to use vscode as a powerful, intuitive, and graphical way to connect to the workstation. It is easy and convenient for running python notebooks, developing python scripts, and general file management.
Using vscode remote for the workstation
Install vscode on your personal computer and install then install the remote ssh extension. Use npl1.in.hwlab as the remote host. This will let you run commands and notebooks on the workstation.
While vscode is convenient, it is limited to simple tasks. You will need other tools for more sophisticated tasks, such as uploading/downloading lots of files, viewing .pdb files, submitting longer jobs (to continue running after exiting vscode), or extended text-based terminal commands.
Text-based commands
For running text-based commands, it is recommended to use ssh from a dedicated terminal app. This is a better alternative to the small default vscode terminal window!
- On Mac, you can use the built-in Terminal app or download iTerm2; on Windows, download putty. The default colors and font sizes are rather ugly; please modify preferences to find something that suits you better.
- Run the command
ssh [email protected]. This will connect you to the workstation, log you in, and present you with a text-based "shell" to interfacing with the workstation. SSH stands for secure shell. If you are new to this way of using computers, please google "introduction to the unix shell" for some guides. - Under the hood,
sshis a protocol that your local computer uses to communicate with the workstation. Once the connection is established there are many things you can do. In fact, many other tools such as vscode andscpare built on top of an underlyingsshconnection.
For submitting longer-running commands, it is recommended to use tmux. See our Guide to using tmux
File transfer
Transferring files to and from the workstation can be pretty clunky.
- Some use vscode; some use command-line tools such as
scporrsyncon the terminal (google these if curious). - To ease the friction of manually using
rsync, I (jchang) like to use a helper script on my local machine as a wrapper torsync. You can find it here. - Try this! If you want to drag-and-drop files on the workstation as if they were local files on your computer, you can use the tool
sshfs. This uses ansshto mount a workstation directory as a separate filesystem (fs) on your own computer (google "unix mount" if curious). Once you downloadsshfson your computer, run this command on your computer:sudo sshfs -o allow_other,default_permissions {USER}@transfer.sbgrid.org:/nfs/polizzi/{USER} /PATH/TO/MOUNT/POINT. Here{USER}is your sbgrid username, and/PATH/TO/MOUNT/POINTis an empty folder on your computer. Your workstation files will then "magically" appear under that empty folder you specified. Note that you will have to reconnect each time you lose access to the internet because the underlying ssh connection will be terminated.
Viewing pdb files
- One way is to transfer them to your local computer with the above methods and then open pymol.
- Another way is to use the protein viewer extension in vscode. It works but it's a little clunky.
- Finally, one way is to host a http server on the workstation and point your pymol to load files from there. Jody uses the script here. Just run the script and copy-paste the
loadcommands into your pymol command window.
Connecting to o2
TODO
Some helpful links for now:
- How to choose a partition
- Scratch vs machine-local temporary filesystems. It is recommended to copy the vdM database to the machine-local temporary filesystem for faster i/o when using COMBS on O2. Transfer and unzipping will take a few min but the overall time savings can be substantial.