Using Workstation - Vision-and-Learning-Lab-UAlberta/home GitHub Wiki

This is a draft version of the guidelines of using workstations in our lab.

In our lab, we have 3 workstations with 11 GPUs (3, 4, 4, respectively). Currently, we should have assigned at least one workstation to each of our lab members include students, researchers and visiting scholars. Since this is a shared resource, we would like to provide a few tips for you to better leveraging the resources and at same time, not affect others working with the workstations.

THE ONLY LAW

DON'T run your program with root even if you are a sudoer. We will kill your process without any notification.

Python

  • Always use virtual environment, or Anaconda to manage project environment. Don't use the default Python2/3 installation on the workstation. As you may noticed, most of the accounts do not have root access therefore you cannot install anything for the Python comes with the Ubuntun installation.

CPU

  • If you need to run some tasks with a significant CPU load, please first consider to use your own desktop provided in the lab. Your desktop provided by our lab actually has a pretty decent configuration.

image

  • It is totally fine to use the workstation to run heavy tasks with CPU, but you should talk to the members who share the same machine through our WhatsApp group. If no one has problem with that, then you can proceed and use it for 1~2 days. But we still prioritise GPU utilization for the workstations.

  • [PyTorch Specific] If you are using PyTorch, you may need to do the following at the beginning of your code base to avoid PyTorch's buggy dataloader abuse the entire CPU.

    os.environ['OMP_NUM_THREAD'] = '1'
    torch.set_num_threads(1)
    
  • [PyTorch Specific] Always set pin_memory to False when creating a dataloader, otherwise the dataloader can take all the cores in CPU and have multiple duplication in the memory.

  • [PyTorch Specific] Do not use multiprocessing lib in the PyTorch package, they are just stupid wrapper of Python's even more stupid multiprocessing module.

GPU

  • Try to run your task on a single GPU if it is enough. A simple workaround is injecting this piece of code at the beginning of your code base.

    os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
    os.environ['CUDA_VISIBLE_DEVICES'] = config.visible_devices
    
  • You can certainly use multiple GPUs, if they are free. And you can run several instance of your task on several GPUs.

Storage

Unfortunately, we don't have very large SSD for the storage purpose. The different workstations should have around 1-3 TB SSD, and 1-3 TB HDD storage.

  • If you have a very large dataset need to be processed or used in your experiment, consider to pre-process it if possible.
    • For example, if you have a dataset which is ~1TB video, and you know in your experiment you only need every 15th frame at a lower resolution. Likely you can do this preprocessing offline and store the processed version (should be much smaller) in SSD for later use and move the raw dataset to HDD (or remove it if you want).

PyCharm

Many members in our lab is using the PyCharm Remote Debugger to do the daily job.

Here we provide a few instruction screenshot for you to correctly setup everything you need (minimal setup).

  • Setup SSH interpreter image image image