Tips - calab-ntu/gpu-cluster GitHub Wiki

This page contains some tips when using Eureka and Spock.

Checking node status

  • node : List node names, properties, and status
    • -h : Display the help messages.
    • -v : Display the version.
    • -f : Only list the free nodes.
    • -d : Only list the down nodes.
    • -o : Only list the offline nodes.
    • -s <jobId> : Only display the specific job id.
      The followings can only work on login node.
    • -i : List the idle users.
    • -j : List the job ID and job user of each node.
    • -t : List the start time of each job.
    • -a : List the content with option i, j, and t.
    • -q : List the "showq" content.
    • -u <user> : Only display the specific user.
    • -l <label> : Only display the specific node label (e.g., stableq).

Making movies

ffmpeg -framerate 10/1 -i YOUR_FIGURES_%06d.png -c:v libx264 -preset slow -tune animation \
       -pix_fmt yuv420p -s 1920x1200 YOUR_MOVIE.mp4

Replace YOUR_FIGURES_%06d.png and YOUR_MOVIE.mp4 by target filenames. Adjust -framerate 10/1 and -s 1920x1200, if necessary.

ffmpeg reference.

Changing password

yppasswd

> Changing NIS account information for xxxxxx on tumaz.
> Please enter old password:
> Changing NIS password for testuser on tumaz.
> Please enter new password:
> Please retype new password:

> The NIS password has been changed on tumaz.

GNU screen

GNU screen is a terminal multiplexer for the following tasks.

  • Managing multiple windows/programs with a single command-line interface.
  • Detaching a screen session and reattaching it later.

Programs running in screen will NOT be terminated even when a user is disconnected. It is thus particularly useful when your network connection is not very stable (just reattach it).

Common commands for managing screen sessions:

  • screen -S YOUR_SESSION_NAME: create a new session named YOUR_SESSION_NAME
  • screen -DR YOUR_SESSION_NAME: reattach a session named YOUR_SESSION_NAME
  • screen -ls: list the current running sessions

Common commands within a screen session:

  • Ctrl+a ?: help
  • Ctrl+a c: create a new window
  • Ctrl+a ": list all windows
  • Ctrl+a k: kill the current window
  • Ctrl+a \: terminate the current screen session (which cannot be reattached later)
  • Ctrl+a: detach the current screen session (which can be reattached later)

See https://wiki.archlinux.org/index.php/GNU_Screen for more details.

Caution: On spock, we recommend creating a file ~/.screenrc and putting shell -$SHELL on the top of this file. It helps inherit all aliases by sourcing /etc/profile.d/alias.sh.

GPU hangs

Sometimes a GPU hangs (i.e., stop responding) after terminating a GPU job improperly. If this happens, follow the steps below to restart the MPS server:

nvidia-smi              # get the PID of nvidia-cuda-mps-server
nvidia-cuda-mps-control # enter the MPS user interface
shutdown_server PID     # kill nvidia-cuda-mps-server
CRTL-C                  # leave the MPS user interface

Links

⚠️ **GitHub.com Fallback** ⚠️