Learning Objectives Intro to Legion - UCL/RCPSTrainingMaterials GitHub Wiki

Lession 1: Intro

Questions

  • What is a cluster? How does it differ from your own workstation?
  • Why would you want to use one?

Objectives

  • Summarise the basic building blocks of a cluster (servers, storage, network(s))
  • Identify some cases when work is or is not suitable for running on a cluster?

Lesson 2: Processors and processes

Questions

  • What do we mean by a core and a node?
  • Why would we want to use multiple cores at once?
  • What is a serial program?
  • What is a threaded program and how does it communicate between threads?
  • What is an MPI program and how does it communicate between processes?
  • How does the design of the computer affect the sizes and types of programs you can run on it?

Objectives

  • Explain the differences between serial, threaded and MPI programs and whether they can run across nodes.

Lesson 3: Legion

Questions

  • How do you access Legion from inside and outside UCL?
  • How many login nodes does Legion have?
  • Why does Legion have multiple login nodes?
  • How would you log in to a specific login node and why would you want to?

Objectives

  • Log in to Legion, either from UCL or from home.
  • Log in to a specific node.

Lession 4: Data management on Legion

Questions

  • How would you create a file and a directory tree on a Unix system?
  • How do you copy your data on to Legion?
  • If you are copying a large quantity of data, how would you do this more effectively?
  • What are the differences between home and Scratch?
  • Where is $TMPDIR and what is it for?
  • How do you retrieve data from $TMPDIR?

Objectives

  • Create a text file inside a directory tree on Legion. (or should this be left to the shell course?)
  • Copy compressed data on to Legion from your local machine using login05.
  • Decompress the data into its final location.
  • Explain the different quotas and how flexible they are.
  • Explain the read/write accessibility to the different areas.
  • Understand that ~/Scratch is a shortcut to /scratch/scratch/$USER.
  • Explain the backup policy of the areas.
  • Explain the performance differences.
  • Explain why you would want to write to $TMPDIR.

Lesson 5: Using software on Legion

Questions

  • How are software packages managed on Legion?
  • What software is available by default?
  • How do you find out all the software which is available?
  • How do you load and run a software package?
  • How do you make a piece of software always be available to you?
  • How do you use a program with a graphical user interface?

Objectives

  • Use module list to see the default modules.
  • Use module avail to see all the modules.
  • Load a module that has prerequisites and requires changes to the default modules.
  • Put a module load command in your .bashrc and start a new shell.
  • Start an X11 server on your local machine and run nedit on Legion.

Lesson 6: Jobs on Legion

Questions

  • Why do we have a scheduler and separate compute nodes?
  • How do you submit a job to the compute nodes?
  • What resources can you request?
  • What happens if you don't request a resource explicitly?
  • What is wallclock time? What happens if your job runs out?
  • Why must your job have a working directory in Scratch?
  • When can your job have two things you could refer to as "working directory" and how would you disambiguate between them? (Hint: cd $TMPDIR)
  • What are two ways to copy your data back from $TMPDIR and how do they differ?
  • How much memory is a job requesting 12 cores and mem=1G actually asking for?
  • How do you request a job that needs an MPI environment?
  • What are some different types of node that Legion contains?
  • How and when might you need to control what type of node your job runs on?
  • How can you check on the status of your job?
  • What does qw, r, Rq, Rr and Eqw mean?
  • How can you tell why a job is in Eqw state?
  • How do you delete a job?
  • How can you check on past jobs?

Objectives

  • Explain that resources have to be assigned fairly to users.
  • Write and submit a simple jobscript that leaves some resources as default values.
  • Use qstat after submitting a job and qstat -j to see what resources you ended up requesting.
  • Write and submit a jobscript that specifies all resources appropriately.
  • Write and submit a jobscript that writes to $TMPDIR and copies data back. Understand that SGE's working directory and the working directory for the program you are running inside the script can be different.
  • Explain that #Local2Scratch happens outside wallclock time, while other copying methods (with or without tar) happen inside it but can be fully customised.
  • Explain the differences in hardware/intended use for some of Legion's nodes.
  • Explain why you might want to run on a specific node, and why this is usually unnecessary.
  • Explain what common qstat statuses mean.
  • Submit a given faulty jobscript and use qexplain to find out what was wrong.
  • Use qdel to delete a job.
  • Use jobhist after a job has ended.

Lession 7: Policies and further resources

Questions

  • In what circumstances might we remove a user's access to Legion?
  • What data may and may not be stored on Legion?
  • When can you use Research Data to back up your data?
  • If you need more resources than are available by default, what can you do?
  • If you need to share data on Legion between a group of researchers, how would you go about this?
  • If you have shared data stored in someone else's directory, whose quota does this affect?
  • If you have a problem, who should you contact?
  • Where can you find FAQs?
  • If your job is not working, what information should you ideally be able to give us?
  • What community resources are available for you to discuss programming or other problems?

Objectives

  • Understand Legion's usage policies.
  • Understand that data you have responsibilities for under the Data Protection Act may not be stored on Legion.
  • Read Research Data T&C and see if it is useful for you.
  • Know that the CRAG will discuss additional resource requests and that ones for resources such as wallclock time or priority access must be well-justified and have defined limits.
  • Understand that data sharing is done by changing its permissions, and a group (often a Research Data group if one exists for the project) can be used.
  • Understand that quotas are based on file ownership, not where the files reside.
  • Know where to find help and how much you need to tell us so we can help you.
  • Know that the Research Programming Hub exists and has a Slack channel which complements but does not replace direct tickets to us.
  • Know the Technical Socials exist.
  • Know that regular drop in sessions are held.