Learning Objectives Intro to Legion - UCL/RCPSTrainingMaterials GitHub Wiki

Lession 1: Intro

What do we mean by a core and a node?
Why would we want to use multiple cores at once?
What is a serial program?
What is a threaded program and how does it communicate between threads?
What is an MPI program and how does it communicate between processes?
How does the design of the computer affect the sizes and types of programs you can run on it?

Explain the differences between serial, threaded and MPI programs and whether they can run across nodes.

How would you create a file and a directory tree on a Unix system?
How do you copy your data on to Legion?
If you are copying a large quantity of data, how would you do this more effectively?
What are the differences between home and Scratch?
Where is $TMPDIR and what is it for?
How do you retrieve data from $TMPDIR?

Create a text file inside a directory tree on Legion. (or should this be left to the shell course?)
Copy compressed data on to Legion from your local machine using login05.
Decompress the data into its final location.
Explain the different quotas and how flexible they are.
Explain the read/write accessibility to the different areas.
Understand that ~/Scratch is a shortcut to /scratch/scratch/$USER.
Explain the backup policy of the areas.
Explain the performance differences.
Explain why you would want to write to $TMPDIR.

Use module list to see the default modules.
Use module avail to see all the modules.
Load a module that has prerequisites and requires changes to the default modules.
Put a module load command in your .bashrc and start a new shell.
Start an X11 server on your local machine and run nedit on Legion.

Why do we have a scheduler and separate compute nodes?
How do you submit a job to the compute nodes?
What resources can you request?
What happens if you don't request a resource explicitly?
What is wallclock time? What happens if your job runs out?
Why must your job have a working directory in Scratch?
When can your job have two things you could refer to as "working directory" and how would you disambiguate between them? (Hint: cd $TMPDIR)
What are two ways to copy your data back from $TMPDIR and how do they differ?
How much memory is a job requesting 12 cores and mem=1G actually asking for?
How do you request a job that needs an MPI environment?
What are some different types of node that Legion contains?
How and when might you need to control what type of node your job runs on?
How can you check on the status of your job?
What does qw, r, Rq, Rr and Eqw mean?
How can you tell why a job is in Eqw state?
How do you delete a job?
How can you check on past jobs?

Explain that resources have to be assigned fairly to users.
Write and submit a simple jobscript that leaves some resources as default values.
Use qstat after submitting a job and qstat -j to see what resources you ended up requesting.
Write and submit a jobscript that specifies all resources appropriately.
Write and submit a jobscript that writes to $TMPDIR and copies data back. Understand that SGE's working directory and the working directory for the program you are running inside the script can be different.
Explain that #Local2Scratch happens outside wallclock time, while other copying methods (with or without tar) happen inside it but can be fully customised.
Explain the differences in hardware/intended use for some of Legion's nodes.
Explain why you might want to run on a specific node, and why this is usually unnecessary.
Explain what common qstat statuses mean.
Submit a given faulty jobscript and use qexplain to find out what was wrong.
Use qdel to delete a job.
Use jobhist after a job has ended.

In what circumstances might we remove a user's access to Legion?
What data may and may not be stored on Legion?
When can you use Research Data to back up your data?
If you need more resources than are available by default, what can you do?
If you need to share data on Legion between a group of researchers, how would you go about this?
If you have shared data stored in someone else's directory, whose quota does this affect?
If you have a problem, who should you contact?
Where can you find FAQs?
If your job is not working, what information should you ideally be able to give us?
What community resources are available for you to discuss programming or other problems?

Understand Legion's usage policies.
Understand that data you have responsibilities for under the Data Protection Act may not be stored on Legion.
Read Research Data T&C and see if it is useful for you.
Know that the CRAG will discuss additional resource requests and that ones for resources such as wallclock time or priority access must be well-justified and have defined limits.
Understand that data sharing is done by changing its permissions, and a group (often a Research Data group if one exists for the project) can be used.
Understand that quotas are based on file ownership, not where the files reside.
Know where to find help and how much you need to tell us so we can help you.
Know that the Research Programming Hub exists and has a Slack channel which complements but does not replace direct tickets to us.
Know the Technical Socials exist.
Know that regular drop in sessions are held.