Learning Objectives Intro to Legion - UCL/RCPSTrainingMaterials GitHub Wiki
Lession 1: Intro
Questions
What is a cluster? How does it differ from your own workstation?
Why would you want to use one?
Objectives
Summarise the basic building blocks of a cluster (servers, storage, network(s))
Identify some cases when work is or is not suitable for running on a cluster?
Lesson 2: Processors and processes
Questions
What do we mean by a core and a node?
Why would we want to use multiple cores at once?
What is a serial program?
What is a threaded program and how does it communicate between threads?
What is an MPI program and how does it communicate between processes?
How does the design of the computer affect the sizes and types of programs you can run on it?
Objectives
Explain the differences between serial, threaded and MPI programs and whether they can run across nodes.
Lesson 3: Legion
Questions
How do you access Legion from inside and outside UCL?
How many login nodes does Legion have?
Why does Legion have multiple login nodes?
How would you log in to a specific login node and why would you want to?
Objectives
Log in to Legion, either from UCL or from home.
Log in to a specific node.
Lession 4: Data management on Legion
Questions
How would you create a file and a directory tree on a Unix system?
How do you copy your data on to Legion?
If you are copying a large quantity of data, how would you do this more effectively?
What are the differences between home and Scratch?
Where is $TMPDIR and what is it for?
How do you retrieve data from $TMPDIR?
Objectives
Create a text file inside a directory tree on Legion. (or should this be left to the shell course?)
Copy compressed data on to Legion from your local machine using login05.
Decompress the data into its final location.
Explain the different quotas and how flexible they are.
Explain the read/write accessibility to the different areas.
Understand that ~/Scratch is a shortcut to /scratch/scratch/$USER.
Explain the backup policy of the areas.
Explain the performance differences.
Explain why you would want to write to $TMPDIR.
Lesson 5: Using software on Legion
Questions
How are software packages managed on Legion?
What software is available by default?
How do you find out all the software which is available?
How do you load and run a software package?
How do you make a piece of software always be available to you?
How do you use a program with a graphical user interface?
Objectives
Use module list to see the default modules.
Use module avail to see all the modules.
Load a module that has prerequisites and requires changes to the default modules.
Put a module load command in your .bashrc and start a new shell.
Start an X11 server on your local machine and run nedit on Legion.
Lesson 6: Jobs on Legion
Questions
Why do we have a scheduler and separate compute nodes?
How do you submit a job to the compute nodes?
What resources can you request?
What happens if you don't request a resource explicitly?
What is wallclock time? What happens if your job runs out?
Why must your job have a working directory in Scratch?
When can your job have two things you could refer to as "working directory" and how would you disambiguate between them? (Hint: cd $TMPDIR)
What are two ways to copy your data back from $TMPDIR and how do they differ?
How much memory is a job requesting 12 cores and mem=1G actually asking for?
How do you request a job that needs an MPI environment?
What are some different types of node that Legion contains?
How and when might you need to control what type of node your job runs on?
How can you check on the status of your job?
What does qw, r, Rq, Rr and Eqw mean?
How can you tell why a job is in Eqw state?
How do you delete a job?
How can you check on past jobs?
Objectives
Explain that resources have to be assigned fairly to users.
Write and submit a simple jobscript that leaves some resources as default values.
Use qstat after submitting a job and qstat -j to see what resources you ended up requesting.
Write and submit a jobscript that specifies all resources appropriately.
Write and submit a jobscript that writes to $TMPDIR and copies data back. Understand that SGE's working directory and the working directory for the program you are running inside the script can be different.
Explain that #Local2Scratch happens outside wallclock time, while other copying methods (with or without tar) happen inside it but can be fully customised.
Explain the differences in hardware/intended use for some of Legion's nodes.
Explain why you might want to run on a specific node, and why this is usually unnecessary.
Explain what common qstat statuses mean.
Submit a given faulty jobscript and use qexplain to find out what was wrong.
Use qdel to delete a job.
Use jobhist after a job has ended.
Lession 7: Policies and further resources
Questions
In what circumstances might we remove a user's access to Legion?
What data may and may not be stored on Legion?
When can you use Research Data to back up your data?
If you need more resources than are available by default, what can you do?
If you need to share data on Legion between a group of researchers, how would you go about this?
If you have shared data stored in someone else's directory, whose quota does this affect?
If you have a problem, who should you contact?
Where can you find FAQs?
If your job is not working, what information should you ideally be able to give us?
What community resources are available for you to discuss programming or other problems?
Objectives
Understand Legion's usage policies.
Understand that data you have responsibilities for under the Data Protection Act may not be stored on Legion.
Read Research Data T&C and see if it is useful for you.
Know that the CRAG will discuss additional resource requests and that ones for resources such as wallclock time or priority access must be well-justified and have defined limits.
Understand that data sharing is done by changing its permissions, and a group (often a Research Data group if one exists for the project) can be used.
Understand that quotas are based on file ownership, not where the files reside.
Know where to find help and how much you need to tell us so we can help you.
Know that the Research Programming Hub exists and has a Slack channel which complements but does not replace direct tickets to us.