glossary - raeker/ARC-Wiki-Test GitHub Wiki
Unix-like means that an operating system uses a similar set of software to control its basic functions as Unix, a 1970s operating system owned by Bell Labs. There are many Unix-like operating systems, some commercial and some free and open-source. Some popular Unix-like systems are GNU/Linux, BSD, Darwin, and Solaris. Flux and Great Lakes run GNU/Linux, MacOS is built on Darwin, and Ubuntu is also built on GNU/Linux.
Sometimes people will refer to a GNU/Linux system as Unix, which can be confusing if you were born later than the 1990s. Even though GNU's not Unix, for our purposes we can consider them to be the same. When someone refers to something as Unix or Unix-like, they are referring to things like the users, groups, permissions, and shell which are all very similar across Unix-like systems.
Unix uses systems of classification called users and groups to identify people and give permissions to those who need them. A user is an object possessing a username and a password used to identify a person when they access a Unix-like system, such as Flux. Files and folders have permissions specifying which users can access them and what kind of access they have, like reading, writing, and executing. A group is an object containing a list of users. Files and folder can have permissions for groups as well as users, so adding a user to a group will give them the same permissions as the group.
The Bash Reference Manual at gnu.org{.external-link} is helpful. Also, before getting deep into Bash, consider using Python{.external-link} instead if you need to write lots of scripts.
Some terms that can be helpful when getting started are directory, permissions, users, groups, nano, environment variables, arguments, flags, piping, grep, man pages, wild cards, escape characters, shell scripts, variables, for loops, operators, if statements, redirection, expansions, bashrc, and cron:
directory - a type of UNIX file that serves as a container for other files.
permissions - the constraints a user places on a file to control what other users or groups may read, write, or execute the file.
users - individuals that has access to system resources.
groups
- a) a collection of users who can share access authorities for protected resources,
- b) a list of names that are known together by a single name,
- c) a set of related records that have the same value for a particular field in all records,
- d) a series of records logically joined together.
[nano] - a simple text editor built into the UNIX shell.
[environment variables] - symbols containing information that can be used by shells or commands. Environment variables are available to all processes in a given process gourd; that are propagated by the creation of a child process.
[arguments] - arguments, also known as parameters, allow the user to either control the flow of the command they entered or to specify the input data for the command.
[flags] - another term for arguments.
[piping] - a series of commands separated by pipes. A pipe is the ' | ' symbol and it can be placed in between commands on the command line to ensure all the commands are run in that order.
[grep] - a command that is used to search for specified lines containing characters that match specified patterns, and then write those matching lines to standard output.
[man pages] - these are reference pages built into the shell. Use the man command followed by the command name you want information about: man <command name>.
[wild cards] - metacharacters that are used to allow wildcard matching in file names or regular expressions. Examples of metacharacters are *, $, /, and ?.
[escape characters] - a character used to prevent a character from being interpreted literally. For example, the escape (\) preceding a character tells the shell to interpret that character literally so \n means new line and not "n".
[shell scripts] - files containing shell commands. Shell scripts are executed via a command line or from within another script.
[variables] - a symbol whose value is allowed to change and gets assigned to a specific value.
[for loops] - a for loop operates on a list of items. It repeats a set of commands for every item in a list.
[operators] - a character that is interpreted to mean something other that its literal meaning.
[if statements] - a statement that evaluates one or more variables or conditions and uses the result to choose amongst several possible paths through the code.
[redirection] - the specifying of one or more of the devices with which the standard input, standard output, and standard error virtual files are to be associated with the command being run.
[expansions] - these are characters used to expand filenames. Much like wildcard characters. For example, typing $echo /ss* would return all of the files and directories that begin with ss within the current directory.
[bashrc] - this is a script file that runs whenever bash is started interactively. This means that you can interact directly with the bash shell and enter commands.
[cron] - background service that schedules tasks to occur at certain times.
Active Directory (AD) is an authentication server used by Windows systems. It stores Windows users and groups, populating them to Windows systems on login. UMROOT is a umich AD server run by ITS, and it copies objects and changes to objects from MCommunity.
A group identification (GID) is a number used to link groups across MCommunity and multiple Unix and AD instances. Different systems and authentication servers may have groups with the same name but different purposes, but GIDs can be used to share a group identity across multiple systems. You can use the MCommunityGIDNumber API{.external-link} to reserve GIDs. There are also instructions here{.createlink}.
Kerberos is another type of authentication server, commonly used by umich web services. Lots of umich authentication servers and third party services like Google will pass thru authentication to kerberos.
MCommunity has users and groups similar to Unix, but can be used on many systems across the university. Your uniqname is your username as an MCommunity user, and hpc-support is an MCommunity group used to receive emails and create tickets, rather than passing permissions and emails to users.
Host Channel Adapter. The industry name for an InfiniBand card. An expansion card installed in a computer or appliance which allows that computer to connect to an InfiniBand network. Note that the term HCA is sometimes used to mean other things in non-InfiniBand contexts.
InfiniBand. A high-speed network often used to interconnect servers and storage equipment in a cluster.
Open Fabrics Enterprise Distribution. A stack of drivers, libraries, and utilities which support the use of InfiniBand hardware and networks.
This refers to an account used for a user to login to flux-login. That is all it can do.
An account used to purchase resources on the Flux cluster.
An allocation is the the paid resources that is tied to an account. I.E. You can have an account, but without an allocation there's nothing you can do with that account.
A computing cluster is a group of computers used to run parallel programs, programs split into many threads across many computers. This is not to be confused with a computing grid, in which the computers are located far from each other and cannot efficiently share data amongst themselves.
An export is a mount on a cluster of a volume from some storage service. You can use export as a verb, like "a user wants to export their Turbo volume to their Armis allocation". You can also use it as a noun, like "a user is requesting we remove access to the Locker export /nfs/locker/mcity/ from his Flux allocation mcity_fluxm".
Apache Hadoop is a framework that allows for fast parallel computing. Its features include a distributed filesystem, fault tolerance, and rack awareness. It's not an operating system, it's just a collection of programs, libraries, and a sort of simulated filesystem.
Map Reduce is a library which allows for breaking down data processing tasks into many smaller tasks. This is useful for parallel programming.
A web interface for viewing billing stuff
A computing node is a single computer in a cluster. We have login nodes which are systems used to write, test, and submit jobs to the queue. When a job in the queue runs, it is distributed across multiple computing nodes. There are also file transfer nodes, which are used to transfer large amounts of data quickly between storage services.
Parallel computing involves writing your program to run not just on multiple threads, but on multiple computers, or nodes. It can also involve communication between nodes while the program is running, but doesn't have to. Unlike threaded computing, in parallel computing each processor uses memory from its own machine. Parallel computing that involves communication between nodes requires the nodes to be located physically close together (a computing cluster) and have high bandwidth local network connections.
A job scheduler controls a queue of jobs submitted by users. The scheduler runs jobs when reach a certain priority rank and enough resources become available. It also notifies users about the status of their job. Jobs are typically submitted to a scheduler's queue by writing a bash script with syntax specific to that scheduler. Flux uses a scheduler called Moab, and Great Lakes will use a scheduler called Slurm.
Threaded computing involves writing your program to run on multiple cores simultaneously. This dramatically speeds up tasks that can be split up into smaller tasks. For instance, inter-frame video encoding cannot be threaded well because each frame depends on information from the previous frame, but intra-frame video encoding can be threaded because each frame is compressed independently from other frames.
A volume is a reserved amount of space on a storage service. For example a user might purchase a DataDen volume to archive many terabytes of research data while they work on a long-term project using Flux.
blender - 3D modelling and animation
cmake - makes compiling more convenient
cuda - nvidia GPU libraries
ffmpeg - video/audio/subtitle transcoder
fftw - fast Fourier transform
gcc - GNU C compiler
git - version control
hive - hadoop interpreter
kibana - data visualization
make - makes compiling more convenient
mathematica - math
matlab - interpreter
meme - discovers motifs in DNA sequences
moab - scheduler used on Flux
mpi4py-dev - parallel programming library for Python
openmpi - parallel programming library for C
p7zip - clone of 7zip
parallel - shell tool for executing parallel jobs
R - statistics
rclone - sync files between computers
slurm - scheduler used on Beta and Great Lakes
stata - statistics
vim - the best text editor
wget - download files
wine32 - Windows compatibility layer
yasm - assembler