2 Working on a Unix System - davidaray/test GitHub Wiki

This will serve as your introduction to working on a linux computer system. Please go through this carefully or you WILL get lost down the road.

A note on formatting

Throughout this tutorial, You will notice various text formats. If there is a command you need to type, it will typically be formatted as:

type this command

If I'm trying to represent output you should be seeing on your screen, it will be formatted as:

This is stuff you should see
on your screen.

Sometimes I'll combine the two. You'll get used to it.

Logging on to quanah or ivy

There are various ways to log on to TTU's HPCC. You can do this through the terminal (for Mac or *nix systems), a Unix emulator (CYGWIN) or you can use an SSH client. For simplicity we will use the BitVise SSH Client that has a bundled FTP client. For Mac people you can use your terminal and Filezilla. Before you can do any of this, you should have completed

1] Download and install the appropriate SSH/FTP combo.

Windows: https://www.bitvise.com/

Mac: http://sourceforge.net/projects/filezilla/files/FileZilla_Client/3.14.1/FileZilla_3.14.1_macosx-x86.app.tar.bz2/download?nowrap

2] Set up and save your profile on the SSH client Host Name.

There are two choices of clusters to use: quanah is newer, faster, and has more processors but it’s also used more and has a limited wall time, which limits how long you can run any given job. It’s also usually busier, limiting whether or not you can get processors to use. Ivy is older, slower, and has fewer processors but I have my own queues that do not have wall time limits.

If you can, and there are processors available, choose quanah. If not, go with Ivy and use any of my queues, Chewie, Yoda, and R2D2.

Specific instructions are available at: http://www.depts.ttu.edu/hpcc/userguides/index.php#quanah & http://www.depts.ttu.edu/hpcc/userguides/general_guides/login_general.php.

Use these instructions but substitute ‘ivy’ for ‘quanah’ if you’re using Hrothgar. Save your profile and connect. The same information can be used to save a profile on your FTP client (Filezilla).

For Macs, in your terminal:

ssh <eraider user name>@quanah.hpcc.ttu.edu

or

ssh <eraider user name>@ivy.hpcc.ttu.edu

This guide will refer to quanah for the rest of the text but you can generally just substitute ivy for quanah if you need to.

Getting off the head node

Now that we are logged on, you should notice that we are on the head node.

quanah:$

Do not perform any analyses on the head node. It is a crime punishable by death. It slows everyone down on the entire system. ONE of the ways to move off of the head node is to request an interactive session (via qlogin),

qlogin -q omni -P quanah -pe sm 1

Notice the change in your command prompt. This tells you that you’re are working from a compute node. The change will be something like this:

quanah:$

to

compute-20-10:$

If no nodes are available to use for qlogin, you will get a message saying your qlogin job failed. In this case, you can try doing a qlogin on Hrothgar, or a different Quanah queue.

ssh hrothgar

qlogin –q Chewie –P communitycluster –pe sm 1

You can also check the status of your job request via:

qstat

Filesystem basics

All computer filesystems, whether on Unix systems or desktop PCs, are basically the same. Files are named locations on the computer's storage device. Each filename is a pointer to a discrete object with a beginning and end, whether it's a program that can be executed or simply a set of data that can be read by a program. Directories or folders are containers in which files can be grouped. Computer filesystems are organized hierarchically, with a root directory that branches into subdirectories and subdirectories of subdirectories.

This hierarchical system can help organize and share information, if used properly. Like the taxonomy of species developed by the early biologists, your file hierarchy should organize information from the general level to the specific. Each time the filesystem splits into subdirectories, it should be because there are meaningful divisions to be created within a larger class of files. Why should you organize your computer files in a systematic, orderly way? lt seems like an obvious question with an obvious answer. And yet, a common problem faced by researchers and research groups is failure to share information effectively. Problems with information management often become apparent when a research group member leaves, and others are required to take over his project.

Imagine you work with a colleague who keeps all his books and papers piled in random stacks all over his office. Now imagine that your colleague gets a new job and needs to depart in a hurry leaving behind just about everything in his office. Your boss tells you that you can't throw away any of your colleague's papers without looking at them, because there might be something valuable in there. Your colleague has not organized or categorized any of his papers, so you have to pick up every item, look at it, determine if it's useful, and then decide where you want to file it. This might be a week's work, if you're lucky, and it's guaranteed to be a tough job.

This kind of problem is magnified when computer files are involved. First of all, many highly useful files, especially binaries of programs, aren't readable as text files by users. Therefore, it's difficult to determine what these files do if they're not documented. Other kinds of files, such as files of numerical data, may not contain useful header information. Even though they can be read as text, it may be next to impossible to figure out their purpose.

Second, space constraints on computer system usage are much more nebulous than the walls of an office. As disk space has become cheaper, it's become easier for users of a shared system simply never to clean up after themselves. Many programs produce multiple output files and, if there's no space constraint that forces you t(') clean up while running them, can produce a huge mess in a short time.

How can you avoid becoming this kind of problem for your colleagues? Awareness of the potential problems you can cause is the first step. You need to know what kinds of programs and files you should share with others and which you should keep in your own directories. You should establish conventions for naming datafiles and programs and stick to these conventions as you work. You should structure your fi lesystem in a sensible hierarchy. You should keep track of how much space you are using on your computer system and create usable archives of your data when you no longer need to access it frequently. You should create informative documentation for your work within the filesystem and within programs and datafiles.

The nature of the filesystem hierarchy means that you already have a powerful indexing system for your work at your fingertips. It's possible to do computer-based research and be just as disorganized as that coworker who piles all his books and papers in random stacks all over his office. But why would you want to do that? Without much more effort, you can use your computer's filesystem to keep your work organized.

Moving around the directory hierarchy

Like all modern operating systems, the file hierarchy on a Unix system is structured as a tree. You may be used to this from PC operating systems. Open one folder, and there can be files and more folders inside it, layered as deep as you want to go. There is a root directory, designated as /. The root directory branches into a finite number of files and subdirectories. On a well-organized system, each of these subdirectories contains files and other subdirectories pertaining to a particular topic or system function.

Of course, there's nothing inside your computer that really looks like a tree. Files are stored on various media-most commonly the hard disk, which is a recordable device that lives in your computer. As its name implies, the hard disk is really a disk. And the tree structure that you perceive in Unix is simply a way of indexing what is on that disk or on other devices such as CDs, floppy disks, and Zip disks, or even on the disks of every machine in a group of networked computers. Unix has extensive networking capabilities that allow devices on networked computers to be mounted on other computers over the network. Using these capabilities, the filesystems of several networked computers can be indexed as if they were one larger, seamless filesystem.

Paths to files and directories

Each file on the filesystem can be uniquely identified by a combination of a filename and a path. You can reference any file on the system by giving its full name, which begins with a / indicating the root directory, continues through a list of subdirectories (the components of the path) and ends with the filename. The full name, or absolute path, of a file in someone's home directory might look like this:

/home/daray/gge/exampledirecotry1/file1.txt

The absolute path describes the relationship of the file to the root directory, /. Each name in the path represents a subdirectory of the prior directory, and / characters separate the directory names. Every file or directory on the system can be named by its absolute path, but it can also be named by a relative path that describes its relationship to the current working directory. Files in the directory you are in can be uniquely identified just by giving the filename they have in the current working directory. Files in subdirectories of your current directory can be named in relation to the subdirectory they are part of. From daray's home directory (/home/daray/), he can uniquely identify the file file1.txt as exampledirectory1/file1.txt. The absence of a preceding / means that the path is defined relative to the current directory rather than relative to the root directory.

If you want to name a directory that is on the same level or above the current working directory, there is a shorthand for doing so. Each directory on the system contains two links, ./ and ../, which refer to the current directory and its parent directory (the directory it's a subdirectory of), respectively. If user daray is working in the directory home/daray/exampledirectory1, he can refer to the directory /home/daray/exampledirectory2 as ../exampledirectory2.

Another shorthand naming convention, is that home directory itself. It can be designated simply by ~. For example if you wanted to identify the path to file1.txt, you could simply type ~/exampledirectory1/file1.txt.

Using a process-based file hierarchy

Filesystems can be deep and narrow or broad and shallow. It's best to follow an intuitive scheme for organizing your files. Each level of hierarchy should be related to a step in the process you've used to carry out the project. A filesystem is probably too shallow if the output from numerous processing steps in one large project is all shoved together in one directory. However, a project directory that involves several analyses of just one data object might not need to be broken down into subdirectories. The filesystem is too deep if versions of output of a process are nested beneath each other or if analyses that require the same level of processing are nested in subdirectories. It's much easier to for you to remember and for others to understand the paths to your data if they clearly symbolize steps in the process you used to do the work.

As you'll see in the upcoming example, your home directory will probably contain a number of directories, each containing data and documentation for a particular project. Each of these project directories should be organized in a way that reflects the outline of the project. Each directory should contain documentation that relates to the data within it. That documentation typically takes the form of a README file that has text to describe the contents and, possibly, how they were generated.

Establishing file-naming conventions for your work

Unix allows an almost unlimited variability in file naming. Filenames can contain any character other than the/ or the null character (the character whose binary representation is all zeros). However, it's important to remember that some characters, such as a space, a backslash, or an ampersand, have special meaning on the command line and may cause problems when naming files. Filenames can be up to 255 characters in length on most systems. However, it's wise to aim for uniformity rather than uniqueness in file naming. Most humans are much better at remembering frequently used patterns than they are at remembering unique 255-character strings, after all.

A common convention in file naming is to name the file with a unique name followed by a dot (.) and then an extension that uniquely indicates the file type.

As you begin working with computers in your research and structuring your data environment, you need to develop your own file-naming conventions, or preferably, find out what naming conventions already exist and use them consistently throughout your project. There's nothing so frustrating as looking through old data sets and finding that the same type of file has been named in several different ways. Have you found all the data or results that belong together? Can the file you are looking for be named something else entirely? In the absence of conventions, there's no way to know this except to open every unidentifiable file and check its format by eye. The next section provides a detailed example of how to set up a filesystem that won't have you tearing out your hair looking for a file you know you put there.

Here are some good rules of thumb to follow for file-naming conventions:

  • Files of the same type should have the same extension.
  • Files derived from the same source data should have a common element in their unique names.
  • The unique name should contain as much information as possible about the experiment.
  • Filenames should be as short as is possible without compromising uniqueness.

You'll probably encounter preestablished conventions for file naming in your work. For instance, if you begin working with protein sequence and structure datafiles, you will find that families of files with the same format have common extensions. You may find that others in your group have established local conventions for certain kinds of data files and results. You should attempt to follow any known conventions. Some typical file naming conventions we'll use are:

  • .fa - fasta formatted sequence files
  • .fq - fastq files that have sequence data and quality scores
  • .txt - plain text files
  • .sh - shell scripts
  • .gz - compressed files

Structuring a project: an example

In a typical genome sequencing and assembly project you will encounter several file types. For example, you may want to keep a record of the sample origination information in a spreadsheet (.xlsx). That could be kept in an 'info' directory. Then, after getting the initial sequencing reads, you would want to store those in a 'raw_reads' directory. The reads will eventually be assembled into an assembly but you may use multiple assemblers and/or perform multiple assemblies using any one assembler. Thus, you should have an 'assemblies' directory and any subdirectories might reflect the different assembly methods you used within them. Once you decide on an assembly to use for downstream analyses, you will want to keep those files in a relevant directory called 'data_analysis'. Finally, you will likely be writing several scripts to use fo r your assemblies and analyses. Thus, you will want to store them in a 'scripts' directory. Overall, it may look something like this:

directory_structure

Assuming you're working on the TTU HPCC and keeping your files on the /lustre/work/ system, your file hierarchy would look something like this:

/lustre/work/your username/species_x_assembly

/lustre/work/your username/species_x_assembly/info

/lustre/work/your username/species_x_assembly/info/readme.txt

/lustre/work/your username/species_x_assembly/raw_reads

/lustre/work/your username/species_x_assembly/raw_reads/illumina

/lustre/work/your username/species_x_assembly/raw_reads/illumina/file1_R1.fastq.gz

/lustre/work/your username/species_x_assembly/raw_reads/illumina/file1_R2.fastq.gz

/lustre/work/your username/species_x_assembly/raw_reads/illumina/file2_R1.fastq.gz

/lustre/work/your username/species_x_assembly/raw_reads/illumina/file2_R2.fastq.gz

and so on....

/lustre/work/your username/species_x_assembly/raw_reads/pacbio

/lustre/work/your username/species_x_assembly/raw_reads/pacbio/file1.fastq.gz

/lustre/work/your username/species_x_assembly/raw_reads/pacbio/file2.fastq.gz

and so on....

/lustre/work/your username/species_x_assembly/raw_reads/nanopore

/lustre/work/your username/species_x_assembly/raw_reads/nanopore/file1.fastq.gz

/lustre/work/your username/species_x_assembly/raw_reads/nanopore/file2.fastq.gz

and so on....

/lustre/work/your username/species_x_assembly/assemblies

/lustre/work/your username/species_x_assembly/assemblies/unicycler_assemblies

/lustre/work/your username/species_x_assembly/assemblies/unicycler_assemblies/k25

/lustre/work/your username/species_x_assembly/assemblies/unicycler_assemblies/k25/file1.fa

/lustre/work/your username/species_x_assembly/assemblies/unicycler_assemblies/k25/file2.txt

and so on...

/lustre/work/your username/species_x_assembly/scripts

/lustre/work/your username/species_x_assembly/scripts/SNP_discovery

/lustre/work/your username/species_x_assembly/scripts/SNP_discovery/

and so on...

Commands for Working with Directories and Files

Now that you have the basics of filesystems, let's dig into the specifics of working with files and directories in Unix. In the following sections, we cover the Unix commands for moving around the filesystem, finding files and directories, and manipulating files and directories.

As we introduce commands, we'll show you the format of the command line for each command (for example, "Usage: man name"), and describe the effects of some options we find most useful.

Moving around the filesystem

When you open a window on a Linux system, you see a command prompt:

$

Command prompts can look different depending on the configuration of your system and your shell. For example, when I log in to TTU's HPCC (Quanah), I see:

quanah:$

Whatever the style of the command prompt, it means that your computer is waiting for you to tell it to do something. If you type an instruction at the prompt and press the Enter key, you have given your computer a command. Unix provides a set of simple navigation commands and commands for searching your filesystem for particular files and programs. We'll discuss the format of commands more thoroughly in Chapter 5. In this chapter, we'll introduce you to basic commands for getting around in Unix.

You are here: pwd

pwd stands for "print working directory," and that's exactly what it does. pwd sends the full pathname of the directory you are currently in, the current working directory, to standard output -it prints to the screen. You can think of being "in" a directory in this way: if the directory tree is a map of the filesystem, the current working directory is the "you are here" pointer on the map.

When you log in to the system, your "you are here" pointer is automatically placed in your home directory. Your home directory is a unique place. It contains the files you use almost every time you log into your system, as well as the directories that you create to store other files. What if you want to find out where your home directory is in relation to the rest of the system? Typing pwd at the command prompt in your home directory should give output something like:

quanah:$ pwd

/home/<your username>

This means that your home directory is a subdirecrory of the home directory, which in turn is a subdirectory of the root (/) directory.

Changing directories with cd

Usage: cd pathname

The cd command2 changes the current working directory. The only argument commonly used with this command is the pathname of a directory. If cd is used without an argument, it changes the current working directory to the user's home directory.

In order for these "you are here" tools to be helpful, you need to have organized your filesystem in a sensible way in the first place, so that the name and location of the directory that you're in gives you information about what kind of material can be found there. Most of the filesystem of your machine will have been set up by default when you installed Linux, but the organization of your own directories, where you store programs and data that you use, is your responsibility.

Finding Files and Directories

Unix provides many ways to find files, from simply listing out the contents of a directory to search programs that look for specified filenames and the locations of executable programs.

Listing files with ls

Usage: ls [-options ] pathname

Now that you know where you are, how do you find out what's around you? Simply typing the Unix list command, ls, at the prompt gives you a listing of all the files and subdirectories in the current working directory. You can also give a direcrory name as an argument to Is. It then prints the names of all files in the named directory.

If you have a directory that contains a lot of files, you can use ls combined with the wildcard character * (asterisk) to produce a partial listing of files. There are several ways to use the•. If you have files in a series (such as ch1 to ch14), or files with common characters (like those ending in .txt), you can use * to specify all of them at once. When given as the argument in a command, * takes the place of any number of characters in a filename. For example, let's say you're looking for files called seq 11, seq25, and seq34 in a directory of 400 files. Instead of scrolling through the list of files by eye, you could find them by typing:

ls seq*

What if in that same directory you wanted to find all the text files? You know that text files usually end with .txt, so you can search for them by typing:

ls *.txt

-a Lists all the files in a directory, even those preceded by a dot. Filenames beginning with a dot (.) aren't listed by ls by default and consequently are referred to as hidden files. Hidden files often contain configuration instructions for programs, and it's sometimes necessary to examine or modify them.

-R Lists subdirectories recursively. The content of the current directory is listed, and whenever a subdirectory is reached, its contents are also explicitly included in the listing. This command can create a catalog of files in your filesystem.

-1 Lists exactly one filename per line, a useful option. A single-column listing of all your source datafiles can quickly be turned into a shell script that executes an identical operation on each file, using just a few regular-expression tricks.

-F Includes a code indicating the file type. A / following the filename indicates that the file is a directory,• indicates that the file is executable, and @ following the filename indicates that the file is a symbolic link.

-s Lists the size of the file in blocks along with the filename.

-t Lists files in chronological order of when they were last modified.

-l Lists files in the long format.

I often make use of ls -lhrt. This will give you a list of all the files in your directory with file sizes in human readable format (h). The r and t options sort the files by reverse time-stamp order (newest files are at the bottom of the list).

The list below was generated using the -lhrt options in my own home directory:

ls -lhrt samtools/

drwxr-xr-x 2 daray bio 4.0K Aug 2 14:04 lz4

-rw-r--r-- 1 daray bio 30K Aug 2 14:04 bam.o

-rw-r--r-- 1 daray bio 18K Aug 2 14:04 bam_import.o

-rw-r--r-- 1 daray bio 14K Aug 2 14:04 bam_aux.o

-rw-r--r-- 1 daray bio 21K Aug 2 14:04 sam_utils.o

-rwxr-xr-x 1 daray bio 5.4M Aug 2 14:04 samtools

-rw-r--r-- 1 daray bio 24K Aug 2 14:04 sam_opts.o

-rw-r--r-- 1 daray bio 76K Aug 2 14:04 sam_header.o

-rw-r--r-- 1 daray bio 45K Aug 2 14:04 libst.a

-rw-r--r-- 1 daray bio 180K Aug 2 14:04 libbam.a

-rw-r--r-- 1 daray bio 11K Aug 2 14:04 bam_plbuf.o

drwxr-xr-x 2 daray bio 4.0K Aug 2 14:05 misc

drwxr-xr-x 18 daray bio 4.0K Aug 2 14:05 test

drwxr-xr-x 3 daray bio 4.0K Aug 2 14:05 share

drwxr-xr-x 2 daray bio 4.0K Aug 2 14:05 bin

The first 10 characters in the line give information about file permissions. The first character describes the file type. You will commonly encounter three types of files: the ordinary file (represented by-), the directory (d), and the symbolic link (I).

The next nine characters are actually three sets of three bits containing file permission information. The first three characters following the file type are the file permissions for the user. The next set are for the user's group, and the final set are for users outside the group. The character string rwxrwxrwx indicates a file is readable (r ), writable (w), and executable (x) by any user. We talk about how to change file permissions and file ownership later.

The next column in the long format file listing tells you how many links a file has; that is, how many directory listings for that file exist on the filesystem. The same file can be named in multiple directories. In a future section we talk about how to create links (directory listings) for new and existing files.

The next two columns show the ownership of the file. The owner of the files in the preceding example is me, 'daray', a member of the larger 'bio' group.

The next three columns show the size of the file in human readable format, and the date and time that the file was last modified. The final column shows the name of the file or directory.

Finding files with find

Usage: find pathname list - [test] criterion

The find command is one of the most powerful, flexible, and complicated commands in the standard set of Unix programs. find searches a path or paths for files based on various tests. There are over 20 different tests that can be used with find; here are a few of the most useful:

-print This test is always true and sends the pathname of the current file to standard output. -print should be the last command specified in a line, because, as it's always true, it causes every file in the pathname being searched to be sent to the list if it comes before other tests in a sequence.

-name This is the test most commonly applied with find and the one that is the most immediately useful. find -name weasel.txt -print lists to standard output the full pathnames of all files on the filesystem named weasel.txt. The wildcard operator * can be used within the filename criterion to find files that match a given substring. find -name weas* -print finds not only weasel.txt, but weasel.c and weasel.

-user This test finds all files owned by the specified user.

-group This test finds all files owned by the specified group.

-ctime n This test is true if the current file has been changed _n _days ago. Changing a file refers to any change, including a change in permissions, whereas modification refers only to changes to the internal text of the file. -atime and -mtime tests, which check the access and modification times of the files, are also available.

Performing two _find _tests one after another amounts to applying a logical "and" between the tests. A -o between tests indicates a logical "or." A slash ( / ) negates a command, which means it finds only those files that fail the test.

_find _can be combined with other commands to selectively archive or remove particular files from a filesystem. Let's say you want a list of every file you have modified in your home directory and all subdirectories in the last week:

find N -type f -mtime -7 -print

Changing the type to d shows only new directories; changing the -7 to +7 shows all files modified more than a week ago. Now let's go back to the original problem and find executable files. One way to do this with find is to use the following command:

find/ -name progname -t ype f -exec ls -alF '{' ';'

This example finds every match for _progname _and executes ls -alF FullPathName for every match. Any Unix command can be used as the object of -exec. Cleanup of the /tmp directory, which is usually done automatically by the operating system, can be done with this command:

_find /tmp -type f -mtime +1 -exec rm -rf ' {' ' ;' _

This deletes everything that hasn't been modified within the last day. As always, you need to refer to your manual pages, or man pages, for more details.

Finding an executable file with which

Usage: which progname

The _which _command searches your current path and reports the full path of the program that executes if you enter _progname _at the command prompt. This is useful if you want to know where a program is located, if, for instance, you want to be sure you're using the right version of the program. _which _can't find a program in a directory that isn't in your path.

Finding an executable file with whereis

Usage: where is -[options] progname

The _whereis _command searches a standard set of directories for executables, manpages, and source files. Unlike which, _whereis _isn't dependent on your path, but it looks for programs only in a limited set of directories, so it doesn't give a definitive answer about the existence of a program.

Manipulating Files and Directories

Of course, just as with the stacks of papers on your desk, you periodically need to do some housekeeping on your files and directories to keep everything neat and tidy. Unix provides commands for moving, copying, and deleting files, as well as creating and removing directories.

Copying files and directories with cp

Usage: cp - [options] source destination

The _cp _command makes a copy of a source file at a destination. If the destination is a directory, the source can be multiple files, copies of which are placed in the destination directory. Frequently used options are -R and -r. Both copy recursively; that is, they copy the source directory and all its subdirectories to the destination. The _-R _option prevents cp from following symbolic links; only the link itself is copied. The -r option allows _cp _to follow symbolic links and copy all files it finds. This can cause problems if the symbolic links happen to form a circular path through the filesystem.

Normally, new files created by cp get their file ownership and permissions from your shell settings. However, the POSIX version of _cp _provides an -a option that attempts to maintain the original file attributes.

Moving and renaming files and directories with mv

Usage: mv source destination

The _mv _command simply moves or renames source to destination. Files and directories can both be either source or destination. If both source and destination are files or both are directories, the result of _mv _is essentially that the file or directory is renamed. If the destination is a directory, and the intention is to move already existing files or directories under that directory in the hierarchy, the directory must exist before the _mv _command is given. Otherwise the destination is created as a regular file, or the operation is treated as a renaming of a directory. One problem that can occur if _mv _isn't used carefully is when source represents a file list, and destination is a preexisting single file. When this happens, each member of source is renamed to destination and then promptly overwritten, leaving only the last file of the list intact. At this point, it's time to look for your system administrator and hope there is a recent backup.

Creating new links to files and directories with ln

Usage: ln - [options] source destination

The In command establishes a link between files or directories at differe~t locations in the directory tree. While creating a link creates the appearance of a new file in the destination location, no data is actually copied. Instead, what's created is a new pointer in the filesystem index that allows the source file to be found at more than one location "on the map."

The most commonly used option, -s, creates a symbolic link (or symlink) to a file or directory, as in the following example:

ln -s perl5.005_03 perl

This allows you to type in just the word perl rather than remembering the entire version nomenclature for the current version of Perl.

Another common use of the _ln _command is to create a link to a newly compiled binary executable file in a directory in the system path, e.g., /usr/local/bin. Doing this allows you to run the program without addressing it by its full pathname.

Creating and removing directories with mkdir and rmdir

Usage: mkdir -[options] dirname

Usage: rmdir -[options] dirname

New directories can be created with the _mkdir _command, which has only two command-line options.

mkdir -p creates a directory and any intermediate components of the path that are missing. For instance, if you decide to create a directory mustelidae/weasels in his home directory, but the intermediate directory mustelidae doesn't exist, mkdir -p creates the intermediate directory and its subdirectory weasels.

mkdir -m mode creates a directory with the specified file-permission mode.

_rmdir _removes a directory if it's empty. With the -p option, _rmdir _removes all the empty directories in a given path. If you decide to remove the directory mustelidae/weasels, and directory _mustelidae _is empty except for directory weasels, rmdir -p ~/mustelidae/weasels removes both _weasels _and its parent directory mustelidae.

Removing files with rm

Usage: rm - [options] files

The rm command removes files and directories. Here are its common options:

-f Forces the removal of files without prompting. You still can't remove files you don't own, but the write permissions on files you do own are ignored. For example, rm -f a* deletes all files starting with the letter a, but doesn't delete any subdirectories.

-i Gets you into interactive mode. Prompts you with rm: remove filename? Files are removed only if you begin your answer with a y or Y.

-r (recursive option) Removes all directories and subdirectories in the list of files. Symbolic links aren't traversed; only the symlink itself is removed.

-v (verbose option) Echoes the names of all files/directories that are removed.

While _rm _is a fairly simple command, there are a few instances in which it can cause serious problems for the careless user.

The command rm * removes all files in a directory. Unless you have the files set as read-only or have the interactive flag set, you will delete everything in the directory. Of course this isn't as bad as using the command_ rm -r *_ or rm -rf *, the last of which overrides any read-only file modes, traverses down through your directories and deletes everything in your current directory or below.

Occasionally you will find that you create odd files in your directories. For instance, you might have a file named -myfile where the - is part of the filename. Try deleting it, and you will get an error message concerning the fact that _rm _doesn't have a -m option. Your shell program interprets the -m as a command flag, not part of the filename. The solution to this problem is trivial but not always instantly apparent: simply provide a more complete path to the file, such as rm ./-myfile or rm /home/thisis/-myfile. Similar solutions are needed if you accidentally create a file with a space in the name.

Changing file/directory permissions with 'chmod' command (modified from https://www.guru99.com/file-permissions.html)

Say you do not want your colleague to see your personal images. This can be achieved by changing file permissions.

We can use the _chmod _command which stands for 'change mode'. Using the command, we can set permissions (read, write, execute) on a file/directory for the owner, group and the world.

Usage: chmod permissions filename

There are 2 ways to use the command: Absolute mode and Symbolic mode

Absolute (Numeric) Mode

In this mode, file permissions are not represented as characters but a three-digit octal number. The table below gives numbers for all for permissions types.

Number Permission Type Symbol
0 No Permission ---
1 Execute --x
2 Write -w-
3 Execute + Write -wx
4 Read r--
5 Read + Execute r-x
6 Read +Write rw-
7 Read + Write +Execute rwx

Perhaps you have a file, text.txt.

chmod 764 text.txt

The above command will change permissions as follows:

  • Owner can read, write and execute
  • Usergroup can read and write
  • World can only read

This is shown as '-rwxrw-r-.

Symbolic Mode

In the Absolute mode, you change permissions for all 3 owners. In the symbolic mode, you can modify permissions of a specific owner. It makes use of mathematical symbols to modify the file permissions.

Operator Description
+ Adds a permission to a file or directory
- Removes the permission
= Sets the permission and overrides the permissions set earlier.

The various owners are represented as

User Denotations
u user/owner
g group
o other
a all
We will not be using permissions in numbers like _755 _but characters like rwx.

chmod o=rwx text.txt allows the other users to read, write, and execute the file.

chmod u-r text.txt removes read permission from the user (owner).

⚠️ **GitHub.com Fallback** ⚠️