UNIX Shell Command Line Programming - clizarraga-UAD7/Workshops GitHub Wiki

Introduction to Command Line Programming.

Nautilus Pompilius

(Image: Nautilus Pompilius. Wikimedia Commons, CC)


UNIX/Linux/MacOS or any other Unix-like Operating systems, include a Command Line Interface via a Terminal emulator program.

The user can interact with the Operating System via a Shell or Command Line Interpreter or also known as Shell, that interprets a sequence of lines of text entered by a user.


UNIX Shells

There is a wide family of available of UNIX shells. We mention the most current used ones that come by default in Linux and MacOS systems:

The Z Shell has backward compatibility with the Bash Shell have similar functionality. To find out which Shell your Terminal is running on, please type:

echo $SHELL (ENTER)

Or to find if bash is in your predefined PATHs, you can type which bash to see if you have available from your environment variables.

You could use the command find to locate a program:

find / -name bash -print, which translates to find from where, what filename and then print.

One of the advantages of working with these Shells is that, they inherit the some editing commands from the GNU Emacs Editor. This allows us to move our cursor to move and "edit" the command line text.

For completeness we summarize these commands, because we will be using them extensively:

Command Action
Positioning
Ctrl+f Moves the cursor one character forward
ESC, f Moves the cursor one word Forward
Ctrl+b Moves the cursor one character backward
ESC, b Moves the cursor one word Backward
Ctrl+a Moves the cursor to the beginning of line
Ctrl+e Moves the cursor to the end of line
Ctrl+p Moves the cursor to the previous line in commands history
Ctrl+n Moves the cursor to the next line in commands history
Memory buffer
Ctrl+k Sends contents of right region after cursor to memory (a.k.a. Kill. Memory keeps only last contents if overwritten)
Ctrl+y Flushes the contents in memory into cursor position (a.k.a. Yank)

Having these commands in mind, will ease our job of command line editing when needed.

Also, take advantage of the word completion capability offered by the Shell, for example in the case of long file/directory names.


Pipes and redirecting output

Command Action
cat file | less Redirects standard output as standard input to next command less
ls -al | tee out.txt Sends output to screen and at the same time writes output to out.txt
echo 'Hello Word!' > hello.txt Redirects standard output and writes it to a file hello.txt
echo 'Hello back! >> hello.txt Appends the phrase to the end of the file hello.txt

Text processing in the Shell

The Shell includes at least the following tools for text processing:

  • cut: A command-line utility that allows you to cut parts of lines from specified files or piped data and print the result to standard output.
  • grep. A command-line utility for searching plain data texts for regular expressions
  • sed. It is a stream editor command-line tool, that parses and transforms text.
  • awk. Is a programming language for text processing and data extraction.
  • perl. Is a general-purpose UNIX scripting language for making reporting easier.

In this tutorial we will focus on: cut, grep, sed and awk only. Perl is out of the scope of this workshop.


Downloading a common text file.

For this purpose, we will download a free text file from the Gutenberg Project: The Raven, a Poem by Edgar Allan Poe. (Please download this short text file from Github)


Cut - A command-line utility for extracting sections of a line.

Reading the Man Pages: : man cut

Syntax:
cut OPTION... [FILE]...

Options description:
-f (--fields=LIST) - Select specifying a field, set of fields, or range of fields (Separated by "TAB").
-b (--bytes=LIST) - Select by specifying a byte, a set of bytes, or a range of bytes.  
-c (--characters=LIST) - Select by specifying a character, set of characters, or range of characters.

You can use one, and only one of the options listed above.

Other options are:
-d (--delimiter) - Specify a delimiter that will be used instead of the default “TAB” delimiter.   
--complement - Display complement the selection. 
-s (--only-delimited) - By default cut prints the lines that contain no delimiter character. 
--output-delimiter - The default of cut is to use the input delimiter as the output delimiter. 

The cut command can accept zero or more input FILE names. When FILE is -, cut reads the standard input.

Examples
echo "Lorem ipsum dolor sit amet consectetur" | cut -d ' ' -f 1,3
echo "Lorem ipsum dolor sit amet consectetur" | cut -d ' ' -f 3-9
echo "Lorem ipsum dolor sit amet consectetur" | cut -c 3-9
echo "Lorem ipsum dolor sit amet consectetur" | cut -d ' ' -f -3
echo "Lorem, ipsum, dolor, sit, amet, consectetur" | cut -d ',' -f 3-
(Execute next 3 lines)
echo "Lorem:ipsum:dolor:sit:amet:consectetur" > lorem.txt
echo "urna:consequat:felis:vehicula:class:ultricies:mollis:dictumst" >> lorem.txt
cut -d ':' -f 3-5 lorem.txt

Grep - print lines that match patterns

The grep filter searches file contents for a particular pattern of characters, and displays all lines that contain that pattern. The pattern that is searched is referred to as the regular expression. (grep = "global search for regular expression and print out")

We can check the Man Pages: man grep

 Syntax: 
 `grep` [options] pattern [files]
 
 Options Description: 
 -c : This prints only a count of the lines that match a pattern. 
 -h : Display the matched lines, but do not display the filenames. 
 -i : Ignores, case for matching. 
 -l : Displays list of a filenames only. 
 -n : Display the matched lines and their line numbers. 
 -v : This prints out all the lines that do not matches the pattern. 
 -e exp : Specifies expression with this option. Can use multiple times. 
 -f file : Takes patterns from file, one per line. 
 -E : Treats pattern as an extended regular expression (ERE) 
 -w : Match whole word. 
 -o : Print only the matched parts of a matching line, 
      with each such part on a separate output line.

 -A n : Prints searched line and nlines after the result. 
 -B n : Prints searched line and n line before the result. 
 -C n : Prints searched line and n lines after before the result. 

Learning common regex metacharacters

Square brackets can be used to define a list or range of characters to be found. So:

[ABC] matches A or B or C. [A-Z] matches any upper case letter. [A-Za-z] matches any upper or lower case letter. [A-Za-z0-9] matches any upper or lower case letter or any digit.

Then there are:

. matches any character. \d matches any single digit. \w matches any part of word character (equivalent to [A-Za-z0-9]). \s matches any space, tab, or newline. \ used to escape the following character when that character is a special character. So, for example, a regular expression that found .com would be \.com because . is a special character that matches any character. ^ is an “anchor” which asserts the position at the start of the line. So what you put after the caret will only match if they are the first characters of a line. The caret is also known as a circumflex. $ is an “anchor” which asserts the position at the end of the line. So what you put before it will only match if they are the last characters of a line. \b asserts that the pattern must match at a word boundary. Putting this either side of a word stops the regular expression matching longer variants of words. So:

  • the regular expression mark will match not only mark but also find marking, market, unremarkable, and so on.
  • the regular expression \bword will match word, wordless, and wordlessly.
  • the regular expression comb\b will match comb and honeycomb but not combine.
  • the regular expression \brespect\b will match respect but not respectable or disrespectful.

Other useful special characters are:

* matches the preceding element zero or more times. For example, ab*c matches ac, abc, abbbc, etc. + matches the preceding element one or more times. For example, ab+c matches abc, abbbc but not ac. ? matches when the preceding character appears zero or one time. {VALUE} matches the preceding character the number of times defined by VALUE; ranges, say, 1-6, can be specified with the syntax {VALUE,VALUE}, e.g. \d{1,9} will match any number between one and nine digits in length. | means or. /i renders an expression case-insensitive (equivalent to [A-Za-z]).

Examples Description
grep -i Raven TheRaven.txt Print lines having the string Raven ignoring case
grep -ivc Raven TheRaven.txt Find the lines not containing the string Raven ignoring case and count lines, words and characters
grep -n '^The' TheRaven.txt Print line number and lines beginning with The
grep 'en!$' TheRaven.txt Print lines ending with en!
grep -in '\bsmil' TheRaven.txt Print the line number and line starting with word smil
grep -c '\bthe\b' TheRaven.txt Count the number of lines that include the word the
grep -c 'ing\b' TheRaven.txt Count the number of lines that include a word ending with ing
`grep -E -w -i 'Raven night' TheRaven.txt`

You can try this regular expressions online tool or this other


Sed (stream editor).

The sed command in UNIX stands for stream editor and it can perform many functions on file like searching, find and replace, insertion or deletion.

We can check the Man Pages: man sed

Syntax:
sed OPTIONS... [SCRIPT] [INPUTFILE...] 

-n, --quiet, --silent. Suppress automatic printing of pattern space
-e script, --expression=script. Add the script to the commands to be executed
-f script-file, --file=script-file. Add the contents of script-file to the commands to be executed
--follow-symlinks. Follow symlinks when processing in place
-i[SUFFIX], --in-place[=SUFFIX]. Edit files in place (makes backup if extension supplied)
-l N, --line-length=N. Specify the desired line-wrap length for the `l' command
--posix. Disable all GNU extensions.
-r, --regexp-extended. Use extended regular expressions in the script.
-s, --separate. Consider files as separate rather than as a single continuous long stream.
-u, --unbuffered. Load minimal amounts of data from the input files and flush the output buffers more often

If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read.

Examples Description
sed -n 's/raven/omen/p' TheRaven.txt Substitute the first occurrence of word raven with omen.
-nsuppresses printing lines that do not match
sed -n 's/floor/ground/gp' TheRaven.txt Do a global substitute of word floor with ground
sed -n 's/[Ff]loor/ground/gp' TheRaven.txt Do a global substitute of word Floor or floor with ground
sed -n 's/floor/ground/gip' TheRaven.txt Do a global case insensitive substitute of word floor with ground
sed -n 's/floor/ground/gip;s/raven/omen/gip' TheRaven.txt Do a double global case insensitive substitutions (using ;)
sed -n '5,9p' TheRaven.txt Print lines 5 thru 9.
sed -n -e '5,10p' -e '19,24p' TheRaven.txt Prints lines 5-9 and 19-24
-eallows adding multiple selections
sed -n '1~3p' TheRaven.txt Starting from line 1, print every third line
sed -n '/^And /p' TheRaven.txt Prints all lines that begin with the word _And _
sed 's/^\(.*\),\(.*\)$/\2,\1 /g' poets.txt Will invert word order on a last name, first name file
gsed '/Di/a --> Inserted!' poets.txt Will append a line with text --> Inserted! if the line contains the expression Di
sed 's/.*/--> Inserted &/' poets.txt Will insert --> Inserted! before matched text
sed 'G' poets.txt Will insert a blank line after each line of text
sed '3d' poets.txt Will delete the 3rd. line
sed '4,5d' poets.txt Will delete a range of lines
sed -i'.bak' '/^.*Di.*$/d' poets.txt Will delete all lines with expression Di and create a backup of original file
sed -i'.bak' '/^.*Di.*$/d' poets.txt > new_poets.txt Similar as above, but the modified file is new_poets.txt

Try this sed (stream editor) online tool.

Note on MacOS sed: Since /bin/bash in MacOS is /bin/zsh, the MacOS sed is not 100% the same as Linux sed. Use Brew to install gnu-sed, then use gsed. Brew.sh is a GNU software package installation system for MacOS and Linux.


AWK (pattern scanning and processing language)

AWK is a full scripting language, as well as a complete text manipulation toolkit for the command line. The awk command was named using the initials of the three people who wrote the original version in 1977: Alfred Aho, Peter Weinberger, and Brian Kernighan.

Check the Man Pages: man awk.

What is possible to do with AWK.

  1. AWK Operations: (a) Scans a file line by line (b) Splits each input line into fields (c) Compares input line/fields to pattern (d) Performs action(s) on matched lines

  2. Useful For: (a) Transform data files (b) Produce formatted reports

  3. Programming Constructs: (a) Format output lines (b) Arithmetic and string operations (c) Conditionals and loops

Syntax:
awk options 'selection _criteria {action }' input-file > output-file

Options:  
-f program-file : Reads the AWK program source from the file 
                  program-file, instead of from the 
                  first command line argument.
-F fs : Use fs for the input field separator

Examples Description
who | awk '{print $3, $4, $5}' Prints fields 3, 4 and 5 of 'who' command output
Field $0 represents whole line, and $NFis the number of fields and it represents the last one
date | awk '{print $2,$3,$NF}' Extracts the day, month and year of date command
date | awk 'OFS="/" {print$2,$3,$6}' Will insert an output field separator in date command output
awk 'BEGIN {print "The Dickinson Family of Poets"} {print $0}' poets.txt Adds a text at the beginning
date | awk 'BEGIN {print "Today is:"} {print $2,$3, $NF}' Prints Today is ... date
awk -F, '{print $1,$2}' poets.txt Prints the two fields of the file, with field separator ,
awk 'BEGIN { print sqrt(625)}' AWK can compute mathematical expressions
awk '/^And/ {print $0}' TheRaven.txt Print all lines starting with the word And
awk '/eyes/{print}' TheRaven.txt Print all lines containing the word eyes
awk 'NR==5, NR==11 {print NR ".- ",$0}' TheRaven.txt Prints from line 5 thru 11, with line number followed with ".- "
awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is",i*i; }' Prints the squares of 1 thru 6

Try this awk online interpreter.


References


Created: 04/13/2022 (C. Lizárraga); Last update: 04/17/2022 (C. Lizárraga)

CC BY-NC-SA