Processing Text Using Filters - Paiet/Tech-Journal-for-Everything GitHub Wiki

  • File-Combining Commands
    • cat (Concatenate)
      • Combines files together
      • Files combined one after another
      • Can also display the contents of a file
        • Combines file with STDOUT
      • Using cat:
        1. Combine two files together
          cat first.txt second.txt > combined.txt
        2. Display the contents of first.txt
          cat first.txt
        3. Display the contents of second.txt
          cat second.txt
        4. Display the contents of combined.txt
          cat combined.txt
    • join
      • Combines files together
      • Files combined based on fields
      • Useful for building tables
      • Joins on first column by default
      • Using join:
        1. Display two files together
          join listing1.1.txt listing1.2.txt
    • paste
      • Similar to join except that it combines data by inserting a tab in between the first and second data set
      • No column is used for comparison
      • Using paste:
        1. Display two files together paste listing1.1.txt listing1.2.txt
  • File-Transforming Commands
    • expand
      • Converts tabs into spaces
    • unexpand
      • Converts spaces to tabs
      • The opposite of expand
    • od (Octal Dump)
      • Displays a file in Octal (Base 8)
      • Useful for viewing binaries
      • Using od:
        1. Display a file in octal format
          od listing1.2.txt
    • sort
      • Displays data reorganized to suite your needs
      • Using sort
        1. Display the contents of listing1.1.txt sorted by first name
          sort -k 3 listing1.1.txt
    • split
      • Divides a file based on criteria
      • Useful for dividing up large files across smaller media
      • Can split by:
        • Bytes
        • Number of lines
      • Output files will have two letters attached to indicate sequence + filenameaa + filenameab + ... + filenamezy + filenamezz
      • Can use cat to recombine
      • Using split
        1. Divide a file every 2 lines
          split -l 2 listing1.1.txt numbers
    • tr (Translate)
      • Converts or removes characters from a file
      • Using tr
        1. Replace every instance of B in a file to b. In the same command, replace the characters C and J with the character c
          tr BCJ bc < listing1.1.txt
    • uniq (Unique)
      • Displays data excluding duplicate entries
      • Using uniq
        1. Display the contents of a file, excluding duplicate entries. Sort the entries alphabetically.
          sort shakespeare.txt | uniq
  • File-Formatting Commands
    • fmt (Format)
      • Applies manual word-wrapping to a file
      • Defaults to 75 character width
    • nl (Numbered Lines)
      • Adds line numbers to each line
      • Useful for readability
      • Useful for troubleshooting script errors that return a line number
      • Similar to cat -b but with advanced options
      • nl -b a filename.txt
      • a option causes all lines to be numbered, including blank lines
    • pr (Prepare for Printing)
      • Formats a file for output to a line printer
      • Assumes 80 character width and mono-space font
      • Can also set headers, footers, margins, etc.
      • Using pr
        1. Display the contents of a file double-spaced and with line numbering
          cat -n /etc/profile | pr -d
        2. Repeat step 1, but apply word wrapping at 50 characters
          cat -n /etc/profile | pr -dfl 50
  • File-Viewing Commands
    • head
      • Displays the first 10 lines of a file
      • Use -n option to set the number of lines
    • tail
      • Displays the last 10 lines of a file
      • Use -n option to set the number of lines
    • less
      • Displays the contents of a file
      • Allows for scrolling and searching
      • Replacement for more command
  • File-Summarizing Commands
    • cut
      • Extracts portions of a file
      • Usually combined with other commands
      • Using cut
        1. Display only the MAC address for each network interface on your system
          ifconfig | grep ether | cut -d " " -f 10
    • wc (Word Count)
      • Displays the word count for a file

Lab

  • Requirements:
    • Text file named first.txt
      • Contents: Data from first file.
    • Text file named second.txt
      • Contents: Data from second file.
    • Text file named listing1.1.txt
      • Contents:
        555-2397 Beckett, Barry
        555-5116 Carter, Gertrude
        555-7929 Jones, Theresa
        555-9871 Orwell, Samuel
    • Text file named listing1.2.txt
      • Contents:
        555-2397 unlisted
        555-5116 listed
        555-7929 listed
        555-9871 unlisted
    • Text file named shakespeare.txt
      • Contents:
        to be
        or
        not
        to be
        that
        is
        the
        question