bash awk - ghdrako/doc_snipets GitHub Wiki

Sprawdzenie czy wiersze maja dlugosc inna niz 1600 bajtow

cat esextrac10a.txt|awk '{ print length }'|grep -v 1600

AWK

AWK program structure

BEGIN {...}
CONDITION { action }    # do action to line matching condition
CONDITION { action }
END {...}
awk -F, '{print $3}'
awk '{s+=$3} END {print s}' # sum numbers in 3rd column
awk '{ lenght($0)>80 {print}'
# running accumulators for multiple types of things - how many processes each user has
ps aux|awk '{count[$1]++} END { for (u in count) { print u, ": ", count[u]}}'
num_lines=$(wc -l $file | awk '{print $1}')
#!/usr/bin/awk -f
BEGIN { FS = "," }
$1 > ymin && $1 < ymax && $2 > xmin && $2 < xmax {print $2 "," $1 "," $3}
awk -F',' '{print $17,$15,$18,$16}' flightdelays.csv # select and change order of column
awk -F',' '{print $17"___"$15"---"$18","$16}' flightdelays.csv |head
cut -d',' -f15 flightdelays.csv |awk '{sss+=$1} END {print sss}' # sum

Awk's Programming Model

An awk program consists of what we will call a main input loop. A loop is a routine that is executed over and over again until some condition exists that terminates it. The main input loop in awk is a routine that reads one line of input from a file and makes it available for processing. The actions you write to do the processing assume that there is a line of input available.The main input loop is executed as many times as there are lines of input.

Awk allows you to write two special routines that can be executed before any input is read and after all input is read. These are the procedures associated with the BEGIN and END rules, respectively.

Awk makes the assumption that its input is structured. In default it takes each input line as a record and each word, separated by spaces or tabs, as a field.

awk '{ print $2, $1, $3 }' names
awk -F"\t" '{ print $2 }' names
echo a b c d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) }'  # print c

Built-in Variables that Control awk

Name Default Description
FS " " field separator (similar to delimiters/IFS)
RS "\n" record separator
OFS " " output field separator
ORS "\n" output record separator
SUBSEP
IGNORECASE
FILENAME name of the file that awk is currently reading
FNR current record number in the current file
NF the number of fields in the current input record
NR the number of input records awk has processed since the beginning of the program’s execution

Print a blank line after each line of a file by changing the ORS, from default of one newline to two newlines

$ cat columns.txt | awk  BEGIN { ORS = \n\n } ; { print $0  } 
$ x=a b c d e
$ echo $x |awk -F {print $1}
a
$ echo $x |awk -F {print NF}
5
$ echo $x |awk -F {print $0}
a b c d e
$ echo $x |awk -F {print $3, $1}
c a

Change the FS (record separator) to an empty string to calculate the length of a string

$ echo abcde | awk BEGIN { FS =  } ; { print NF }
5

Equivalent ways to specify test.txt as the input file for an awk command:

awk < test.txt ̍{ print $1 }
awk ̍{ print $1 } < test.txt
awk ̍{ print $1 }  test.txt
cat test.txt | awk ̍{ print $1 }  # inefficient not use

Format output - align column

$ cat columns2.txt
one two
three four
one two three four
five six
one two three
four five

$ awk 
{
# left-align $1 on a 10-char column
# right-align $2 on a 10-char column
# right-align $3 on a 10-char column
# right-align $4 on a 10-char column
printf(̎%-10s*%10s*%10s*%10s*\n, $1, $2, $3, $4)
}
 columns2.txt

one    *  two*    *      *
three  *  four*   *      *
one    *  two*    three* four*
five   *  six*    *      *
one    *  two*    three* *
four   *  five*   *      *

CONDITIONAL LOGIC AND CONTROL STATEMENTS

echo ̎̎ | awk ̍
BEGIN { x = 10 } 
{
  if (x % 2 == 0) {
      print ̎x is even̎
  }
  else {
      print ̎x is odd̎
  }
}

The preceding code block initializes the variable x with the value 10 and prints “x is even” if x is divisible by 2, otherwise it prints “x is odd.”

Field Splitting

FS = "[':\t]"

Escape sequence

Sequence Description
\ddd Character represented as 1 to 3 digit octal value
\xhex Character represented as hexadecimal value[3]
\c Any literal character c (e.g., " for ")[4]

Variable

Variable is an identifier that references a value. Each variable has a string and numeric value and awk use the appropriate value based on the context expresion.(String that not consist of numbers have numeric value 0) Awk automaticaly inicialize variable to empty string, which acyt like 0 if used as a number.

# Count blank lines in file
/^$/ {
print ++x
}
END {
print x
}

awk -f awkscr test
awk 'BEGIN { print "Hello, world" }' # BEGIN pattern specifies actions that are performed before the first line of input is read.
awk '$1 ~ /pattern/ { ... }' infile      # match lines to regex pattern
awk '{if($1 ~ /pattern/) { ... }}' infile # matching for conditions
awk '{print $(NF - 1)}' # 
awk '{if (length!=850) { print NR}}' MRCP20160428_2
awk '{if (length!=850) { print }}' MRCP20160428_2 > bledne_linie.txt
awk '{print length}' MRCP20160427_2|sort|uniq -
awk -F',' '{print $17,$15,$18,$16}' flightdelays.csv # select and change order of column
awk -F',' '{print $17"___"$15"---"$18","$16}' flightdelays.csv |head
cut -d',' -f15 flightdelays.csv |awk '{sss+=$1} END {print sss}' # sum
awk '{action}' your_file_name.txt
awk '/regex pattern/{action}' your_file_name.txt
awk '{print $0}' information.txt  # equivalent cat command - show file
awk '{print NR,$0}' information.txt  # add line-number to each line

# awk determines where each column starts and ends is with a space, by default.
awk '{print $1}' information.txt # print first column
awk '{print $2}' information.txt # print second column
awk '{print $1, $4}' information.txt
awk '{print $NF}' information.txt  # print lat column
awk '{print $1}' information.txt | head -1  # first column first line
awk '{print $1}' information.txt | head -2 

awk '/^O/' information.txt
awk '/0$/' information.txt 
awk '! /0$/' information.txt 
awk ' /io/{print $0}' information.txt  # look for words containing on
awk '/IT/' information.txt 
awk '/IT/{print $1, $2}' information.txt  # first and last names of the people working in IT
awk '/N\/A$/' information.txt  # find lines that end with the pattern N/A
awk '$3 <  40 { print $0 }' information.txt #  find all the information of employees that were under the age of 40
1 fristName     lastName        age     city       ID
2 
3 Thomas        Shelby          30      Rio        400
4 Omega         Night           45      Ontario    600
5 Wood          Tinker          54      Lisbon     N/A
6 Giorgos       Georgiou        35      London     300
7 Timmy         Turner          32      Berlin     N/A
$ awk -F ',' '{sp=$9 * $10;cp=$9 * $11; {printf "%f,%f,%s,%s \n",
sp,cp,$1,$2 }}' sales_100.csv

# Return all rows starting with A in Column 1
$ awk -F ',' '$1 ~ /^A/ {print}' sales_100.csv
# Return all rows which has Space in Column 1
$ awk -F ',' '$1 ~ /\s/ {print}' sales_100.csv

AWK also has functionality to change the column and row delimiter

  • OFS : Output Field Separator
  • ORS : Ouput Row Separator
$ awk -F ',' 'BEGIN{OFS="|";ORS="\n\n"} $1 ~ /^A/ {print substr($1,1,4),$2,$3,$4,$5}'  sales_100.csv

Build-in function

 awk -F ',' 'BEGIN{OFS="|";ORS="\n"} $1 ~ /^A/ {print tolower(substr($1,1,4)),tolower($2),$3,$4,$5}'  sales_100.csv

create a separate file with business logic and it call be called with AWK

$ vi businesslogic.awk
{sp=$9 * $10;cp=$9 * $11; {printf "%f,%f,%s,%s \n",sp,cp,$1,$2 }}

$ awk -F ',' -f businesslogic.awk sales_100.csv