bash awk - ghdrako/doc_snipets GitHub Wiki
- https://en.wikipedia.org/wiki/AWK
- https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
- https://github.com/onetrueawk/awk
- https://www.grymoire.com/Unix/Awk.html
- https://www.grymoire.com/Unix/Awk.html#uh-22
- https://earthly.dev/blog/awk-examples/
- https://ferd.ca/awk-in-20-minutes.html
- https://ketancmaheshwari.github.io/posts/2020/05/24/SMC18-Data-Challenge-4.html
Sprawdzenie czy wiersze maja dlugosc inna niz 1600 bajtow
cat esextrac10a.txt|awk '{ print length }'|grep -v 1600
AWK
AWK program structure
BEGIN {...}
CONDITION { action } # do action to line matching condition
CONDITION { action }
END {...}
awk -F, '{print $3}'
awk '{s+=$3} END {print s}' # sum numbers in 3rd column
awk '{ lenght($0)>80 {print}'
# running accumulators for multiple types of things - how many processes each user has
ps aux|awk '{count[$1]++} END { for (u in count) { print u, ": ", count[u]}}'
num_lines=$(wc -l $file | awk '{print $1}')
#!/usr/bin/awk -f
BEGIN { FS = "," }
$1 > ymin && $1 < ymax && $2 > xmin && $2 < xmax {print $2 "," $1 "," $3}
awk -F',' '{print $17,$15,$18,$16}' flightdelays.csv # select and change order of column
awk -F',' '{print $17"___"$15"---"$18","$16}' flightdelays.csv |head
cut -d',' -f15 flightdelays.csv |awk '{sss+=$1} END {print sss}' # sum
Awk's Programming Model
An awk program consists of what we will call a main input loop. A loop is a routine that is executed over and over again until some condition exists that terminates it. The main input loop in awk is a routine that reads one line of input from a file and makes it available for processing. The actions you write to do the processing assume that there is a line of input available.The main input loop is executed as many times as there are lines of input.
Awk allows you to write two special routines that can be executed before any input is read and after all
input is read. These are the procedures associated with the BEGIN
and END
rules, respectively.
Awk makes the assumption that its input is structured. In default it takes each input line as a record and each word, separated by spaces or tabs, as a field.
awk '{ print $2, $1, $3 }' names
awk -F"\t" '{ print $2 }' names
echo a b c d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) }' # print c
Built-in Variables that Control awk
Name | Default | Description |
---|---|---|
FS | " " | field separator (similar to delimiters/IFS) |
RS | "\n" | record separator |
OFS | " " | output field separator |
ORS | "\n" | output record separator |
SUBSEP | ||
IGNORECASE | ||
FILENAME | name of the file that awk is currently reading | |
FNR | current record number in the current file | |
NF | the number of fields in the current input record | |
NR | the number of input records awk has processed since the beginning of the program’s execution |
Print a blank line after each line of a file by changing the ORS, from default of one newline to two newlines
$ cat columns.txt | awk BEGIN { ORS = \n\n } ; { print $0 }
$ x=a b c d e
$ echo $x |awk -F {print $1}
a
$ echo $x |awk -F {print NF}
5
$ echo $x |awk -F {print $0}
a b c d e
$ echo $x |awk -F {print $3, $1}
c a
Change the FS (record separator) to an empty string to calculate the length of a string
$ echo abcde | awk BEGIN { FS = } ; { print NF }
5
Equivalent ways to specify test.txt as the input file for an awk command:
awk < test.txt ̍{ print $1 }
awk ̍{ print $1 } < test.txt
awk ̍{ print $1 } test.txt
cat test.txt | awk ̍{ print $1 } # inefficient not use
Format output - align column
$ cat columns2.txt
one two
three four
one two three four
five six
one two three
four five
$ awk
{
# left-align $1 on a 10-char column
# right-align $2 on a 10-char column
# right-align $3 on a 10-char column
# right-align $4 on a 10-char column
printf(̎%-10s*%10s*%10s*%10s*\n, $1, $2, $3, $4)
}
columns2.txt
one * two* * *
three * four* * *
one * two* three* four*
five * six* * *
one * two* three* *
four * five* * *
CONDITIONAL LOGIC AND CONTROL STATEMENTS
echo ̎̎ | awk ̍
BEGIN { x = 10 }
{
if (x % 2 == 0) {
print ̎x is even̎
}
else {
print ̎x is odd̎
}
}
The preceding code block initializes the variable x with the value 10 and prints “x is even” if x is divisible by 2, otherwise it prints “x is odd.”
Field Splitting
FS = "[':\t]"
Escape sequence
Sequence | Description |
---|---|
\ddd | Character represented as 1 to 3 digit octal value |
\xhex | Character represented as hexadecimal value[3] |
\c | Any literal character c (e.g., " for ")[4] |
Variable
Variable is an identifier that references a value. Each variable has a string and numeric value and awk use the appropriate value based on the context expresion.(String that not consist of numbers have numeric value 0) Awk automaticaly inicialize variable to empty string, which acyt like 0 if used as a number.
# Count blank lines in file
/^$/ {
print ++x
}
END {
print x
}
awk -f awkscr test
awk 'BEGIN { print "Hello, world" }' # BEGIN pattern specifies actions that are performed before the first line of input is read.
awk '$1 ~ /pattern/ { ... }' infile # match lines to regex pattern
awk '{if($1 ~ /pattern/) { ... }}' infile # matching for conditions
awk '{print $(NF - 1)}' #
awk '{if (length!=850) { print NR}}' MRCP20160428_2
awk '{if (length!=850) { print }}' MRCP20160428_2 > bledne_linie.txt
awk '{print length}' MRCP20160427_2|sort|uniq -
awk -F',' '{print $17,$15,$18,$16}' flightdelays.csv # select and change order of column
awk -F',' '{print $17"___"$15"---"$18","$16}' flightdelays.csv |head
cut -d',' -f15 flightdelays.csv |awk '{sss+=$1} END {print sss}' # sum
awk '{action}' your_file_name.txt
awk '/regex pattern/{action}' your_file_name.txt
awk '{print $0}' information.txt # equivalent cat command - show file
awk '{print NR,$0}' information.txt # add line-number to each line
# awk determines where each column starts and ends is with a space, by default.
awk '{print $1}' information.txt # print first column
awk '{print $2}' information.txt # print second column
awk '{print $1, $4}' information.txt
awk '{print $NF}' information.txt # print lat column
awk '{print $1}' information.txt | head -1 # first column first line
awk '{print $1}' information.txt | head -2
awk '/^O/' information.txt
awk '/0$/' information.txt
awk '! /0$/' information.txt
awk ' /io/{print $0}' information.txt # look for words containing on
awk '/IT/' information.txt
awk '/IT/{print $1, $2}' information.txt # first and last names of the people working in IT
awk '/N\/A$/' information.txt # find lines that end with the pattern N/A
awk '$3 < 40 { print $0 }' information.txt # find all the information of employees that were under the age of 40
1 fristName lastName age city ID
2
3 Thomas Shelby 30 Rio 400
4 Omega Night 45 Ontario 600
5 Wood Tinker 54 Lisbon N/A
6 Giorgos Georgiou 35 London 300
7 Timmy Turner 32 Berlin N/A
$ awk -F ',' '{sp=$9 * $10;cp=$9 * $11; {printf "%f,%f,%s,%s \n",
sp,cp,$1,$2 }}' sales_100.csv
# Return all rows starting with A in Column 1
$ awk -F ',' '$1 ~ /^A/ {print}' sales_100.csv
# Return all rows which has Space in Column 1
$ awk -F ',' '$1 ~ /\s/ {print}' sales_100.csv
AWK also has functionality to change the column and row delimiter
- OFS : Output Field Separator
- ORS : Ouput Row Separator
$ awk -F ',' 'BEGIN{OFS="|";ORS="\n\n"} $1 ~ /^A/ {print substr($1,1,4),$2,$3,$4,$5}' sales_100.csv
Build-in function
awk -F ',' 'BEGIN{OFS="|";ORS="\n"} $1 ~ /^A/ {print tolower(substr($1,1,4)),tolower($2),$3,$4,$5}' sales_100.csv
create a separate file with business logic and it call be called with AWK
$ vi businesslogic.awk
{sp=$9 * $10;cp=$9 * $11; {printf "%f,%f,%s,%s \n",sp,cp,$1,$2 }}
$ awk -F ',' -f businesslogic.awk sales_100.csv