Task 10.1: The Wild and Weird awk Command
awk '{commands }'.
There are two possible flags to awk:
-f file specifies that the instructions should be read from the file file rather than from the command line
-Fc indicates that the program should consider the letter c as the separator between fields of information, rather than the default of white space
1.
$ who | awk '{ print }'
root console Nov 9 07:31
yuenca ttyAo Nov 27 17:39
limyx4 ttyAp Nov 27 16:22
wifey ttyAx Nov 27 17:16
tobster ttyAz Nov 27 17:59
taylor ttyqh Nov 27 17:43 (vax1.umkc.edu)
A line of input is broken into specific fields of information, each field being assigned a unique identifier. Field one is $1, field two $2, and so on:
$ who | awk '{ print $1 }'
root
yuenca
limyx4
wifey
tobster
taylor
The good news is that you also can specify any other information to print by surrounding it with double quotes:
$ who | awk '{ print "User " $1 " is on terminal line " $2 }'
User root is on terminal line console
User yuenca is on terminal line ttyAo
User limyx4 is on terminal line ttyAp
User hawk is on terminal line ttyAw
User wifey is on terminal line ttyAx
user taylor is on terminal line ttyqh
2.
$ grep taylor /etc/passwd | awk -F: '{ print $1 " has "$7" as their login shell." }'
User taylorj has /bin/csh as their login shell.
User mtaylor has /usr/local/bin/tcsh as their login shell.
User dataylor has /usr/local/lib/msh as their login shell.
User taylorjr has /bin/csh as their login shell.
User taylorrj has /bin/csh as their login shell.
User taylormx has /bin/csh as their login shell.
User taylor has /bin/csh as their login shell.
3.
how many different login shells are used at my site and which one is most popular
$ awk -F: '{print $7}' /etc/passwd | sort | uniq -c
2
3365 /bin/csh
1 /bin/false
84 /bin/ksh
21 /bin/sh
11 /usr/local/bin/ksh
353 /usr/local/bin/tcsh
45 /usr/local/lib/msh
4.
Sticking with the password file, notice that the names therein are all in first-name-then-last-name format. That is, my account is Dave Taylor,,,,. A common requirement that you might have is to generate a report of system users. You’d like to sort them by name, but by last name.
$ grep taylor /etc/passwd | awk -F: '{print $5}'
James Taylor,,,,
Mary Taylor,,,,
Dave Taylor,,,,
James Taylor,,,,
Robert Taylor,,,,
Melanie Taylor,,,,
Dave Taylor,,,,
$ grep taylor /etc/passwd | awk -F: '{print $5}' | sed 's/,//g' | awk '{print $2", "$1}' | sort
Taylor, Dave
Taylor, Dave
Taylor, James
Taylor, James
Taylor, Mary
Taylor, Melanie
Taylor, Robert
Note: white space is default for awk.
5.
The script earlier that looked for the login shell isn’t quite correct. It turns out that if the user wants to have /bin/sh—the Bourne shell—as his or her default shell, the final field can be left blank:
joe:?:45:555:Joe-Bob Billiard,,,,:/home/joe:
NF
Used without a dollar sign, it indicates how many fields are on a line
used with a dollar sign, it’s always the value of the last field on the line itself
$ who | head -3 | awk '{ print NF }'
5
5
5
$ who | head -3 | awk '{ print $NF }'
07:31
16:22
18:21
$ grep taylor /etc/passwd | awk -F: '{print $NF}' | sort | uniq -c
3365 /bin/csh
1 /bin/false
84 /bin/ksh
21 /bin/sh
11 /usr/local/bin/ksh
353 /usr/local/bin/tcsh
45 /usr/local/lib/msh
6.
NR keeps track of the number of records (or lines) displayed. Here’s a quick way to number a file:
$ ls -l | awk '{ print NR": "$0 }'
1: total 29
2: drwx------ 2 taylor 512 Nov 21 10:39 Archives/
3: drwx------ 3 taylor 512 Nov 16 21:55 InfoWorld/
4: drwx------ 2 taylor 1024 Nov 27 18:02 Mail/
5: drwx------ 2 taylor 512 Oct 6 09:36 News/
6: drwx------ 3 taylor 512 Nov 21 12:39 OWL/
7: drwx------ 2 taylor 512 Oct 13 10:45 bin/
8: -rw-rw---- 1 taylor 12556 Nov 16 09:49 keylime.pie
9: -rw------- 1 taylor 11503 Nov 27 18:05 randy
10: drwx------ 2 taylor 512 Oct 13 10:45 src/
11: drwxrwx--- 2 taylor 512 Nov 8 22:20 temp/
12: -rw-rw---- 1 taylor 0 Nov 27 18:29 testme
Here you can see that the zero field of a line is the entire line.
$ who | awk '{ print $2": "$0 }'
ttyAp: limyx4 ttyAp Nov 27 16:22
ttyAt: ltbei ttyAt Nov 27 18:21
ttyAu: woodson ttyAu Nov 27 18:19
ttyAv: morning ttyAv Nov 27 18:19
ttyAw: hawk ttyAw Nov 27 18:12
ttyAx: wifey ttyAx Nov 27 17:16
ttyAz: wiwatr ttyAz Nov 27 18:22
ttyAA: chong ttyAA Nov 27 13:56
ttyAB: ishidahx ttyAB Nov 27 18:20
7.
$ ls -lF | awk '{ print $9 " " $5 }'
rchives/ 512
InfoWorld/ 512
Mail/ 1024
News/ 512
OWL/ 512
bin/ 512
keylime.pie 12556
randy 11503
src/ 512
temp/ 512
testme 582
two special character sequences that can be embedded in the quoted arguments to print:
/n Generates a carriage return
/t Generates a tab character
$ ls -lF | awk '{ print $5 "/t" $9 }'
512 Archives/
512 InfoWorld/
1024 Mail/
512 News/
512 OWL/
512 bin/
12556 keylime.pie
11503 randy
512 src/
512 temp/
582 testme
$ ls -l | awk '{print $5"/t" $9 }' | sort -rn | head -5
12556 keylime.pie
11503 randy
1024 Mail/
582 testme
512 temp/
8.
The awk program basically looks for a pattern to appear in a line and then, if the pattern is found, executes the instructions that follow the pattern in the awk script. There are two special patterns in awk: BEGIN and END.
The instructions that follow BEGIN are executed before any lines of input are read.
The instructions that follow END are executed only after all the input has been read.
This can be very useful for computing the sum of a series of numbers. For example, I’d like to know the total number of bytes I’m using for all my files:
$ ls -l | awk '{print $5}'
512
512
1024
512
512
512
12556
11503
512
512
582
$ ls -l | awk '{ totalsize += $5; print totalsize }'
512
1024
2048
2560
3072
3584
16140
27643
28155
28667
29249
$ ls -l | awk '{ totalsize += $5; print totalsize }' | tail -1
29249
$ ls -l | awk '{ totalsize += $5 } END { print totalsize }'
29249
$ ls -l | awk '{ totalsize += $5 } END { print "You have a total of" totalsize " bytes used in files." }'
You have a total of 29249 bytes used in files.
9.
$ ls -l | awk '{ totalsize += $5 } END { print "You have a total of" totalsize " bytes used across "NR" files." }'
You have a total of 29249 bytes used across 11 files.
An easier way to see all this is to create an awk program file:
$ cat << EOF > script
>{ totalsize += $5 } END { print "You have a total of "totalsize " bytes used across "NR" files."}
>EOF
$ ls -l | awk -f script
You have a total of 29249 bytes used across 11 files.
10.
Scripts in awk are really programs and have all the flow-control capabilities. One thing you can do within an awk script is to have conditional execution of statements, the if-then condition.
to see whether the length of the first field (the account name) is exactly two characters long
$ awk -F: '{ if (length($1) == 2) print $0 }' /etc/passwd | wc -l
11.
$ cat << EOF > awkscript
>{
>count[length($1)]++
>}
>END{
>for (i=1; i < 9; i++)
>print "There are " count[i] " accounts with " i " letter names."
>}
>EOF
$ awk -F: -f awkscript /etc/passwd
There are 1 accounts with 1 letter names.
There are 26 accounts with 2 letter names.
There are 303 accounts with 3 letter names.
There are 168 accounts with 4 letter names.
There are 368 accounts with 5 letter names.
There are 611 accounts with 6 letter names.
There are 906 accounts with 7 letter names.
There are 1465 accounts with 8 letter names.
$ awk -F: '{ if (length($1) == 1) print $0 }' < /etc/passwd
Task 10.2: Re-routing the Pipeline with tee
The only option to tee is -a, which appends the output to the specified file, rather than replaces the contents of the file each time.
$ ls -l | awk '{ print $5 "/t" $9 }' | sort -rn | tee bigfiles | head -5
12556 keylime.pie
8729 owl.c
1024 Mail/
582 tetme
512 temp/
$ cat bigfiles
12556 keylime.pie
8729 owl.c
1024 Mail/
582 tetme
512 temp/
512 src/
512 bin/
512 OWL/
512 News/
512 InfoWorld/
512 Archives/
207 sample2
199 sample
126 awkscript