Awk
Introduction
This document and accompanying talk are intended to give an overview
of the capabilites of the awk programming language. Most of the content is
based on the capabilites of gawk, the GNU awk clone, however most can be
applied to any POSIX compliant awk implementation.
What is awk?
- Awk is a specialised langauge used for the processing of text files
into alternate formats, and acting on the conent of those text files.
Like many other languages in the common UNIX utility suite, it is an
interpreted scripting language. It is quick to write, and easy to
read. It has some elements of C-like syntax, although borrows from
other languages also.
- Awk has two types, although all type conversion is implicit. The
two types are strings and numbers. Each variable can be either a
string, a number, or both. This is not really much of an issue,
however, since any variable can be used as any type without special
conversion.
- Awk in it's most basic form is a text translation program, accpeting
standard input, and processing it to produce some output. It can
- Awk is a standardised langauge (POSIX 1003.2)
What can awk do?
Awk can interpret files on it's standard intput, in it's parameter
list, or of it's own choice internally. It can manipulate the informaion
in each of these files, and act based on the information gleened.
Awk can read and write files or pipes internally. It can also
execute programs using shell syntax. The processing it does is quite
minimal, and is usually used in conjunction with other specialsed
programs according to standard UNIX philosophy.
How is awk used?
Here's a quick example of an awk program:
#!/usr/bin/gawk -f
#Supporting examples for HUMBUG awk talk.
#
#
#Basic syntax example
#
function reverse(list,i,swap) {
# reverse a list of strings hashed on numbers starting at 1
for (i=0; i<length(array); i++) {
# swap current element
swap = list[i+1]
list[i+1] = list[length(list)-i]
list[length(list)-i] = swap
}
}
BEGIN {
print "Beginning file"
}
/dog/ {
print "dog found"
}
{
if (/cat/) {
# Split input line into an array
split($0, inputline)
reverse(inputline)
# Print array components
for (i=1; i<=length(inputline); i++) {
print inputline[i]
}
} else {
print
}
}
END {
print "All done!"
}
I use it more commonly on the command line for very quick tasks.
lillian:~$ gawk '/su.*:/ && /FAILED/' /var/log/messages
lillian:~$ gawk '/su.*:/ && ! /FAILED/' /var/log/messages
Or if as a quick shell util:
#!/bin/sh
# Kills all running copies of a program on $1
ps -x | gawk '$5 ~ /'$1'/ { system("kill " $1) }'
What are the features of awk?
- Basic Pattern mactching constructs (
BEGIN
END
/rexex/
C-like expression
range (,)
)
- Fields
- Important internal variables (
$0..$9
ARGC,ARGV
ENVIRON
FS, OFS
NF
FNR, NR
)
- Arrays (
Using Arrays
for
delete
)
- Control constructs (
if
while
do
for C-like and var in array)
break
continue
exit
)
- Internal functions (
print
printf
getline
close
next & next file
system
Note mathematical functions
length
split
sprintf
tolower & toupper
Note other string functions and time functions
)
- User-defined functions
- Pattern matching with regexs
What are awks limitations?
Awk is a specialised general tool. It's very flexible, but not
suited to all jobs. It should be used in conjunction with other specialised
tools for best effect. It is usually an interpreted language, so can be
slower than compiled implementations of similar tasks. At least one free
Awk to C translator does exist, however, which can be used if a problem is
well suited to awk, but must be faster than an interpreted language allows.
What alternatives exist?
Awk is quite an old language. It was created long before good
good programming practice became practice anywhere in the real world. Some
parts of the language are more hacked on than really there. The following
alternatives are often suggested for a more modern language perspective:
Any Questions?
You can contact me at
fuzzy@humbug.org.au
Where can I find out more?
- man gawk
- man awk
- The AWK Programming Language, by Aho, Kernighan, and Weinberger
(not that I've ever seen it :)