610 lines
14 KiB
Groff
610 lines
14 KiB
Groff
'\"macro stdmacro
|
|
.if n .pH g1.awk @(#)awk 41.7 of 6/12/91
|
|
.\" Copyright 1991 UNIX System Laboratories, Inc.
|
|
.\" Copyright 1989, 1990 AT&T
|
|
.nr X
|
|
.if \nX=0 .ds x} awk 1 "Directory and File Management Utilities" "\&"
|
|
.if \nX=1 .ds x} awk 1 "Directory and File Management Utilities"
|
|
.if \nX=2 .ds x} awk 1 "" "\&"
|
|
.if \nX=3 .ds x} awk "" "" "\&"
|
|
.TH \*(x}
|
|
.tr `
|
|
.SH NAME
|
|
\f3awk\f1, \f3nawk\f1, \f3pawk\f1 \- pattern scanning and processing language
|
|
.SH SYNOPSIS
|
|
\f3awk\f1 [\f3\-F\f2 re\f1] [\f3-v\f1 \f2var=value\f1] [\f3'\f2prog\f3'\f1] [\f2file\f1. . .]
|
|
.sp .1
|
|
\f3awk\f1 [\f3\-F\f2 re\f1] [\f3-v\f1 \f2var=value\f1] [\f3\-f\f2 progfile\f1] [\f2file\f1. . .]
|
|
.SH DESCRIPTION
|
|
\f3NOTE:\f1 This version of awk has some incompatibilities with previous versions.
|
|
See the COMPATIBILITY ISSUES section below for more detail.
|
|
.PP
|
|
\f3awk\fP and \f3nawk\fP use the old \f3regexp()\fP and \f3compile()\fP regular
|
|
expression routines.
|
|
When the environment variable \f3_XPG\fP is equal to \f31\fP (one), \f3pawk\fP is exec'ed
|
|
which uses the newer \f3regcomp()\fP and \f3regexec()\fP routines which
|
|
implement the Extended Regular Expression package.
|
|
.PP
|
|
\f3awk\fP scans each input \f2file\f1
|
|
for lines that match any of a set of patterns specified in
|
|
.IR prog .
|
|
The \f2prog\f1
|
|
string must be enclosed in single quotes
|
|
\f1(\f3\(fm\f1)
|
|
to protect it from the shell.
|
|
Patterns are arbitrary Boolean combinations
|
|
of regular expressions and
|
|
relational expressions.
|
|
For each pattern in \f2prog\f1
|
|
there may be an associated action performed
|
|
when a line of a \f2file\f1
|
|
matches the pattern.
|
|
The set of pattern-action statements may appear literally as
|
|
\f2prog\f1 or in a file
|
|
specified with the \f3\-f\f2 progfile\f1 option.
|
|
Input files are read in order;
|
|
if there are no files, the standard input is read.
|
|
The file name
|
|
\f3\-\f1
|
|
means the standard input.
|
|
.PP
|
|
\f3awk\fP processes supplementary code set characters in
|
|
pattern-action statements and comments, and recognizes
|
|
supplementary code set characters as field separators (see below)
|
|
according to the locale specified in the \f3LC_CTYPE\fP
|
|
environment variable [see \f3LANG\fP on \f3environ\fP(5)].
|
|
In regular expressions, pattern searches are performed
|
|
on characters, not bytes, as described on \f3ed\f1(1).
|
|
.PP
|
|
Each input line is matched against the
|
|
pattern portion of every pattern-action statement;
|
|
the associated action is performed for each matched pattern.
|
|
Any \f2file\f1 of the form \f3var\fP=\f2value\f1 is treated as an assignment,
|
|
not a filename, and is executed at the time it would have been
|
|
opened if it were a filename.
|
|
The option \f3-v\fP followed by \f3var\fP=\f2value\f1 is an assignment
|
|
to be done before \f2prog\f1 is executed; any number of \f3-v\fP
|
|
options may be present.
|
|
.PP
|
|
An input line is normally made up of fields separated by white space.
|
|
(This default can be changed by using the
|
|
\f3FS\f1
|
|
built-in variable or the \f3\-F\f2 re\f1 option.)
|
|
The fields are denoted
|
|
\f3$1\f1,
|
|
\f3$2\f1,
|
|
\&.\|.\|.\|;
|
|
\f3$0\f1
|
|
refers to the entire line.
|
|
.PP
|
|
A pattern-action statement has the form:
|
|
.PP
|
|
.RS
|
|
\f2pattern\f3 { \f2action\f3 } \f1
|
|
.RE
|
|
.PP
|
|
Either pattern or action may be omitted.
|
|
If there is no action with a pattern,
|
|
the matching line is printed.
|
|
If there is no pattern with an action,
|
|
the action is performed on every input line.
|
|
Pattern-action statements are separated by newlines or semicolons.
|
|
.PP
|
|
As noted, patterns are arbitrary Boolean combinations
|
|
(
|
|
\f3!\f1,
|
|
\(bv\^\(bv,
|
|
\f3&&\f1,
|
|
and parentheses) of
|
|
relational expressions and
|
|
regular expressions.
|
|
A relational expression is one of the following:
|
|
.PP
|
|
.ss 18
|
|
.RS
|
|
.I "expression relop expression"
|
|
.br
|
|
.I "expression matchop regular_expression"
|
|
.br
|
|
\f2expression\f3 in \f2array-name\f1
|
|
.br
|
|
\f3(\f2expression\f3,\f2expression\f3,\f1
|
|
\&...
|
|
\f3) in \f2array-name\f1
|
|
.RE
|
|
.ss 12
|
|
.PP
|
|
where a
|
|
.I relop
|
|
is any of the six relational operators in C,
|
|
and a
|
|
.I matchop
|
|
is either
|
|
\f3~\f1
|
|
(contains)
|
|
or
|
|
\f3!~\f1
|
|
(does not contain).
|
|
An
|
|
.I expression
|
|
is an arithmetic expression,
|
|
a relational expression,
|
|
the special expression
|
|
.PP
|
|
.RS
|
|
.ft 2
|
|
var \f3in\fP array
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
or a Boolean combination
|
|
of these.
|
|
.PP
|
|
Regular expressions are as in
|
|
\f3egrep\fP(1).
|
|
In patterns they must be surrounded by slashes.
|
|
Isolated regular expressions
|
|
in a pattern apply to the entire line.
|
|
Regular expressions may also occur in
|
|
relational expressions.
|
|
A pattern may consist of two patterns separated by a comma;
|
|
in this case, the action is performed for all lines
|
|
between an occurrence of the first pattern
|
|
and the next occurrence of the second pattern.
|
|
.PP
|
|
The special patterns
|
|
\f3BEGIN\f1
|
|
and
|
|
\f3END\f1
|
|
may be used to capture control before the first input line has been read
|
|
and after the last input line has been read respectively.
|
|
These keywords do not combine with any other patterns.
|
|
.PP
|
|
A regular expression
|
|
may be used to separate fields by
|
|
using the \f3\-F\f2 re\f1 option
|
|
or by assigning the expression to
|
|
the built-in variable FS.
|
|
The default is to ignore leading blanks and to separate fields
|
|
by blanks and/or tab characters.
|
|
However, if FS is assigned a value,
|
|
leading blanks are no longer ignored.
|
|
.PP
|
|
Other built-in variables include:
|
|
.RS
|
|
.TP 16n
|
|
\f3ARGC\f1
|
|
command line argument count
|
|
.TP
|
|
\f3ARGV\f1
|
|
command line argument array
|
|
.TP
|
|
\f3ENVIRON\fP
|
|
array of environment variables; subscripts are names
|
|
.TP
|
|
\f3FILENAME\f1
|
|
name of the current input file
|
|
.TP
|
|
\f3FNR\f1
|
|
ordinal number of the current record in the current file
|
|
.TP
|
|
\f3FS\f1
|
|
input field separator regular expression (default blank and tab)
|
|
.TP
|
|
\f3NF\f1
|
|
number of fields in the current record
|
|
.TP
|
|
\f3NR\f1
|
|
ordinal number of the current record
|
|
.TP
|
|
\f3OFMT\f1
|
|
output format for numbers (default
|
|
\f3%.6g\f1)
|
|
.TP
|
|
\f3OFS\f1
|
|
output field separator (default blank)
|
|
.TP
|
|
\f3ORS\f1
|
|
output record separator (default new-line)
|
|
.TP
|
|
\f3RS\f1
|
|
input record separator (default new-line)
|
|
.TP
|
|
\f3SUBSEP\f1
|
|
separates multiple subscripts (default is 034)
|
|
.RE
|
|
.PP
|
|
The field separators specified with the \f3\-F\f1 option or with the
|
|
variables \f3OFS\f1, \f3ORS\f1, and \f3FS\fP
|
|
may be supplementary code set characters.
|
|
.PP
|
|
An action is a sequence of statements.
|
|
A statement may be one of the following:
|
|
.PP
|
|
.ss 18
|
|
.RS
|
|
.nf
|
|
\f3if\fP ( \f2expression\fP ) \f2statement\fP [ \f3else\fP \f2statement\fP ]
|
|
\f3while\fP ( \f2expression\fP ) \f2statement\fP
|
|
\f3do\fP \f2statement\fP \f3while\fP ( \f2expression\fP )
|
|
\f3for\fP ( \f2expression\fP ; \f2expression\fP ; \f2expression\fP ) \f2statement\fP
|
|
\f3for\fP ( \f2var\fP \f3in\fP \f2array\fP ) \f2statement\fP
|
|
\f3delete\fP \f2array\fP[\f2subscript\fP] #delete an array element
|
|
\f3break\fP
|
|
\f3continue\fP
|
|
{ [ \f2statement\fP ] .\|.\|. }
|
|
\f2expression\f1 # commonly variable = expression
|
|
\f3print\fP [ \f2expression-list\fP ] [ >\f2expression\fP ]
|
|
\f3printf\fP \f2format\fP [ , \f2expression-list\fP ] [ >\f2expression\fP ]
|
|
\f3next\f1 # skip remaining patterns on this input line
|
|
\f3exit\fP [\f3expr\fP] \f1# skip the rest of the input; exit status is expr
|
|
\f3return\fP [\f3expr\fP]
|
|
.fi
|
|
.RE
|
|
.ss 12
|
|
.PP
|
|
Statements are terminated by
|
|
semicolons, new-lines, or right braces.
|
|
An empty expression-list stands for the whole input line.
|
|
Expressions take on string or numeric values as appropriate,
|
|
and are built using the operators
|
|
\f3+\f1,
|
|
\f3\-\f1,
|
|
\f3\(**\f1,
|
|
\f3/\f1,
|
|
\f3%\f1,
|
|
\f3^\f1
|
|
and concatenation (indicated by a blank).
|
|
The
|
|
operators
|
|
\f3++\f1
|
|
\f3\-\-\f1
|
|
\f3+=\f1
|
|
\f3\-=\f1
|
|
\f3\(**=\f1
|
|
\f3/=\f1
|
|
\f3%=\f1
|
|
\f3^= >\f1
|
|
\f3>= < <= == != ?:\f1
|
|
are also available in expressions.
|
|
Variables may be scalars, array elements
|
|
(denoted
|
|
x[i]),
|
|
or fields.
|
|
Variables are initialized to the null string or zero.
|
|
Array subscripts may be any string,
|
|
not necessarily numeric;
|
|
this allows for a form of associative memory.
|
|
Multiple subscripts such as \f3[i,j,k]\fP are permitted; the constituents
|
|
are concatenated, separated by the value of \f3SUBSEP\fP.
|
|
String constants are quoted (\f3""\fP), with the usual C escapes recognized
|
|
within.
|
|
.PP
|
|
A comment consists of any characters beginning with the number sign
|
|
character and terminated by, but excluding the next occurrence of,
|
|
a newline character. Comments will have no effect, except to delimit
|
|
statements.
|
|
.PP
|
|
The \f3print\f1 statement prints its arguments on the standard output,
|
|
or on a file if
|
|
\f3>\f2expression\^\f1
|
|
is present,
|
|
or on a pipe if | \f2cmd\f1
|
|
is present.
|
|
The arguments are separated by the current output field separator
|
|
and terminated by the output record separator.
|
|
The \f3printf\f1
|
|
statement formats its expression list according to the format
|
|
(see \f3printf\^\f1(3S)).
|
|
The built-in function \f3close\fP(\f2expr\f1) closes the file or pipe
|
|
\f2expr\f1.
|
|
.PP
|
|
The mathematical functions:
|
|
\f3atan2\f1,
|
|
\f3cos\f1,
|
|
\f3exp\f1,
|
|
\f3log\f1,
|
|
\f3sin\f1,
|
|
\f3sqrt\f1, are built-in.
|
|
.PP
|
|
Other built-in functions include:
|
|
.SP
|
|
.TP 1.0i
|
|
\f3gsub\fP(\f2for\fP,`\f2repl\fP,`\f2in\fP)\f1
|
|
behaves like \f3sub\fP (see below),
|
|
except that it replaces successive occurrences
|
|
of the regular expression (like the
|
|
\f3ed\fP
|
|
global substitute command).
|
|
.TP 1.0i
|
|
\f3index(\f2s\f3,\f2`t\f3)\f1
|
|
returns the position in string
|
|
.I s
|
|
where string
|
|
.I t
|
|
first occurs, or 0 if it does not occur at all.
|
|
.TP 1.0i
|
|
\f3int\f1
|
|
truncates to an integer value.
|
|
.TP 1.0i
|
|
\f3length(\f2s\f3)\f1
|
|
returns the length in bytes of its argument
|
|
taken as a string,
|
|
or of the whole line if there is no argument.
|
|
.TP 1.0i
|
|
\f3match(\f2s\f3,\f2`re\f3)\f1
|
|
returns the position in string
|
|
.I s
|
|
where the regular expression
|
|
.I re
|
|
occurs, or 0 if it does not occur at all.
|
|
\f3RSTART\f1
|
|
is set to the starting position (which is
|
|
the same as the returned value), and
|
|
\f3RLENGTH\f1
|
|
is set to the length of the matched string.
|
|
.TP 1.0i
|
|
\f3rand\f1
|
|
random number on (0, 1).
|
|
.TP 1.0i
|
|
\f3split(\f2s\fP,\f2`a\fP,\f2`fs\fP)\f1
|
|
splits the string
|
|
.I s
|
|
into array elements
|
|
.IR a [ 1 ],
|
|
.IR a [ 2 ],
|
|
\...,
|
|
.IR a [ n ],
|
|
and returns
|
|
.IR n .
|
|
The separation is done with the regular expression
|
|
\f2fs\fP
|
|
or with the field separator
|
|
\f3FS\f1
|
|
if
|
|
\f2fs\fP
|
|
is not given.
|
|
.TP 1.0i
|
|
\f3srand\f1
|
|
sets the seed for \f3rand\f1
|
|
.TP 1.0i
|
|
\f3sprintf(\f2fmt\fP,`\f2expr\fP,`\f2expr\fP,\f1\|.\|.\|.\|\f3)\f1
|
|
formats the expressions according to the
|
|
\f3printf\fP(3S)
|
|
format given by
|
|
.I fmt\^
|
|
and returns the resulting string.
|
|
.TP 1.0i
|
|
\f3sub\fP(\f2for\fP,`\f2repl\fP,`\f2in\fP)\f1
|
|
substitutes the string
|
|
.I repl
|
|
in place of the first instance of
|
|
the regular expression
|
|
.I for
|
|
in string
|
|
.I in
|
|
and returns the number of substitutions.
|
|
If
|
|
.I in
|
|
is omitted,
|
|
\f3awk\fP
|
|
substitutes in the current record
|
|
\f1(\f3$0\f1).
|
|
.TP 1.0i
|
|
\f3substr\fP(\f2s\fP,`\f2m\fP,`\f2n\fP)\f1
|
|
returns the
|
|
.IR n -byte
|
|
substring of
|
|
.I s\^
|
|
that begins at position
|
|
.IR m .
|
|
.TP 1.0i
|
|
\f3tolower\fP(\f2s\fP)\f1
|
|
converts all upper-case alphabetic characters in string
|
|
.I s
|
|
to lower-case. Numbers and other characters are not affected.
|
|
.TP 1.0i
|
|
\f3toupper\fP(\f2s\fp)\f1
|
|
converts all lower-case alphabetic characters in string
|
|
.I s
|
|
to upper-case. Numbers and other characters are not affected.
|
|
|
|
.P
|
|
The input/output built-in functions are:
|
|
.TP 1.0i
|
|
\f3close(\f2filename\f3)\f1
|
|
closes the file or pipe named
|
|
.IR filename .
|
|
.TP 1.0i
|
|
\f2cmd \f3| getline\f1
|
|
pipes the output of
|
|
.I cmd
|
|
into
|
|
\f3getline\f1;
|
|
each successive call to
|
|
.I getline
|
|
returns the next line of output from
|
|
.IR cmd .
|
|
.TP 1.0i
|
|
\f3getline\f1
|
|
sets
|
|
\f3$0\f1
|
|
to the next input record from the current input file.
|
|
.TP 1.0i
|
|
\f3getline <\f2file\f1
|
|
sets
|
|
\f3$0\f1
|
|
to the next record from
|
|
.IR file .
|
|
.TP 1.0i
|
|
\f3getline \f2x\f1
|
|
sets variable
|
|
.I x
|
|
instead.
|
|
.TP 1.0i
|
|
\f3getline \f2x\f3 <\f2file\f1
|
|
sets
|
|
.I x
|
|
from the next record of
|
|
.IR file .
|
|
.TP 1.0i
|
|
\f3system(\f2cmd\f3)\f1
|
|
executes
|
|
.I cmd
|
|
and returns its exit status.
|
|
.P
|
|
All forms of
|
|
\f3getline\f1
|
|
return 1 for successful input, 0 for end of file, and \-1 for an error.
|
|
.PP
|
|
\f3awk\fP also provides user-defined functions.
|
|
Such functions may be defined (in the pattern position of a pattern-action
|
|
statement) as
|
|
.PP
|
|
.ss 18
|
|
.RS
|
|
.ft 4
|
|
.nf
|
|
function \f2name\fP(\f2args\fP,\f2...\fP) { \f2stmts\fP }
|
|
.fi
|
|
.ft 1
|
|
.RE
|
|
.ss 12
|
|
.PP
|
|
Function arguments are passed by value if scalar and by reference if array name.
|
|
Argument names are local to the function; all other variable names are global.
|
|
Function calls may be nested and functions may be recursive.
|
|
The \f3return\f1 statement may be used to return a value.
|
|
.SH EXAMPLES
|
|
Print lines longer than 72 characters:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
length > 72
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Print first two fields in opposite order:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
{ print $2, $1 }
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Same, with input fields separated by comma and/or blanks and tabs:
|
|
.sp .5
|
|
.nf
|
|
BEGIN { FS = ",[\ \et]*|[\ \et]+" }
|
|
{ print $2, $1 }
|
|
|
|
.fi
|
|
.PP
|
|
Add up first column, print sum and average:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
.nf
|
|
{ s += $1 }
|
|
END { print "sum is", s, " average is", s/NR }
|
|
.fi
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Print fields in reverse order:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
{ for (i = NF; i > 0; \-\-i) print $i }
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Print all lines between start/stop pairs:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
/start/, /stop/
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Print all lines whose first field is different from previous one:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
$1 != prev { print; prev = $1 }
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Simulate
|
|
\f3echo\fP(1):
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
.nf
|
|
BEGIN {
|
|
for (i = 1; i < ARGC; i++)
|
|
printf "%s", ARGV[i]
|
|
printf "\en"
|
|
exit
|
|
}
|
|
.fi
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Print a file, filling in page numbers starting at 5:
|
|
.PP
|
|
.RS
|
|
.ft 1
|
|
.nf
|
|
/Page/ { $2 = n++; }
|
|
{ print }
|
|
.fi
|
|
.ft 1
|
|
.RE
|
|
.PP
|
|
Assuming this program is in a file named
|
|
\f3prog\f1,
|
|
the following command line prints the file
|
|
\f3input\f1
|
|
numbering its pages starting at 5:
|
|
awk \-f prog n=5 input.
|
|
.SH FILES
|
|
.TP
|
|
/usr/lib/locale/locale/LC_MESSAGES/uxawk
|
|
language-specific message file (see \f3LANG\fP on \f3environ\f1(5))
|
|
.SH SEE ALSO
|
|
\f3oawk\fP(1),
|
|
\f3egrep\fP(1),
|
|
\f3grep\fP(1),
|
|
\f3lex\fP(1),
|
|
\f3perl\fP(1),
|
|
\f3sed\fP(1),
|
|
\f3printf\fP(3S)
|
|
.br
|
|
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
|
|
\f2The awk Programming Language\f1
|
|
Addison-Wesley, 1988
|
|
.SH NOTES and COMPATIBILITY ISSUES
|
|
\f3awk\fP is a newer version
|
|
that provides capabilities unavailable
|
|
in previous versions. See \f3oawk\fP(1) for the
|
|
older version.
|
|
.PP
|
|
Input white space is not preserved on output if fields are involved.
|
|
.PP
|
|
There are no explicit conversions between numbers and strings.
|
|
To force an expression to be treated as a number add 0 to it;
|
|
to force it to be treated as a string concatenate the
|
|
null string
|
|
(\f3"\^"\fP) to it.
|
|
.PP
|
|
The following regular expressions are no longer accepted:
|
|
.br
|
|
.nf
|
|
|
|
\f3 /[]/ /[^]/ /[\\]]/\f1
|
|
.fi
|
|
.tr ``
|
|
.\" @(#)awk.1 6.2 of 9/2/83
|
|
.Ee
|