1
0
Files
irix-657m-src/eoe/man/man1/awk.1
2022-09-29 17:59:04 +03:00

610 lines
14 KiB
Groff

'\"macro stdmacro
.if n .pH g1.awk @(#)awk 41.7 of 6/12/91
.\" Copyright 1991 UNIX System Laboratories, Inc.
.\" Copyright 1989, 1990 AT&T
.nr X
.if \nX=0 .ds x} awk 1 "Directory and File Management Utilities" "\&"
.if \nX=1 .ds x} awk 1 "Directory and File Management Utilities"
.if \nX=2 .ds x} awk 1 "" "\&"
.if \nX=3 .ds x} awk "" "" "\&"
.TH \*(x}
.tr `
.SH NAME
\f3awk\f1, \f3nawk\f1, \f3pawk\f1 \- pattern scanning and processing language
.SH SYNOPSIS
\f3awk\f1 [\f3\-F\f2 re\f1] [\f3-v\f1 \f2var=value\f1] [\f3'\f2prog\f3'\f1] [\f2file\f1. . .]
.sp .1
\f3awk\f1 [\f3\-F\f2 re\f1] [\f3-v\f1 \f2var=value\f1] [\f3\-f\f2 progfile\f1] [\f2file\f1. . .]
.SH DESCRIPTION
\f3NOTE:\f1 This version of awk has some incompatibilities with previous versions.
See the COMPATIBILITY ISSUES section below for more detail.
.PP
\f3awk\fP and \f3nawk\fP use the old \f3regexp()\fP and \f3compile()\fP regular
expression routines.
When the environment variable \f3_XPG\fP is equal to \f31\fP (one), \f3pawk\fP is exec'ed
which uses the newer \f3regcomp()\fP and \f3regexec()\fP routines which
implement the Extended Regular Expression package.
.PP
\f3awk\fP scans each input \f2file\f1
for lines that match any of a set of patterns specified in
.IR prog .
The \f2prog\f1
string must be enclosed in single quotes
\f1(\f3\(fm\f1)
to protect it from the shell.
Patterns are arbitrary Boolean combinations
of regular expressions and
relational expressions.
For each pattern in \f2prog\f1
there may be an associated action performed
when a line of a \f2file\f1
matches the pattern.
The set of pattern-action statements may appear literally as
\f2prog\f1 or in a file
specified with the \f3\-f\f2 progfile\f1 option.
Input files are read in order;
if there are no files, the standard input is read.
The file name
\f3\-\f1
means the standard input.
.PP
\f3awk\fP processes supplementary code set characters in
pattern-action statements and comments, and recognizes
supplementary code set characters as field separators (see below)
according to the locale specified in the \f3LC_CTYPE\fP
environment variable [see \f3LANG\fP on \f3environ\fP(5)].
In regular expressions, pattern searches are performed
on characters, not bytes, as described on \f3ed\f1(1).
.PP
Each input line is matched against the
pattern portion of every pattern-action statement;
the associated action is performed for each matched pattern.
Any \f2file\f1 of the form \f3var\fP=\f2value\f1 is treated as an assignment,
not a filename, and is executed at the time it would have been
opened if it were a filename.
The option \f3-v\fP followed by \f3var\fP=\f2value\f1 is an assignment
to be done before \f2prog\f1 is executed; any number of \f3-v\fP
options may be present.
.PP
An input line is normally made up of fields separated by white space.
(This default can be changed by using the
\f3FS\f1
built-in variable or the \f3\-F\f2 re\f1 option.)
The fields are denoted
\f3$1\f1,
\f3$2\f1,
\&.\|.\|.\|;
\f3$0\f1
refers to the entire line.
.PP
A pattern-action statement has the form:
.PP
.RS
\f2pattern\f3 { \f2action\f3 } \f1
.RE
.PP
Either pattern or action may be omitted.
If there is no action with a pattern,
the matching line is printed.
If there is no pattern with an action,
the action is performed on every input line.
Pattern-action statements are separated by newlines or semicolons.
.PP
As noted, patterns are arbitrary Boolean combinations
(
\f3!\f1,
\(bv\^\(bv,
\f3&&\f1,
and parentheses) of
relational expressions and
regular expressions.
A relational expression is one of the following:
.PP
.ss 18
.RS
.I "expression relop expression"
.br
.I "expression matchop regular_expression"
.br
\f2expression\f3 in \f2array-name\f1
.br
\f3(\f2expression\f3,\f2expression\f3,\f1
\&...
\f3) in \f2array-name\f1
.RE
.ss 12
.PP
where a
.I relop
is any of the six relational operators in C,
and a
.I matchop
is either
\f3~\f1
(contains)
or
\f3!~\f1
(does not contain).
An
.I expression
is an arithmetic expression,
a relational expression,
the special expression
.PP
.RS
.ft 2
var \f3in\fP array
.ft 1
.RE
.PP
or a Boolean combination
of these.
.PP
Regular expressions are as in
\f3egrep\fP(1).
In patterns they must be surrounded by slashes.
Isolated regular expressions
in a pattern apply to the entire line.
Regular expressions may also occur in
relational expressions.
A pattern may consist of two patterns separated by a comma;
in this case, the action is performed for all lines
between an occurrence of the first pattern
and the next occurrence of the second pattern.
.PP
The special patterns
\f3BEGIN\f1
and
\f3END\f1
may be used to capture control before the first input line has been read
and after the last input line has been read respectively.
These keywords do not combine with any other patterns.
.PP
A regular expression
may be used to separate fields by
using the \f3\-F\f2 re\f1 option
or by assigning the expression to
the built-in variable FS.
The default is to ignore leading blanks and to separate fields
by blanks and/or tab characters.
However, if FS is assigned a value,
leading blanks are no longer ignored.
.PP
Other built-in variables include:
.RS
.TP 16n
\f3ARGC\f1
command line argument count
.TP
\f3ARGV\f1
command line argument array
.TP
\f3ENVIRON\fP
array of environment variables; subscripts are names
.TP
\f3FILENAME\f1
name of the current input file
.TP
\f3FNR\f1
ordinal number of the current record in the current file
.TP
\f3FS\f1
input field separator regular expression (default blank and tab)
.TP
\f3NF\f1
number of fields in the current record
.TP
\f3NR\f1
ordinal number of the current record
.TP
\f3OFMT\f1
output format for numbers (default
\f3%.6g\f1)
.TP
\f3OFS\f1
output field separator (default blank)
.TP
\f3ORS\f1
output record separator (default new-line)
.TP
\f3RS\f1
input record separator (default new-line)
.TP
\f3SUBSEP\f1
separates multiple subscripts (default is 034)
.RE
.PP
The field separators specified with the \f3\-F\f1 option or with the
variables \f3OFS\f1, \f3ORS\f1, and \f3FS\fP
may be supplementary code set characters.
.PP
An action is a sequence of statements.
A statement may be one of the following:
.PP
.ss 18
.RS
.nf
\f3if\fP ( \f2expression\fP ) \f2statement\fP [ \f3else\fP \f2statement\fP ]
\f3while\fP ( \f2expression\fP ) \f2statement\fP
\f3do\fP \f2statement\fP \f3while\fP ( \f2expression\fP )
\f3for\fP ( \f2expression\fP ; \f2expression\fP ; \f2expression\fP ) \f2statement\fP
\f3for\fP ( \f2var\fP \f3in\fP \f2array\fP ) \f2statement\fP
\f3delete\fP \f2array\fP[\f2subscript\fP] #delete an array element
\f3break\fP
\f3continue\fP
{ [ \f2statement\fP ] .\|.\|. }
\f2expression\f1 # commonly variable = expression
\f3print\fP [ \f2expression-list\fP ] [ >\f2expression\fP ]
\f3printf\fP \f2format\fP [ , \f2expression-list\fP ] [ >\f2expression\fP ]
\f3next\f1 # skip remaining patterns on this input line
\f3exit\fP [\f3expr\fP] \f1# skip the rest of the input; exit status is expr
\f3return\fP [\f3expr\fP]
.fi
.RE
.ss 12
.PP
Statements are terminated by
semicolons, new-lines, or right braces.
An empty expression-list stands for the whole input line.
Expressions take on string or numeric values as appropriate,
and are built using the operators
\f3+\f1,
\f3\-\f1,
\f3\(**\f1,
\f3/\f1,
\f3%\f1,
\f3^\f1
and concatenation (indicated by a blank).
The
operators
\f3++\f1
\f3\-\-\f1
\f3+=\f1
\f3\-=\f1
\f3\(**=\f1
\f3/=\f1
\f3%=\f1
\f3^= >\f1
\f3>= < <= == != ?:\f1
are also available in expressions.
Variables may be scalars, array elements
(denoted
x[i]),
or fields.
Variables are initialized to the null string or zero.
Array subscripts may be any string,
not necessarily numeric;
this allows for a form of associative memory.
Multiple subscripts such as \f3[i,j,k]\fP are permitted; the constituents
are concatenated, separated by the value of \f3SUBSEP\fP.
String constants are quoted (\f3""\fP), with the usual C escapes recognized
within.
.PP
A comment consists of any characters beginning with the number sign
character and terminated by, but excluding the next occurrence of,
a newline character. Comments will have no effect, except to delimit
statements.
.PP
The \f3print\f1 statement prints its arguments on the standard output,
or on a file if
\f3>\f2expression\^\f1
is present,
or on a pipe if | \f2cmd\f1
is present.
The arguments are separated by the current output field separator
and terminated by the output record separator.
The \f3printf\f1
statement formats its expression list according to the format
(see \f3printf\^\f1(3S)).
The built-in function \f3close\fP(\f2expr\f1) closes the file or pipe
\f2expr\f1.
.PP
The mathematical functions:
\f3atan2\f1,
\f3cos\f1,
\f3exp\f1,
\f3log\f1,
\f3sin\f1,
\f3sqrt\f1, are built-in.
.PP
Other built-in functions include:
.SP
.TP 1.0i
\f3gsub\fP(\f2for\fP,`\f2repl\fP,`\f2in\fP)\f1
behaves like \f3sub\fP (see below),
except that it replaces successive occurrences
of the regular expression (like the
\f3ed\fP
global substitute command).
.TP 1.0i
\f3index(\f2s\f3,\f2`t\f3)\f1
returns the position in string
.I s
where string
.I t
first occurs, or 0 if it does not occur at all.
.TP 1.0i
\f3int\f1
truncates to an integer value.
.TP 1.0i
\f3length(\f2s\f3)\f1
returns the length in bytes of its argument
taken as a string,
or of the whole line if there is no argument.
.TP 1.0i
\f3match(\f2s\f3,\f2`re\f3)\f1
returns the position in string
.I s
where the regular expression
.I re
occurs, or 0 if it does not occur at all.
\f3RSTART\f1
is set to the starting position (which is
the same as the returned value), and
\f3RLENGTH\f1
is set to the length of the matched string.
.TP 1.0i
\f3rand\f1
random number on (0, 1).
.TP 1.0i
\f3split(\f2s\fP,\f2`a\fP,\f2`fs\fP)\f1
splits the string
.I s
into array elements
.IR a [ 1 ],
.IR a [ 2 ],
\...,
.IR a [ n ],
and returns
.IR n .
The separation is done with the regular expression
\f2fs\fP
or with the field separator
\f3FS\f1
if
\f2fs\fP
is not given.
.TP 1.0i
\f3srand\f1
sets the seed for \f3rand\f1
.TP 1.0i
\f3sprintf(\f2fmt\fP,`\f2expr\fP,`\f2expr\fP,\f1\|.\|.\|.\|\f3)\f1
formats the expressions according to the
\f3printf\fP(3S)
format given by
.I fmt\^
and returns the resulting string.
.TP 1.0i
\f3sub\fP(\f2for\fP,`\f2repl\fP,`\f2in\fP)\f1
substitutes the string
.I repl
in place of the first instance of
the regular expression
.I for
in string
.I in
and returns the number of substitutions.
If
.I in
is omitted,
\f3awk\fP
substitutes in the current record
\f1(\f3$0\f1).
.TP 1.0i
\f3substr\fP(\f2s\fP,`\f2m\fP,`\f2n\fP)\f1
returns the
.IR n -byte
substring of
.I s\^
that begins at position
.IR m .
.TP 1.0i
\f3tolower\fP(\f2s\fP)\f1
converts all upper-case alphabetic characters in string
.I s
to lower-case. Numbers and other characters are not affected.
.TP 1.0i
\f3toupper\fP(\f2s\fp)\f1
converts all lower-case alphabetic characters in string
.I s
to upper-case. Numbers and other characters are not affected.
.P
The input/output built-in functions are:
.TP 1.0i
\f3close(\f2filename\f3)\f1
closes the file or pipe named
.IR filename .
.TP 1.0i
\f2cmd \f3| getline\f1
pipes the output of
.I cmd
into
\f3getline\f1;
each successive call to
.I getline
returns the next line of output from
.IR cmd .
.TP 1.0i
\f3getline\f1
sets
\f3$0\f1
to the next input record from the current input file.
.TP 1.0i
\f3getline <\f2file\f1
sets
\f3$0\f1
to the next record from
.IR file .
.TP 1.0i
\f3getline \f2x\f1
sets variable
.I x
instead.
.TP 1.0i
\f3getline \f2x\f3 <\f2file\f1
sets
.I x
from the next record of
.IR file .
.TP 1.0i
\f3system(\f2cmd\f3)\f1
executes
.I cmd
and returns its exit status.
.P
All forms of
\f3getline\f1
return 1 for successful input, 0 for end of file, and \-1 for an error.
.PP
\f3awk\fP also provides user-defined functions.
Such functions may be defined (in the pattern position of a pattern-action
statement) as
.PP
.ss 18
.RS
.ft 4
.nf
function \f2name\fP(\f2args\fP,\f2...\fP) { \f2stmts\fP }
.fi
.ft 1
.RE
.ss 12
.PP
Function arguments are passed by value if scalar and by reference if array name.
Argument names are local to the function; all other variable names are global.
Function calls may be nested and functions may be recursive.
The \f3return\f1 statement may be used to return a value.
.SH EXAMPLES
Print lines longer than 72 characters:
.PP
.RS
.ft 1
length > 72
.ft 1
.RE
.PP
Print first two fields in opposite order:
.PP
.RS
.ft 1
{ print $2, $1 }
.ft 1
.RE
.PP
Same, with input fields separated by comma and/or blanks and tabs:
.sp .5
.nf
BEGIN { FS = ",[\ \et]*|[\ \et]+" }
{ print $2, $1 }
.fi
.PP
Add up first column, print sum and average:
.PP
.RS
.ft 1
.nf
{ s += $1 }
END { print "sum is", s, " average is", s/NR }
.fi
.ft 1
.RE
.PP
Print fields in reverse order:
.PP
.RS
.ft 1
{ for (i = NF; i > 0; \-\-i) print $i }
.ft 1
.RE
.PP
Print all lines between start/stop pairs:
.PP
.RS
.ft 1
/start/, /stop/
.ft 1
.RE
.PP
Print all lines whose first field is different from previous one:
.PP
.RS
.ft 1
$1 != prev { print; prev = $1 }
.ft 1
.RE
.PP
Simulate
\f3echo\fP(1):
.PP
.RS
.ft 1
.nf
BEGIN {
for (i = 1; i < ARGC; i++)
printf "%s", ARGV[i]
printf "\en"
exit
}
.fi
.ft 1
.RE
.PP
Print a file, filling in page numbers starting at 5:
.PP
.RS
.ft 1
.nf
/Page/ { $2 = n++; }
{ print }
.fi
.ft 1
.RE
.PP
Assuming this program is in a file named
\f3prog\f1,
the following command line prints the file
\f3input\f1
numbering its pages starting at 5:
awk \-f prog n=5 input.
.SH FILES
.TP
/usr/lib/locale/locale/LC_MESSAGES/uxawk
language-specific message file (see \f3LANG\fP on \f3environ\f1(5))
.SH SEE ALSO
\f3oawk\fP(1),
\f3egrep\fP(1),
\f3grep\fP(1),
\f3lex\fP(1),
\f3perl\fP(1),
\f3sed\fP(1),
\f3printf\fP(3S)
.br
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
\f2The awk Programming Language\f1
Addison-Wesley, 1988
.SH NOTES and COMPATIBILITY ISSUES
\f3awk\fP is a newer version
that provides capabilities unavailable
in previous versions. See \f3oawk\fP(1) for the
older version.
.PP
Input white space is not preserved on output if fields are involved.
.PP
There are no explicit conversions between numbers and strings.
To force an expression to be treated as a number add 0 to it;
to force it to be treated as a string concatenate the
null string
(\f3"\^"\fP) to it.
.PP
The following regular expressions are no longer accepted:
.br
.nf
\f3 /[]/ /[^]/ /[\\]]/\f1
.fi
.tr ``
.\" @(#)awk.1 6.2 of 9/2/83
.Ee