344 lines
10 KiB
Groff
344 lines
10 KiB
Groff
'\"macro stdmacro
|
|
.if n .pH g1.sort @(#)sort 41.11 of 5/26/91
|
|
.\" Copyright 1991 UNIX System Laboratories, Inc.
|
|
.\" Copyright 1989, 1990 AT&T
|
|
.nr X
|
|
.if \nX=0 .ds x} sort 1 "Essential Utilities" "\&"
|
|
.if \nX=1 .ds x} sort 1 "Essential Utilities"
|
|
.if \nX=2 .ds x} sort 1 "" "\&"
|
|
.if \nX=3 .ds x} sort "" "" "\&"
|
|
.TH \*(x}
|
|
.SH NAME
|
|
\f4sort\f1 \- sort and/or merge files
|
|
.SH SYNOPSIS
|
|
\f4sort\f1
|
|
[\f4\-cmu\f1]
|
|
[\f4\-o\f2output\f1]
|
|
[\f4\-y\f2kmem\f1]
|
|
[\f4\-z\f2recsz\f1]
|
|
[\f4\-bdfiMnr\f1]
|
|
[\f4\-t\f2x\f1]
|
|
.br
|
|
[\f4\-k\f2keydef\f1]
|
|
[\f4+\f2pos1\f1
|
|
[\f4\-\f2pos2\f1]]
|
|
[\f4\-T\ \f2tdir\f1]
|
|
[\f2files\f1]
|
|
.SH DESCRIPTION
|
|
The \f4sort\fP command
|
|
sorts
|
|
lines of all the named files together
|
|
and writes the result on
|
|
the standard output.
|
|
The standard input is read if
|
|
\f4\-\f1
|
|
is used as a filename
|
|
or no input files are named.
|
|
.PP
|
|
Comparisons are based on one or more sort keys extracted
|
|
from each line of input.
|
|
By default, there is one sort key, the entire input line,
|
|
and ordering is lexicographic by bytes in machine
|
|
collating sequence.
|
|
.PP
|
|
\f4sort\f1 processes supplementary code set characters
|
|
according to the locale specified in the \f4LC_CTYPE\fP
|
|
and \f4LC_COLLATE\f1
|
|
environment variables [see \f4LANG\fP on \f4environ\fP(5)],
|
|
except as noted below.
|
|
Supplementary code set characters are collated in code order.
|
|
.PP
|
|
The following options alter the default behavior:
|
|
.TP 5
|
|
\f4\-c\f1
|
|
Check that the input file is sorted according to the ordering rules;
|
|
give no output unless the file is out of sort.
|
|
.TP
|
|
\f4\-m\f1
|
|
Merge only, the input files are assumed to be already sorted.
|
|
.TP
|
|
\f4\-u\f1
|
|
Unique: suppress all but one in each set of lines having equal keys.
|
|
If used with \f4\-c\f1, check that there are no lines with duplicate keys,
|
|
in addition to checking that the input file is sorted.
|
|
.TP
|
|
\f4\-o\f2output\f1
|
|
The argument given is the name of an output file
|
|
to use instead of the standard output.
|
|
This file may be the same as one of the inputs.
|
|
.TP
|
|
\f4\-y\f2kmem\f1
|
|
The amount of main memory used by \f4sort\f1
|
|
has a large impact on its performance.
|
|
Sorting a small file in a large amount
|
|
of memory is a waste.
|
|
If this option is omitted,
|
|
\f4sort\fP
|
|
begins using a system default memory size (64Kb),
|
|
and continues to use more space as needed, up to a maximum equal to
|
|
one eighth of physical memory.
|
|
If this option is presented with a value (\f2kmem\fP),
|
|
\f4sort\fP will start
|
|
using that number of kilobytes of memory,
|
|
unless the administrative minimum or maximum is violated,
|
|
in which case the corresponding extremum will be used.
|
|
\f4sort\fP will continue to allocate memory until a limit is reached.
|
|
This limit is capped at 2Gb and computed to be the larger of the value
|
|
given to the \f4\-y\fP option and one half of physical memory.
|
|
Thus, \f4\-y\f10
|
|
is guaranteed to start with minimum memory.
|
|
By convention,
|
|
\f4\-y\f1 (with no argument) starts with maximum memory;
|
|
on SGI systems, \f4\-y\f1 (with no argument) starts with half
|
|
of the total physical memory on the system (and the limit on memory
|
|
usage is set to one half of physical memory as well).
|
|
.TP
|
|
\f4\-z\f2recsz\f1
|
|
The size of the longest line read is recorded
|
|
in the sort phase so buffers can be allocated
|
|
during the merge phase.
|
|
If the sort phase is omitted via the
|
|
\f4\-c\f1
|
|
or
|
|
\f4\-m\f1
|
|
options, a popular system default size will be used.
|
|
Lines longer than the buffer size will cause
|
|
\f4sort\fP
|
|
to terminate abnormally.
|
|
Supplying the actual number of bytes in the longest line
|
|
to be merged (or some larger value)
|
|
will prevent abnormal termination. If the sort phase is not omitted,
|
|
then the maximum line size is calculated
|
|
and used as the \f2recsz\fP,
|
|
overriding the value of \f4-z\fP.
|
|
Thus, the \f4-z\fP option is significant
|
|
only when used with \f4-c\fP or \f4-m\fP.
|
|
.TP
|
|
\f4\-T\ \f2tdir\f1
|
|
The argument given is used as a directory where any temporary files
|
|
required will be created.
|
|
.PP
|
|
Sort keys can be specified using the options:
|
|
.TP 5
|
|
\f4\-k\f2field_start[type][,field_end[type]]\f1
|
|
Defines a key field that begins at
|
|
\f2field_start\f1
|
|
and ends at
|
|
\f2field_end\f1
|
|
inclusive(provided that
|
|
\f2field_end\f1
|
|
does not precede
|
|
\f2field_start\f1).
|
|
A missing
|
|
\f2field_end\f1
|
|
means the end of the line.
|
|
A field comprises a maximal sequence of non-separating characters
|
|
and, in the absence of option
|
|
\f4\-t\f1
|
|
, any preceding field separator.
|
|
.PP
|
|
The
|
|
\f2field_start\f1
|
|
portion of the argument has the form:
|
|
.IP
|
|
field_number[.first_character]
|
|
.PP
|
|
Fields and characters within fields are numbered starting with 1.
|
|
The field_number and first_character pieces, interpreted as positive
|
|
decimal integers, specify the first character to be used as part of a
|
|
sort key. If first_character is omitted, it refers to the first
|
|
character of the field.
|
|
.PP
|
|
The
|
|
\f2field_end\f1
|
|
portion of the argument has the form:
|
|
.IP
|
|
field_number[.last_character]
|
|
.PP
|
|
The field_number is as described above for
|
|
\f2field_start\f1.
|
|
The last_character piece, interpreted as a non-negative decimal
|
|
integer, specifies
|
|
the last character to be used as part of the sort key. If
|
|
last_character evaluates to zero or last_character
|
|
is omitted, it refers to the last character of the field specified by
|
|
field_number.
|
|
.PP
|
|
type is a modifier from the list of characters \f4bdfinr\f1. Each
|
|
modifier behaves like the corresponding options(see below), but apply
|
|
only to the key field to which they are attached.
|
|
.TP 5
|
|
\f4+\f2pos1\f1, \f4\-\f2pos2\f1
|
|
This is an obsolescent form of \f4\-k\f1. \f2pos1\f1 corresponds to
|
|
\f2field_start\f1, \f2pos2\f1 corresponds to \f2field_end\f1, except
|
|
that both fields and characters are numbered from zero instead of
|
|
one. The optional type modifiers are the same as in \f4\-k\f1 option.
|
|
.PP
|
|
The following options override the default ordering rules.
|
|
.TP 5
|
|
\f4\-d\f1
|
|
``Dictionary'' order: only letters, digits, and blanks (spaces and tabs)
|
|
are significant in comparisons.
|
|
No comparison is performed for multibyte characters.
|
|
.TP
|
|
\f4\-f\f1
|
|
Fold lowercase
|
|
letters into uppercase for the purpose of comparison.
|
|
Does not apply to multibyte characters.
|
|
.TP
|
|
\f4\-i\f1
|
|
Ignore non-printable characters.
|
|
Multibyte and embedded NULL characters are also ignored.
|
|
.TP
|
|
\f4\-M\f1
|
|
Compare as months.
|
|
The first three non-blank characters
|
|
of the field are folded to uppercase
|
|
and compared.
|
|
Month names are processed according to the locale specified
|
|
in the \f4LC_TIME\f1 environment variable
|
|
[see \f4LANG\f1 on \f4environ\f1(5)].
|
|
For example, in an English locale the sorting order
|
|
would be ``\s-1JAN\s0'' < ``\s-1FEB\s0''
|
|
< . . . < ``\s-1DEC\s0.''
|
|
Invalid fields compare low to ``\s-1JAN\s0.''
|
|
The \f4\-M\f1 option implies the \f4\-b\f1 option
|
|
(see below).
|
|
.TP
|
|
\f4\-n\f1
|
|
An initial numeric string,
|
|
consisting of optional blanks, an optional minus sign,
|
|
and zero or more digits with an optional decimal point,
|
|
is sorted by arithmetic value. An empty digit string is treated
|
|
as zero. Leading zeros and signs on zeros do not affect ordering.
|
|
The \f4\-n\f1 option implies the \f4\-b\f1 option
|
|
(see below).
|
|
.TP
|
|
\f4\-r\f1
|
|
Reverse the sense of comparisons.
|
|
.TP
|
|
\f4\-b\f1
|
|
Ignore leading blanks when determining the starting and ending
|
|
positions of a restricted sort key.
|
|
|
|
.TP
|
|
\f4\-t\f2x\f1
|
|
Use
|
|
.I x
|
|
as the field separator character;
|
|
.I x
|
|
is not considered to be part of a field
|
|
(although it may be included in a sort key).
|
|
Each occurrence of
|
|
.I x
|
|
is significant
|
|
(for example,
|
|
.I xx
|
|
delimits an empty field).
|
|
\f2x\f1 may be a supplementary code set character.
|
|
The default field separators are blank characters.
|
|
.PP
|
|
When ordering options appear before restricted
|
|
sort key specifications, the requested ordering rules are
|
|
applied globally to all sort keys.
|
|
When attached to a specific sort key (described below),
|
|
the specified ordering options override all global ordering options
|
|
for that key.
|
|
.br
|
|
.ne 5
|
|
.PP
|
|
When there are multiple sort keys, later keys
|
|
are compared only after all earlier keys
|
|
compare equal. Except when the -u option is specified, lines that
|
|
otherwise compare equal are ordered as if none of the options -d,
|
|
-f, -i, -n or -k were present (but with -r still in effect, if it
|
|
was specified) and with all bytes in the lines significant to the
|
|
comparison.
|
|
.SH EXAMPLES
|
|
Sort the contents of
|
|
.I infile
|
|
with the second field as the sort key:
|
|
.IP
|
|
\f4sort \-k2 \f2infile\f1
|
|
.PP
|
|
Sort, in reverse order, the contents of
|
|
.I infile1
|
|
and
|
|
.IR infile2 ,
|
|
placing the output in
|
|
.I outfile
|
|
and using the first character of the second field
|
|
as the sort key:
|
|
.IP
|
|
\f4sort \-r \-o \f2outfile\fP -k 2.1,2.1 \f2infile1 infile2\f1
|
|
.PP
|
|
Sort, in reverse order, the contents of
|
|
.I infile1
|
|
and
|
|
.I infile2
|
|
using the first non-blank character of the second field
|
|
as the sort key:
|
|
.IP
|
|
\f4sort \-r +1.0b \-1.1b \f2infile1 infile2\f1
|
|
.PP
|
|
Print the password file
|
|
[\f4passwd\f1(4)]
|
|
sorted by the numeric user
|
|
.SM ID
|
|
(the third colon-separated field):
|
|
.IP
|
|
\f4sort \-t: +2n \-3 /etc/passwd\f1
|
|
.PP
|
|
Sort the contents of the password file using the group ID (third field) as
|
|
the primary sort key and the user ID (second field) as the secondary sort
|
|
key:
|
|
.IP
|
|
\f4sort \-t: +3 \-4 +2 \-3 /etc/passwd\f1
|
|
.PP
|
|
Print the lines of the already sorted file
|
|
.IR infile ,
|
|
suppressing all but the first occurrence of lines
|
|
having the same third field
|
|
(the options
|
|
\f4\-um\f1
|
|
with just one input file make the choice of a unique
|
|
representative from a set of equal lines predictable):
|
|
.IP
|
|
\f4sort \-um +2 \-3 \f2infile\f1
|
|
.SH FILES
|
|
.PD 0
|
|
.TP
|
|
\f4/var/tmp/stm???\f1
|
|
.TP
|
|
\f4/usr/lib/locale/\f2locale\f4/LC_MESSAGES/uxcore.abi\f1
|
|
language-specific message file [See \f4LANG\fP on \f4environ\f1 (5).]
|
|
.SH SEE ALSO
|
|
\f4comm\fP(1),
|
|
\f4join\fP(1),
|
|
\f4uniq\fP(1)
|
|
.SH NOTES
|
|
Comments and exits with non-zero status for various trouble
|
|
conditions
|
|
(for example, when input lines are too long),
|
|
and for disorder discovered under the
|
|
\f4\-c\f1
|
|
option.
|
|
.SP
|
|
When the last line of an input file
|
|
is missing a \f4newline\f1 character,
|
|
\f4sort\fP appends one,
|
|
prints a warning message, and continues.
|
|
\f4sort\fP does not guarantee
|
|
preservation of relative line ordering on equal keys.
|
|
.PP
|
|
If the \f4\-i\f1 and \f4\-f\f1
|
|
options are both used, or if the \f4\-i\f1 and \f4\-d\f1
|
|
are both used,
|
|
the last one given controls the sort behavior;
|
|
it is not currently possible to sort with folded case or dictionary order
|
|
and non-printing characters
|
|
ignored, because of the method used to implement these options.
|
|
|
|
.\" @(#)sort.1 6.3 of 9/2/83
|
|
.Ee
|