irix-657m-src/irix/cmd/stress/perf/README

Overview
--------
This directory contains scripts that automate the process of
collecting and comparing unixperf and profiling output data collected
while running unixperf commands.

The basic idea is that you run 'uprun' to collect data and 'upcmp'
to compare the results from two previous executions of 'uprun'.

'upcmp' displays 3 pieces of information about the two runs being
compared:

1) Comparison of unixperf performance

2) Summary comparison of Total time, Idle time, and CPU time.

3) Detailed comparison of CPU time for individual functions.

Items 2) and 3) are generated by post-processing the output from
the profiler 'prfld' command.  The numbers are the aggregate totals
across all CPU's.  (Note that the perl script, 'prfcmp', used to
do the post-processing can be used against any prfld output streams.)

Example output from 'upcmp' is shown at the end of this file.

A further script, 'upsrun' is provided to obtain gprof-type stack
profile analysis for 'uprun' tests. Note that no upscmp script exists
to compare gprof analyses since the reports are large and variations
great.


Before running 'uprun' and 'upcmp'
----------------------------------
- For 'uprun' you must be root (to enable profiling).

- Add /usr/stress/perf to your shell path.
	e.g., set path = (/usr/stress/perf $path)


Running 'uprun'
---------------
Syntax: uprun <unix_pathname> <output_dir> [<parallelism>]
	e.g., uprun /unix outdir1

A file 'test_list' must exist in the current directory, specifying each
of the unixperf tests to run.  There should be one test name per line
in the file.  The names are those understood on the unixperf command
line (e.g., forkwait, select16r, pipeping1, all, etc.). Each line may
also include unixperf options - in particular, to override the default
rep counts. Lines beginning with "#" are ignored as comments.

The optional <parallelism> argument can be used to spawn multiple
copies of each unixperm test. Typically, the level of parallelism chosen
will be the number of processors on the test system.

The <output_dir> is created by the script. If a directory already exists
with this name, it is renamed <output_dir>.old. If a directory named
<output_dir>.old already exists, this is and all its contents will first
be deleted.

The default iteration count per unixperf test is 200000, and each test
is repeated twice: hance, the default unixperf options are "-loop 200000
-repeat 2". These and other options may be given separately for each test.
In general, long runs are preferable because they tend to
smooth out the statistical variation due to the 1 msec sampling rate
of the kernel profiler. As a test sequence is executed, start and end
times are printed for each test to aid tuning of loop counts.

[Note that if you abort 'uprun' in the middle of its execution, you should
ensure that kernel profiling is disabled by executing 'prfstat off'.]

Running 'upcmp'
---------------
Syntax: upcmp <output_dir1> <output_dir2> <results_dir> [cutoff%]
	e.g., upcmp outdir1 outdir2 1vs2

The comparison data is put into the <results_dir>/results file.

The optional cutoff% argument specifies the percentage below which the
data for individual functions will not be displayed.  In other words,
a cutoff of 1% will cause 'upcmp' to only display those functions
(from either of the two runs) whose time represents at least 1% of the
total of its run.  The default cutoff% is 0.5.

Running 'upsrun'
----------------

Syntax: upsrun <unix_pathname> <output_dir> [<parallelism>]

'upsrun' operates in a similar way to 'uprun' driven by file test_list
but, instead of pc profiling, stack profiling is used to perform
gprof-style analysis.  Since no comparison is performed by a later
compare phase, the analysis is immediately available in the output
directory. For each test a the gprof analysis is written to file
<output_dir>/<test_ame>/prof.out.

This script requires an adequate quantity of available disk space on
the test machine to accumulate raw stack profiling data. The analysis
is performed on the test machine following data collection and the
raw trace data files are then deleted.

'upsrun' relies on additional tools which are expected to be installed
in the /usr/stress/perf directory. These are:

	ssrun		- SpeedShop environment
	kernprof	- SpeedShop kernel profiling data gatherer
	profi		- SpeedShop performance data analyzer
	libss.so	- SpeedShop lib
	libssrt.s	- SpeedShop lib

	libX11.so.1	- X11 lib also required by SpeedShop

Note that stack profiling is performed for all procesors of the
test machine.


Revision History
----------------
April 96	Paul Roy	Created 'uprun', 'upcmp'
Novemver 96	Chris Peak	Enhancements and addition of 'upsrun'.


Example 'upcmp' output
----------------------
This comes from running only the unixperf 'forkwait' test (i.e., the
test_list file had only "forkwait" in it), such as:

	# test_list
	# ---------
	# forkwait run with default loop/repeat
	#
	forkwait


****************************** forkwait results ******************************
1: 1vs2/outdir1/forkwait/upout
2: 1vs2/outdir2/forkwait/upout

    1              2           Operation
--------   -----------------   -----------------
   425.0      421.0 (  0.99)   fork, child immediately exits, parent waits


Profiling Summary          Run1        Run2       R2/R1
Sum of all cpus           (sec)       (sec)
-------------------------------------------------------
Number of cpu's             2           2
Elapsed time            1745.514    1825.126       1.05
Idle time                866.599     907.122       1.05
CPU time                 876.987     915.842       1.04


CPU Time Analysis          Run1        Run2       R2-R1    R2-R1/total R1 CPU
Function (cutoff=0.50%)   (sec)       (sec)       (sec)         (%)
-----------------------------------------------------------------------------
bcopy                    146.961     169.565      22.604        2.58
user                      24.763      25.867       1.104        0.13
mutex_spinunlock          23.952      25.037       1.085        0.12
avl_findanyrange          16.879      17.889       1.010        0.12
tossmpages                12.287      13.133       0.846        0.10
kmem_zone_alloc           10.696      11.522       0.826        0.09
copy_pmap                 18.757      19.536       0.779        0.09
mutex_bitunlock            6.865       7.591       0.726        0.08
pmap_probe                 8.818       9.538       0.720        0.08
mutex_spinlock            11.417      12.053       0.636        0.07
preg_avl_start             4.334       4.950       0.616        0.07
syscall                    7.634       8.187       0.553        0.06
elocore_exl_3              4.151       4.665       0.514        0.06
VEC_tlbmod                 9.752      10.244       0.492        0.06
procfork                  10.823      11.264       0.441        0.05
nonactive_timers_add       5.044       5.468       0.424        0.05
pagealloc                  5.298       5.713       0.415        0.05
segcommon_free             6.667       7.071       0.404        0.05
avl_insert_find_growth     4.926       5.306       0.380        0.04
freepreg                  11.115      11.482       0.366        0.04
pagefreeanon               8.991       9.354       0.363        0.04
pageuseinc                 8.618       8.965       0.347        0.04
_pagefree                 13.261      13.596       0.336        0.04
collapse_pair              4.750       5.047       0.297        0.03
pfault                    12.854      13.146       0.292        0.03
kmem_zone_free             4.746       5.018       0.272        0.03
exit                       7.879       8.144       0.264        0.03
mutex_lock                 7.301       7.519       0.218        0.02
attachreg                  9.914      10.118       0.204        0.02
freeproc                   5.831       5.995       0.164        0.02
segcommon_reserve          4.462       4.626       0.163        0.02
mrunlock                   7.907       8.062       0.154        0.02
procdup                    7.299       7.448       0.149        0.02
mrlock                    15.901      16.027       0.127        0.01
freereg                    4.743       4.857       0.113        0.01
chunk_find                 4.817       4.914       0.097        0.01
dupreg                     7.907       7.944       0.037        0.00
mr_unlock                  7.415       7.437       0.022        0.00
allocpreg                  5.466       5.434      -0.032        0.00
spl0                      25.197      24.891      -0.306       -0.03
bzero                     56.180      41.355     -14.826       -1.69

Notes: 1 msec profiling sample rate.
       idle time calcualted as sum of "idle" and "checkRunq" functions.