Overview
--------
This directory contains scripts that automate the process of
collecting and comparing unixperf and profiling output data collected
while running unixperf commands.
The basic idea is that you run 'uprun' to collect data and 'upcmp'
to compare the results from two previous executions of 'uprun'.
'upcmp' displays 3 pieces of information about the two runs being
compared:
1) Comparison of unixperf performance
2) Summary comparison of Total time, Idle time, and CPU time.
3) Detailed comparison of CPU time for individual functions.
Items 2) and 3) are generated by post-processing the output from
the profiler 'prfld' command. The numbers are the aggregate totals
across all CPU's. (Note that the perl script, 'prfcmp', used to
do the post-processing can be used against any prfld output streams.)
Example output from 'upcmp' is shown at the end of this file.
A further script, 'upsrun' is provided to obtain gprof-type stack
profile analysis for 'uprun' tests. Note that no upscmp script exists
to compare gprof analyses since the reports are large and variations
great.
Before running 'uprun' and 'upcmp'
----------------------------------
- For 'uprun' you must be root (to enable profiling).
- Add /usr/stress/perf to your shell path.
e.g., set path = (/usr/stress/perf $path)
Running 'uprun'
---------------
Syntax: uprun <unix_pathname> <output_dir> [<parallelism>]
e.g., uprun /unix outdir1
A file 'test_list' must exist in the current directory, specifying each
of the unixperf tests to run. There should be one test name per line
in the file. The names are those understood on the unixperf command
line (e.g., forkwait, select16r, pipeping1, all, etc.). Each line may
also include unixperf options - in particular, to override the default
rep counts. Lines beginning with "#" are ignored as comments.
The optional <parallelism> argument can be used to spawn multiple
copies of each unixperm test. Typically, the level of parallelism chosen
will be the number of processors on the test system.
The <output_dir> is created by the script. If a directory already exists
with this name, it is renamed <output_dir>.old. If a directory named
<output_dir>.old already exists, this is and all its contents will first
be deleted.
The default iteration count per unixperf test is 200000, and each test
is repeated twice: hance, the default unixperf options are "-loop 200000
-repeat 2". These and other options may be given separately for each test.
In general, long runs are preferable because they tend to
smooth out the statistical variation due to the 1 msec sampling rate
of the kernel profiler. As a test sequence is executed, start and end
times are printed for each test to aid tuning of loop counts.
[Note that if you abort 'uprun' in the middle of its execution, you should
ensure that kernel profiling is disabled by executing 'prfstat off'.]
Running 'upcmp'
---------------
Syntax: upcmp <output_dir1> <output_dir2> <results_dir> [cutoff%]
e.g., upcmp outdir1 outdir2 1vs2
The comparison data is put into the <results_dir>/results file.
The optional cutoff% argument specifies the percentage below which the
data for individual functions will not be displayed. In other words,
a cutoff of 1% will cause 'upcmp' to only display those functions
(from either of the two runs) whose time represents at least 1% of the
total of its run. The default cutoff% is 0.5.
Running 'upsrun'
----------------
Syntax: upsrun <unix_pathname> <output_dir> [<parallelism>]
'upsrun' operates in a similar way to 'uprun' driven by file test_list
but, instead of pc profiling, stack profiling is used to perform
gprof-style analysis. Since no comparison is performed by a later
compare phase, the analysis is immediately available in the output
directory. For each test a the gprof analysis is written to file
<output_dir>/<test_ame>/prof.out.
This script requires an adequate quantity of available disk space on
the test machine to accumulate raw stack profiling data. The analysis
is performed on the test machine following data collection and the
raw trace data files are then deleted.
'upsrun' relies on additional tools which are expected to be installed
in the /usr/stress/perf directory. These are:
ssrun - SpeedShop environment
kernprof - SpeedShop kernel profiling data gatherer
profi - SpeedShop performance data analyzer
libss.so - SpeedShop lib
libssrt.s - SpeedShop lib
libX11.so.1 - X11 lib also required by SpeedShop
Note that stack profiling is performed for all procesors of the
test machine.
Revision History
----------------
April 96 Paul Roy Created 'uprun', 'upcmp'
Novemver 96 Chris Peak Enhancements and addition of 'upsrun'.
Example 'upcmp' output
----------------------
This comes from running only the unixperf 'forkwait' test (i.e., the
test_list file had only "forkwait" in it), such as:
# test_list
# ---------
# forkwait run with default loop/repeat
#
forkwait
****************************** forkwait results ******************************
1: 1vs2/outdir1/forkwait/upout
2: 1vs2/outdir2/forkwait/upout
1 2 Operation
-------- ----------------- -----------------
425.0 421.0 ( 0.99) fork, child immediately exits, parent waits
Profiling Summary Run1 Run2 R2/R1
Sum of all cpus (sec) (sec)
-------------------------------------------------------
Number of cpu's 2 2
Elapsed time 1745.514 1825.126 1.05
Idle time 866.599 907.122 1.05
CPU time 876.987 915.842 1.04
CPU Time Analysis Run1 Run2 R2-R1 R2-R1/total R1 CPU
Function (cutoff=0.50%) (sec) (sec) (sec) (%)
-----------------------------------------------------------------------------
bcopy 146.961 169.565 22.604 2.58
user 24.763 25.867 1.104 0.13
mutex_spinunlock 23.952 25.037 1.085 0.12
avl_findanyrange 16.879 17.889 1.010 0.12
tossmpages 12.287 13.133 0.846 0.10
kmem_zone_alloc 10.696 11.522 0.826 0.09
copy_pmap 18.757 19.536 0.779 0.09
mutex_bitunlock 6.865 7.591 0.726 0.08
pmap_probe 8.818 9.538 0.720 0.08
mutex_spinlock 11.417 12.053 0.636 0.07
preg_avl_start 4.334 4.950 0.616 0.07
syscall 7.634 8.187 0.553 0.06
elocore_exl_3 4.151 4.665 0.514 0.06
VEC_tlbmod 9.752 10.244 0.492 0.06
procfork 10.823 11.264 0.441 0.05
nonactive_timers_add 5.044 5.468 0.424 0.05
pagealloc 5.298 5.713 0.415 0.05
segcommon_free 6.667 7.071 0.404 0.05
avl_insert_find_growth 4.926 5.306 0.380 0.04
freepreg 11.115 11.482 0.366 0.04
pagefreeanon 8.991 9.354 0.363 0.04
pageuseinc 8.618 8.965 0.347 0.04
_pagefree 13.261 13.596 0.336 0.04
collapse_pair 4.750 5.047 0.297 0.03
pfault 12.854 13.146 0.292 0.03
kmem_zone_free 4.746 5.018 0.272 0.03
exit 7.879 8.144 0.264 0.03
mutex_lock 7.301 7.519 0.218 0.02
attachreg 9.914 10.118 0.204 0.02
freeproc 5.831 5.995 0.164 0.02
segcommon_reserve 4.462 4.626 0.163 0.02
mrunlock 7.907 8.062 0.154 0.02
procdup 7.299 7.448 0.149 0.02
mrlock 15.901 16.027 0.127 0.01
freereg 4.743 4.857 0.113 0.01
chunk_find 4.817 4.914 0.097 0.01
dupreg 7.907 7.944 0.037 0.00
mr_unlock 7.415 7.437 0.022 0.00
allocpreg 5.466 5.434 -0.032 0.00
spl0 25.197 24.891 -0.306 -0.03
bzero 56.180 41.355 -14.826 -1.69
Notes: 1 msec profiling sample rate.
idle time calcualted as sum of "idle" and "checkRunq" functions.