202 lines
8.6 KiB
Plaintext
202 lines
8.6 KiB
Plaintext
Overview
|
|
--------
|
|
This directory contains scripts that automate the process of
|
|
collecting and comparing unixperf and profiling output data collected
|
|
while running unixperf commands.
|
|
|
|
The basic idea is that you run 'uprun' to collect data and 'upcmp'
|
|
to compare the results from two previous executions of 'uprun'.
|
|
|
|
'upcmp' displays 3 pieces of information about the two runs being
|
|
compared:
|
|
|
|
1) Comparison of unixperf performance
|
|
|
|
2) Summary comparison of Total time, Idle time, and CPU time.
|
|
|
|
3) Detailed comparison of CPU time for individual functions.
|
|
|
|
Items 2) and 3) are generated by post-processing the output from
|
|
the profiler 'prfld' command. The numbers are the aggregate totals
|
|
across all CPU's. (Note that the perl script, 'prfcmp', used to
|
|
do the post-processing can be used against any prfld output streams.)
|
|
|
|
Example output from 'upcmp' is shown at the end of this file.
|
|
|
|
A further script, 'upsrun' is provided to obtain gprof-type stack
|
|
profile analysis for 'uprun' tests. Note that no upscmp script exists
|
|
to compare gprof analyses since the reports are large and variations
|
|
great.
|
|
|
|
|
|
Before running 'uprun' and 'upcmp'
|
|
----------------------------------
|
|
- For 'uprun' you must be root (to enable profiling).
|
|
|
|
- Add /usr/stress/perf to your shell path.
|
|
e.g., set path = (/usr/stress/perf $path)
|
|
|
|
|
|
|
|
Running 'uprun'
|
|
---------------
|
|
Syntax: uprun <unix_pathname> <output_dir> [<parallelism>]
|
|
e.g., uprun /unix outdir1
|
|
|
|
A file 'test_list' must exist in the current directory, specifying each
|
|
of the unixperf tests to run. There should be one test name per line
|
|
in the file. The names are those understood on the unixperf command
|
|
line (e.g., forkwait, select16r, pipeping1, all, etc.). Each line may
|
|
also include unixperf options - in particular, to override the default
|
|
rep counts. Lines beginning with "#" are ignored as comments.
|
|
|
|
The optional <parallelism> argument can be used to spawn multiple
|
|
copies of each unixperm test. Typically, the level of parallelism chosen
|
|
will be the number of processors on the test system.
|
|
|
|
The <output_dir> is created by the script. If a directory already exists
|
|
with this name, it is renamed <output_dir>.old. If a directory named
|
|
<output_dir>.old already exists, this is and all its contents will first
|
|
be deleted.
|
|
|
|
The default iteration count per unixperf test is 200000, and each test
|
|
is repeated twice: hance, the default unixperf options are "-loop 200000
|
|
-repeat 2". These and other options may be given separately for each test.
|
|
In general, long runs are preferable because they tend to
|
|
smooth out the statistical variation due to the 1 msec sampling rate
|
|
of the kernel profiler. As a test sequence is executed, start and end
|
|
times are printed for each test to aid tuning of loop counts.
|
|
|
|
[Note that if you abort 'uprun' in the middle of its execution, you should
|
|
ensure that kernel profiling is disabled by executing 'prfstat off'.]
|
|
|
|
Running 'upcmp'
|
|
---------------
|
|
Syntax: upcmp <output_dir1> <output_dir2> <results_dir> [cutoff%]
|
|
e.g., upcmp outdir1 outdir2 1vs2
|
|
|
|
The comparison data is put into the <results_dir>/results file.
|
|
|
|
The optional cutoff% argument specifies the percentage below which the
|
|
data for individual functions will not be displayed. In other words,
|
|
a cutoff of 1% will cause 'upcmp' to only display those functions
|
|
(from either of the two runs) whose time represents at least 1% of the
|
|
total of its run. The default cutoff% is 0.5.
|
|
|
|
Running 'upsrun'
|
|
----------------
|
|
|
|
Syntax: upsrun <unix_pathname> <output_dir> [<parallelism>]
|
|
|
|
'upsrun' operates in a similar way to 'uprun' driven by file test_list
|
|
but, instead of pc profiling, stack profiling is used to perform
|
|
gprof-style analysis. Since no comparison is performed by a later
|
|
compare phase, the analysis is immediately available in the output
|
|
directory. For each test a the gprof analysis is written to file
|
|
<output_dir>/<test_ame>/prof.out.
|
|
|
|
This script requires an adequate quantity of available disk space on
|
|
the test machine to accumulate raw stack profiling data. The analysis
|
|
is performed on the test machine following data collection and the
|
|
raw trace data files are then deleted.
|
|
|
|
'upsrun' relies on additional tools which are expected to be installed
|
|
in the /usr/stress/perf directory. These are:
|
|
|
|
ssrun - SpeedShop environment
|
|
kernprof - SpeedShop kernel profiling data gatherer
|
|
profi - SpeedShop performance data analyzer
|
|
libss.so - SpeedShop lib
|
|
libssrt.s - SpeedShop lib
|
|
|
|
libX11.so.1 - X11 lib also required by SpeedShop
|
|
|
|
Note that stack profiling is performed for all procesors of the
|
|
test machine.
|
|
|
|
|
|
Revision History
|
|
----------------
|
|
April 96 Paul Roy Created 'uprun', 'upcmp'
|
|
Novemver 96 Chris Peak Enhancements and addition of 'upsrun'.
|
|
|
|
|
|
Example 'upcmp' output
|
|
----------------------
|
|
This comes from running only the unixperf 'forkwait' test (i.e., the
|
|
test_list file had only "forkwait" in it), such as:
|
|
|
|
# test_list
|
|
# ---------
|
|
# forkwait run with default loop/repeat
|
|
#
|
|
forkwait
|
|
|
|
|
|
****************************** forkwait results ******************************
|
|
1: 1vs2/outdir1/forkwait/upout
|
|
2: 1vs2/outdir2/forkwait/upout
|
|
|
|
1 2 Operation
|
|
-------- ----------------- -----------------
|
|
425.0 421.0 ( 0.99) fork, child immediately exits, parent waits
|
|
|
|
|
|
Profiling Summary Run1 Run2 R2/R1
|
|
Sum of all cpus (sec) (sec)
|
|
-------------------------------------------------------
|
|
Number of cpu's 2 2
|
|
Elapsed time 1745.514 1825.126 1.05
|
|
Idle time 866.599 907.122 1.05
|
|
CPU time 876.987 915.842 1.04
|
|
|
|
|
|
CPU Time Analysis Run1 Run2 R2-R1 R2-R1/total R1 CPU
|
|
Function (cutoff=0.50%) (sec) (sec) (sec) (%)
|
|
-----------------------------------------------------------------------------
|
|
bcopy 146.961 169.565 22.604 2.58
|
|
user 24.763 25.867 1.104 0.13
|
|
mutex_spinunlock 23.952 25.037 1.085 0.12
|
|
avl_findanyrange 16.879 17.889 1.010 0.12
|
|
tossmpages 12.287 13.133 0.846 0.10
|
|
kmem_zone_alloc 10.696 11.522 0.826 0.09
|
|
copy_pmap 18.757 19.536 0.779 0.09
|
|
mutex_bitunlock 6.865 7.591 0.726 0.08
|
|
pmap_probe 8.818 9.538 0.720 0.08
|
|
mutex_spinlock 11.417 12.053 0.636 0.07
|
|
preg_avl_start 4.334 4.950 0.616 0.07
|
|
syscall 7.634 8.187 0.553 0.06
|
|
elocore_exl_3 4.151 4.665 0.514 0.06
|
|
VEC_tlbmod 9.752 10.244 0.492 0.06
|
|
procfork 10.823 11.264 0.441 0.05
|
|
nonactive_timers_add 5.044 5.468 0.424 0.05
|
|
pagealloc 5.298 5.713 0.415 0.05
|
|
segcommon_free 6.667 7.071 0.404 0.05
|
|
avl_insert_find_growth 4.926 5.306 0.380 0.04
|
|
freepreg 11.115 11.482 0.366 0.04
|
|
pagefreeanon 8.991 9.354 0.363 0.04
|
|
pageuseinc 8.618 8.965 0.347 0.04
|
|
_pagefree 13.261 13.596 0.336 0.04
|
|
collapse_pair 4.750 5.047 0.297 0.03
|
|
pfault 12.854 13.146 0.292 0.03
|
|
kmem_zone_free 4.746 5.018 0.272 0.03
|
|
exit 7.879 8.144 0.264 0.03
|
|
mutex_lock 7.301 7.519 0.218 0.02
|
|
attachreg 9.914 10.118 0.204 0.02
|
|
freeproc 5.831 5.995 0.164 0.02
|
|
segcommon_reserve 4.462 4.626 0.163 0.02
|
|
mrunlock 7.907 8.062 0.154 0.02
|
|
procdup 7.299 7.448 0.149 0.02
|
|
mrlock 15.901 16.027 0.127 0.01
|
|
freereg 4.743 4.857 0.113 0.01
|
|
chunk_find 4.817 4.914 0.097 0.01
|
|
dupreg 7.907 7.944 0.037 0.00
|
|
mr_unlock 7.415 7.437 0.022 0.00
|
|
allocpreg 5.466 5.434 -0.032 0.00
|
|
spl0 25.197 24.891 -0.306 -0.03
|
|
bzero 56.180 41.355 -14.826 -1.69
|
|
|
|
Notes: 1 msec profiling sample rate.
|
|
idle time calcualted as sum of "idle" and "checkRunq" functions.
|
|
|