147 lines
5.4 KiB
Plaintext
147 lines
5.4 KiB
Plaintext
(1) INTRODUCTION
|
|
|
|
In order to discover where bru spends most of its time, and
|
|
how inclusion of the macro based debugger package (dbug)
|
|
affects execution, a version of bru, both with and without
|
|
inclusion of the debugger, was profiled and the results are
|
|
presented here.
|
|
|
|
First some information about the machine and the version of
|
|
bru used:
|
|
|
|
Bru version: 4.8
|
|
Machine: Callan Unistar 200
|
|
Cpu: 68000
|
|
Cpu clock: 8 Mhz
|
|
Wait states: 2
|
|
Archive device: MPI 8304 (640Kb floppy)
|
|
Archive size: 272Kb
|
|
Files in archive: 32
|
|
|
|
|
|
(2) TIMING TESTS
|
|
|
|
The raw statistics from timing execution with the "time" command
|
|
are:
|
|
|
|
No dbug With dbug
|
|
|
|
Mode Real User Sys Real User Sys
|
|
---- ----------------------- -----------------------
|
|
|
|
c 1:05.8 3.9 2.4 1:08.7 7.4 2.7
|
|
d 1:15.1 5.0 4.6 1:19.7 10.0 4.5
|
|
e 6.4 0.7 2.9 7.6 1.2 2.9
|
|
i 1:00.9 3.8 2.2 1:04.9 7.4 2.3
|
|
t 31.5 2.8 3.9 33.1 4.9 4.2
|
|
x 1:15.9 3.9 6.0 1:16.5 8.5 5.8
|
|
|
|
|
|
(3) PROFILING TESTS
|
|
|
|
The profiling tests were run using the system V "prof" facility.
|
|
The following table represents at least the top 10 functions
|
|
for every mode (since they are not always the same ones the
|
|
table contains more than 10 entries).
|
|
|
|
|
|
Percent time spent in the function
|
|
(dbug package used)
|
|
|
|
Function c d e i t x
|
|
-------- ----- ----- ----- ----- ----- -----
|
|
|
|
__doprnt 22.55 5.73 14.43 7.60 7.09 5.59
|
|
_chksum 19.57 37.16 0.00 49.54 25.59 42.57
|
|
_diff 0.00 19.27 0.00 0.00 0.00 0.00
|
|
_zeroblk 10.07 0.00 0.00 0.00 0.00 0.00
|
|
aldiv 6.10 0.69 2.32 0.61 1.18 0.25
|
|
lrem 5.11 0.46 3.61 0.00 9.06 0.25
|
|
_memcpy 3.26 0.92 1.55 1.82 0.79 0.25
|
|
mcount 2.55 0.23 1.29 0.61 1.57 2.02
|
|
_sprintf 2.13 0.00 0.00 1.22 0.39 0.00
|
|
_fromhex 0.00 2.52 0.00 3.04 4.33 2.02
|
|
smul 0.43 0.92 0.52 0.30 0.00 0.76
|
|
_tree_wa 0.14 0.00 3.87 0.00 0.00 0.00
|
|
_fwrite 1.13 0.46 3.87 0.91 2.36 0.76
|
|
__flsbuf 0.00 0.46 3.09 0.00 2.36 0.76
|
|
ldiv 0.43 0.46 2.58 0.00 1.57 0.76
|
|
_tohex 1.99 0.46 0.00 1.22 0.79 0.50
|
|
_memccpy 0.14 0.46 0.26 0.30 2.76 0.50
|
|
_ar_seek 0.57 0.69 0.00 0.91 0.39 1.26
|
|
_ar_read 0.00 0.46 0.00 0.61 0.00 1.01
|
|
|
|
(debugger functions)
|
|
|
|
__db_key 5.25 10.09 19.33 11.85 13.78 16.62
|
|
__db_pri 1.42 1.83 3.35 2.74 2.36 3.37
|
|
__db_ent 1.56 5.05 7.22 3.04 4.72 4.03
|
|
__db_ret 3.26 4.59 8.25 2.13 4.72 3.27
|
|
|
|
|
|
|
|
|
|
Percent time spent in the function
|
|
(dbug package not used)
|
|
|
|
Function c d e i t x
|
|
-------- ----- ----- ----- ----- ----- -----
|
|
|
|
__doprnt 31.91 5.86 29.18 10.90 9.44 8.02
|
|
_chksum 22.27 53.52 0.00 65.88 37.22 61.79
|
|
aldiv 8.35 0.39 6.01 0.95 1.11 0.94
|
|
_zeroblk 6.85 0.00 0.00 0.00 0.00 0.00
|
|
lrem 4.71 0.39 1.72 0.47 15.56 0.00
|
|
_memcpy 3.85 1.17 2.58 1.42 1.67 1.89
|
|
_sleep 2.14 0.78 0.43 0.00 0.00 0.00
|
|
mcount 2.14 0.00 2.58 1.42 2.78 0.94
|
|
_tohex 1.71 0.39 0.00 0.00 0.00 0.47
|
|
_sprintf 1.71 0.00 0.00 0.47 0.56 1.42
|
|
_diff 0.00 23.05 0.00 0.00 0.00 0.00
|
|
_s_fprin 0.43 5.86 0.00 0.95 2.22 0.00
|
|
_fromhex 0.00 1.56 0.00 5.21 3.33 6.60
|
|
_verbosi 0.43 1.17 0.86 0.95 0.00 0.47
|
|
__flsbuf 0.46 1.17 4.29 0.00 2.22 0.94
|
|
__xflsbu 0.21 0.78 3.86 0.95 0.00 0.00
|
|
_ar_seek 0.21 0.78 0.00 0.00 1.67 0.00
|
|
ldiv 0.21 0.78 6.44 0.95 1.67 0.94
|
|
_tree_wa 0.21 0.00 3.86 0.47 0.00 0.00
|
|
_fwrite 1.07 0.39 3.43 0.95 1.11 0.94
|
|
smul 1.07 0.78 3.00 0.95 1.11 1.89
|
|
_estimat 0.00 0.00 2.58 0.00 0.00 0.00
|
|
_endpwen 0.00 0.00 0.00 0.95 1.11 0.00
|
|
_gmtime 0.00 0.00 0.00 0.00 2.22 0.00
|
|
__cleanu 0.00 0.00 1.72 0.47 1.67 0.00
|
|
_s_signa 0.00 0.00 0.00 0.00 1.42 1.42
|
|
_write 0.43 0.00 1.72 0.47 1.42 1.42
|
|
|
|
|
|
(4) CONCLUSIONS
|
|
|
|
|
|
1. Because the elapsed (real) time is so much larger
|
|
than the sum of the user and system times, bru
|
|
appears to be mostly I/O bound. Thus performance
|
|
will be highly dependent upon the speed of the
|
|
peripherals used as the archive device.
|
|
|
|
2. Because bru is I/O bound, inclusion of the debugger
|
|
does not significantly effect total execution time.
|
|
However, note that approximately 10-30 percent of
|
|
the time spent executing user code is spent in the
|
|
debugger routines. Thus in a multiuser environment,
|
|
inclusion of the debugger adds significant cpu
|
|
burden to the system.
|
|
|
|
3. Depending upon mode, bru spends from 20-60 percent of
|
|
its time computing block checksums in the routine
|
|
"chksum". It might be worthwhile to recode this
|
|
particular routine is assembly language for a
|
|
specific implementation in a multiuser
|
|
or multiprocessing environment. If bru is used
|
|
mostly on a system in "single-user" mode and
|
|
is the only significant process running, there is
|
|
probably no performance advantage to be gained by
|
|
recoding chksum since bru is I/O bound anyway.
|
|
|