(1) INTRODUCTION In order to discover where bru spends most of its time, and how inclusion of the macro based debugger package (dbug) affects execution, a version of bru, both with and without inclusion of the debugger, was profiled and the results are presented here. First some information about the machine and the version of bru used: Bru version: 4.8 Machine: Callan Unistar 200 Cpu: 68000 Cpu clock: 8 Mhz Wait states: 2 Archive device: MPI 8304 (640Kb floppy) Archive size: 272Kb Files in archive: 32 (2) TIMING TESTS The raw statistics from timing execution with the "time" command are: No dbug With dbug Mode Real User Sys Real User Sys ---- ----------------------- ----------------------- c 1:05.8 3.9 2.4 1:08.7 7.4 2.7 d 1:15.1 5.0 4.6 1:19.7 10.0 4.5 e 6.4 0.7 2.9 7.6 1.2 2.9 i 1:00.9 3.8 2.2 1:04.9 7.4 2.3 t 31.5 2.8 3.9 33.1 4.9 4.2 x 1:15.9 3.9 6.0 1:16.5 8.5 5.8 (3) PROFILING TESTS The profiling tests were run using the system V "prof" facility. The following table represents at least the top 10 functions for every mode (since they are not always the same ones the table contains more than 10 entries). Percent time spent in the function (dbug package used) Function c d e i t x -------- ----- ----- ----- ----- ----- ----- __doprnt 22.55 5.73 14.43 7.60 7.09 5.59 _chksum 19.57 37.16 0.00 49.54 25.59 42.57 _diff 0.00 19.27 0.00 0.00 0.00 0.00 _zeroblk 10.07 0.00 0.00 0.00 0.00 0.00 aldiv 6.10 0.69 2.32 0.61 1.18 0.25 lrem 5.11 0.46 3.61 0.00 9.06 0.25 _memcpy 3.26 0.92 1.55 1.82 0.79 0.25 mcount 2.55 0.23 1.29 0.61 1.57 2.02 _sprintf 2.13 0.00 0.00 1.22 0.39 0.00 _fromhex 0.00 2.52 0.00 3.04 4.33 2.02 smul 0.43 0.92 0.52 0.30 0.00 0.76 _tree_wa 0.14 0.00 3.87 0.00 0.00 0.00 _fwrite 1.13 0.46 3.87 0.91 2.36 0.76 __flsbuf 0.00 0.46 3.09 0.00 2.36 0.76 ldiv 0.43 0.46 2.58 0.00 1.57 0.76 _tohex 1.99 0.46 0.00 1.22 0.79 0.50 _memccpy 0.14 0.46 0.26 0.30 2.76 0.50 _ar_seek 0.57 0.69 0.00 0.91 0.39 1.26 _ar_read 0.00 0.46 0.00 0.61 0.00 1.01 (debugger functions) __db_key 5.25 10.09 19.33 11.85 13.78 16.62 __db_pri 1.42 1.83 3.35 2.74 2.36 3.37 __db_ent 1.56 5.05 7.22 3.04 4.72 4.03 __db_ret 3.26 4.59 8.25 2.13 4.72 3.27 Percent time spent in the function (dbug package not used) Function c d e i t x -------- ----- ----- ----- ----- ----- ----- __doprnt 31.91 5.86 29.18 10.90 9.44 8.02 _chksum 22.27 53.52 0.00 65.88 37.22 61.79 aldiv 8.35 0.39 6.01 0.95 1.11 0.94 _zeroblk 6.85 0.00 0.00 0.00 0.00 0.00 lrem 4.71 0.39 1.72 0.47 15.56 0.00 _memcpy 3.85 1.17 2.58 1.42 1.67 1.89 _sleep 2.14 0.78 0.43 0.00 0.00 0.00 mcount 2.14 0.00 2.58 1.42 2.78 0.94 _tohex 1.71 0.39 0.00 0.00 0.00 0.47 _sprintf 1.71 0.00 0.00 0.47 0.56 1.42 _diff 0.00 23.05 0.00 0.00 0.00 0.00 _s_fprin 0.43 5.86 0.00 0.95 2.22 0.00 _fromhex 0.00 1.56 0.00 5.21 3.33 6.60 _verbosi 0.43 1.17 0.86 0.95 0.00 0.47 __flsbuf 0.46 1.17 4.29 0.00 2.22 0.94 __xflsbu 0.21 0.78 3.86 0.95 0.00 0.00 _ar_seek 0.21 0.78 0.00 0.00 1.67 0.00 ldiv 0.21 0.78 6.44 0.95 1.67 0.94 _tree_wa 0.21 0.00 3.86 0.47 0.00 0.00 _fwrite 1.07 0.39 3.43 0.95 1.11 0.94 smul 1.07 0.78 3.00 0.95 1.11 1.89 _estimat 0.00 0.00 2.58 0.00 0.00 0.00 _endpwen 0.00 0.00 0.00 0.95 1.11 0.00 _gmtime 0.00 0.00 0.00 0.00 2.22 0.00 __cleanu 0.00 0.00 1.72 0.47 1.67 0.00 _s_signa 0.00 0.00 0.00 0.00 1.42 1.42 _write 0.43 0.00 1.72 0.47 1.42 1.42 (4) CONCLUSIONS 1. Because the elapsed (real) time is so much larger than the sum of the user and system times, bru appears to be mostly I/O bound. Thus performance will be highly dependent upon the speed of the peripherals used as the archive device. 2. Because bru is I/O bound, inclusion of the debugger does not significantly effect total execution time. However, note that approximately 10-30 percent of the time spent executing user code is spent in the debugger routines. Thus in a multiuser environment, inclusion of the debugger adds significant cpu burden to the system. 3. Depending upon mode, bru spends from 20-60 percent of its time computing block checksums in the routine "chksum". It might be worthwhile to recode this particular routine is assembly language for a specific implementation in a multiuser or multiprocessing environment. If bru is used mostly on a system in "single-user" mode and is the only significant process running, there is probably no performance advantage to be gained by recoding chksum since bru is I/O bound anyway.