.PH ""
.ce
PAWS Revision 3.22 Performance Review
.ce
Scott Bayes
.ce
March 6, 1989
.SP
.H 1 "Summary"
.SP
Revision 3.22 of PAWS does not attempt to optimize performance by
changing the way code is generated, or the way the OS works.  It
does attempt to utilize the increased performance of the new
68030/68882 hardware.
.SP
To take advantage of the added performance provided by the 68030 d-cache
(data cache), code was added to turn on this cache and flush it when
necessary.  This was the only performance enhancement made in revision
3.22.  This minor change produces speedup factors on the 360 and 370
between 1.1x and 3.7x the performance of revision 3.2 on the same
hardware.  Performance was only considered for compute-bound codes, as
PAWS still performs I/O only on DIO I (not DIO II) interfaces, and we believe 
that most I/O operations are not CPU bound.
.SP
Performance of models 330 and 350 showed no change from revision 3.2 to
revision 3.22. Similar results are expected for other older series 200 and
300 models.
.SP
.H 1 "Other Investigations"
.SP
We investigated improving performance in areas other than those related
directly to the 68030 hardware features.
.SP
We investigated and discarded "alignment" performance enhancements.  Some
newer series 300 models have the capability to load or store 32-bit data
in a single memory bus operation if they are aligned to start on a
longword boundary.  PAWS in the past has never attempted any alignment
of programs or data to more than word boundaries.  In PAWS, "properly"
aligned programs have the potential to execute significantly faster than
"improperly" aligned ones.  Speed differences approaching 3x have been
seen in some test cases on the 370.  Differences this large are not the
general rule; real-world code is unlikely to exhibit such large
variance, due to competing alignment effects among the many data
structures and procedures.
.SP
Our investigation showed that it is almost impossible to have the OS or
COMPILER align programs and data on longword boundaries while retaining
full backwards compatibility with previous 3.x revisions of PAWS. PAWS
Assembler routines make explicit assumptions about the COMPILER when
communicating with Pascal variables. Changing the COMPILER
packing rules would require the programmer to reimplement almost
every Assembly routine ever written for PAWS.
.SP
Other alignment strategies (heap alignment, stack alignment, local
variable alignment, etc) were considered and discarded.  All these
strategies have "random" performance effects on existing programs,
though they all provide the opportunity for the programmer to
increase the performance of his code over what is currently possible, if
he is willing to redesign his application. All have the potential to
break existing code in various ways. None seem to have the potential to
produce speed gains larger than a few percent.
.SP
.H 1 "The Numbers"
.SP
The benchmarking undertaken is not intended for publication outside HP.
The benchmarks were rather simplistic, as PAWS Lab and Marketing are
not sufficiently staffed to undertake formal benchmarking. The benchmarks
are intended to provide a sanity check on the value of the 3.22
performance enhancements.
.SP
Three benchmark programs were run:
.SP	
.AL
.LI
INSTTIME - written by Dave willis, it is a simple doubly nested for-loop, 
containing an integer divide, and two global variable
assignments. This program is very vulnerable to misalignment of
the integer variables it contains, and generally shows the largest
variability in execution speed due to longword alignment.
.LI
dhry - the dhrystone benchmark, as implemented by Siemens AG and
loaned to HP.
.LI
wirth - a catchall program that does a little of everything, loop
timing, real math, array operations, procedure calls, and linked
list/memory management tasks. Also from Siemens AG.
.LE
.H 2 "Existing Hardware"
.SP
330/350
.SP
3.2 and 3.22 showed no significant performance differences for any of the
above programs, compiled either with COMPILER or with COMP20. In other words,
the 3.22 revision did not affect performance on currently supported hardware.
.SP
.H 2 "Speedups due to raw Hardware speed"
.SP
360
.SP
The 360 running 3.22 showed performance improvements over the 330,
ranging from about 1.7x to about 2.1x. The 330 runs at about 2960
dhrystones, and the 360 at about 6320 dhrystones.
.SP
370
.SP
The 370 running 3.22 showed performance improvements over the 350,
ranging from about 1.5x to about 1.7x. The 350 runs at about 5350
dhrystones, and the 370 at about 9300 dhrystones.
.SP
.H 2 "Speedup due to New OS"
.SP
The 360 is faster in 3.22 than in 3.2 (which is unsupported on 360.)
The speedup factor ranges from about 1.1x to about 1.4x.  This speedup is
mostly due to turning on d-cache.
.SP
The 370 is faster in 3.22 than in 3.2 (which is unsupported on 370.)
The speedup factor ranges from about 1.7x to about 3.7x.  This speedup is
due to turning on d-cache, and to the fact that because d-cache is
turned off in 3.2, there is no caching anywhere in a 370 system running
3.2; the 370 running 3.2 can actually be slower than a 350.
.SP
.H 1 "Conclusion"
.SP
Enhancing performance of PAWS in 3.22 for the 68030 CPU was worthwhile,
and is not detrimental to the performance of other SPUs running the
OS.
