







			A few notes about Catseye ...



			  Updated Aug 13, 1987





				PURPOSE

This document contains a few hints about writing device driver level code
for the Catseye family of display boards.  It assumes that the reader has
familiarity with graphics and has read the ERS's for the Catseye board and
the RUG, BARC and IRIS chips.  A simple graphics library shows example code
to control the display boards.  To see how the Catseye board was used in the
series 300 HP-UX environment, refer to the Starbase device driver and associated
IRS and ERS written by John Waitz.

The following nomenclature will be used: HRC stands for the High Resolution
Catseye (1280 x 1024 x 10 planes), LCC stands for the Low Cost Color board
(1024 x 768 x 6) and Mono stands for the Monochrome board.  When talking
about the overlay capability, primary planes refer to the main set of
drawing planes (8 for HRC, 4 or 6 for LCC and 1 for Mono).  Overlay planes
refer to the planes attached to the overlay inputs of IRIS (2 for HRC, 0
or 2 for LCC and none for Mono).


		WHAT CAN I EXPECT WHEN I START TO USE IT?

The Display ID ROM on each display card allows the BOOT ROM to initialize
all video and timing parameters in IRIS and the following color map entries:
	
	HRC: 0 - black, 1 - white, 255 - white
	LCC: 0 - black, 1 - white, 63 - white
	Mono: 0 - black, 1 - white


BARC's are set with all primary planes enabled to be written to, primary 
WRR and PRR set to SOURCE, all other replacement rule registers set to 0.
BARC is set to the blit mode for TOPCAT compatibility.  Only the least 
significant plane is enabled to be displayed.  The LCC board is set so that
all 6 planes are primary planes.

RUG is set to blit mode for TOPCAT compatibility and page mode operation.


	  JUST HOW COMPATIBLE WITH HEXAGON IS THE LCC ANYWAY?

The Catseye chip set will acknowledge reads and writes from all registers
defined in the Hexagon ERS, although some registers will have no effect.
The functions the Catseye does not duplicate are:

	-There is no hardware cursor.
	-The secondary ID is different.
	-There is only one block mover so any software using TRIGGER
	 to start block moves on only certain planes will not work.
	 Use FBEN to get the same effect.
	 NOTE: The Catseye block mover will be triggered by any write
	 to TRIGGER while TOPCAT based displays require a 1 in the bit
	 position of the block mover to trigger.
	-There is no hardware CRC check of the frame buffer.
	-There is no second frame buffer (Hexagon is 2048 x 1024, LCC
	 is 1024 x 1024).  Writing to the ALTFRAME register will have
	 no effect.
	-No shadow or hidden registers were duplicated so software using
	 these registers will NOT work.  But of course I know no HP software
	 writer would ever use a non documented feature.
	-Reading from unimplemented planes (6 and 7) returns undefined
	 results.  In Hexagon reading unimplemented planes returns 0.
	-The Hexagon board will always DTACK for register writes, although
	 writing to the board while the TOPCAT is busy may cause unpredictable
	 results and has always been avoided.  The Catseye board will not DTACK
	 if a register in the RUG space is written while RUG is busy.
	-Due to a bug in BARC, reading the TCREN, TCWEN and FBEN registers
	 in compatibility space does not work.  To get the correct result,
	 these registers must be read in new space at 0x4504, 0x4508 and
	 0x4500.
	-The Vertical retrace interrupt request register (0x4048) and 
	 vertical retrace interrupt enable register (0x4049) are not
	 supported.


Both RUG and BARC must be set into Blit mode to be compatible.  Write
0x0090 to COMMAND in RUG and 0x00 to VB in BARC.


				USING IRIS

Assuming that all timing and video parameters have been loaded, IRIS is
easy to use.  The routines DefineColorTable, ReadColorTable, BlinkPlanes,
EnableTransparency and DisplayEnable show how use the chip.  You must
always be sure to wait on the IRIS status register when writing or reading
color map entries.


				USING BARC

BARC is easy to use if you like indirect action, otherwise it is like any
other custom (read that: for inhouse use only) chip.  The first thing to
realize is that there can be two banks of BARCs (the case for HRC) and they
have separate register sets, except for the VB, TCNTRL, ACNTRL, PNCNTRL and
ID registers which are at the same address for both banks.  A write to any
of these registers will affect all BARCs.  All other registers are unique
to each bank of BARCs.

The following shows what registers control data manipulation:

	Drawing Primitives:
	------- -----------

	if TCNTRL = 0x00
		if VB = 0x00
			Use WRR to control replacement rule
			Contents of BARCs source register file used as source.
		else 
			Use PRR to control replacement rule
			Use COLOR to control what is written into the FB
	else
		Use TRR to control replacement rule
		The Pattern RAM is used as the third operand


	Block Moves:
	----- ------

	Use PNCNTRL to select whether the block move is to be within planes
	(how block moves have worked in the past) or is be between planes
	with one plane acting as a source and one or more others as the
	destination.  Use FBEN to set the destination plane mask.

	if TCNTRL = 0x00
		if VB = 0x00
			Use WRR to control replacement rule
			Contents of BARCs source register file used as source.
			(register file loaded by source rectangle)
		else 
			Use PRR to control replacement rule
			Use COLOR to control what is written into the FB
	else
		Use TRR to control replacement rule
		The Pattern RAM is used as the third operand


	Block Read/Write to/from Main Memory:
	----- ---------- ------- ---- -------

	Use ACNTRL to select whether the transfer will be byte or bit per
	pixel.  For HRC use ACNTRL to select either the overlay or primary
	frame buffer.

	if TCNTRL = 0x00
		Use PRR to control replacement rule
	else
		Use TRR to control replacement rule


At all times FBEN controls what planes may actually be modified.  This leads 
to an interesting and VERY, VERY IMPORTANT performance note.  All BARCs
always perform ALU operations on data from the frame buffer.  The FBEN then
controls what data is written back out to the frame buffer.  The hardware was
carefully designed so operations always take the least amount of time
possible.  For example, the ZERO replacement rule operates in half the time
of the SOURCE replacement rule and and a third of the time as the XOR 
replacement rule.  For this reason all BARCs communicate to RUG and the GLAD
bus controller what kind of replacement rule (clear, source only, read modify
write) is in effect.  The board will then issue cycles for the slowest 
replacement rule, REGARDLESS if the BARC with the slowest replacement rule is
actually modifying the frame buffer or not.  Thus it is up to you, the hotshot
programmer, to assure that BARCs which are not being used have replacement
rules of 0.

Also remember that for compatibility, the significant bits of WRR are from
0-7 of a word write whereas all other registers in BARC have significant bits
from 8-15 of a word write.


				USING RUG

As one of the designers of RUG, I can assure you this chip is a joy to use -
except for a few eccentricities.

Internally RUG represents numbers in 12 bit words giving it a range of -2048 to
2047 pixels.  However, to assist clipping of filled circles, the range was
skewed to -1024 to 3071 pixels.  All coordinate values written to the chip 
must fall within this range.  If not, truncation will occur and the primitive
drawn will be incorrect.

There are limitations as discussed below:

	Vectors: The absolute delta x or delta y between the two coordinates
	of the vector must be less than 2048 pixels in length.

	Circles: The radius must be less than 1024 pixels.  Incorrect clipping
	will occur for filled circles when:
	
			xcenter - radius < -1024

	Polygon fill: The absolute delta x or delta y between any two
	coordinates of a triangle must be less than 2048 pixels in length.

	Blit: The maximum width or height is 2048 pixels.

If clipping is enabled during a blit, RUG will clip writes but not reads.

You must always check the status of RUG BUSY before attempting to change
any register other than XSRC, YSRC, XDST or YDST.  RUG prohibits modification
of registers while operating and you may get a bus timeout. The source and
destination registers may be written when RUG RFD is true.  Remember, both
status bits are available in the Catseye board STATUS register.  Oh yeah, also
check RUG BUSY before mucking with any registers in IRIS or BARC that would
change the state of the board, e.g. changing the COLOR register while RUG
is drawing a line.  I can't tell you how many times this bug bit me when
writing the example library.

When the Fastcat product is used with Catseye, all device drivers must check
RUG BUSY, RUG RFD and Fastcat status bits to assure that an operation is
not in progress or will be started in the near future.  The reason for checking
so many bits is that an operation can be queued up in the Fastcat Transform
Engine (FATE) while RUG is not busy.  Once queued up, there is no way to stop
FATE from starting an operation in RUG.

Remember to save and restore the LTYPE and LTP registers when changing hw state.
If you (or any other process) are interested in using the picking support
remember to also save and restore the state of the pick bit in the COMMAND
register.


			USING OVERLAY PLANES

The Monochrome board is the easiest - it doesn't support overlay planes.

The HRC board is the next easiest, it has a separate BARC controlling two
dedicated overlay planes.  Note that the scratch plane also has a BARC to 
itself.  These two BARCs occupy the second BARC bank.  As shown in the example
library, these BARCs are treated exactly the same as the BARCs controlling the
primary planes, except they are addressed in the 0x47XX space.  The OVLCNTRL
register in IRIS controls display enable, modification of and transparency
(of overlay color 0) of the overlay planes.

The LCC board is more interesting.  Planes 4 and 5 are connected to both
primary and overlay inputs of IRIS.  Thus if bits 4 and 5 of PMASK are
set then the planes are displayed as part of the primary set.  If bits 0 and 1
or OVLCNTRL are set, planes 4 and 5 are displayed as overlay planes.  If you
write 0x0f to PMASK and 0x03 to OVLCNTRL the board is configured to have 4
primary planes and 2 overlay planes.  Writing 0x3f to PMASK and 0x00 to OVLCNTRL
will configure the board for 6 primary planes.  FBEN is used to select what
planes to modify (0x0f for primary, 0x30 for overlay, 0x3f for primary with
no overlay).  Be sure to left shift the color information by 4 when writing
the COLOR register for the overlay planes.  Data being written from the CPU
into the overlay planes must also be left shifted by 4.  TCWEN is used to
assure that only the appropriate BARCs allow modification of the replacement
rules.


	WHERE IS THAT SCRATCH PLANE ANYWAY?  (AND MAYBE, WHAT IS IT?)

The scratch plane is used for drawing polygons and can also be used for
aligning packed pixel (bit per pixel) reads and writes.  Basically, for those
of you who have forgotten (or haven't read any of the ERS'), polygons are
generated by drawing them, in XOR MODE, three endpoints at a time in the scratch
plane, then edging the polygon and blitting it through the COLOR register or
pattern RAM into the primary or overlay planes.

The scratch plane is located in three separate places of course, depending on
the board.  It is plane 1 for the Monochrome board and is connected to the
plane 1 input if IRIS and may be displayed.  The scratch plane is plane 10
(plane 3 of the second BARC bank) in the HRC board.  It can not be displayed.
As usual, the LCC board is more interesting.  The scratch plane shares a BARC
with plane 5.  By manipulating the Catseye board STATUS register, either the 
scratch plane or plane 5 can be independently read from or written to.  Be 
sure to reset the STATUS register after using the scratch plane in the LCC 
board.  The scratch plane in the LCC board can not be displayed.

NOTE: The scratch plane is to be used for operations performed by the device
driver and should NEVER be displayed (except maybe for debugging).  The
customer should not know that the Monochrome board has an extra displayable
plane.  If they were to find out, it could lead to compatibility problems 
with future display cards.  Anybody letting this cat out of the bag will be
put on software maintenence for the rest of their life (and how you will 
emulate in software a frame buffer plane that does not exist, I haven't a clue).


		FIGURING THREE OPERAND REPLACEMENT RULES

Three operand replacement rules allow the inclusion of a 16 x 16 pixel pattern
RAM into source to destination operations.  For a complete description of the
usage and calculation of three operand replacement rules consult the IRS
John Waitz created for his Starbase device driver.  But for those cases when
you are interested in quickly creating a three operand replacement rule for
the TRR register in BARC, use the method described below (Heaven help you if
you try to use the RPN format specified by some naive standards committee).
Basically what you do is to create a truth table of all possible Pattern,
Source and Destination (P, S and D) values.  Then using the operation and
replacement rule you desire calculate the result for each row.  This result
then becomes the three operand replacement rule.

For example, assume you want to generate the rule: 

	If S
	    then P <op> D
	else
	    D

where <op> is one of the 16 primary CGI replacement rules (the ones where 
zero is ZERO, three is SOURCE, six is XOR and fifteen is ONE).

    P S D    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
    -----------------------------------------------------------------------
    0 0 0    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  LSB
    0 0 1    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
    0 1 0    0   0   0   0   0   0   0   0   1   1   1   1   1   1   1   1
    0 1 1    0   0   0   0   1   1   1   1   0   0   0   0   1   1   1   1
    1 0 0    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    1 0 1    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
    1 1 0    0   0   1   1   0   0   1   1   0   0   1   1   0   0   1   1
    1 1 1    0   1   0   1   0   1   0   1   0   1   0   1   0   1   0   1  MSB
             --------------------------------------------------------------
   TRR (hex) 22  a2  62  e2  2a  aa  6a  ea  26  a6  66  e6  2e  ae  6e  ee

By the way, this is the rule you would use in blitting a finished polygon
from the scratch plane to the visible planes using patterns loaded in BARC's
pattern RAM.


			ESTIMATING OPERATION TIMES

Catseye operation times may be calculated in terms of GLAD bus cycle times.
When RUG is drawing a primitive, it can modify 16 pixels in a 16 x 1 tile every
cycle it gets.  VRAM/screen refresh and SPU register/frame buffer accesses
have higher priority in obtaining GLAD cycles than RUG and will affect 
performance.  RUG cycles take different amounts of time depending on the
slowest selected replacement rule in the BARCs.  There are two read cycles
(std read and BP read) and two different write cycles (std write and RMW).
In certain cases Page Mode (PM) cycles are used to increase performance.

	Cycle     |	Time	Condition
	----------+-------------------------------------------------------
	 std read |	320 nS	first, second, last and any non PM cycles
		  |	160 nS	page mode cycles
		  |
	std write |	320 nS	first, last and any non PM cycles
		  |	400 nS	second during PM (first during vertical vectors)
		  |	160 nS	remaining page mode cycles
		  |
	      RMW |	480 nS 	first, second, last and any non PM cycles
		  |	320 nS	page mode cycles
		  |
	  BP read |	400 nS	first, second, last and any non PM cycles
		  |	240 nS	page mode cycles
			


	RMW = Read Modify Write cycle (e.g. XOR replacement rule)
	BP read = Between planes blit read

RUG can do page mode cycles for horizontal scan lines longer than 48 pixels and
for vertical vectors.  When doing horizontal scan lines (whether part of a
vector, a circle or a polygon fill) the first and second cycles are always
full cycles (except for standard writes where the second cycle is 80 nS longer
than a full cycle). The last cycle is always a full cycle also.  Any horizontal
scan line that takes more than three cycles can utilize page mode cycles for
all cycles in between the second and last cycle. 

Blits are a little more complicated.  BARC was optimized to cache 256 pixels of
information so as to utilize page mode cycles.  During a Blit, RUG issues a
dummy write cycle (which takes 320 nS), up to 16 read cycles and then
the same (or less, if clipped) number of write cycles.  The number of read 
and write cycles depends on the width of the blit and whether clipping is being
performed on the data written back to the frame buffer.  The operation time for
the read and write cycles may be calculated exactly like that described above 
for vectors.  If a between planes blit has been enabled, the read cycles are
400 nS for the first, second and last cycle and 240 nS for all cycles 
in between.  Blits in which there is no source (e.g. WRR = 0 or 15) do not
have any read cycles.

The Catseye frame buffer organization allows a 31 % increase in performance
on vertical vectors.  There are four pixels per page (of VRAM) in a vertical
vector.  In a given page, the first cycle is always a full cycle (except for
standard writes where it is a 400 nS cycle).  The next three cycles are page
mode cycles.

Catseye performs screen refresh and RAM refresh concurrently.  This cycle
requires 560 nS every 15.625 uS for the HRC and Mono boards and every 10.851 uS
for the LCC board.  Since these cycles have priority over all other cycles,
this results in a net performance decrease of 3.6 % for the HRC and Mono
boards and 5.2 % for the LCC board.  Note that this is an average performance
decrease.  Drawing operations that occur within one refresh cycle can occur
without any performance degradation.  When a drawing operation is in progress
and is currently doing page mode cycles, any refresh cycles will cause a
restart, i.e. the next RUG cycle will be a full cycle.

Once triggered, RUG requires some overhead to setup internal registers and
to fill its internal pipe.  Typical overhead times (from trigger) are shown
below:

	Operation	Overhead time
	-----------------------------
	Vectors		Horizontal: 320 nS; Other: 480 nS
	Circles		800 nS
	Area Fill	Per triangle: 3.4 uS
	Blits		640 nS

Following are examples showing performance calculations for the HRC board.

	Example 1: Horizontal vector (PRR = SOURCE, Length = 71 pixels)

		Assume the horizontal vector requires 6 cycles (it is non
		aligned).  Also assume that no refresh cycles occur during 
		this operation.

		RUG Overhead: 320 nS
		Cycle time: 320 + 400 + 160 + 160 + 160 + 320 = 1.52 uS

		Total time = 320 nS + 1.52 uS = 1.84 uS

	Example 2: Vertical vector (PRR = XOR, Length = 27 pixels)

		Assume that the vector is aligned so that the first pixel
		is the first pixel in a RAM page.  Also assume that a
		screen refresh occurs after the 21st pixel.

		RUG Overhead: 480 nS
		Pixels 1 - 21: 5 * (480 + 320 + 320 + 320) + 480 = 12 uS
		Refresh time: 560 nS
		Pixels 22 - 27: 2 *(480 + 320 + 320) = 2.24 uS

		Total time = 15.28 uS

		Note that if the replacement rule is SOURCE, the
		drawing time for the four pixels in a page would 
		be 400 + 160 + 160 + 160 nS.

	Example 3: Between planes Blit (WRR = SOURCE, w = 258, h = 16)

		This example will estimate the refresh time by multiplying
		the final time by 1.036 instead of adding it in directly.
		Assume that the read is aligned (takes 16 cycles for 256 
		pixels) but the write is not aligned (takes 17 cycles).

		RUG Overhead: 640 nS

		Time for one scan line: 8.8 uS
		    First 256 pixels: 7.76 uS
			Dummy write cycle: 320 nS
			Read cycles: 400 + 14 * 240 + 400 = 4.16 uS
			Write cycles: 320 + 400 + 14 * 160 + 320 = 3.28 uS
		    Remaining 2 pixels: 1.04 uS
			Dummy write cycle: 320 nS
			Read Cycle: 400 nS
			Write Cycle: 320 nS

		Total = 1.036 * (640 nS + 16 * 8.8 uS) = 146.53 uS

To remain independent of SPU or Fastcat considerations, all register and frame
buffer accesses through the Catseye interface must be measured from the time
the address is decoded and a cycle request generated to the time the Catseye
board issues DTACK.  Since all accesses must by synchronized by a flip flop
an additional 40 nS may be added to the cycle time.

	Cycle		Time + synchronization
	--------------------------------------
	Register read	320 + 40 = 360 nS
	Register write	320 + 40 = 360 nS
	FB read		440 + 40 = 480 nS
	FB write	440 + 40 = 480 nS
	FB RMW		560 + 40 = 600 nS      (e.g. XOR RR)

To calculate the time of interest for your CPU, you must add any bus and/or
instruction overhead to the above numbers.


			MISCELLANEOUS NOTES

Although the state of RUG BUSY can be obtained be checking the RUG STATUS
register, you should always check the Catseye STATUS register.  Every time
a register is read in one of the chips while a drawing operation is
taking place, the operation is interrupted because register accesses have
a higher priority than RUG cycles.  If the CPU is fast enough, constant 
access of a chip register can almost shut down operation of the board.

Adding an offset of 0x100 to any register write to RUG triggers the chip and
eliminates the need to write to the separate trigger register.



				KNOWN BUGS

IRIS : IRIS may miss read or write triggers if it is re-triggered within
       1.86 uS after the IRIS busy bit is cleared.  This could occur back
       to back triggers were used (e.g. clearing the color map).  Normal
       operation, where the RED, GREEN, BLUE and RAMADDR (RSEL for overlay)
       registers are written or read between triggers is sufficient to 
       prevent any missed triggers.

BARC : Due to a decode circuitry bug, the BARC at ID location 2 (the third
       BARC chip) will drive all 16 data lines when the TCREN, TCWEN or FBEN
       registers are read in compatiblity space.  Writing these registers
       operates correctly and the correct data is latched into the part.
       An example is shown below.

	   write 0x408c with 0x0100
	   read 0x408c (result will be 0x55-- instead of 0x01--)
	   read 0x4504 (result will be 0x01--)

       Because the BARC at ID location 2 will drive data lines that are being
       driven by other BARCs, reading from the compatibility registers may
       eventually damage one or more BARC chips and should be avoided.

       Also due to a decode bug, the ACNTRL, PNCNTRL and TCNTRL registers
       in an upper bank BARC cannot be read in the 0x47xx space.  Since they 
       are equivalent to the 0x45xx space registers, the correct value may
       be read from the 0x45xx space.

RUG  : The special case for handling vertical vectors in page mode does not
       operate correctly when the vector is clipped on a non-page boundry.
       Page boundries occur every 4 scan lines in the Catseye boards.  When
       a clip limit is set so that a vertical vector (with page mode and
       clipping enabled) is clipped on some scan line other than a page
       boundry, an extra cycle is generated causing pixels to be set in
       the word following the word that held the last pixel drawn in the
       vector.  The most obvious fix is to disable page mode cycles
       entirely.  However, this would slow down all operations that could
       take advantage of horizontal page mode cycles (e.g. blits, fill).  A 
       second, more preferable solution is to set the scan lines/page field 
       in the CONTROL register in RUG to 1 scan line/page.  This allows all
       horizontal cycles to operate with page mode but disables vertical
       page mode cycles.  Even more performance can be gained if the
       scan lines/page field is changed to 4 scan lines/page when
       clipping is disabled and 1 scan line/page when clipping is 
       enabled.  An example of how to do this is shown below:

	   if (clipping_enabled)
	       *((unsigned short)(cntrl + CONTROL)) |= 0x0083;
           else
	       *((unsigned short)(cntrl + CONTROL)) &= 0x007c;

       This code example maintains the status of all other bits in the
       CONTROL register.





				LIBRARY STATE

Maintaining the correct state in the Catseye chip registers can often 
increase performance by eliminating unneccessary register writes in
each routine.  I have attempted to demonstrate this concept with the
example graphics library.  The state expected by each routine is shown
below.  

	Catseye Board Registers
	    STATUS: 0 (for LCC only)

	IRIS Registers
	    RAMADDR: undefined
	    RED, GREEN, BLUE: undefined
	    WTRIG, RTRIG: undefined
	    PMASK: current primary planes display mask
	    BLINK: current primary planes blink mask
	    OVLCNTRL: current overlay display, blink and transparency mask
	    RSEL: 0 (so modifiable cmap ram is for primary planes)

	BARC Registers
	    Pattern registers: current pattern
	    FBEN0, FBEN1: currently selected plane set (either primary planes
			  or overlay planes) with write mask
			  other plane set, scratch plane with 0
	    PRR0, PRR1: currently selected plane set with drawing mode
			other plane set, scratch plane with 0
	    WRR0, WRR1: currently selected plane set with drawing mode
			other plane set, scratch plane with 0
	    TRR0, TRR1: currently selected plane set with fill_three_op 
			other plane set, scratch plane with 0
	    TCREN0, TCREN1: set to read plane 0
	    TCWEN0, TCWEN1: set to allow modification of currently selected
			    plane set
	    COLOR0, COLOR1: set to line drawing color
	    VB: set to vector mode
	    TCNTRL: 0
	    ACNTRL: 0
	    PNCNTRL: 0

	RUG Registers
	    COMMAND: undefined, except pick information maintained
	    CONTROL: current clip enable, page mode operation enable
	    XSRC: undefined
	    YSRC: undefined
	    XDST: undefined
	    YDST: undefined
	    FX: undefined
	    FY: undefined
	    LTYPE: current active linetype
	    LTP: current active line type pointer
	    CLIPXMIN: current active xmin clip limit 
	    CLIPYMIN: current active ymin clip limit
	    CLIPXMAX: current active xmax clip limit 
	    CLIPYMAX: current active ymax clip limit
