Advanced Micro Devices

Am29050™ Microprocessor
User’s Manual

© 1991 Advanced Micro Devices, Inc.

Advanced Micro Devices reserves the right to make changes in its products without notice in order to improve design or performance characteristics.

This publication neither states nor implies any warranty of any kind, including but not limited to implied warranties of merchantability or fitness for a particular application. AMD® assumes no responsibility for the use of any circuitry other than the circuitry embodied in an AMD product.

The information in this publication is believed to be accurate in all respects at the time of publication, but is subject to change without notice. AMD assumes no responsibility for any errors or omissions, and disclaims responsibility for any consequences resulting from the use of the information included herein. Additionally, AMD assumes no responsibility for the functioning of undescribed features or parameters.
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Preface</td>
<td>Introduction and Overview</td>
<td>P-1</td>
</tr>
<tr>
<td></td>
<td>Design Philosophy</td>
<td>P-1</td>
</tr>
<tr>
<td></td>
<td>Optimum Performance</td>
<td>P-1</td>
</tr>
<tr>
<td></td>
<td>Performance Leverage</td>
<td>P-2</td>
</tr>
<tr>
<td></td>
<td>Conclusion</td>
<td>P-2</td>
</tr>
<tr>
<td></td>
<td>Am29050 Microprocessor User Manual Overview</td>
<td>P-3</td>
</tr>
<tr>
<td>Chapter 1</td>
<td>Features and Performance</td>
<td>1-1</td>
</tr>
<tr>
<td></td>
<td>1.1 Distinctive Characteristics</td>
<td>1-1</td>
</tr>
<tr>
<td></td>
<td>1.2 Introduction</td>
<td>1-1</td>
</tr>
<tr>
<td></td>
<td>1.3 Performance Overview</td>
<td>1-2</td>
</tr>
<tr>
<td></td>
<td>1.3.1 Cycle Time</td>
<td>1-2</td>
</tr>
<tr>
<td></td>
<td>1.3.2 Four-Stage Pipeline</td>
<td>1-2</td>
</tr>
<tr>
<td></td>
<td>1.3.3 System Interface</td>
<td>1-3</td>
</tr>
<tr>
<td></td>
<td>1.3.4 Register File</td>
<td>1-4</td>
</tr>
<tr>
<td></td>
<td>1.3.5 Instruction Execution</td>
<td>1-5</td>
</tr>
<tr>
<td></td>
<td>1.3.6 Branch Target Cache™ Memory</td>
<td>1-5</td>
</tr>
<tr>
<td></td>
<td>1.3.7 Branching</td>
<td>1-5</td>
</tr>
<tr>
<td></td>
<td>1.3.8 Loads and Stores</td>
<td>1-6</td>
</tr>
<tr>
<td></td>
<td>1.3.9 Memory Management</td>
<td>1-7</td>
</tr>
<tr>
<td></td>
<td>1.3.10 Interrupts and Traps</td>
<td>1-8</td>
</tr>
<tr>
<td></td>
<td>1.4 Optimizing Compilers</td>
<td>1-8</td>
</tr>
<tr>
<td></td>
<td>1.4.1 Optimizing-Compiler Overview</td>
<td>1-8</td>
</tr>
<tr>
<td></td>
<td>1.4.2 Optimizing-Compiler Operation</td>
<td>1-9</td>
</tr>
<tr>
<td></td>
<td>1.4.3 The Am29050 Microprocessor and Optimizing Compilers</td>
<td>1-10</td>
</tr>
<tr>
<td>Chapter 2</td>
<td>Architecture Highlights</td>
<td>2-1</td>
</tr>
<tr>
<td></td>
<td>2.1 Programmer Reference Overview</td>
<td>2-1</td>
</tr>
<tr>
<td></td>
<td>2.1.1 Program Modes (see Section 3.1)</td>
<td>2-1</td>
</tr>
<tr>
<td></td>
<td>2.1.2 Visible Registers (see Section 3.2)</td>
<td>2-1</td>
</tr>
<tr>
<td></td>
<td>2.1.3 Instruction Set Overview (see Section 3.3 and Chapter 8)</td>
<td>2-5</td>
</tr>
<tr>
<td></td>
<td>2.1.4 Data Formats And Handling (see Section 3.4)</td>
<td>2-6</td>
</tr>
<tr>
<td></td>
<td>2.1.5 Interrupts And Traps (see Section 3.5)</td>
<td>2-11</td>
</tr>
<tr>
<td></td>
<td>2.1.6 Memory Management (see Section 3.6)</td>
<td>2-12</td>
</tr>
<tr>
<td></td>
<td>2.1.7 Coprocessor Programming (see Section 6.1)</td>
<td>2-13</td>
</tr>
<tr>
<td></td>
<td>2.1.8 Timer Facility (see Section 7.3.6)</td>
<td>2-13</td>
</tr>
<tr>
<td></td>
<td>2.1.9 Trace Facility (see Section 3.7)</td>
<td>2-13</td>
</tr>
<tr>
<td></td>
<td>2.2 Hardware Overview</td>
<td>2-13</td>
</tr>
<tr>
<td></td>
<td>2.2.1 Four-Stage Pipeline (see Section 4.1)</td>
<td>2-13</td>
</tr>
<tr>
<td></td>
<td>2.2.2 Instruction Fetch Unit (see Section 4.2)</td>
<td>2-14</td>
</tr>
<tr>
<td></td>
<td>2.2.3 Execution Unit (see Section 4.3)</td>
<td>2-15</td>
</tr>
<tr>
<td></td>
<td>2.2.4 Memory Management Unit (see Section 4.4)</td>
<td>2-16</td>
</tr>
<tr>
<td></td>
<td>2.2.5 Processor Modes</td>
<td>2-16</td>
</tr>
</tbody>
</table>
### Chapter 3
#### System Interface

2.3 System Interface Overview .................................. 2-17

2.3.1 Channel (see Section 5.2) .................................. 2-17

2.3.2 Test/Development Interface (see Section 5.3) ............. 2-18

2.3.3 Clocks (see Section 5.7) .................................. 2-19

2.3.4 Master/Slave Operation (see Section 5.8) ................. 2-19

2.3.5 Coprocessor Attachment (see Section 6.2) ............... 2-19

#### Programmer Reference

3.1 Program Modes ............................................. 3-1

3.1.1 Supervisor Mode ....................................... 3-1

3.1.2 User Mode ............................................ 3-1

3.1.3 Monitor Mode .......................................... 3-2

3.2 Visible Registers ........................................... 3-2

3.2.1 General-Purpose Registers .............................. 3-3

3.2.2 Floating-Point Accumulator Registers .................... 3-7

3.2.3 Special-Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . . . .. 3-7

3.2.4 TLB Registers ...................................... 3-32

3.3 Instruction Set ............................................ 3-35

3.3.1 Integer Arithmetic ................................... 3-35

3.3.2 Compare ........................................... 3-35

3.3.3 Logical ........................................... 3-38

3.3.4 Shift ............................................. 3-38

3.3.5 Data Movement .................................... 3-38

3.3.6 Constant .......................................... 3-38

3.3.7 Floating-Point ...................................... 3-40

3.3.8 Branch ........................................... 3-40

3.3.9 Miscellaneous ...................................... 3-40

3.3.10 Reserved Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3-40

3.4 Data Formats And Handling ................................ 3-40

3.4.1 Integer Data Types ................................ 3-40

3.4.2 Floating-Point Data Types ............................ 3-44

3.4.3 Special Floating-Point Values ......................... 3-45

3.4.4 External Data Accesses .............................. 3-46

3.4.5 Addressing and Alignment ............................ 3-51

3.4.6 Byte and Half-Word Accesses ......................... 3-54

3.5 Interrupts and Traps ....................................... 3-58

3.5.1 Interrupts .......................................... 3-58

3.5.2 Traps .................................................. 3-58

3.5.3 Wait Mode ........................................ 3-59

3.5.4 Vector Area ........................................ 3-59

3.5.5 Interrupt and Trap Handling ......................... 3-60

3.5.6 WARN Trap ........................................ 3-64

3.5.7 Monitor Trap ....................................... 3-65

3.5.8 Sequencing of Interrupts and Traps .................. 3-66

3.5.9 Exception Reporting and Restarting ................. 3-66

3.5.10 Arithmetic Exceptions ............................. 3-69

3.5.11 Exceptions During Interrupt and Trap Handling ........ 3-70

3.6 Memory Management ...................................... 3-70

3.6.1 Translation Look-Aside Buffer ..................... 3-70

3.6.2 Address Translation ................................ 3-72

3.6.3 TLB Reload ........................................ 3-76

3.6.4 TLB Entry Invalidation ............................ 3-76

3.6.5 Protection ......................................... 3-77

3.7 Debugging ............................................... 3-78

3.7.1 Trace Facility ...................................... 3-78
<table>
<thead>
<tr>
<th>Chapter 6</th>
<th>Coprocessor Interface</th>
<th>6-1</th>
</tr>
</thead>
<tbody>
<tr>
<td>6.1</td>
<td>Coprocessor Programming</td>
<td>6-1</td>
</tr>
<tr>
<td>6.1.1</td>
<td>Overview of Coprocessor Operations</td>
<td>6-1</td>
</tr>
<tr>
<td>6.1.2</td>
<td>Coprocessor Transfers</td>
<td>6-2</td>
</tr>
<tr>
<td>6.1.3</td>
<td>Coprocessor Exceptions</td>
<td>6-3</td>
</tr>
<tr>
<td>6.1.4</td>
<td>Coprocessor as a System Option</td>
<td>6-4</td>
</tr>
<tr>
<td>6.1.5</td>
<td>Interrupted Coprocessor Operations</td>
<td>6-4</td>
</tr>
<tr>
<td>6.2</td>
<td>Coprocessor Attachment</td>
<td>6-5</td>
</tr>
<tr>
<td>6.2.1</td>
<td>Signal Description</td>
<td>6-5</td>
</tr>
<tr>
<td>6.2.2</td>
<td>Coprocessor Communication</td>
<td>6-7</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Chapter 7</th>
<th>Programming</th>
<th>7-1</th>
</tr>
</thead>
<tbody>
<tr>
<td>7.1</td>
<td>Run-Time Storage Organization and Calling Convention</td>
<td>7-1</td>
</tr>
<tr>
<td>7.1.1</td>
<td>Run-Time Stack Organization and Use</td>
<td>7-1</td>
</tr>
<tr>
<td>7.1.2</td>
<td>Procedure Linkage Conventions</td>
<td>7-7</td>
</tr>
<tr>
<td>7.1.3</td>
<td>Register Usage Convention</td>
<td>7-13</td>
</tr>
<tr>
<td>7.1.4</td>
<td>Example of a Complex Procedure Call</td>
<td>7-14</td>
</tr>
<tr>
<td>7.1.5</td>
<td>Trace-Back Tags</td>
<td>7-15</td>
</tr>
<tr>
<td>7.2</td>
<td>Applications-Programming Considerations</td>
<td>7-16</td>
</tr>
<tr>
<td>7.2.1</td>
<td>Addressing General-Purpose Registers Indirectly</td>
<td>7-16</td>
</tr>
<tr>
<td>7.2.2</td>
<td>Run-time Checking</td>
<td>7-17</td>
</tr>
<tr>
<td>7.2.3</td>
<td>Operating System Calls</td>
<td>7-17</td>
</tr>
<tr>
<td>7.2.4</td>
<td>Multi-Precision Integer Addition and Subtraction</td>
<td>7-18</td>
</tr>
<tr>
<td>7.2.5</td>
<td>Integer Multiplication</td>
<td>7-18</td>
</tr>
<tr>
<td>7.2.6</td>
<td>Integer Division</td>
<td>7-19</td>
</tr>
<tr>
<td>7.2.7</td>
<td>Rounding</td>
<td>7-21</td>
</tr>
<tr>
<td>7.2.8</td>
<td>Fast-Float Mode</td>
<td>7-21</td>
</tr>
<tr>
<td>7.2.9</td>
<td>Complementing a Boolean</td>
<td>7-22</td>
</tr>
<tr>
<td>7.2.10</td>
<td>Using the Floating-Point Accumulators</td>
<td>7-22</td>
</tr>
<tr>
<td>7.2.11</td>
<td>Using the Condition Code Accumulator</td>
<td>7-24</td>
</tr>
<tr>
<td>7.2.12</td>
<td>Generating Large Constants</td>
<td>7-25</td>
</tr>
<tr>
<td>7.2.13</td>
<td>Large Jump and Call Ranges</td>
<td>7-25</td>
</tr>
<tr>
<td>7.2.14</td>
<td>NO-OPs</td>
<td>7-26</td>
</tr>
<tr>
<td>7.2.15</td>
<td>Character-String Operations</td>
<td>7-26</td>
</tr>
<tr>
<td>7.2.16</td>
<td>Movement of Large Data Blocks</td>
<td>7-27</td>
</tr>
<tr>
<td>7.3</td>
<td>Systems-Programming Considerations</td>
<td>7-27</td>
</tr>
<tr>
<td>7.3.1</td>
<td>System Protection</td>
<td>7-27</td>
</tr>
<tr>
<td>7.3.2</td>
<td>Interrupts and Traps</td>
<td>7-28</td>
</tr>
<tr>
<td>7.3.3</td>
<td>Memory Management</td>
<td>7-30</td>
</tr>
<tr>
<td>7.3.4</td>
<td>Restarting Faulting External Accesses</td>
<td>7-34</td>
</tr>
<tr>
<td>7.3.5</td>
<td>Multiple Processor Systems</td>
<td>7-35</td>
</tr>
<tr>
<td>7.3.6</td>
<td>Timer Facility</td>
<td>7-36</td>
</tr>
<tr>
<td>7.4</td>
<td>Pipeline Features Exposed to Software</td>
<td>7-37</td>
</tr>
<tr>
<td>7.4.1</td>
<td>Delayed Branch</td>
<td>7-37</td>
</tr>
<tr>
<td>7.4.2</td>
<td>Overlapped Operations</td>
<td>7-39</td>
</tr>
<tr>
<td>7.4.3</td>
<td>Delayed Effects of Registers</td>
<td>7-41</td>
</tr>
</tbody>
</table>
Chapter 8  Instruction Set ................................................. 8-1
8.1 Instruction-Description Nomenclature .................................. 8-1
8.1.1 Operand Notation and Symbols ...................................... 8-1
8.1.2 Operator Symbols ..................................................... 8-2
8.1.3 Control-Flow Terminology .......................................... 8-3
8.1.4 Assembler Syntax ..................................................... 8-4
8.2 Arithmetic/Logic Status Results of Instructions ................... 8-4
8.2.1 Arithmetic/Logic Status Bits ........................................ 8-4
8.2.2 Arithmetic Operation Status Results .............................. 8-4
8.2.3 Logical Operation Status Results ................................... 8-5
8.2.4 Floating-Point Status ................................................ 8-6
8.3 Instruction Formats ..................................................... 8-6
8.4 Instruction Description ................................................ 8-9
8.5 Instruction Index by Operation Code ................................. 8-137

Appendix A  Channel Operation Timing .................................. A-1
Appendix B  Register Summary .............................................. B-1
Appendix C  Floating-Point Behavior ...................................... C-1
C.1 Timing ................................................................. C-1
C.2 Exceptions ............................................................. C-2
C.2.1 Addition (FADD, DADD) .............................................. C-5
C.2.2 Subtraction (FSUB, DSUB) .......................................... C-6
C.2.3 Multiplication (FMUL, DMUL, FDMUL) ............................ C-7
C.2.4 Division (FDIV, DDIV) ................................................ C-8
C.2.5 Comparison (FEQ, DEQ, FGE, DGE, FGT, DGT) ................ C-10
C.2.6 Multiply-Accumulate (FMAC, DMAC), Multiply-Sum (FMSM, DMSM) C-10
C.2.7 Square Root (SQR) .................................................... C-11
C.2.8 Floating-Point-to-Floating-Point Conversions (CONVERT) ...... C-12
C.2.9 Integer-to-Floating-Point Conversions (CONVERT) .............. C-13
C.2.10 Floating-Point-to-Integer Conversions (CONVERT) ............. C-14
C.2.11 Move From Accumulator (MFACC) ................................ C-15
C.2.12 Move To Accumulator (MTACC) ................................... C-16
C.2.13 Classify (CLASS) ..................................................... C-16
C.2.14 Integer Multiply (MULTIPLY, MULTIPLU, MULTM, MULTMU) C-16
C.2.15 Integer Divide (DIVIDE, DIVIDU) ................................ C-17
C.3 Traps ................................................................. C-17
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 1-1</td>
<td>Simplified System Diagram</td>
<td>1-3</td>
</tr>
<tr>
<td>Figure 2-1</td>
<td>Data-Unit Numbering Conventions</td>
<td>2-6</td>
</tr>
<tr>
<td>Figure 2-2</td>
<td>Am29050 Microprocessor Data Flow</td>
<td>2-14</td>
</tr>
<tr>
<td>Figure 3-1</td>
<td>General-Purpose Register Organization</td>
<td>3-3</td>
</tr>
<tr>
<td>Figure 3-2</td>
<td>Register Bank Organization</td>
<td>3-6</td>
</tr>
<tr>
<td>Figure 3-3</td>
<td>Special-Purpose Registers</td>
<td>3-8</td>
</tr>
<tr>
<td>Figure 3-4</td>
<td>Vector Area Base Address Register</td>
<td>3-9</td>
</tr>
<tr>
<td>Figure 3-5</td>
<td>Current Processor Status Register</td>
<td>3-10</td>
</tr>
<tr>
<td>Figure 3-6</td>
<td>Configuration Register</td>
<td>3-12</td>
</tr>
<tr>
<td>Figure 3-7</td>
<td>Channel Address Register</td>
<td>3-13</td>
</tr>
<tr>
<td>Figure 3-8</td>
<td>Channel Data Register</td>
<td>3-14</td>
</tr>
<tr>
<td>Figure 3-9</td>
<td>Channel Control Register</td>
<td>3-14</td>
</tr>
<tr>
<td>Figure 3-10</td>
<td>Register Bank Protect Register</td>
<td>3-15</td>
</tr>
<tr>
<td>Figure 3-11</td>
<td>Timer Counter Register</td>
<td>3-16</td>
</tr>
<tr>
<td>Figure 3-12</td>
<td>Timer Reload Register</td>
<td>3-16</td>
</tr>
<tr>
<td>Figure 3-13</td>
<td>Program Counter 0 Register</td>
<td>3-17</td>
</tr>
<tr>
<td>Figure 3-14</td>
<td>Program Counter 1 Register</td>
<td>3-17</td>
</tr>
<tr>
<td>Figure 3-15</td>
<td>Program Counter 2 Register</td>
<td>3-18</td>
</tr>
<tr>
<td>Figure 3-16</td>
<td>MMU Configuration Register</td>
<td>3-18</td>
</tr>
<tr>
<td>Figure 3-17</td>
<td>LRU Recommendation Register</td>
<td>3-19</td>
</tr>
<tr>
<td>Figure 3-18</td>
<td>Reason Vector Register</td>
<td>3-19</td>
</tr>
<tr>
<td>Figure 3-19</td>
<td>Region Mapping Address 0 Register</td>
<td>3-20</td>
</tr>
<tr>
<td>Figure 3-20</td>
<td>Region Mapping Control 0 Register</td>
<td>3-20</td>
</tr>
<tr>
<td>Figure 3-21</td>
<td>Shadow Program Counter 0 Register</td>
<td>3-22</td>
</tr>
<tr>
<td>Figure 3-22</td>
<td>Shadow Program Counter 1 Register</td>
<td>3-22</td>
</tr>
<tr>
<td>Figure 3-23</td>
<td>Shadow Program Counter 2 Register</td>
<td>3-22</td>
</tr>
<tr>
<td>Figure 3-24</td>
<td>Instruction Breakpoint Address 0 Register</td>
<td>3-23</td>
</tr>
<tr>
<td>Figure 3-25</td>
<td>Instruction Breakpoint Control 0 Register</td>
<td>3-23</td>
</tr>
<tr>
<td>Figure 3-26</td>
<td>Indirect Pointer C Register</td>
<td>3-24</td>
</tr>
<tr>
<td>Figure 3-27</td>
<td>Indirect Pointer A Register</td>
<td>3-25</td>
</tr>
<tr>
<td>Figure 3-28</td>
<td>Indirect Pointer B Register</td>
<td>3-25</td>
</tr>
<tr>
<td>Figure 3-29</td>
<td>Q Register</td>
<td>3-26</td>
</tr>
<tr>
<td>Figure 3-30</td>
<td>ALU Status Register</td>
<td>3-26</td>
</tr>
<tr>
<td>Figure 3-31</td>
<td>Byte Pointer Register</td>
<td>3-27</td>
</tr>
<tr>
<td>Figure 3-32</td>
<td>Funnel Shift Count Register</td>
<td>3-27</td>
</tr>
<tr>
<td>Figure 3-33</td>
<td>Load/Store Count Remaining Register</td>
<td>3-28</td>
</tr>
<tr>
<td>Figure 3-34</td>
<td>Floating-Point Environment Register</td>
<td>3-28</td>
</tr>
<tr>
<td>Figure 3-35</td>
<td>Integer Environment Register</td>
<td>3-29</td>
</tr>
<tr>
<td>Figure 3-36</td>
<td>Floating-Point Status</td>
<td>3-30</td>
</tr>
<tr>
<td>Figure 3-37</td>
<td>Exception Opcode Register</td>
<td>3-32</td>
</tr>
<tr>
<td>Figure 3-38</td>
<td>Translation Look-Aside Buffer Registers</td>
<td>3-32</td>
</tr>
<tr>
<td>Figure 3-39</td>
<td>TLB Entry Word 0 Register</td>
<td>3-33</td>
</tr>
<tr>
<td>Figure 3-40</td>
<td>TLB Entry Word 1</td>
<td>3-34</td>
</tr>
<tr>
<td>Figure 3-41</td>
<td>Character Format</td>
<td>3-43</td>
</tr>
<tr>
<td>Figure 3-42</td>
<td>Half-Word Format</td>
<td>3-44</td>
</tr>
<tr>
<td>Figure 3-43</td>
<td>Single-Precision Floating-Point Format</td>
<td>3-45</td>
</tr>
<tr>
<td>Figure 3-44</td>
<td>Double-Precision Floating-Point Format</td>
<td>3-45</td>
</tr>
<tr>
<td>Figure 3-45</td>
<td>Load/Store Instruction Format</td>
<td>3-47</td>
</tr>
<tr>
<td>Figure 3-46</td>
<td>Non-Coprocessor Load/Store Format</td>
<td>3-47</td>
</tr>
<tr>
<td>Figure 3-47</td>
<td>Byte and Half-Word Addressing with BO = 0 (Big Endian)</td>
<td>3-52</td>
</tr>
<tr>
<td>Figure 3-48</td>
<td>Byte and Half-Word Addressing with BO = 1 (Little Endian)</td>
<td>3-53</td>
</tr>
<tr>
<td>Figure 3-49</td>
<td>Vector Table Entry</td>
<td>3-59</td>
</tr>
<tr>
<td>Figure 3-50</td>
<td>Current Processor Status After an Interrupt or Trap</td>
<td>3-62</td>
</tr>
<tr>
<td>Figure 3-51</td>
<td>Current Processor Status Before Interrupt Return</td>
<td>3-62</td>
</tr>
<tr>
<td>Figure 3-52</td>
<td>Translation Look-Aside Buffer Organization</td>
<td>3-71</td>
</tr>
<tr>
<td>Figure 3-53</td>
<td>Virtual Address for 1, 2, 4, and 8 kb Pages</td>
<td>3-73</td>
</tr>
<tr>
<td>Figure</td>
<td>Description</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>-------------</td>
<td>------</td>
</tr>
<tr>
<td>3-54</td>
<td>TLB Address Translation Process</td>
<td>3-74</td>
</tr>
<tr>
<td>3-55</td>
<td>Current Processor Status Register In Reset Mode</td>
<td>3-81</td>
</tr>
<tr>
<td>3-56</td>
<td>Floating-Point Environment Register in Reset Mode</td>
<td>3-81</td>
</tr>
<tr>
<td>4-1</td>
<td>Am29050 Microprocessor Data Flow</td>
<td>4-1</td>
</tr>
<tr>
<td>4-2</td>
<td>IPB State Transitions</td>
<td>4-4</td>
</tr>
<tr>
<td>4-3</td>
<td>Branch Target Cache Memory Organization (CO = 0)</td>
<td>4-6</td>
</tr>
<tr>
<td>4-4</td>
<td>Branch Target Cache Memory Organization (CO = 1)</td>
<td>4-7</td>
</tr>
<tr>
<td>4-5</td>
<td>Branch Target Cache Memory Lookup Process (CO = 0)</td>
<td>4-9</td>
</tr>
<tr>
<td>4-6</td>
<td>Program Counter Unit</td>
<td>4-11</td>
</tr>
<tr>
<td>4-7</td>
<td>Register File and Register Address Generator</td>
<td>4-13</td>
</tr>
<tr>
<td>4-8</td>
<td>Address Unit</td>
<td>4-15</td>
</tr>
<tr>
<td>4-9</td>
<td>PAC Entry Word 0</td>
<td>4-17</td>
</tr>
<tr>
<td>4-10</td>
<td>PAC Entry Word 1</td>
<td>4-17</td>
</tr>
<tr>
<td>4-11</td>
<td>Floating-Point Unit</td>
<td>4-20</td>
</tr>
<tr>
<td>5-1</td>
<td>Channel Flowchart</td>
<td>5-9</td>
</tr>
<tr>
<td>5-2</td>
<td>Processor Burst-Mode Instruction Accesses: Control Flow</td>
<td>5-12</td>
</tr>
<tr>
<td>5-3</td>
<td>Slave Burst-Mode Instruction Accesses: Control Flow</td>
<td>5-13</td>
</tr>
<tr>
<td>5-4</td>
<td>Processor Burst-Mode Data Accesses: Control Flow</td>
<td>5-14</td>
</tr>
<tr>
<td>5-5</td>
<td>Slave Burst-Mode Data Accesses: Control Flow</td>
<td>5-15</td>
</tr>
<tr>
<td>5-6</td>
<td>Valid Transitions on CNTL(1-0) Inputs</td>
<td>5-23</td>
</tr>
<tr>
<td>5-7</td>
<td>Processor Status While in Load Test Instruction Mode</td>
<td>5-25</td>
</tr>
<tr>
<td>6-1</td>
<td>Coprocessor Load/Store Format</td>
<td>6-2</td>
</tr>
<tr>
<td>6-2</td>
<td>Coprocessor Attachment</td>
<td>6-6</td>
</tr>
<tr>
<td>7-1</td>
<td>Run-Time Stack Example</td>
<td>7-2</td>
</tr>
<tr>
<td>7-2</td>
<td>An Activation Record in the Register Stack</td>
<td>7-3</td>
</tr>
<tr>
<td>7-3</td>
<td>Relationship of Stack Cache and Register Stack</td>
<td>7-5</td>
</tr>
<tr>
<td>7-4</td>
<td>Stack Overflow</td>
<td>7-6</td>
</tr>
<tr>
<td>7-5</td>
<td>Stack Underflow</td>
<td>7-6</td>
</tr>
<tr>
<td>7-6</td>
<td>Definition of size and rsize Values</td>
<td>7-9</td>
</tr>
<tr>
<td>7-7</td>
<td>Trace-Back Tags</td>
<td>7-15</td>
</tr>
<tr>
<td>8-1</td>
<td>Instruction Format</td>
<td>8-7</td>
</tr>
<tr>
<td>8-2</td>
<td>Frequently Occurring Instruction Field Uses</td>
<td>8-8</td>
</tr>
<tr>
<td>8-3</td>
<td>Instruction-Description Format</td>
<td>8-9</td>
</tr>
<tr>
<td>A-1</td>
<td>Instruction Read—Simple Access</td>
<td>A-3</td>
</tr>
<tr>
<td>A-2</td>
<td>Instruction Read—Simple Access with IRDY Delayed</td>
<td>A-4</td>
</tr>
<tr>
<td>A-3</td>
<td>Instruction Read—Pipelined Access</td>
<td>A-5</td>
</tr>
<tr>
<td>A-4</td>
<td>Instruction Read—Establishing Burst-Mode Access</td>
<td>A-6</td>
</tr>
<tr>
<td>A-5</td>
<td>Instruction Read—Burst-Mode Access Suspended by Slave</td>
<td>A-7</td>
</tr>
<tr>
<td>A-6</td>
<td>Instruction Read—Burst-Mode Access Suspended by Master</td>
<td>A-8</td>
</tr>
<tr>
<td>A-7</td>
<td>Instruction Read—Burst-Mode Access Preempted by Slave</td>
<td>A-9</td>
</tr>
<tr>
<td>A-8</td>
<td>Instruction Read—Burst-Mode Access Suspended by Master and Later Preempted by Slave</td>
<td>A-10</td>
</tr>
<tr>
<td>A-9</td>
<td>Instruction Read—Burst-Mode Access Canceled by Slave</td>
<td>A-11</td>
</tr>
<tr>
<td>A-10</td>
<td>Instruction Read—Burst-Mode Access Ended by Master (Preempted, Terminated or Canceled)</td>
<td>A-12</td>
</tr>
<tr>
<td>A-11</td>
<td>Instruction Read—TLB Miss or Protection Violation</td>
<td>A-13</td>
</tr>
<tr>
<td>A-12</td>
<td>Instruction Read—Pipelined Access with TLB Miss or Protection Violation</td>
<td>A-14</td>
</tr>
<tr>
<td>A-13</td>
<td>Instruction Read—Error Detected by Slave</td>
<td>A-15</td>
</tr>
<tr>
<td>A-14</td>
<td>Data Read—Simple Access</td>
<td>A-16</td>
</tr>
<tr>
<td>A-15</td>
<td>Data Write—Simple Access</td>
<td>A-17</td>
</tr>
<tr>
<td>A-16</td>
<td>Data Read—Simple Access with DRDY Delayed</td>
<td>A-18</td>
</tr>
<tr>
<td>A-17</td>
<td>Data Write—Simple Access with DRDY Delayed</td>
<td>A-19</td>
</tr>
<tr>
<td>A-18</td>
<td>Data Read Followed by Data Write—Simple Access</td>
<td>A-20</td>
</tr>
<tr>
<td>A-19</td>
<td>Load and Set Instruction</td>
<td>A-21</td>
</tr>
<tr>
<td>A-20</td>
<td>Data Read—Pipelined Access</td>
<td>A-22</td>
</tr>
<tr>
<td>A-21</td>
<td>Data Write—Pipelined Access</td>
<td>A-23</td>
</tr>
<tr>
<td>Figure</td>
<td>Description</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>------------------------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>Figure A-22</td>
<td>Data Read Followed by Data Write—Pipelined Access (Not Used by Processor)</td>
<td>A-24</td>
</tr>
<tr>
<td>Figure A-23</td>
<td>Data Write Followed by Data Read—Pipelined Access</td>
<td>A-25</td>
</tr>
<tr>
<td>Figure A-24</td>
<td>Data Read—Establishing Burst-Mode Access</td>
<td>A-26</td>
</tr>
<tr>
<td>Figure A-25</td>
<td>Data Write—Establishing Burst-Mode Access</td>
<td>A-27</td>
</tr>
<tr>
<td>Figure A-26</td>
<td>Data Read—Burst-Mode Access Suspended by Slave</td>
<td>A-28</td>
</tr>
<tr>
<td>Figure A-27</td>
<td>Data Write—Burst-Mode Access Suspended by Slave</td>
<td>A-29</td>
</tr>
<tr>
<td>Figure A-28</td>
<td>Data Read—Burst-Mode Access Suspended by Master (Not Used by Processor)</td>
<td>A-30</td>
</tr>
<tr>
<td>Figure A-29</td>
<td>Data Write—Burst-Mode Access Suspended by Master (Not Used by Processor)</td>
<td>A-31</td>
</tr>
<tr>
<td>Figure A-30</td>
<td>Data Read—Burst-Mode Access Preempted by Slave</td>
<td>A-32</td>
</tr>
<tr>
<td>Figure A-31</td>
<td>Data Write—Burst-Mode Access Preempted by Slave</td>
<td>A-33</td>
</tr>
<tr>
<td>Figure A-32</td>
<td>Data Read—Burst-Mode Access Suspended by Master and Later Preempted by Slave (Not Used by Processor)</td>
<td>A-34</td>
</tr>
<tr>
<td>Figure A-33</td>
<td>Data Write—Burst-Mode Access Suspended by Master and Later Preempted by Slave (Not Used by Processor)</td>
<td>A-35</td>
</tr>
<tr>
<td>Figure A-34</td>
<td>Data Read—Burst-Mode Access Canceled by Slave</td>
<td>A-36</td>
</tr>
<tr>
<td>Figure A-35</td>
<td>Data Write—Burst-Mode Access Canceled by Slave</td>
<td>A-37</td>
</tr>
<tr>
<td>Figure A-36</td>
<td>Data Read—Burst-Mode Access Ended by Master (Preempted, Terminated, or Canceled)</td>
<td>A-38</td>
</tr>
<tr>
<td>Figure A-37</td>
<td>Data Write—Burst-Mode Access Ended by Master (Preempted, Terminated, or Canceled)</td>
<td>A-39</td>
</tr>
<tr>
<td>Figure A-38</td>
<td>Data Read—TLB Miss or Protection Violation</td>
<td>A-40</td>
</tr>
<tr>
<td>Figure A-39</td>
<td>Data Write—TLB Miss or Protection Violation</td>
<td>A-41</td>
</tr>
<tr>
<td>Figure A-40</td>
<td>Data Read—Pipelined Access with TLB Miss or Protection Violation</td>
<td>A-42</td>
</tr>
<tr>
<td>Figure A-41</td>
<td>Data Write—Pipelined Access with TLB Miss or Protection Violation</td>
<td>A-43</td>
</tr>
<tr>
<td>Figure A-42</td>
<td>Data Read—Error Detected by Slave</td>
<td>A-44</td>
</tr>
<tr>
<td>Figure A-43</td>
<td>Data Write—Error Detected by Slave</td>
<td>A-45</td>
</tr>
<tr>
<td>Figure A-44</td>
<td>Channel Transfer from Processor to External Master</td>
<td>A-46</td>
</tr>
<tr>
<td>Figure A-45</td>
<td>Channel Transfer from External Master to Processor</td>
<td>A-47</td>
</tr>
<tr>
<td>Figure B-1</td>
<td>General-Purpose Register Organization</td>
<td>B-1</td>
</tr>
<tr>
<td>Figure B-2</td>
<td>Register Bank Organization</td>
<td>B-2</td>
</tr>
<tr>
<td>Figure B-3</td>
<td>Special-Purpose Registers</td>
<td>B-3</td>
</tr>
<tr>
<td>Figure B-4</td>
<td>Special-Purpose Registers</td>
<td>B-8</td>
</tr>
<tr>
<td>Figure B-5</td>
<td>Translation Look-Aside Buffer Entries</td>
<td>B-8</td>
</tr>
</tbody>
</table>
LIST OF TABLES

Table 2-1  Am29050 Microprocessor Instruction Set ................................................. 2-7
Table 3-1  Integer Arithmetic Instructions ............................................................... 3-36
Table 3-2  Compare Instructions .................................................................................. 3-37
Table 3-3  Logical Instructions ..................................................................................... 3-38
Table 3-4  Shift Instructions ......................................................................................... 3-38
Table 3-5  Data Movement Instructions ........................................................................ 3-39
Table 3-6  Constant Instructions .................................................................................. 3-39
Table 3-7  Floating-Point Instructions ......................................................................... 3-41
Table 3-8  Branch Instructions .................................................................................... 3-42
Table 3-9  Miscellaneous Instructions ......................................................................... 3-42
Table 3-10 Vector Number Assignments ...................................................................... 3-61
Table 3-11 Interrupt and Trap Priority Table .............................................................. 3-67
Table 3-12 Access Protection ....................................................................................... 3-77
Table 4-1 Staging of Floating-Point Operations ......................................................... 4-21
Table A-1 Signal Summary ............................................................................................ A-1
Table B-1 Register Field Summary .............................................................................. B-9
Table C-1 Latency of Floating-Point and Integer Multiply Operations ....................... C-1
Table C-2 Issue Rate of Floating-Point Operations ...................................................... C-2
Table C-3 Effect on Latency of Denormalized Source Operands or Results ................. C-3
INTRODUCTION AND OVERVIEW

DESIGN PHILOSOPHY

The Am29050™ Streamlined Instruction Processor is the result of a design philosophy that recognizes that processor performance must be considered in light of the processor's hardware and software environment. The key to maximizing performance lies in the realization that the processor is part of an integrated system, and is itself a collection of components that must be properly integrated.

Processor features must be considered not only on their own merits, but also in relation to other components of the system. A particular feature that—considered alone—increases one aspect of processor performance may actually decrease the performance of the total system, because of the burden that it places elsewhere in the system. As an illustration, consider the factors involved in the execution time of any processor task:

\[
\text{TASK TIME} = \frac{\text{INSTRUCTIONS / TASK} \times \text{CYCLES / INSTRUCTION}}{\text{TIME / CYCLE}}
\]

To minimize the time taken, it is necessary to minimize the above product. This is not equivalent to minimizing all of the terms that contribute to the product; this, in fact, is generally not possible due to the interaction of the terms.

As an example of the interaction of the above terms, consider the number of instructions required for a task. An attempt to minimize this number, a more or less traditional approach to processor architecture design, increases the average number of cycles required for the execution of an instruction, because of the increased number of operations performed by each instruction. In addition, cycle time is increased because of instruction-decode time.

A second example of the interaction in the above equation appears in an attempt to reduce the cycle time through the pipelining of operations. In theory, the cycle time can be made arbitrarily small by the definition of an arbitrarily large number of pipeline stages. In practice—at least in the case of general-purpose processors—pipelining rarely yields much of its potential benefit. This is due to situations where the pipeline cannot be kept fully occupied, such as when storage references and branches occur. In these situations, additional pipeline stages increase the number of cycles required for an operation, and thus affect the CYCLES / INSTRUCTION term.

OPTIMUM PERFORMANCE

Each of the terms in the above equation has some minimum bound for a given implementation technology and task. In general, this minimum bound cannot be approached without an offsetting increase in the other terms, making the overall product less-than-optimum. The question then arises, what combination of terms does yield an optimum product? There are several things to note when answering this question.

The first observation is that the number of operations underlying a given task is more or less fixed. Any single processor ultimately limits the time required for a task because it has a single execution unit and a single instruction stream. The operations
that must be performed are reflected in the INSTRUCTIONS / TASK and CYCLES / INSTRUCTION terms. These operations may be performed by relatively few instructions, where each instruction takes multiple cycles to execute, or by a larger number of instructions, where each takes a single cycle to execute. In the first case, the instructions are complex; in the second, they are simple.

The point is that the trade-off between simple and complex instructions is not one-to-one. For example, reducing the number of cycles per instruction by a factor of three does not increase the number of instructions per task by the same factor. There are two reasons for this. The first is that, even when an instruction set supports complex operations, a large proportion of the instructions that are executed perform operations that could be performed as well by simple instructions. The second is that simple instructions expose more of the internal processor operation to an optimizing compiler. This allows the compiler to tailor the organization and sequence of operations to the task at hand, thereby reducing the total number of instructions executed.

PERFORMANCE LEVERAGE

Another important observation is that there is a tremendous amount of leverage in the TIME / CYCLE and CYCLES / INSTRUCTION terms. As they are made smaller, they have a proportionately greater effect on performance.

For example, a reduction of 10 ns in the cycle time of a processor operating with a 200-ns cycle time yields an increase of 5% in the processor's performance. The same improvement in a processor operating with a 50-ns cycle time yields a 20% increase in performance.

Correspondingly, a reduction of 0.2 in the number of cycles per instruction in a processor that averages 5 cycles per instruction yields a 4% increase in performance. However, the same reduction yields a 12.5% performance increase in a processor that averages 1.6 cycles per instruction.

CONCLUSION

The conclusion is that it is possible—and desirable—to yield somewhat in the number of instructions executed for a given task, and more than make up for the performance impact of this increase by reductions in the cycle time and in the number of cycles per instruction. For example, if both the cycle time and the number of cycles per instruction are reduced by a factor of three, while the number of instructions for a given task is allowed to grow by 50%, the resulting task time is reduced by a factor of 6.

The Am29050 microprocessor architecture was designed with the above effects in mind. Maximum performance is obtained by the optimization of the product of the number of instructions per task, the number of cycles per instruction, and the cycle time, not by minimizing one factor at the expense of the others. This is accomplished by careful definition of all processor components. In particular:

1. The INSTRUCTION / TASK term is optimized by the definition of simple instructions. The processor provides an efficient instruction set and a large number of general-purpose registers to an optimizing, high-level language compiler. Most reductions in this term are accomplished by the compiler. The number of instructions for a given task may be greater than the number of instructions for processors with complex instruction sets. However, this increase is more than offset by other improvements in processor performance.

2. The CYCLES / INSTRUCTION term is optimized by the data-flow structure and performance-enhancing features of the processor. A large amount of processor
hardware is dedicated to achieving an average instruction-execution rate that is close to single-cycle execution.

3. The TIME / CYCLE term is optimized by the implementation technology, the processor system interface, and judicious use of pipelining. The simplicity of the instruction set and processor features helps minimize the cycle time.

**Am29050 MICROPROCESSOR USER MANUAL OVERVIEW**

This manual contains information on the Am29050 processor that is essential for computer hardware and software architects and system design engineers. Additional information is available in the form of data sheets, application notes, and other documentation that is provided with software products and hardware-development tools.

The information in this manual is organized into eight chapters, each viewing the processor from a different perspective, and each with a specific objective.

Chapter 1 introduces the features and performance aspects of the Am29050 microprocessor.

Chapter 2 contains brief technical descriptions of the processor architecture and implementation.

Chapter 3 describes the details of the Am29050 microprocessor architecture.

Chapter 4 details the operation of the processor's internal functional units.

Chapter 5 describes the operation of the external interfaces of the Am29050 microprocessor.

Chapter 6 describes the attachment and use of coprocessors for the Am29050 microprocessor.

Chapter 7 discusses the implementation of software systems for the processor, focusing on programming features that deserve more coverage than is provided by other chapters.

Chapter 8 specifies the instruction set of the Am29050 microprocessor. It describes the instruction formats in detail, and provides a detailed description of every instruction.

This manual is organized around readers' concerns and objectives. Each chapter focuses on a particular aspect of the processor, and is organized so that it may be read independently, insofar as possible.

For those readers desiring only a brief overview of the Am29050 microprocessor, Chapters 1 and 2 identify the outstanding features of the processor, and give a brief overview of the processor. These chapters address both software and hardware concerns.

For software architects and system programmers interested mainly in software-related issues, Chapters 3, 7, and 8 provide the necessary information.

For hardware architects and systems hardware designers interested mainly in hardware-related issues, Chapters 4 and 5 provide most of the required information; Chapter 8 also provides some related information.

For those readers interested in the coprocessor interface, Chapter 6 describes the interface both from a software and hardware point-of-view.
This chapter provides an evaluation of the Am29050 microprocessor as an aid in considering a particular application. A detailed technical description of the Am29050 microprocessor is contained in subsequent chapters. This chapter informally describes the features of the processor, concentrating on features which distinguish the Am29050 microprocessor from other available processors.

### DISTINCTIVE CHARACTERISTICS

- Full 32-bit architecture
- Double-precision, Floating-Point Arithmetic Unit on-chip
- CMOS technology/TTL-compatible
- 32 million instructions per second sustained at a 40-MHz operating frequency
- 1.25 clock cycles per instruction average
- 4-Gb virtual address space
- 192 general-purpose registers
- Three-address instruction architecture
- Non-multiplexed, pipelined address, instruction and data buses
- Concurrent instruction and data accesses
- Burst-mode access support
- 1024-byte Branch Target Cache™ memory
- 4-entry Physical Address Cache memory
- 64-entry Memory Management Unit on-chip
- Demand paging
- Fully pipelined
- On-chip Timer Facility
- On-chip clock generation
- Enhanced debugging support
- Master/slave chip output checking

### INTRODUCTION

The Am29050 Streamlined Instruction Processor is a high-performance, general-purpose, 32-bit microprocessor implemented in complementary metal-oxide semiconductor (CMOS) technology. It supports a variety of applications, using a flexible architecture and rapid execution of simple instructions which are common to a wide range of tasks.
The Am29050 microprocessor extends the 29K™ Family of processors with a high-performance, pipelined, on-chip floating-point unit. The floating-point unit performs IEEE-compatible, single-precision and double-precision arithmetic at a peak rate of 80 million floating-point operations per second (MFLOPS) at 40 MHz. The Am29050 microprocessor also has features to improve the performance of loads and branches, allowing sustained integer performance of 32 million instructions per second (MIPS) at 40 MHz.

The Am29050 microprocessor is fully hardware- and software-compatible with the Am29000™ microprocessor. It can be used in existing Am29000 microprocessor applications without hardware or software modifications. It can bring a dramatic increase in performance to floating-point-intensive applications, particularly graphics and laser-printer applications.

The Am29050 microprocessor is packaged in a 169-pin, pin-grid-array (PGA) package, with 141 signal pins, 27 power and ground pins, and one alignment pin. A representative system diagram is shown in Figure 1-1.

1.3 PERFORMANCE OVERVIEW

The Am29050 microprocessor provides a significant margin of performance over other processors in its class, since the majority of processor features were defined with the maximum achievable performance in mind. This section describes the features of the Am29050 microprocessor from the point-of-view of system performance.

1.3.1 Cycle Time

The Am29050 microprocessor is implemented in CMOS technology, with a 0.8 micron effective transistor-channel length. This technology allows the processor to operate at a frequency of 40 MHz. The processor cycle time is a single, 25-ns clock period. The processor interface drivers can drive 80-pF loads at this frequency.

1.3.2 Four-Stage Pipeline

The Am29050 microprocessor utilizes a four-stage pipeline for integer operations, allowing it to execute one integer instruction every clock cycle. The processor can complete an instruction on every cycle, even though four cycles are required from the beginning of an instruction to its completion.

Floating-point operations are pipelined to a depth determined by the operation latency, and are overlapped with integer operations. A floating-point operation and an integer operation can complete at the same time without stalling the pipeline.

At a 40-MHz operating frequency, the maximum instruction execution rate is 40 million instructions per second (MIPS). For most other processors, the maximum MIPS rate has little meaning, because it can be achieved only under special circumstances. However, the Am29050 microprocessor pipeline is designed so that the Am29050 microprocessor can operate at the maximum instruction-execution rate a significant portion of the time.

Pipeline interlocks are implemented by processor hardware, including those required for floating-point operations. Except for a few special cases, it is not necessary to re-arrange programs to avoid pipeline dependencies, although this is sometimes desirable for performance.
1.3.3 System Interface

One of the most difficult tasks in the definition of a high-speed microprocessor is the definition of an off-chip interface which supports the operating frequency of the processor, and does not restrict the ability of the processor to fetch instructions and data. If the external interface of a microprocessor cannot support an instruction fetch rate of one instruction every cycle, there is little prospect that the processor will execute at this rate, even though it supports such a rate internally.
The Am29050 microprocessor accesses external instructions and data using three non-multiplexed buses. These buses are referred to collectively as the channel. The channel protocol minimizes the logic chains involved in a transfer, and provides a maximum transfer rate of 320 Mb/s at 40 MHz.

1.3.3.1  SEPARATE ADDRESS, INSTRUCTION, AND DATA BUSES
The Am29050 microprocessor incorporates two 32-bit buses for instruction and data transfers, and a third address bus which is shared between instruction and data accesses. This bus structure allows simultaneous instruction and data transfers, even though the address bus is shared. The channel achieves the performance of four separate 32-bit buses at a much reduced pin count.

1.3.3.2  PIPELINED ADDRESSES
The Am29050 microprocessor address bus is pipelined, so that it can be released before an instruction or data transfer is completed. This allows a subsequent access to begin before the first has completed, and allows the processor to have two accesses in progress simultaneously.

1.3.3.3  SUPPORT OF BURST DEVICES AND MEMORIES
Burst-mode accesses provide high transfer rates for instructions and data at sequential addresses. For such accesses, the address of the first instruction or datum is sent, and subsequent requests for instructions or data at sequential addresses do not require additional address transfers. These instructions or data are transferred until either party involved in the transfer terminates the access.

Burst-mode accesses can occur at the rate of one access per cycle after the first address has been processed. At 40 MHz, the maximum achievable transfer bandwidth for either instructions or data is 160 Mb/s.

Burst-mode accesses may occur to input/output devices, if the system design permits.

1.3.3.4  INTERFACE TO FAST DEVICES AND MEMORIES
The processor can be interfaced to devices and memories which complete accesses within one cycle. The channel protocol takes maximum advantage of such devices and memories by allowing data to be returned to the processor during the cycle in which the address is transmitted. This allows a full range of memory-speed trade-offs to be made within a particular system.

1.3.4  Register File
An on-chip Register File containing 192 general-purpose registers allows most instruction operands to be fetched without the delay of an external access. The Register File incorporates several features which aid the retention of data required by an executing program. Because of the number of general-purpose registers, the frequency of external references for the Am29050 microprocessor is significantly lower than the frequency of references in processors having only 16 or 32 registers.

Four-port access to the Register File allows two 64-bit source-operands to be fetched, in one cycle, while two previously computed results are written; one write port is for integer operations, and the other port is for floating-point operations. Four 64-bit internal buses prevent contention in the routing of operands. All operand fetches and result write-backs for instruction execution can be performed in a single cycle.

The registers allow efficient procedure linkage, by caching a portion of a compiler's run-time stack. On the average, procedure calls and returns can be executed 5 to 10 times faster (on a cycle-by-cycle basis) than in processors which require the
implementation of a run-time stack in external memory (with the attendant loading and storing of registers on procedure call and return).

### 1.3.5 Instruction Execution

The Am29050 microprocessor uses an Arithmetic/Logic Unit, a Field Shift Unit, and a Prioritizer to execute most instructions. Each of these is organized to operate on 32-bit operands, and provide a 32-bit result. All operations are performed in a single cycle.

Floating-point operations are performed in an on-chip Floating-Point Unit. The floating-point unit performs 32-bit, single-precision and 64-bit, double-precision computations. Most of the time, floating-point operations are performed in parallel with integer operations and other floating-point operations.

Instruction operations are overlapped with operand fetch and result write-back to the Register File. Pipeline forwarding logic detects pipeline dependencies and routes data as required, avoiding delays which might arise from these dependencies.

### 1.3.6 Branch Target Cache Memory

In general, the Am29050 microprocessor meets its instruction bandwidth requirements via instruction prefetching. However, instruction prefetching is ineffective when a branch occurs. The Am29050 microprocessor therefore incorporates a 64- or 128-entry (configurable at run time) Branch Target Cache memory to supply instructions for a branch—if this branch has been taken previously—while a new prefetch stream is established.

If branch-target instructions are in the Branch Target Cache memory, branches execute in a single cycle. This has a very positive effect on processor performance, due to the amount of time the processor could otherwise be idle waiting for the new instruction stream.

As an example, consider that successful branches are 20% of a dynamic instruction mix, and that five cycles are required to restart the processor pipeline after a branch. For 20% of the instructions, the processor would take one cycle to execute the branch instruction and wait five cycles to refill the instruction pipeline. The overhead of branch instructions would be six cycles. If the remaining 80% of the instructions require a single cycle to execute, the latency involved in branching would reduce the average execution rate from one cycle per instruction to two, thus halving the performance.

The Branch Target Cache memory in the Am29050 microprocessor has an average hit rate of 80%. In other words, it eliminates the branch latency for 80% of all successful branches on the average.

### 1.3.7 Branching

Branch conditions in the Am29050 microprocessor can be based on Boolean data contained in general-purpose registers, as well as on arithmetic condition codes. Using a condition-code register for the purpose of branching can inhibit certain compiler optimizations, because the condition-code register can typically be modified by many different instructions. It can be difficult for an optimizing compiler to schedule this shared use. Since it can treat branch conditions like any other instruction operands, the Am29050 microprocessor avoids this problem.
The Am29050 microprocessor executes branches in a single cycle, for those cases where the target of the branch is in the Branch Target Cache memory. The single-cycle branch is unusual for a pipelined processor, and is due to processor hardware which allows much of the branch instruction operation to be performed early in the execution of the branch. Single-cycle branching has a dramatic effect on performance, since successful branches typically represent 15% to 25% of a processor's instruction mix.

The techniques used to achieve single-cycle branching also minimize the execution time of branches in those cases where the target is not in the Branch Target Cache memory. To keep the pipeline operating at the maximum rate, the instruction following the branch, referred to as the delay instruction, is executed regardless of the outcome of the branch. An optimizing compiler can define a useful instruction for the delay instruction in approximately 90% of branch instructions, thereby increasing the performance of branches.

1.3.8 Loads and Stores

The performance degradation of load and store operations is minimized in the Am29050 microprocessor by overlapping them with instruction execution, by taking advantage of pipelining, and by organizing the flow of external data onto the processor so that the impact of external accesses is minimized.

1.3.8.1 OVERLAPPED LOADS AND STORES

In the Am29050 microprocessor, a load or store is performed concurrently with execution of instructions which do not have dependencies on the load or store operation. An optimizing compiler can schedule loads and stores in the instruction sequence so that, in most cases, data accesses are overlapped with instruction execution.

Overlapped load and store operations can achieve up to a 30% improvement in performance when data memory has a two-cycle access time. Processor hardware detects dependencies while overlapped loads and stores are being performed, so dependencies have no software implications.

A classical problem in the implementation of overlapped loads and stores is that of dealing with address-translation exceptions in a demand-paged environment. Overlap is not possible if any load or store which encounters an address-translation exception must be restarted by the re-execution of the initiating instruction. In this case, the processor would have to hold instruction execution until the success of every load or store were insured. The Am29050 microprocessor exception restart mechanism automatically saves information required to restart any load or store, until the operation successfully completes. Thus, it allows the overlapped execution of loads and stores while properly handling address-translation exceptions.

A second problem in the implementation of overlapped loads concerns the handling of data which is returned to the processor upon completion of the load. This data must be written to the register file, but it contends for register-file write-cycles with other instructions which are being overlapped with the load. This contention may be eliminated by adding a special write port to the register file. However, due to the size of the register file in the Am29050 microprocessor, a fifth port for writing incoming load data is not economical.

The Am29050 microprocessor data-flow organization avoids the one-cycle penalty which would result from the contention between load data and the results of overlapped instruction execution. Load data is buffered in a latch while awaiting an opportunity to be written into the register file. This opportunity is guaranteed to arise before
the next load is executed. While the data is buffered in this latch, it may be used as an instruction operand in place of the destination register for the load.

**1.3.8.2 EARLY LOADS**

The early load feature, incorporating a 4-entry Physical Address Cache memory, speeds up the execution of load operations by making the physical address of the load available at the end of the decode cycle of the load instruction. At the beginning of the next cycle, when the load enters the execute stage, the physical address appears on the channel. In effect, early loads reduce the effective access time of the external memory by one cycle.

**1.3.8.3 LOAD MULTIPLE AND STORE MULTIPLE**

These instructions allow the transfer of the contents of multiple registers to or from external memories or devices. This transfer can occur at a rate of one register-content per cycle.

The advantage of Load Multiple and Store Multiple is best seen in task switching, register-file saving and restoring, and in block data moves. In many systems, such operations require a significant percentage of execution time.

The load-multiple and store-multiple sequences are interruptible, so that they do not affect interrupt latency.

**1.3.8.4 FORWARDING OF LOAD DATA**

Data which is sent to the processor at the completion of a load is forwarded directly to the appropriate execution unit if the data is required immediately by an instruction. This avoids the common one-cycle delay from bus transfer to use of data, and reduces the access latency of external data by one cycle.

**1.3.9 Memory Management**

A 64-entry Translation Look-Aside Buffer (TLB) and two Region Mapping registers on the Am29050 microprocessor perform virtual-to-physical address translation, avoiding the cycle which would be required to transfer the virtual address to an external TLB. A number of enhancements improve the performance of address translation:

1. Pipelining—The operation of the TLB is pipelined with other processor operations.

2. Early Address Translation—Address translations for load, store, and branch instructions occur during the cycle in which these instructions are executed. This allows the physical address to be transferred externally in the next cycle.

3. Region Mapping—The Region Mapping registers permit efficient mapping of large, contiguous regions of memory. This is useful for code libraries and large data structures; these can appear in a virtual address space without paging overhead.

4. Task Identifiers—Task Identifiers allow TLB entries to be matched to different processes, so that TLB invalidation is not required during task switches.

5. Least-Recently Used Hardware—This hardware allows immediate selection of a TLB set to be replaced.

6. Software Reload—Software reload allows the operating system to use a page-mapping scheme which is best matched to its environment. Paged-segmented, one-level-page mapping, two-level-page mapping, or any other user-defined page-mapping scheme can be supported. Because Am29050
microprocessor instructions execute at an average rate of nearly one instruction per cycle, software reload has a performance approaching that of hardware TLB reload.

1.3.10 Interrupts and Traps

When the Am29050 microprocessor takes an interrupt or trap, it does not automatically save its current state information in memory. This greatly improves the performance of temporary interruptions such as TLB reload or other simple operating-system calls which require no saving of state information.

In cases where the processor state must be saved, the saving and restoring of state information is under the control of software. The methods and data structures used to handle interrupts—and the amount of state saved—may be tailored to the needs of a particular system.

Interrupts and traps are dispatched through a 256-entry Vector Area, which directs the processor to a routine to handle a given interrupt or trap. The Vector Area may be relocated in memory by the modification of a processor register. There may be multiple Vector Areas in the system, though only one is active at any given time.

The Vector Area is either a table of pointers to the interrupt and trap handlers, or a segment of instruction memory (possibly read-only memory) containing the handlers themselves. The choice between the two possible Vector Area definitions is determined by the cost/performance trade-offs made for a particular system.

If the Vector Area is a table of vectors in data memory, it requires only 1 kb of memory. However, this structure requires that the processor perform a vector fetch every time an interrupt or trap is taken. The vector fetch requires at least 3 cycles, in addition to the number of cycles required for the basic memory access.

If the Vector Area is a segment of instruction memory, it requires a maximum of 64 kb of memory. The advantage of this structure is that the processor begins the execution of the interrupt or trap handler in the minimum amount of time.

1.4 OPTIMIZING COMPILERS

The number of instructions used to perform a given task is minimized by optimizing compilers which are supplied for the Am29050 microprocessor. A full discussion of optimizing-compiler technology is beyond the scope of this manual, but there are a few concepts which should be mentioned here, because the Am29050 microprocessor was designed to be an excellent target for optimizing compilers.

1.4.1 Optimizing-Compiler Overview

In addition to performing the same tasks as any other compiler, an optimizing compiler rearranges the generated code to minimize its size and execution time. This optimization occurs after the initial phases of code generation have been completed. The optimizer inspects large portions of the compiled program for frequently occurring cases where the compiled results can be improved.

Many optimization opportunities arise precisely because the code is compiler generated. Code translation is an automated process, so the initial phases of the compiler often generate code that is much less than optimum. However, the optimizer can produce results which are often better than those produced by human assembly-language programmers, because it can deal with large portions of the program and an immense amount of data concerning program behavior.
1.4.2 **Optimizing-Compiler Operation**

Conceptually, the optimizer arranges program flow and the creation, modification, and use of program data to minimize the amount of time required to perform a given task. The reduction in program space is a normal side-benefit of the reduction in execution time. The optimizer is concerned not only with data explicit in the high-level program, but also with data created by other phases of the compiler in order to properly translate the program (for example, temporary values created during the evaluation of expressions). Optimization involves the following sorts of operations:

1. Reusing results rather than repeating computations. The optimizer attempts to eliminate redundant computations by performing a computation once, and saving the result for later use. Often these redundant computations are not apparent in the original program, but are created by the underlying definitions of high-level operations.

2. Reducing the amount of code executed within loops. In many cases, only a few computations change on different loop iterations. The optimizer attempts to reduce the amount of work performed within loops to a minimum, by moving loop-invariant computations outside of loops.

3. Replacing slow operations by faster ones. The optimizer can recognize special cases of multiply and divide, for example, and replace them with faster shift and add instructions. The slow operations, again, often are generated by earlier phases of the compiler because these operations are most general, and the early code-generation phases cannot recognize the special cases which allow the operations to be replaced with faster ones.

4. Allocating processor registers so that they contain frequently used data. This reduces the number of relatively slow memory references, and replaces them by faster register references.

5. Scheduling the execution of instructions. The optimizer attempts to move instructions to a point in the program flow where they create fewer problems for the processor pipeline. For example, a register load or a floating-point operation may be moved to a point in the instruction sequence where its execution can be overlapped with other instructions.

Most optimizations performed rely heavily on two types of information collected by the optimizer: the first type deals with program flow, and the second with data dependencies which arise because of the program flow. The optimizer can tailor the code to the high-level task being compiled, not because it understands the task being performed by the high-level program, but because it understands the dependencies which arise in the generated code. As a result, it can adjust the instruction sequence to minimize the performance impact of these dependencies.

It is important to note that the optimizer does not directly optimize a given program, but rather optimizes a special representation of the program which is suitable for analysis and modification by the optimizer, which is, after all, just another program. The key to optimization is that this representation be easy to analyze for program and data-flow information, and that it be easy to rearrange when optimizations are performed.
1.4.3 The Am29050 Microprocessor and Optimizing Compilers

1.4.3.1 GENERAL PRINCIPLES

The primary principle behind the Am29050 microprocessor instruction set is that it matches the internal representation used by optimizing compilers to perform optimization. As discussed above, this representation is not arbitrary, but is rather strictly defined by the optimization algorithms.

It is important to realize that optimizations performed for the Am29050 microprocessor would have limited effectiveness if applied to so-called complex-instruction processors. There are several fundamental problems that limit the effectiveness of optimizations for these other processors.

The first problem with complex-instruction sets is that they normally provide a variety of instruction sequences which perform the same function as a sequence of instructions in the compiler's internal representation, but do not match it exactly. The trade-offs made by a compiler to decide among the available choices can be very complex.

In the first place, it is difficult for the compiler to determine the difference in execution time between multiple instruction sequences, because of the amount of information involved. For example, just changing the addressing mode of an instruction can change the execution time. This is further complicated in the cases where the compiled program is to be run on different implementations of the same processor, where execution times can depend on the implementation. If there is only one instruction sequence to choose from, and if all instructions execute in a single cycle, this problem is reduced greatly.

During the generation of code for a complex-instruction processor, it is nearly impossible to guarantee that the choice of a given code sequence will not force a less-than-optimum choice of code at some later point in the translation. Restrictions arise late in translation because of decisions made earlier. Often, these restrictions arise because of interactions between instructions; they are especially severe when instructions operate only on a specific register or group of registers.

An additional problem with complex instruction sets is that optimizations applied to them do not necessarily save execution time. An optimization may not be reflected in the final compiled code, because the instruction set may inhibit the realization of the optimization. However, in the case of the Am29050 microprocessor, an optimization is guaranteed to eliminate one or more execution cycles, because all processor operations are exposed to the compiler.

The greatest benefit of exposing all processor operations to the compiler appears within loops, which is where processors spend a great deal of their execution time. The problem with complex instruction sets here is that, when an instruction set forces multiple operations with one instruction, the processor spends much time performing redundant computations within loops. Many times, the redundant computations are performed by microcode, which cannot detect that a computation is loop-invariant, because it knows nothing of loops. The compiler is in no position to do much about this, because it cannot remove the loop-invariant computations from the micro-sequence; it is forced to accept the definitions of the instructions as they are.

If an instruction set is defined so that all hardware-level operations are available to the compiler, the compiler is free to construct any sequence of these operations. This allows the movement of loop-invariant computations out of loops, which can result in tremendous performance improvements.
SPECIAL Am29050 MICROPROCESSOR FEATURES

In addition to the above considerations, there are several other central principles behind the definition of the Am29050 microprocessor.

The Am29050 microprocessor instruction set reduces the number of instructions required for most general-purpose tasks by providing a complete set of operations. The instruction set is streamlined, but there is no attempt to minimize the number of instructions. Rather, the goal is to minimize the number of instructions required to execute most high-level language programs.

With a few minor exceptions, Am29050 microprocessor integer instructions execute in a single cycle. As a result, the performance of an Am29050 microprocessor instruction sequence is very easy to predict, simplifying the task of compiler instruction selection. In addition, single-cycle instruction execution allows the Am29050 microprocessor to take the maximum advantage of a high-performance system design. Instructions are executed at approximately the rate at which they are supplied to the processor. The Am29050 microprocessor does not artificially constrain the instruction-execution rate by forcing instructions to require multiple cycles for execution, except in the unavoidable case of floating-point operations.

The Am29050 microprocessor contains a large number of registers which facilitate compiler optimizations. These registers allow frequently used variables to be accessed quickly, provide a large number of temporary locations for the reuse of computational results, and simplify inter-procedural communication. The compiler is free to allocate these registers as required to improve performance. Register allocation is relatively simple, because there is such a large number of registers.

For other processors which have fixed register-addressing, a compiler has difficulties allocating the usage of registers, because registers must be allocated statically at compile time. Procedure calls present the greatest difficulty. It is impossible for the compiler to determine exactly which procedures will be called during execution, and in what order they will be called. Thus, it is impossible to precisely allocate the usage of registers across procedure-call boundaries.

Since the Am29050 microprocessor local registers are addressed relative to a Stack Pointer, compiler register-allocation is simplified. The local registers are allocated dynamically during execution. Thus, the compiler need not be concerned about the allocation of registers across procedure boundaries; this is handled automatically by the local-register addressing.

Am29050 microprocessor pipelining is exposed to the compiler in the form of delayed branches, overlapped loads and stores, and overlapped floating-point operations. The compiler is free to arrange instructions to reduce the performance impact of the processor pipeline. However, the compiler arranges instructions only because of the performance benefits. Pipeline interlocks in the Am29050 microprocessor guarantee correct operation in any case.
This chapter gives a brief overview of the Am29050 microprocessor architecture, grouped into programming-related features, hardware features, and system interfaces. The technical information given in this chapter is also contained in subsequent chapters. Much of the detail is omitted here, since the objective is to provide a framework for understanding the information in later chapters.

Where appropriate, section titles in this chapter are followed by references to sections appearing in subsequent chapters. The referenced sections contain related detailed information.

2.1
PROGRAMMER REFERENCE OVERVIEW

This section gives a brief description of the Am29050 microprocessor from a programmer's point of view. It introduces the processor's program modes, registers, and instructions. An overview of the processor's data formats and handling is given. This section also briefly describes interrupts and traps, memory management, and the coprocessor interface. Finally, the Timer Facility and Trace Facility are introduced.

2.1.1 Program Modes (see Section 3.1)

There are three mutually exclusive modes of program execution: the Supervisor mode, the User mode, and the Monitor mode. In the Supervisor mode, executing programs have access to all processor resources. In the User mode, certain processor resources may not be accessed; any attempted access causes a trap. The Monitor mode allows debugging of both User and Supervisor code.

2.1.2 Visible Registers (see Section 3.2)

The Am29050 microprocessor incorporates four classes of registers which are accessed and manipulated by instructions: general-purpose registers, floating-point accumulator registers, special-purpose registers, and Translation Look-Aside Buffer (TLB) registers.

2.1.2.1 GENERAL-PURPOSE REGISTERS (see Section 3.2.1)

The Am29050 microprocessor has 192 general-purpose registers. General-purpose registers are not dedicated to any special use, and are available for any appropriate program use.

Most processor instructions are three-address instructions. An instruction specifies any three of the 192 registers for use in instruction execution. Normally, two of these registers contain source-operands for the instruction, and a third stores the result of the instruction.

The 192 registers are divided into 64 global and 128 local registers. Global registers are addressed with absolute register numbers, while local registers are addressed relative to an internal Stack Pointer.
For fast procedure calling, a portion of a compiler’s run-time stack can be mapped into the local registers. Statically allocated variables, temporary values, and operating-system parameters are kept in the global registers.

The Stack Pointer for local registers is mapped to Global Register 1. The Stack Pointer is a full 32-bit virtual address for the top of the run-time stack.

The Condition Code Accumulator Register is mapped to both Global Register 2 and Global Register 3. This register can be used to accumulate into a single condition code the Boolean values produced by several operations. The condition code can then be used as an operand in further operations, for example, as a control parameter for conditional branches.

The general-purpose registers may be accessed indirectly, with the register number specified by the content of a special-purpose register (see below) rather than by an instruction field. Three independent indirect register numbers are contained in three separate special-purpose registers. Indirect addressing is accomplished by specifying Global Register 0 as an instruction operand or result register. An instruction can specify an indirect register access for any or all of the source operands or result.

General-purpose registers may be partitioned into segments of 16 registers for the purpose of access protection. A register in a protected segment may be accessed only by a program executing in the Supervisor or Monitor modes. An attempted access (either read or write) by a User-mode program causes a trap to occur.

2.1.2.2 FLOATING-POINT ACCUMULATOR REGISTERS (see Section 3.2.2)

The Am29050 microprocessor contains four double-precision floating-point accumulator registers for use with the floating-point multiply-accumulate and multiply-sum operations. Instructions are also provided for writing and reading the accumulator registers directly.

2.1.2.3 SPECIAL-PURPOSE REGISTERS (see Section 3.2.3)

The Am29050 microprocessor contains 39 special-purpose registers. These registers provide controls and data for certain processor functions.

Special-purpose registers are accessed by data movement only. Any special-purpose register can be written with the contents of any general-purpose register or a 16-bit immediate field, and any general-purpose register can be written with the contents of any special-purpose register. Operations cannot be performed directly on the contents of special-purpose registers.

Some special-purpose registers are protected, and can be accessed only in the Supervisor or Monitor modes. This restriction applies to both read and write accesses. An attempt by a User-mode program to access a protected register causes a trap to occur.

The protected special-purpose registers are defined as follows:

0. VAB: Vector Area Base Address—Defines the beginning of the interrupt/trap Vector Area.

1. OPS: Old Processor Status—Receives a copy of the Current Processor Status (see below) when an interrupt or trap is taken. It is later used to restore the Current Processor Status on an interrupt return.

2. CPS: Current Processor Status—Contains control information associated with the currently executing process, such as interrupt disables and the Supervisor Mode bit.

2-2 ARCHITECTURE HIGHLIGHTS
3. CFG: Configuration—Contains control information which normally varies only from system to system, and usually is set only during system initialization.

4. CHA: Channel Address—Contains the address associated with an external access, and retains the address if the access does not complete successfully. The Channel Address Register, in conjunction with the Channel Data and Channel Control registers described below, allow the restarting of unsuccessful external accesses. This might be necessary for an access encountering a page fault in a demand-paged environment, for example.

5. CHD: Channel Data—Contains data associated with a store operation, and retains the data if the operation does not complete successfully.

6. CHC: Channel Control—Contains control information associated with a channel operation, and retains this information if the operation does not complete successfully.

7. RBP: Register Bank Protect—Restricts access of User-mode programs to specified groups of 16 registers. This protects operating-system parameters kept in the global registers from corruption by User-mode programs.

8. TMC: Timer Counter—Supports real-time control and other timing-related functions.

9. TMR: Timer Reload—Maintains synchronization of the Timer Counter. It includes control bits for the Timer Facility.

10. PC0: Program Counter 0—Contains the address of the instruction being decoded when an interrupt or trap is taken. The processor restarts this instruction upon interrupt return.

11. PC1: Program Counter 1—Contains the address of the instruction being executed when an interrupt or trap is taken. The processor restarts this instruction upon interrupt return.

12. PC2: Program Counter 2—Contains the address of the instruction just completed when an interrupt or trap is taken. This address is provided for information only, and does not participate in an interrupt return.

13. MMU: MMU Configuration—Allows selection of various memory-management options, such as page size.

14. LRU: LRU Recommendation—Simplifies the reload of entries in the Translation Look-Aside Buffer (TLB) by providing information on the least-recently used entry of the TLB when a TLB miss occurs (see Section 2.1.6).

15. RSN: Reason Vector—Contains the vector number of the synchronous trap which caused entry into the Monitor mode.

16. RMA0: Region Mapping Address 0—Specifies a mapping from a region of virtual address space to physical address space; contains the Virtual Base Address (VBA) and the corresponding Physical Base Address (PBA) (see Section 3.6.2).

17. RMC0: Region Mapping Control 0—Contains control information associated with the region mapping specified by the Region Mapping Address Register 0.

18. RMA1: Region Mapping Address 1—Specifies a mapping from a region of virtual address space to physical address space; contains the Virtual Base Address (VBA) and the corresponding Physical Base Address (PBA).

19. RMC1: Region Mapping Control 1—Contains control information associated with the region mapping specified by the Region Mapping Address Register 1.
20. SPC0: Shadow Program Counter 0—Contains the address of the instruction being decoded when the processor enters Monitor mode. The processor restarts this instruction upon return from Monitor mode (see Section 3.7).

21. SPC1: Shadow Program Counter 1—Contains the address of the instruction being executed when the processor enters Monitor mode. The processor restarts this instruction upon return from Monitor mode.

22. SPC2: Shadow Program Counter 2—Contains the address of the instruction just completed when the processor enters Monitor mode. This address is provided for information only, and does not participate in the return from Monitor mode.

23. IBA0: Instruction Breakpoint Address 0—Contains the address of an instruction breakpoint (see Section 3.7).

24. IBC0: Instruction Breakpoint Control 0—Contains control and status information for the breakpoint comparison specified by the Instruction Breakpoint Address Register 0.

25. IBA1: Instruction Breakpoint Address 1—Contains the address of an instruction breakpoint.

26. IBC1: Instruction Breakpoint Control 1—Contains control and status information for the breakpoint comparison specified by the Instruction Breakpoint Address Register 1.

The unprotected special-purpose registers are defined as follows:

128. IPC: Indirect Pointer C—Allows the indirect access of a general-purpose register.

129. IPA: Indirect Pointer A—Allows the indirect access of a general-purpose register.

130. IPB: Indirect Pointer B—Allows the indirect access of a general-purpose register.

131. Q: Q—Provides additional operand bits for multiply step, divide step, and divide operations.

132. ALU: ALU Status—Contains information about the outcome of integer arithmetic and logical operations, and holds residual control for certain instruction operations.

133. BP: Byte Pointer—Contains an index of a byte or half-word within a word. This register is also accessible via the ALU Status Register.

134. FC: Funnel Shift Count—Provides a bit offset for the extraction of word-length fields from double-word operands. This register is also accessible via the ALU Status Register.

135. CR: Load/Store Count Remaining—Maintains a count of the number of loads and stores remaining for load-multiple and store-multiple operations. The count is initialized to the total number of loads or stores to be performed before the operation is initiated. This register is also accessible via the Channel Control Register.

160. FPE: Floating-Point Environment—Controls the operation of floating-point arithmetic, such as rounding modes and exception reporting.

161. INTE: Integer Environment—Enables and disables the reporting of exceptions which occur during integer multiply and divide operations.

162. FPS: Floating-Point Status—Contains information about the outcome of floating-point operations.
2.1.2.4 TLB REGISTERS (see Section 3.2.4)
Translation Look-Aside Buffer (TLB) entries in the Am29050 Memory Management Unit are accessed via 128 TLB registers. A single TLB entry appears as two TLB registers; TLB registers are thus paired according to the corresponding TLB entry.

TLB registers are accessed by data movement only. Any TLB register can be written with the contents of any general-purpose register, and any general-purpose register can be written with the contents of any TLB register. Operations cannot be performed directly on the contents of TLB registers.

TLB registers can be accessed only in the Supervisor mode. This restriction applies to both read and write accesses. An attempt by a User-mode program to access a TLB register causes a trap to occur.

2.1.3 Instruction Set Overview (see Section 3.3 and Chapter 8)
The three-address architecture of the Am29050 microprocessor instruction set allows a compiler or assembly-language programmer to prevent the destruction of operands, and aids register allocation and operand reuse. Instruction operands may be contained in any two of the 192 general-purpose registers, and instruction results may be stored in any of the 192 general-purpose registers.

The compiler or assembly-language programmer has complete freedom to allocate register usage. There is no dedication of a particular register or register group to a particular class of operations. The instruction set is designed to minimize the number of side effects and implicit operations of instructions.

Most Am29050 microprocessor instructions can accept an 8-bit constant as one of the source operands. Larger constants are constructed using one or two additional instructions and a general-purpose register. Relative branch instructions specify a 16-bit, signed, word offset. Absolute branches specify a 16-bit word address.

The Am29050 microprocessor instruction set contains 125 instructions. These instructions are divided into nine classes:

1. Integer Arithmetic—Perform integer add, subtract, multiply, and divide operations.
2. Compare—Perform arithmetic and logical comparisons. Some instructions in this class allow the generation of a trap if the comparison condition is not met.
3. Logical—Perform a set of bit-wise Boolean operations.
4. Shift—Perform arithmetic and logical shifts, and allow the extraction of 32-bit words from 64-bit double-words.
5. Data Movement—Perform movement of data fields between registers, and the movement of data to and from external devices and memories.
6. Constant—Allow the generation of large constant values in registers.
7. Floating-Point—Perform floating-point arithmetic, comparisons, and format conversions.
8. Branch—Perform program jumps and subroutine calls.
9. Miscellaneous—Perform miscellaneous control functions and operations not provided by other classes.
The Am29050 microprocessor executes all instructions in a single cycle, except for floating-point operations, interrupt returns, Load Multiple, and Store Multiple.

Table 2-1 lists all Am29050 microprocessor instructions alphabetically by instruction mnemonic. Table 2-1 is provided only to give a general overview of the instruction set. Section 3.3 defines the instructions grouped into classes, and Chapter 8 provides a detailed specification of the instruction set.

Figure 2-1

Data-Unit Numbering Conventions

<table>
<thead>
<tr>
<th>Bytes Within Words</th>
<th>BO bit = 0 (big endian)</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7 0</td>
<td></td>
</tr>
<tr>
<td>Byte 0</td>
<td>Byte 1</td>
</tr>
<tr>
<td>Byte 2</td>
<td>Byte 3</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Half-Words Within Words</th>
<th>BO bit = 0 (big endian)</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7 0</td>
<td></td>
</tr>
<tr>
<td>Half-Word 0</td>
<td>Half-Word 1</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Half-Words Within Words</th>
<th>BO bit = 1 (little endian)</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7 0</td>
<td></td>
</tr>
<tr>
<td>Half-Word 1</td>
<td>Half-Word 0</td>
</tr>
</tbody>
</table>

2.1.4 Data Formats And Handling (see Section 3.4)

This section introduces the data formats and data-manipulation mechanisms which are supported by the Am29050 microprocessor.

2.1.4.1 DATA TYPES (see Sections 3.4.1, 3.4.2, and 3.4.3)

A word is defined as 32 bits of data. A half-word consists of 16 bits, and a double-word consists of 64 bits. Bytes are 8 bits in length. The Am29050 microprocessor has direct support for single- and double-precision floating-point, word-integer (signed and unsigned), word-logical, word-Boolean, half-word integer (signed and unsigned), and character (signed and unsigned) data. Other data types, such as character strings, are supported with sequences of basic instructions.

The format for Boolean data used by the processor is such that the Boolean values TRUE and FALSE are represented by 1 and 0, respectively, in the most-significant bit of a word.
<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Instruction Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD</td>
<td>Add</td>
</tr>
<tr>
<td>ADDC</td>
<td>Add with Carry</td>
</tr>
<tr>
<td>ADDCS</td>
<td>Add with Carry, Signed</td>
</tr>
<tr>
<td>ADDCU</td>
<td>Add with Carry, Unsigned</td>
</tr>
<tr>
<td>ADDS</td>
<td>Add, Signed</td>
</tr>
<tr>
<td>ADDU</td>
<td>Add, Unsigned</td>
</tr>
<tr>
<td>AND</td>
<td>AND Logical</td>
</tr>
<tr>
<td>ANDN</td>
<td>AND-NOT Logical</td>
</tr>
<tr>
<td>ASEQ</td>
<td>Assert Equal To</td>
</tr>
<tr>
<td>ASGE</td>
<td>Assert Greater Than or Equal To</td>
</tr>
<tr>
<td>ASGEU</td>
<td>Assert Greater Than or Equal To, Unsigned</td>
</tr>
<tr>
<td>ASGT</td>
<td>Assert Greater Than</td>
</tr>
<tr>
<td>ASGTU</td>
<td>Assert Greater Than, Unsigned</td>
</tr>
<tr>
<td>ASLE</td>
<td>Assert Less Than or Equal To</td>
</tr>
<tr>
<td>ASLEU</td>
<td>Assert Less Than or Equal To, Unsigned</td>
</tr>
<tr>
<td>ASLT</td>
<td>Assert Less Than</td>
</tr>
<tr>
<td>ASLTU</td>
<td>Assert Less Than, Unsigned</td>
</tr>
<tr>
<td>ASNEQ</td>
<td>Assert Not Equal To</td>
</tr>
<tr>
<td>CALL</td>
<td>Call Subroutine</td>
</tr>
<tr>
<td>CALLI</td>
<td>Call Subroutine, Indirect</td>
</tr>
<tr>
<td>CLASS</td>
<td>Classify Floating-Point Operand</td>
</tr>
<tr>
<td>CLZ</td>
<td>Count Leading Zeros</td>
</tr>
<tr>
<td>CONST</td>
<td>Constant</td>
</tr>
<tr>
<td>CONSTH</td>
<td>Constant, High</td>
</tr>
<tr>
<td>CONSTHZ</td>
<td>Constant High, Zero Lower</td>
</tr>
<tr>
<td>CONSTN</td>
<td>Constant, Negative</td>
</tr>
<tr>
<td>CONVERT</td>
<td>Convert Data Format</td>
</tr>
<tr>
<td>CPBYTE</td>
<td>Compare Bytes</td>
</tr>
<tr>
<td>CPEQ</td>
<td>Compare Equal To</td>
</tr>
<tr>
<td>CPGE</td>
<td>Compare Greater Than or Equal To</td>
</tr>
<tr>
<td>CPGEU</td>
<td>Compare Greater Than or Equal To, Unsigned</td>
</tr>
<tr>
<td>CPGT</td>
<td>Compare Greater Than</td>
</tr>
<tr>
<td>CPGTU</td>
<td>Compare Greater Than, Unsigned</td>
</tr>
<tr>
<td>CPLE</td>
<td>Compare Less Than or Equal To</td>
</tr>
<tr>
<td>CPLEU</td>
<td>Compare Less Than or Equal To, Unsigned</td>
</tr>
<tr>
<td>CPLT</td>
<td>Compare Less Than</td>
</tr>
<tr>
<td>CPLTU</td>
<td>Compare Less Than, Unsigned</td>
</tr>
<tr>
<td>CPNEQ</td>
<td>Compare Not Equal To</td>
</tr>
<tr>
<td>DADD</td>
<td>Floating-Point Add, Double-Precision</td>
</tr>
<tr>
<td>DDIV</td>
<td>Floating-Point Divide, Double-Precision</td>
</tr>
<tr>
<td>DEQ</td>
<td>Floating-Point Equal To, Double-Precision</td>
</tr>
<tr>
<td>DGE</td>
<td>Floating-Point Greater Than or Equal To, Double-Precision</td>
</tr>
<tr>
<td>DGT</td>
<td>Floating-Point Greater Than, Double-Precision</td>
</tr>
<tr>
<td>DIV</td>
<td>Divide Step</td>
</tr>
<tr>
<td>DIV0</td>
<td>Divide Initialize</td>
</tr>
<tr>
<td>DIVIDE</td>
<td>Integer Divide, Signed</td>
</tr>
<tr>
<td>DIVIDU</td>
<td>Integer Divide, Unsigned</td>
</tr>
<tr>
<td>DIVL</td>
<td>Divide Last Step</td>
</tr>
<tr>
<td>DIVREM</td>
<td>Divide Remainder</td>
</tr>
<tr>
<td>DMAC</td>
<td>Floating-Point Multiply-Accumulate, Double-Precision</td>
</tr>
<tr>
<td>DMUL</td>
<td>Floating-Point Multiply, Double-Precision</td>
</tr>
<tr>
<td>DMSM</td>
<td>Floating-Point Multiply-Sum, Double-Precision</td>
</tr>
<tr>
<td>DSM</td>
<td>Floating-Point Subtract, Double-Precision</td>
</tr>
<tr>
<td>EMULATE</td>
<td>Trap to Software Emulation Routine</td>
</tr>
<tr>
<td>EXBYTE</td>
<td>Extract Byte</td>
</tr>
<tr>
<td>EXHW</td>
<td>Extract Half-Word</td>
</tr>
<tr>
<td>Mnemonic</td>
<td>Instruction Name</td>
</tr>
<tr>
<td>-----------</td>
<td>------------------------------------------------------</td>
</tr>
<tr>
<td>EXHWS</td>
<td>Extract Half-Word, Sign-Extended</td>
</tr>
<tr>
<td>EXTRACT</td>
<td>Extract Word, Bit-Aligned</td>
</tr>
<tr>
<td>FADD</td>
<td>Floating-Point Add, Single-Precision</td>
</tr>
<tr>
<td>FDIV</td>
<td>Floating-Point Divide, Single-Precision</td>
</tr>
<tr>
<td>FDMUL</td>
<td>Floating-Point Multiply, Single-to-Double Precision</td>
</tr>
<tr>
<td>FEQ</td>
<td>Floating-Point Equal To, Single-Precision</td>
</tr>
<tr>
<td>FGE</td>
<td>Floating-Point Greater Than or Equal To, Single-Precision</td>
</tr>
<tr>
<td>FGT</td>
<td>Floating-Point Greater Than, Single-Precision</td>
</tr>
<tr>
<td>FMAC</td>
<td>Floating-Point Multiply-Accumulate, Single-Precision</td>
</tr>
<tr>
<td>FMUL</td>
<td>Floating-Point Multiply, Single-Precision</td>
</tr>
<tr>
<td>FMSM</td>
<td>Floating-Point Multiply-Sum, Single-Precision</td>
</tr>
<tr>
<td>FSUB</td>
<td>Floating-Point Subtract, Single-Precision</td>
</tr>
<tr>
<td>HALT</td>
<td>Enter Halt Mode</td>
</tr>
<tr>
<td>INBYTE</td>
<td>Insert Byte</td>
</tr>
<tr>
<td>INHW</td>
<td>Insert Half-Word</td>
</tr>
<tr>
<td>INV</td>
<td>Invalidate</td>
</tr>
<tr>
<td>IRET</td>
<td>Interrupt Return</td>
</tr>
<tr>
<td>IRETINV</td>
<td>Interrupt Return and Invalidate</td>
</tr>
<tr>
<td>JMP</td>
<td>Jump</td>
</tr>
<tr>
<td>JMPF</td>
<td>Jump False</td>
</tr>
<tr>
<td>JMPFDEC</td>
<td>Jump False and Decrement</td>
</tr>
<tr>
<td>JMPI</td>
<td>Jump Indirect</td>
</tr>
<tr>
<td>JMPT</td>
<td>Jump True</td>
</tr>
<tr>
<td>JMPTI</td>
<td>Jump True Indirect</td>
</tr>
<tr>
<td>LOAD</td>
<td>Load</td>
</tr>
<tr>
<td>LOADL</td>
<td>Load and Lock</td>
</tr>
<tr>
<td>LOADM</td>
<td>Load Multiple</td>
</tr>
<tr>
<td>LOADSET</td>
<td>Load and Set</td>
</tr>
<tr>
<td>MFACC</td>
<td>Move from Accumulator</td>
</tr>
<tr>
<td>MFSR</td>
<td>Move from Special Register</td>
</tr>
<tr>
<td>MFTLB</td>
<td>Move from Translation Look-Aside Buffer Register</td>
</tr>
<tr>
<td>MTACC</td>
<td>Move to Accumulator</td>
</tr>
<tr>
<td>MTSR</td>
<td>Move to Special Register</td>
</tr>
<tr>
<td>MTSRIM</td>
<td>Move to Special Register Immediate</td>
</tr>
<tr>
<td>MTTLB</td>
<td>Move to Translation Look-Aside Buffer Register</td>
</tr>
<tr>
<td>MUL</td>
<td>Multiply Step</td>
</tr>
<tr>
<td>MULL</td>
<td>Multiply Last Step</td>
</tr>
<tr>
<td>MULTIPLU</td>
<td>Integer Multiply, Unsigned</td>
</tr>
<tr>
<td>MULTIPLY</td>
<td>Integer Multiply, Signed</td>
</tr>
<tr>
<td>MULTM</td>
<td>Integer Multiply Most-Significant Bits, Signed</td>
</tr>
<tr>
<td>MULTMU</td>
<td>Integer Multiply Most-Significant Bits, Unsigned</td>
</tr>
<tr>
<td>MULU</td>
<td>Multiply Step, Unsigned</td>
</tr>
<tr>
<td>NAND</td>
<td>NAND Logical</td>
</tr>
<tr>
<td>NOR</td>
<td>NOR Logical</td>
</tr>
<tr>
<td>OR</td>
<td>OR Logical</td>
</tr>
<tr>
<td>ORN</td>
<td>OR NOT Logical</td>
</tr>
<tr>
<td>SETIP</td>
<td>Set Indirect Pointers</td>
</tr>
<tr>
<td>SLL</td>
<td>Shift Left Logical</td>
</tr>
<tr>
<td>SQRT</td>
<td>Floating-Point Square Root</td>
</tr>
<tr>
<td>SRA</td>
<td>Shift Right Arithmetic</td>
</tr>
<tr>
<td>SRL</td>
<td>Shift Right Logical</td>
</tr>
<tr>
<td>STORE</td>
<td>Store</td>
</tr>
<tr>
<td>STOREL</td>
<td>Store and Lock</td>
</tr>
<tr>
<td>STOREM</td>
<td>Store Multiple</td>
</tr>
</tbody>
</table>
Table 2-1 Am29050 Microprocessor Instruction Set (continued)

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Instruction Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUB</td>
<td>Subtract</td>
</tr>
<tr>
<td>SUBC</td>
<td>Subtract with Carry</td>
</tr>
<tr>
<td>SUBCS</td>
<td>Subtract with Carry, Signed</td>
</tr>
<tr>
<td>SUBCU</td>
<td>Subtract with Carry, Unsigned</td>
</tr>
<tr>
<td>SUBR</td>
<td>Subtract Reverse</td>
</tr>
<tr>
<td>SUBRC</td>
<td>Subtract Reverse with Carry</td>
</tr>
<tr>
<td>SUBRCU</td>
<td>Subtract Reverse with Carry, Signed</td>
</tr>
<tr>
<td>SUBRS</td>
<td>Subtract Reverse, Signed</td>
</tr>
<tr>
<td>SUBRU</td>
<td>Subtract Reverse, Unsigned</td>
</tr>
<tr>
<td>SUBS</td>
<td>Subtract Signed</td>
</tr>
<tr>
<td>SUBU</td>
<td>Subtract Unsigned</td>
</tr>
<tr>
<td>XNOR</td>
<td>Exclusive-NOR Logical</td>
</tr>
<tr>
<td>XOR</td>
<td>Exclusive-OR Logical</td>
</tr>
</tbody>
</table>

Figure 2-1 illustrates the numbering conventions for data units contained in a word. Within a word, bits are numbered in increasing order from right-to-left, starting with the number 0 for the least-significant bit. Bytes and half-words within a word are numbered in increasing order starting with the number 0. However, bytes and half-words may be numbered right-to-left (sometimes referred to as “little endian”) or left-to-right (sometimes referred to as “big endian”), as controlled by the Configuration Register.

Note that the numbering of bits within words is strictly for notational convenience. In contrast, the numbering conventions for bytes and half-words within words affect processor operations.

2.1.4.2 EXTERNAL DATA ACCESSES (see Section 3.4.4)

External accesses move data between the processor and external devices and memories. These accesses occur only as a result of load and store instructions.

Load and store instructions move words of data to and from general-purpose registers. Each load and store instruction moves a single word. There are load and store instructions which support interlocking operations necessary for multi-processor exclusion, synchronization, and communication.

For the movement of multiple words, Load Multiple and Store Multiple instructions move the contents of sequentially addressed external locations to or from sequentially numbered general-purpose registers. The Load Multiple and Store Multiple allow the movement of up to 192 words at a maximum rate of one word per processor cycle. The multiple load and store sequences can be interrupted, and restarted at the point of interruption.

Load and store instructions provide no mechanism for computing the address associated with the external data access. All addresses are contained in a general-purpose register at the beginning of the access, or are given by an 8-bit instruction constant. Any address computation must be performed explicitly before the load or store instruction is executed. Since address computations are expressed directly, they are exposed for compiler optimizations as any other computations are. Processor hardware tracks the registers that are being used to contain addresses, and tracks computations that are for external addresses. This information allows the processor to reduce the apparent external access time by one cycle in many cases.
External data accesses are overlapped with instruction execution. Processor performance is improved if instructions that follow loads do not immediately use externally referenced data. In this manner, the time required to perform the external access is overlapped with subsequent instruction execution. Because of hardware interlocks, this concurrency has no effect on the logical behavior of an executing program.

2.1.4.3 ADDRESSING AND ALIGNMENT (see Section 3.4.5)

External instructions and data are contained in one of five 32-bit address spaces:

1. Data Memory
2. Input/Output
3. Coprocessor
4. Instruction Read-Only Memory (Instruction ROM)
5. Instruction Random Access Memory (Instruction RAM)

An address is treated as virtual or physical, as determined by the Current Processor Status Register. Address translation for data accesses is enabled separately from address translation for instruction accesses. A program in the Supervisor mode can temporarily disable address translation for individual loads and stores; this permits load-real and store-real operations.

Bits contained within load and store instructions distinguish between the data memory, input/output, and coprocessor address-spaces. Address translation also may determine whether an access is performed in the data memory or the input/output address space. The Current Processor Status register determines whether instruction accesses are directed to the instruction/RAM memory address space or to the instruction ROM address space.

The Am29050 microprocessor does not support data accesses directly to the instruction RAM or instruction ROM address space. However, this capability is possible as a system option.

All addresses are interpreted as byte addresses, although accesses are word-oriented. The number of a byte within a word is given by the two least-significant address bits. The number of a half-word within a word is given by the next-to-least-significant address bit.

Since only byte addressing is supported, it is possible that an address for the access of a word or half-word is not aligned to the desired word or half-word. For a word access, an unaligned address has a 1 in either or both of the two least-significant address bits. For a half-word access, an unaligned address has a 1 in the least-significant address bit. In many systems, address alignment can be ignored, with addresses truncated to access the word or half-word of interest. However, as a user option, the Am29050 microprocessor can create a trap when a non-aligned access is attempted. The trap allows software emulation of non-aligned accesses.

In the Am29050 microprocessor, all instructions are 32 bits in length, and are aligned on word-address boundaries.

2.1.4.4 BYTE AND HALF-WORD ACCESSES (see Section 3.4.6)

The Am29050 microprocessor supports the direct external access of bytes and half-words as an option. If this option is enabled, the Am29050 microprocessor selects a byte or half-word within a word on a load, and aligns it to the low-order byte or half-word of a register. On a store, the low-order byte or half-word of a register is replicated in all byte or half-word positions, so that the external memory can easily
write the required byte or half-word in memory. This option requires that the external memory system be able to write individual bytes and half-words within words.

To avoid the memory-system complexity required for writing individual bytes and half-words, the Am29050 microprocessor can perform byte and half-word accesses using software alone. The Am29050 microprocessor can set a byte-position indicator in the ALU Status Register, as an option for load instructions, with the two least-significant bits of the address for the load. To load a byte or half-word, a word load is first performed. This load sets the byte-position indicator, and a subsequent instruction extracts the byte or half-word of interest from the accessed word. To store a byte or half-word, a load is also first performed; the byte or half-word of interest is inserted into the accessed word, and the resulting word then is stored. Even if the Am29050 microprocessor is configured to perform byte and half-word accesses in hardware, this software-only technique operates correctly; this allows software to be upward-compatible from simpler systems to more complex systems.

2.1.5 Interrupts And Traps (see Section 3.5)

Normal program flow may be preempted by an interrupt or trap for which the processor is enabled. The effect on the processor is identical for interrupts and traps; the distinction is in the different mechanisms by which interrupts and traps are enabled. It is intended that interrupts be used for suspending current program execution and causing another program to execute, while traps are used to report errors and exceptional conditions.

The interrupt and trap mechanism supports high-speed, temporary context switching and user-defined interrupt-processing mechanisms.

2.1.5.1 TEMPORARY CONTEXT SWITCHING

The basic interrupt/trap mechanism of the Am29050 microprocessor supports temporary context switching. During the temporary context switch, the interrupted context is held in processor registers. The interrupt or trap handler can return immediately to this context.

Temporary context switching is useful for instruction emulation, TLB reload routines, and so forth. Many of its features are similar to microprogram execution; processor context does not have to be saved; interrupts are disabled for the duration of the program; and all processor resources are accessible, even if the context that was interrupted is in the User mode. The associated routine may execute from instruction RAM memory or instruction ROM.

2.1.5.2 USER-DEFINED INTERRUPT PROCESSING

Since the basic interrupt/trap mechanism for the Am29050 microprocessor keeps the interrupted context in the processor, dynamically nested interrupts are not supported directly. The context in the processor must be saved before another interrupt or trap can be taken.

The interrupt or trap handler executing during a temporary context switch is not required to return to the interrupted context. This routine optionally may save the interrupted context, load a new one, and return to the new context.

The implementation of the saving and restoring of contexts is completely user-defined. Thus, the context save/restore mechanism used (e.g., interrupt stack, program status word area, etc.) and the amount of context saved can be tailored to the needs of the system.
2.1.5.3 **VECTOR AREA (see Section 3.5.4)**

Interrupt and trap dispatching occurs through a relocatable Vector Area which accommodates as many as 256 interrupt and trap handling routines. Entries into the Vector Area are associated with various sources of interrupts and traps; some are pre-defined, while others are user-defined.

The Vector Area is either a table of vectors in data memory, where each vector points to the beginning of an interrupt or trap handler, or it is a segment of instruction/data memory (or instruction ROM) containing the actual routines. The latter configuration for the Vector Area yields better interrupt performance at the cost of additional memory.

2.1.6 **Memory Management (see Section 3.6)**

The Am29050 microprocessor incorporates a Memory Management Unit (MMU) that accepts a 32-bit virtual byte-address and translates it to a 32-bit physical byte-address in a single cycle. Address translation in the MMU is performed either by a 64-entry Translation Look-Aside Buffer (TLB) or by one of two Region Mapping Units (RMU). The MMU is not dedicated to any particular address-translation architecture.

2.1.6.1 **TRANSLATION LOOK-ASIDE BUFFER**

The TLB is an associative table which contains the most-recently used address translations for the processor. If the translation for a given address cannot be performed by the TLB, a TLB miss occurs, and causes a trap which allows the required translation to be placed into the TLB.

Processor hardware maintains information for each TLB line indicating which entry was least recently used; when a TLB miss occurs, this information is used to indicate the TLB entry to be replaced. Software is responsible for searching system page tables and modifying the indicated TLB entry as appropriate. This allows the page tables to be defined according to the system environment.

TLB entries are modified directly by processor instructions. A TLB entry consists of 64 bits and appears as two word-length TLB registers which may be inspected and modified by instructions.

TLB entries are tagged with a Task Identifier field, which allows the operating system to create a unique 32-bit virtual address space for each of 256 processes. In addition, TLB entries provide support for memory protection and user-defined control information.

2.1.6.2 **REGION MAPPING UNITS**

In addition to the page-by-page translation provided by the TLB, the Am29050 microprocessor supports translation for variable-sized regions, ranging from 64 kb to 2 Gb, by means of two Region Mapping Units.

Each RMU consists of two special-purpose registers. One of the registers in each RMU contains the base address of the virtual region to be mapped and the base address of the corresponding physical region. The other register specifies the region size and contains information which is used to control access, including a Task Identifier.

The RMUs have priority over the TLB translation; in addition, RMU0 has priority over RMU1.
2.1.7 Coprocessor Programming (see Section 6.1)

The coprocessor interface for the Am29050 microprocessor allows a program to communicate with an off-chip coprocessor for performing operations not supported by processor hardware directly.

The coprocessor interface allows the program to transfer operands and operation codes to the coprocessor, and then perform other operations while the coprocessor operation is in progress. The results of the operation are read from the coprocessor by a separate transfer. The processor may transfer multiple operands to the coprocessor without re-transferring operation codes or reading intermediate results. As many as 64 bits of information can be transferred to the coprocessor in a single cycle.

The Am29050 microprocessor includes features that support the definition of the coprocessor as a system option. In this case, coprocessor operations are emulated by software when the coprocessor is not present in a system.

2.1.8 Timer Facility (see Section 7.3.6)

The Timer Facility provides a counter for implementing a real-time clock or other software timing functions. This facility is comprised of two special-purpose registers: the Timer Counter Register, which decrements at a rate equal to the processor operating frequency, and the Timer Reload Register, which re-initializes the Timer Counter Register when it decrements to zero. The Timer Facility optionally may create an interrupt when the Timer Counter decrements to zero.

2.1.9 Trace Facility (see Section 3.7)

The Trace Facility allows a debug program to emulate single-instruction stepping in a program under test. This facility allows a trap to be generated after the execution of any instruction in the program being tested.

Using the Trace Facility, the debug program can inspect and modify the state of the program at every instruction boundary. The Trace Facility is designed to work properly in the presence of normal system interrupts and traps.

2.2 HARDWARE OVERVIEW

This section briefly describes the operation of Am29050 microprocessor hardware. It introduces the processor pipeline and the three major internal functional units: the Instruction Fetch Unit, the Execution Unit, and the Memory Management Unit. Finally, the processor's operational modes are described.

Figure 2-2 shows the Am29050 microprocessor internal data-flow organization. The following sections refer to the various components on this data-flow diagram.

2.2.1 Four-Stage Pipeline (see Section 4.1)

The Am29050 microprocessor implements a four-stage pipeline for integer instruction execution. The four stages are: fetch, decode, execute, and write-back. The pipeline is organized so that the effective instruction-execution rate is as high as one instruction per cycle. Data forwarding and pipeline interlocks are handled by processor hardware.
The execute stage of Am29050 microprocessor floating-point operations is further pipelined to a depth determined by the latency of the operation. The Am29050 microprocessor can therefore issue most floating-point operations at a rate of one operation per cycle, though most operations take more than one cycle to complete.

### 2.2.2 Instruction Fetch Unit (see Section 4.2)

The Instruction Fetch Unit fetches instructions, and supplies instructions to other functional units. It incorporates the Instruction Prefetch Buffer, the Branch Target Cache memory, and the Program Counter Unit. All components of the Instruction Fetch Unit operate during the fetch stage of the processor pipeline.

### 2.2.2.1 INSTRUCTION PREFETCH BUFFER (see Section 4.2.1)

Most instructions executed by the Am29050 microprocessor are fetched from external instruction memory. The processor prefetches instructions so that they are requested at least four cycles before they are required for execution.

Prefetched instructions are stored in a four-word Instruction Prefetch Buffer while awaiting execution. An instruction-prefetch request occurs whenever there is a free location in this buffer (if the processor is otherwise enabled to fetch instructions). When a non-sequential instruction fetch occurs, prefetching is terminated, and then restarted for the new instruction stream.

Instruction prefetching de-couples the instruction-fetch rate from the instruction-access latency. For example, an instruction may be transferred to the processor two cycles after it is requested. However, as long as instructions are supplied to the processor at an average rate of one instruction per cycle, this latency has no effect on the instruction-execution rate.

### 2.2.2.2 Branch Target Cache Memory (see Section 4.2.2)

The Am29050 microprocessor incorporates a Branch Target Cache memory which contains as many as 256 instructions. The Branch Target Cache memory is a two-way, set-associative cache containing the first target instructions of a number of recently taken branches. The Branch Target Cache memory can be configured, under
software control, to cache either two instructions for each branch or four instructions. Each of the two sets in the Branch Target Cache memory contains 128 instructions, and the 128 instructions are further divided either into 32 blocks of four instructions each or into 64 blocks of two instructions each.

The purpose of the Branch Target Cache memory is to provide instructions for the beginning of a non-sequential instruction-fetch sequence. This keeps the instruction pipeline full until the processor can establish a new instruction-prefetch stream from the external instruction memory.

The processor is organized so that branch instructions can execute in a single cycle if the target instruction sequence is present in the Branch Target Cache memory.

2.2.3.3 PROGRAM COUNTER UNIT (see Section 4.2.4)
The Program Counter Unit creates and sequences addresses of instructions as they are executed by the processor.

2.2.3 Execution Unit (see Section 4.3)
The Execution Unit executes instructions. It incorporates the Register File, the Address Unit, the Arithmetic/Logic Unit, the Field Shift Unit, the Floating-Point Unit, and the Prioritizer. The Register File and Address Unit operate during the decode stage of the pipeline. The Arithmetic/Logic Unit, Field Shift Unit, Floating-Point Unit, and Prioritizer operate during the execute stage of the pipeline. The Register File also operates during the write-back stage.

2.2.3.1 REGISTER FILE (see Section 4.3.1)
The general-purpose registers are implemented by a 192-location Register File. The Register File can perform two 64-bit read accesses and two write accesses in a single cycle. Normally, two read accesses are performed during the decode-pipeline stage to fetch operands required by the instruction being decoded. One write access during the same cycle completes the write-back stage of a previously executed integer instruction, and a second write access completes the write-back stage of a previously executed floating-point operation. The write port for integer results is 32 bits wide, and the write port for floating-point results is 64 bits wide.

Addressing logic associated with the Register File distinguishes between the global and local general-purpose registers, and it performs the Stack-Pointer addressing for the local registers. Register File addressing functions are performed during the decode stage.

2.2.3.2 ADDRESS UNIT (see Section 4.3.2)
The Address Unit evaluates addresses for branches, loads, and stores. It also assembles instruction-immediate data and computes addresses for load-multiple and store-multiple sequences.

2.2.3.3 ARITHMETIC/LOGIC UNIT (see Section 4.3.4)
The ALU performs all logical, compare, and integer arithmetic operations (including multiply step and divide step).

2.2.3.4 FIELD SHIFT UNIT (see Section 4.3.5)
The Field Shift Unit performs N-bit shifts. The Field Shift Unit also performs byte and half-word extract and insert operations, and it extracts words from double-words.
2.2.3.5 FLOATING-POINT UNIT (see Section 4.3.7)
The Floating-Point Unit performs single- and double-precision floating-point opera-
tions in accordance with the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985).

2.2.3.6 PRIORITIZER (see Section 4.3.6)
The Prioritizer provides a count of the number of leading zero bits in a 32-bit word; this is useful for performing prioritization in a multi-level interrupt handler, for example.

2.2.4 Memory Management Unit (see Section 4.4)
The Memory Management Unit (MMU) performs address translation and memory-protection functions for all branches, loads, and stores. The MMU operates during the execute stage of the pipeline, so the physical address that it generates is available at the beginning of the write-back stage.

All addresses for external accesses are physical addresses. MMU operation is pipelined with external accesses, so that an address translation can occur while a previous access completes.

Address translation is not performed for the addresses associated with instruction prefetching. Instead, these addresses are generated by an instruction prefetch pointer which is incremented by the processor. Address translation is performed only at the beginning of the prefetch sequence (as the result of a branch instruction), and when the prefetch pointer crosses a potential virtual-page boundary.

2.2.5 Processor Modes
The Am29050 microprocessor operates in several different modes to accomplish various processor and system functions. All modes except for Pipeline Hold (see below) are under direct control of instructions and/or processor control inputs. The Pipeline Hold mode normally is determined by the relative timing between the processor and its external system for certain types of operations. The processor provides an external indication of its operational mode.

2.2.5.1 EXECUTING
When the processor is in the Executing mode, it fetches and executes instructions as described in this manual. External accesses occur as required.

2.2.5.2 WAIT (see Section 3.5.3)
When the processor is in the Wait mode, it does not execute instructions, and performs no external accesses. The Wait mode is controlled by the Current Processor Status Register. The processor leaves this mode when an interrupt or trap for which it is enabled occurs, or when a reset occurs.

2.2.5.3 PIPELINE HOLD (see Section 4.5)
Under certain conditions, processor pipelining might cause non-sequential instruction execution or timing-dependent results of execution. For example, the processor might attempt to execute an instruction that has not been fetched from instruction/data memory.

For such cases, pipeline-interlock hardware detects the anomalous condition and suspends processor execution until execution can proceed properly. While execution is suspended by the interlock hardware, the processor is in the Pipeline Hold mode.
The processor resumes execution when the pipeline-interlock hardware determines that it is correct to do so.

2.2.5.4 **HALT (see Section 5.3.3)**

The Halt mode is provided so that the processor may be placed under the control of a hardware-development system (see Section 2.3.2) for the purposes of hardware and software debug. The processor enters the Halt mode as the result of instruction execution, or as the result of external controls. In the Halt mode, the processor neither fetches nor executes instructions.

2.2.5.5 **STEP (see Section 5.3.3)**

The Step mode allows a hardware-development system to step through processor pipeline operation on a stage-by-stage basis. The Step mode nearly is identical to the Halt mode, except that it enables the processor to enter the Executing mode while the pipeline advances by one stage.

2.2.5.6 **LOAD TEST INSTRUCTION (see Section 5.3.3)**

The Load Test Instruction mode permits a hardware-development system to access data contained in the processor or system. This is accomplished by allowing a hardware-development system to supply the processor with instructions, instead of having the processor fetch instructions from instruction memory. The Load Test Instruction mode is defined so that, once the processor has completed the execution of instructions provided by the hardware-development system, it may resume the execution of its normal instruction sequence.

2.2.5.7 **TEST (see Section 5.3.4)**

The Test mode facilitates testing of hardware associated with the processor by disabling processor outputs so that they may be driven directly by test hardware. The Test mode also allows the addition of a second processor to a system, to monitor the outputs of the first and signal detected errors.

2.2.5.8 **RESET (see Section 3.9 and Section 5.5)**

The Reset mode provides initialization of certain processor registers and control state. This is used for power-on reset, for eliminating unrecoverable error conditions, and for supporting certain hardware-debug functions.

2.3 **SYSTEM INTERFACE OVERVIEW**

This section briefly describes the features of the Am29050 microprocessor that allow it to be connected to other system components.

The two major interfaces of the Am29050 microprocessor, introduced in this section, are the channel and the Test/Development interface. The other topics briefly described here are clock generation, master/slave checking, and coprocessor attachment.

Section 5.1 contains a complete pin description of the Am29050 microprocessor. Appendix A contains timing diagrams and related information.

2.3.1 **Channel (see Section 5.2)**

The Am29050 microprocessor channel consists of the following 32-bit buses and related controls:

1. An Instruction Bus, which transfers instructions into the processor.
2. A Data Bus, which transfers data to and from the processor.

3. An Address Bus, which provides addresses for both instruction and data accesses. The Address Bus also is used to transfer data to a coprocessor.

The channel performs accesses and data transfers to all external devices and memories, including instruction and data memories, instruction caches, data caches, input/output devices, bus converters, and coprocessors.

The channel defines three different access protocols: simple, pipelined, and burst-mode. For simple accesses, the Am29050 microprocessor holds the address valid throughout the entire access. This is appropriate for high-speed devices that can complete an access in one cycle, and for low-cost devices that are accessed infrequently (such as read-only memories containing initialization routines). Pipelined and burst-mode accesses provide high performance with other types of devices and memories.

For pipelined accesses, the address transfer is decoupled from the corresponding data or instruction transfer. After transmitting an address for a request, the processor may transmit one more address before receiving the reply to the first request. This allows address transfer and decoding to be overlapped with another access.

On the other hand, burst-mode accesses eliminate the address-transfer cycle completely. Burst-mode accesses are defined so that once an address is transferred for a given access, subsequent accesses to sequentially increasing addresses may occur without re-transfer of the address. The burst may be terminated at any time by either the processor or responding device.

The Am29050 microprocessor determines whether an access is simple, pipelined or burst-mode on a transfer-by-transfer (i.e., generally device-by-device) basis. However, an access that begins as a simple access may be converted to a pipelined or burst-mode access at any time during the transfer. This relaxes the timing constraints on the channel-protocol implementation, since addressed devices do not have to respond immediately to a pipelined or burst-mode request.

Except for the shared Address Bus, the channel maintains a strict division between instruction and data accesses. In the most common situation, the system supplies the processor with instructions using burst-mode accesses, with instruction addresses transmitted to the system only when a branch occurs. Data accesses can occur simultaneously without interfering with instruction transfer.

The Am29050 microprocessor contains arbitration logic to support other masters on the channel. A single external master can arbitrate directly for the channel, while multiple masters may arbitrate using a daisy chain or other method that requires no additional arbitration logic. However, to increase arbitration performance in a multiple-master configuration, an external channel arbiter should be used. This arbiter works in conjunction with the processor's arbitration logic.

2.3.2 Test/Development Interface (see Section 5.3)

The Am29050 microprocessor supports the attachment of a hardware-development system such as an in-circuit emulator. This attachment is made directly to the processor in the system under development, without the removal of the processor from the system. The Test/Development Interface makes it possible for the hardware-development system to gain control over the Am29050 microprocessor, and inspect and modify its internal state (e.g., general-purpose register contents, TLB entries, etc.). In addition, the Am29050 microprocessor can be used to access other system devices and memories on behalf of the hardware-development system.
The Test/Development Interface is comprised of controls and status signals provided on the Am29050 microprocessor, as well as the Instruction and Data buses. The Halt, Step, Reset, and Load Test Instruction modes allow the hardware-development system to control the operation of the Am29050 microprocessor. The hardware-development system may supply the processor with instructions on the Instruction Bus using the Load Test Instruction mode. Internal processor state can be inspected and modified via the Data Bus.

2.3.3 Clocks (see Section 5.7)

The Am29050 microprocessor generates and distributes a system clock at its operating frequency. This clock is specially designed to reduce skews between the system clock and the processor's internal clocks. The internal clock-generation circuitry requires a single-phase oscillator signal at twice the processor operating frequency.

For systems in which processor-generated clocks are not appropriate, the Am29050 microprocessor also can accept a clock from an external clock generator.

The processor decides between these two clocking arrangements based on whether the power supply to the clock-output driver (PWRCLK) is tied to +5 volts or to GROUND.

2.3.4 Master/Slave Operation (see Section 5.8)

Each Am29050 microprocessor output has associated logic that compares the signal on the output with the signal that the processor is providing internally to the output driver. The processor signals situations where the output of any enabled driver does not agree with its input.

For a single processor, the output comparison detects short circuits in output signals, but does not detect open circuits. It is possible to connect a second processor in parallel with the first, where the second processor has its outputs disabled due to the Test mode. The second processor detects open-circuit signals, as well as provides a check of the outputs of the first processor.

2.3.5 Coprocessor Attachment (see Section 6.2)

A coprocessor for the Am29050 microprocessor attaches directly to the processor channel. However, this attachment has features that are different than those of other channel devices. The coprocessor interface is designed to support a high operand-transfer rate and to support the overlap of coprocessor operations with other processor operations, including other external accesses.

The coprocessor is assigned a special address space on the channel. This permits the transfer of operands and other information on the Address Bus without interfering with normal addressing functions. Since both the Address Bus and Data Bus are used for data transfer, the Am29050 microprocessor can transfer 64 bits of information to the coprocessor in one cycle.
This chapter contains a formal description of the Am29050 microprocessor architecture. It concentrates on the features of the Am29050 microprocessor and their logical behavior. Chapter 7 discusses the use of some of these features.

3.1 PROGRAM MODES

At any given time, the Am29050 microprocessor operates in one of three mutually exclusive program modes: the Supervisor mode, the User mode, or the Monitor mode. The Supervisor and User modes are for normal program execution; all system-protection features of the Am29050 microprocessor are based on the difference between these two modes. The Monitor mode is used for debugging.

3.1.1 Supervisor Mode

Unless it has been forced into the Monitor mode (see Section 3.7), the processor operates in the Supervisor mode whenever the Supervisor Mode (SM) bit of the Current Processor Status Register is 1 (see Section 3.2.3). In the Supervisor mode, executing programs have access to all processor resources. Virtual regions or pages mapped by the Memory Management Unit, however, are protected from Supervisor access (read, write, or execute) when the appropriate bit (SR, SW, or SE, respectively) in the corresponding TLB Entry or Region Mapping Control register is 0 (see Section 3.6.2).

During the address cycle of a channel request, the Supervisor mode is indicated by the SUP/US output being High.

3.1.2 User Mode

Unless it has been forced into the Monitor mode (see Section 3.7), the processor operates in the User mode whenever the SM bit in the Current Processor Status Register is 0. In the User mode, any of the following actions by an executing program causes a Protection Violation trap to occur:

1. An attempted access of any TLB entry (see Section 3.2.4).

2. An attempted access of any general-purpose register for which a bit in the Register Bank Protect Register is 1 (see Section 3.2.1).

3. An attempted execution of a load or store instruction for which the PA bit is 1, or for which the UA bit is 1 (see Section 3.4.4). (The attempted execution of a translated load or store for which the AS bit is 1 also causes a Protection Violation trap. However, this trap occurs regardless of whether or not the processor is in the User mode.)

4. An attempted execution of one of the following instructions: Interrupt Return, Interrupt Return and Invalidate, Invalidate, or Halt. However, a hardware-development system can disable protection checking for the Halt instruction, so
that this instruction may be used to implement instruction breakpoints in User-mode programs (see Section 5.3.3).

5. An attempted access of one of the following registers: SR0–127, SR165–255 (see Section 3.2.3).

6. An attempted execution of an assert or Emulate instruction which specifies a vector number between 0 and 63, inclusive (see Section 3.5.4).

7. An attempted access (read, write, or execute) in a virtual region or page mapped by the Memory Management Unit, when the appropriate permission bit (UR, UW, or UE, respectively) in the corresponding TLB Entry or Region Mapping Control register is 0 (see Section 3.6.2).

Devices and memories on the channel also can implement protection and generate traps based on the value of the SM bit. During the address cycle of a channel request, the User mode is indicated by the SUP/US output being Low.

### 3.1.3 Monitor Mode

The Monitor mode allows debugging of both Supervisor and User code (see Section 3.7). The processor enters the Monitor mode whenever the DA bit in the Current Processor Status register is 1, and either a valid breakpoint comparison or a trap occurs (except for a trap caused by TRAP(1–0)).

Upon entry into the Monitor mode, the read-only MM bit in the CPS register is set to 1, and, if entry was caused by a trap, the Reason Vector register is set to the trap vector number. Otherwise, the processor state is not modified. The values in the Shadow Program Counter registers are frozen.

Executing an IRET instruction causes the processor to leave the Monitor mode. The processor resumes operation at the instruction addresses contained in the Shadow Program Counter registers.

The Monitor mode can also be used by an external hardware debugger (see Section 5.3).

### 3.2 VISIBLE REGISTERS

The Am29050 microprocessor has four classes of registers that are accessible by instructions. These are general-purpose registers, floating-point accumulator registers, special-purpose registers, and Translation Look-Aside Buffer (TLB) registers. Any operation available in the Am29050 microprocessor can be performed on the general-purpose registers, while only the floating-point multiply-accumulate and multiply-sum operations use the floating-point accumulator registers. Special-purpose registers and TLB registers are accessed only by explicit data movement to or from general-purpose registers. Various protection mechanisms prevent the access of some of these registers by User-mode programs.

A summary of the information in this section appears in Appendix B.
3.2.1 General-Purpose Registers
The Am29050 microprocessor incorporates 192 general-purpose registers. The organization of the general-purpose registers is diagrammed in Figure 3-1.

Figure 3-1 General-Purpose Register Organization

<table>
<thead>
<tr>
<th>Absolute REG #</th>
<th>General-Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Indirect Pointer Access</td>
</tr>
<tr>
<td>1</td>
<td>Stack Pointer</td>
</tr>
<tr>
<td>2</td>
<td>Condition Code Accumulator</td>
</tr>
<tr>
<td>3</td>
<td>Condition Code Accumulator, Shifted</td>
</tr>
<tr>
<td>4–63</td>
<td>Not Implemented</td>
</tr>
<tr>
<td>64</td>
<td>Global Register 64</td>
</tr>
<tr>
<td>65</td>
<td>Global Register 65</td>
</tr>
<tr>
<td>66</td>
<td>Global Register 66</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>126</td>
<td>Global Register 126</td>
</tr>
<tr>
<td>127</td>
<td>Global Register 127</td>
</tr>
<tr>
<td>128</td>
<td>Local Register 125</td>
</tr>
<tr>
<td>129</td>
<td>Local Register 126</td>
</tr>
<tr>
<td>130</td>
<td>Local Register 127</td>
</tr>
<tr>
<td>131</td>
<td>Local Register 0</td>
</tr>
<tr>
<td>132</td>
<td>Local Register 1</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>254</td>
<td>Local Register 123</td>
</tr>
<tr>
<td>255</td>
<td>Local Register 124</td>
</tr>
</tbody>
</table>

Stack Pointer = 131 (Example)
General-purpose registers hold the following types of operands for program use.
1. 32-bit data addresses
2. 32-bit signed or unsigned integers
3. 32-bit branch-target addresses
4. 32-bit logical bit strings
5. 8-bit signed or unsigned characters
6. 16-bit signed or unsigned integers
7. Word-length Booleans
8. Single-precision floating-point numbers
9. Double-precision floating-point numbers (in two register locations)

Because a large number of general-purpose registers are provided, a large amount of frequently used data can be kept on-chip, where access time is fastest.

Am29050 microprocessor instructions can specify two general-purpose registers for source operands, and one general-purpose register for storing the instruction result. These registers are specified by three 8-bit instruction fields containing register numbers. A register may be specified directly by the instruction, or indirectly by one of three special-purpose registers.

3.2.1.1 REGISTER ADDRESSING

The general-purpose registers are partitioned into 64 global registers and 128 local registers, differentiated by the most-significant bit of the register number. The distinction between global and local registers is the result of register-addressing considerations.

The following terminology is used to describe the addressing of general-purpose registers:
1. Register number—this is a software-level number for a general-purpose register. For example, this is the number contained in an instruction field. Register numbers range from 0 to 255.
2. Global-register number—this is a software-level number for a global register. Global-register numbers range from 0 to 127.
3. Local-register number—this is a software-level number for a local register. Local-register numbers range from 0 to 127.
4. Absolute-register number—this is a hardware-level number used to select a general-purpose register in the Register File. Absolute-register numbers range from 0 to 255.

3.2.1.2 GLOBAL REGISTERS

When the most-significant bit of a register number is 0, a global register is selected. The seven least-significant bits of the register number give the global-register number. For global registers, the absolute-register number is equivalent to the register number.

Global registers 4 through 63 are not implemented. An attempt to access these registers yields unpredictable results; however, they may be protected from User-mode access by the Register Bank Protect Register (see below).

The register numbers associated with Global Registers 0, 1, 2, and 3 have special meaning. The number for Global Register 0 specifies that an indirect pointer is to be
used as the source of the register number; there is an indirect pointer for each of the instruction operand/result registers. Global Register 1 contains the Stack Pointer, which is used in the addressing of local registers. The Condition Code Accumulator, which is used to concatenate Boolean results from one or more operations into a single condition code, is accessed through Global Registers 2 and 3.

### 3.2.1.3 LOCAL-REGISTER STACK POINTER

The Stack Pointer is a 32-bit register that may be an operand of an instruction as any other general-purpose register. However, a shadow copy of Global Register 1 is maintained by processor hardware to be used in local-register addressing. This shadow copy is set only with the results of Arithmetic and Logical instructions. If the Stack Pointer is set with the result of any other instruction class, local registers cannot be accessed predictably until the Stack Pointer is set once again with an Arithmetic or Logical instruction.

A modification of the Stack Pointer has a delayed effect on the addressing of local registers, as discussed in Section 7.4.3.

### 3.2.1.4 CONDITION CODE ACCUMULATOR REGISTER

The Condition Code Accumulator Register is accessed through Global Registers 2 and 3. If Global Register 2 (CCA) is specified as the destination of an operation, then the 32-bit operation result is written to the Condition Code Accumulator Register. If Global Register 3 (CCA-shift) is the destination, then the Condition Code Accumulator Register is shifted left one bit and the most-significant bit of the operation result is placed in the least-significant bit of the register. The Condition Code Accumulator Register contents can be read by specifying Global Register 2 as a source operand of an operation. In this way, the Condition Code Accumulator Register can concatenate the Boolean results of several operations into a single condition code, which can then be used in subsequent operations.

### 3.2.1.5 LOCAL REGISTERS

When the most-significant bit of a register number is 1, a local register is selected. The seven least-significant bits of the register number give the local-register number. For local registers, the absolute-register number is obtained by adding the local-register number to bits 8–2 of the Stack Pointer and truncating the result to seven bits; the most-significant bit of the original register number is unchanged (i.e., it remains a 1).

The Stack Pointer addition applied to local-register numbers provides a limited form of base-plus-offset addressing within the local registers. The Stack Pointer contains the 32-bit base address. This assists run-time storage management of variables for dynamically nested procedures (see Section 7.1).

### 3.2.1.6 REGISTER BANKING

For the purpose of access restriction, the general-purpose registers are divided into register banks. Register banks consist of 16 registers (except for Bank 0, which contains registers 4 through 15), and are partitioned according to absolute-register numbers, as shown in Figure 3-2.

The Register Bank Protect Register contains 16 protection bits, where each bit controls User-mode accesses (read or write) to a bank of registers. Bits 0–15 of the Register Bank Protect Register protect register banks 0 through 15, respectively.

When a bit in the Register Bank Protect Register is 1, and a register in the corresponding bank is specified as an operand register or result register by a User-mode instruction, a Protection Violation trap occurs. Note that protection is based on
absolute-register numbers; in the case of local registers, Stack-Pointer addition is performed before protection checking.

When the processor is in Supervisor or Monitor mode, the Register Bank Protect Register has no effect on general-purpose register accesses.

### 3.2.1.7 INDIRECT ACCESSES

Specification of Global Register 0 as an instruction-operand register or result register causes an indirect access to the general-purpose registers. In this case, the absolute-register number is provided by an indirect pointer contained in a special-purpose register.

Each of the three possible registers for instruction execution has an associated 8-bit indirect pointer. Indirect register numbers can be selected independently for each of the three operands. Since the indirect pointers contain absolute-register numbers, the number in an indirect pointer is not added to the Stack Pointer when local registers are selected.

The indirect pointers are set by the Move To Special Register, DIVIDE, DIVIDU, SETIP, and EMULATE instructions. The indirect pointers are also set by Floating-Point, MULTIPLY, MULTM, MULTIPLU, and MULTMU instructions when these cause exceptions. This allows the exception handler to access the instruction operands.

For a Move-To-Special-Register instruction, an indirect pointer is set with bits 9–2 of the 32-bit source operand. This provides consistency between the addressing of words in general-purpose registers and the addressing of words in external devices or memories. A modification of an indirect pointer using a Move To Special Register has a delayed effect on the addressing of general-purpose registers, as discussed in Section 7.4.3.
For the remaining instructions, all three indirect pointers are set simultaneously with the absolute-register numbers derived from the register numbers specified by the instruction. For any local registers selected by the instruction, the Stack-Pointer addition is applied to the register numbers before the indirect pointers are set.

Register numbers stored into the indirect pointers are checked for bank-protection violations—except when an indirect pointer is set by a Move-To-Special-Register instruction—at the time that the indirect pointers are set.

### 3.2.2 Floating-Point Accumulator Registers

Four 64-bit Accumulator Registers ACC(3–0) are provided for use with the floating-point multiply-accumulate (FMAC, DMAC) and multiply-sum (FMSM, DMSM) operations. These registers can contain either single- or double-precision floating-point numbers.

The Accumulator Registers are written with the Move To Accumulator (MTACC) instruction and read with the Move From Accumulator (MFACC) instruction. Any of the four Accumulator Registers can be used as a source or destination for the multiply-accumulate operations. ACC0 can also be used as a source for the multiply-sum operations (see Section 3.3.7).

### 3.2.3 Special-Purpose Registers

The Am29050 microprocessor contains 39 special-purpose registers. The organization of the special-purpose registers is shown in Figure 3-3.

Special-purpose registers provide controls and data for certain processor operations. Some special-purpose registers are updated dynamically by the processor, independent of software controls. Because of this, a read of a special-purpose register following a write does not necessarily get the data that was written.

Some special-purpose registers have fields that are reserved for future processor implementations. When a special-purpose register is read, a bit in a reserved field is read as a 0. An attempt to write a reserved bit with a 1 has no effect; however, this should be avoided because of upward-compatibility considerations.

The special-purpose registers are accessed by explicit data movement only. Instructions that move data to or from a special-purpose register specify the special-purpose register by an 8-bit field containing a special-purpose register number. Register numbers are specified directly by instructions.

The special-purpose registers are partitioned into protected and unprotected registers. Special-purpose registers numbered 0–127 and 165–255 are protected (note that not all of these are implemented). Special-purpose registers numbered 128–164 are unprotected (again, not all are implemented).

An attempted read of an unimplemented special-purpose register yields an unpredictable value. An attempted write of an unimplemented, protected special-purpose register has an unpredictable effect on processor operation. An attempted write of an unimplemented, unprotected special-purpose register has no effect; however, this should be avoided because of upward-compatibility considerations.

Unprotected special-purpose registers are accessible by programs executing in the User, Supervisor, and Monitor modes.
## Figure 3-3 Special-Purpose Registers

<table>
<thead>
<tr>
<th>Register Number</th>
<th>Protected Registers</th>
<th>Mnemonic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Vector Area Base Address</td>
<td>VAB</td>
</tr>
<tr>
<td>1</td>
<td>Old Processor Status</td>
<td>OPS</td>
</tr>
<tr>
<td>2</td>
<td>Current Processor Status</td>
<td>CPS</td>
</tr>
<tr>
<td>3</td>
<td>Configuration</td>
<td>CFG</td>
</tr>
<tr>
<td>4</td>
<td>Channel Address</td>
<td>CHA</td>
</tr>
<tr>
<td>5</td>
<td>Channel Data</td>
<td>CHD</td>
</tr>
<tr>
<td>6</td>
<td>Channel Control</td>
<td>CHC</td>
</tr>
<tr>
<td>7</td>
<td>Register Bank Protect</td>
<td>RBP</td>
</tr>
<tr>
<td>8</td>
<td>Timer Counter</td>
<td>TMC</td>
</tr>
<tr>
<td>9</td>
<td>Timer Reload</td>
<td>TMR</td>
</tr>
<tr>
<td>10</td>
<td>Program Counter 0</td>
<td>PC0</td>
</tr>
<tr>
<td>11</td>
<td>Program Counter 1</td>
<td>PC1</td>
</tr>
<tr>
<td>12</td>
<td>Program Counter 2</td>
<td>PC2</td>
</tr>
<tr>
<td>13</td>
<td>MMU Configuration</td>
<td>MMU</td>
</tr>
<tr>
<td>14</td>
<td>LRU Recommendation</td>
<td>LRU</td>
</tr>
<tr>
<td>15</td>
<td>Reason Vector</td>
<td>RSN</td>
</tr>
<tr>
<td>16</td>
<td>Region Mapping Address 0</td>
<td>RMA0</td>
</tr>
<tr>
<td>17</td>
<td>Region Mapping Control 0</td>
<td>RMC0</td>
</tr>
<tr>
<td>18</td>
<td>Region Mapping Address 1</td>
<td>RMA1</td>
</tr>
<tr>
<td>19</td>
<td>Region Mapping Control 1</td>
<td>RMC1</td>
</tr>
<tr>
<td>20</td>
<td>Shadow Program Counter 0</td>
<td>SPC0</td>
</tr>
<tr>
<td>21</td>
<td>Shadow Program Counter 1</td>
<td>SPC1</td>
</tr>
<tr>
<td>22</td>
<td>Shadow Program Counter 2</td>
<td>SPC2</td>
</tr>
<tr>
<td>23</td>
<td>Instruction Breakpoint Address 0</td>
<td>IBA0</td>
</tr>
<tr>
<td>24</td>
<td>Instruction Breakpoint Control 0</td>
<td>IBC0</td>
</tr>
<tr>
<td>25</td>
<td>Instruction Breakpoint Address 1</td>
<td>IBA1</td>
</tr>
<tr>
<td>26</td>
<td>Instruction Breakpoint Control 1</td>
<td>IBC1</td>
</tr>
</tbody>
</table>

## Unprotected Registers

<table>
<thead>
<tr>
<th>Register Number</th>
<th>Protected Registers</th>
<th>Mnemonic</th>
</tr>
</thead>
<tbody>
<tr>
<td>128</td>
<td>Indirect Pointer C</td>
<td>IPC</td>
</tr>
<tr>
<td>129</td>
<td>Indirect Pointer A</td>
<td>IPA</td>
</tr>
<tr>
<td>130</td>
<td>Indirect Pointer B</td>
<td>IPB</td>
</tr>
<tr>
<td>131</td>
<td>Q</td>
<td>Q</td>
</tr>
<tr>
<td>132</td>
<td>ALU Status</td>
<td>ALU</td>
</tr>
<tr>
<td>133</td>
<td>Byte Pointer</td>
<td>BP</td>
</tr>
<tr>
<td>134</td>
<td>Funnel Shift Count</td>
<td>FC</td>
</tr>
<tr>
<td>135</td>
<td>Load/Store Count Remaining</td>
<td>CR</td>
</tr>
<tr>
<td>160</td>
<td>Floating-Point Environment</td>
<td>FPE</td>
</tr>
<tr>
<td>161</td>
<td>Integer Environment</td>
<td>INTE</td>
</tr>
<tr>
<td>162</td>
<td>Floating-Point Status</td>
<td>FPS</td>
</tr>
<tr>
<td>164</td>
<td>Exception Opcode</td>
<td>EXOP</td>
</tr>
</tbody>
</table>
3.2.3.1 VECTOR AREA BASE ADDRESS (VAB, REGISTER 0)

This protected special-purpose register (see Figure 3-4) specifies the beginning address of the interrupt/trap Vector Area. The Vector Area is either a table of 256 vectors which points to interrupt and trap handling routines, or a segment of 256, 64-instruction blocks which directly contain the interrupt and trap handling routines.

Figure 3-4 Vector Area Base Address Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>VAB</td>
<td>0 0 0 0 0 0 0 0 0</td>
</tr>
</tbody>
</table>

The organization of the Vector Area is determined by the Vector Fetch (VF) bit of the Configuration Register. If the VF bit is 1 when an interrupt or trap is taken, the vector number for the interrupt or trap (see Section 3.5.4) replaces bits 9–2 of the value in the Vector Area Base Address Register to generate the physical address for a vector contained in instruction/data memory.

If the VF bit is 0, the vector number replaces bits 15–8 of the value in the Vector Area Base Address Register to generate the physical address of the first instruction of the interrupt or trap handler. The instruction fetch for this instruction is directed either to instruction memory or instruction read-only memory as determined by the ROM Vector Area (RV) bit of the Configuration Register.

Bits 31–10: Vector Area Base (VAB)—The VAB field gives the beginning physical address of the Vector Area. This address is constrained to begin on a 1-kb address boundary in instruction/data memory or instruction read-only memory. If the Vector Area is an instruction segment, bits 10–15 are ignored, and the alignment is forced to a 64-kb boundary.

Bits 9–0: Zeros—These bits force the alignment of the Vector Area to a 1-kb boundary.

3.2.3.2 OLD PROCESSOR STATUS (OPS, REGISTER 1)

This protected special-purpose register has the same format as the Current Processor Status described below. The Old Processor Status stores a copy of the Current Processor Status when an interrupt or trap is taken. This is required since the Current Processor Status will be modified to reflect the status of the interrupt/trap handler.

During an interrupt return, the Old Processor Status is copied into the Current Processor Status. This allows the Current Processor Status to be set as required for the routine that is the target of the interrupt return.

3.2.3.3 CURRENT PROCESSOR STATUS (CPS, REGISTER 2)

This protected special-purpose register (see Figure 3-5) controls the behavior of the processor and its ability to recognize exceptional events.

Bits 31–17: Reserved.

Bit 16: Monitor Mode (MM)—This read-only bit is set by the processor upon entry into the monitor mode, and reset on exit. The MM bit has no counterpart in the Old Processor Status Register.

Bit 15: Coprocessor Active (CA)—The CA bit is set and reset under the control of load and store instructions that transfer information to and from a coprocessor. This
bit indicates that the coprocessor is performing an operation at the time that an inter-
rupt or trap is taken. This notifies the interrupt or trap handler that the coprocessor
contains state information to be preserved. Note that this notification occurs because
the CA bit of the Old Processor Status is 1 in this case, not because of the value of
the CA bit of the Current Processor Status.

**Bit 14: Interrupt Pending (IP)**—This bit allows software to detect the presence of
external interrupts while they are disabled. The IP bit is set if one or more of the exter-
nal signals INTR(3–0) is active, but the processor is disabled from taking the resulting
interrupt due to the value of the DA, DI, or IM bits. If all external interrupt signals
subsequently are de-asserted while still disabled, the IP bit is reset.

**Bits 13–12: Trace Enable, Trace Pending (TE, TP)**—The TE and TP bits implement
a software-controlled, instruction single-step facility. Single stepping is not imple-
mented directly, but rather emulated by trap sequences controlled by these bits. The
value of the TE bit is copied to the TP bit whenever an instruction completes execu-
tion. When the TP bit is 1, a Trace trap occurs. Section 3.7.1 describes the use of
these bits in more detail.

**Bit 11: Trap Unaligned Access (TU)**—The TU bit enables checking of address
alignment for external data-memory accesses. When this bit is 1, an Unaligned Ac-
cess trap occurs if the processor either generates an address for an external word
that is not aligned on a word address-boundary (i.e., either of the least-significant
two bits is 1), or generates an address for an external half-word that is not aligned on a
half-word address boundary (i.e., the least-significant address bit is 1). When the TU
bit is 0, data-memory address alignment is ignored.

Alignment is ignored for input/output accesses and coprocessor transfers. The align-
ment of instruction addresses is also ignored (unaligned instruction addresses can be
generated only by indirect jumps). Interrupt/trap vector addresses always are aligned
properly.

**Bit 10: Freeze (FZ)**—The FZ bit prevents certain registers from being updated during
interrupt and trap processing, except by explicit data movement. The affected regis-
ters are: Channel Address, Channel Data, Channel Control, Program Counter 0,
Program Counter 1, Program Counter 2, and the ALU Status Register.

When the FZ bit is 1, these registers hold their values. An affected register can be
changed only by a Move-To-Special-Register instruction. When the FZ bit is 0, there
is no effect on these registers, and they are updated by processor instruction execu-
tion as described in this manual.

The FZ bit is set whenever an interrupt or trap is taken, holding critical state in the
processor so that it is not modified unintentionally by the interrupt or trap handler.

---

**Figure 3-5 Current Processor Status Register**

```
<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reserved</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CA</td>
<td>TE</td>
<td>TU</td>
<td>LM*</td>
<td>PI</td>
</tr>
<tr>
<td>MM</td>
<td>IP</td>
<td>TP</td>
<td>FZ</td>
<td>RE</td>
</tr>
<tr>
<td>PD</td>
<td>SM</td>
<td>DI</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

3-10 PROGRAMMER REFERENCE
**Bit 9: Lock (LK)**—The LK bit controls the value of the LOCK external signal. If the LK bit is 1, the LOCK signal is active. If the LK bit is 0, the LOCK signal is controlled by the execution of the instructions Load and Set, Load and Lock, and Store and Lock. This bit is provided for the implementation of multi-processor synchronization protocols.

**Bit 8: ROM Enable (RE)**—The RE bit enables instruction fetching from external instruction read-only memory (ROM). When this bit is 1, the IREQT signal directs all instruction requests to ROM. Instructions that are fetched from ROM are subject to capture and re-use by the Branch Target Cache memory when it is enabled; the Branch Target Cache memory distinguishes between instructions from ROM and those from non-ROM storage. When this bit is 0, off-chip requests for instructions are directed to instruction/data memory.

**Bit 7: WAIT Mode (WM)**—The WM bit places the processor in the Wait mode. When this bit is 1, the processor performs no operations. The Wait mode is reset by an interrupt or trap for which the processor is enabled, or by the Reset mode.

**Bit 6: Physical Addressing/Data (PD)**—The PD bit determines whether address translation is performed for load or store operations. Address translation is performed for an access only when this bit is 0, and the Physical Address (PA) bit in the load or store instruction causing the access is also 0.

**Bit 5: Physical Addressing/Instructions (PI)**—The PI bit determines whether address translation is performed for external instruction accesses. Address translation is performed only when this bit is 0.

**Bit 4: Supervisor Mode (SM)**—The SM bit protects certain processor context, such as protected special-purpose registers. When this bit is 1, the processor is in the Supervisor mode, and access to all processor context is allowed. When this bit is 0, the processor is in the User mode, and access to protected processor context is not allowed; an attempt to access (either read or write) protected processor context causes a Protection Violation trap.

Section 3.1 describes the processor state protected from User-mode access.

For an external access, the User Access (UA) bit in the load or store instruction also controls access to protected processor context. When the UA bit is 1, the Memory Management Unit and channel perform the access as if the program causing the access were in User mode.

**Bits 3-2: Interrupt Mask (IM)**—The IM field is an encoding of the processor priority with respect to external interrupts. The interpretation of the interrupt mask is specified by the following table:

<table>
<thead>
<tr>
<th>IM Value</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0</td>
<td>INTR0 enabled</td>
</tr>
<tr>
<td>0 1</td>
<td>INTR(1-0) enabled</td>
</tr>
<tr>
<td>1 0</td>
<td>INTR(2-0) enabled</td>
</tr>
<tr>
<td>1 1</td>
<td>INTR(3-0) enabled</td>
</tr>
</tbody>
</table>

**Bit 1: Disable Interrupts (DI)**—The DI bit prevents the processor from being interrupted by external interrupt requests INTR(3-0). When this bit is 1, the processor ignores all external interrupts. However, note that traps (both internal and external), Timer interrupts, and Trace traps will be taken. When this bit is 0, the processor will take any interrupt enabled by the IM field, unless the DA bit is 1.
3.2.3.4

**CONFIGURATION (CFG, REGISTER 3)**

This protected special-purpose register (see Figure 3-6) controls certain processor and system options. Most fields normally are modified only during system initialization. The Configuration Register is defined as follows:

<table>
<thead>
<tr>
<th>Figure 3-6</th>
<th>Configuration Register</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Bit 31  23  15  7  0</td>
</tr>
<tr>
<td></td>
<td>PRL       Reserved</td>
</tr>
<tr>
<td></td>
<td>EE        DW           RV</td>
</tr>
<tr>
<td></td>
<td>CO        VF           BO</td>
</tr>
<tr>
<td></td>
<td>CD</td>
</tr>
</tbody>
</table>

**Bits 31–24: Processor Release Level (PRL)**—The PRL field is an 8-bit, read-only identification number which specifies the processor version.

**Bits 23–8: Reserved.**

**Bit 7: Early Load Enable (EE)**—The EE bit determines whether the Early Load facility is enabled. When this bit is 1, early loads are permitted to take place; when this bit is 0, the generation of early load addresses by either the Physical Address Cache or the Early Address Generator is disabled.

**Bit 6: Branch Target Cache Memory Organization (CO)**—The CO bit determines the organization of the Branch Target Cache memory (BTC memory). When this bit is 0, the BTC memory is organized into 64 entries of 4 instructions each. When this bit is 1, the BTC memory is organized into 128 entries of 2 instructions each. The CO bit is initialized to 0 on reset.

**Bit 5: Data Width (DW)**—The DW bit enables and disables byte and half-word external accesses. If the DW bit is 0, byte and half-word accesses are not performed in hardware, and these accesses must be emulated by software. If the DW bit is 1, byte and half-word accesses are performed by hardware: this requires that external devices and memories be able to write individual bytes and half-words within a word. The DW bit is initialized to 0 on reset.

**Bit 4: Vector Fetch (VF)**—The VF bit determines the structure of the interrupt/trap Vector Area. If this bit is 1, the Vector Area is defined as a block of 256 vectors which specify the beginning addresses of the interrupt and trap handling routines. If the VF bit is 0, the Vector Area is a segment of 256 64-instruction blocks that contain the actual routines.

**Bit 3: ROM Vector Area (RV)**—If the VF bit is 0, the RV bit specifies whether the Vector Area is contained in instruction memory (RV = 0) or instruction read-only memory (RV = 1). The value of the RV bit is irrelevant if the VF bit is 1.
Bit 2: Byte Order (BO)—The BO bit determines the ordering of bytes and half-words within words. If the BO bit is 0, bytes and half-words are numbered left-to-right within a word. If the BO bit is 1, bytes and half-words are numbered right-to-left. Section 3.4.5 describes the interpretation of the BO bit in more detail.

Bit 1: Coprocessor Present (CP)—The CP bit indicates the presence of a coprocessor that may be used by the processor. If this bit is 1, it enables the execution of load and store instructions that have a Coprocessor Enable (CE) bit of 1. If the CP bit is 0 and the processor attempts to execute a load or store instruction with a CE bit of 1, a Coprocessor Not Present trap occurs. This feature may be used to emulate coprocessor operations as well as to protect the state of a coprocessor shared between multiple processes.

Bit 0: Branch Target Cache Memory Disable (CD)—The CD bit determines whether or not the Branch Target Cache memory is used for non-sequential instruction references. When this bit is 1, all instruction references are directed to external instruction memory or instruction ROM, and the Branch Target Cache memory is not used. When this bit is 0, the targets of non-sequential instruction fetches are stored in the Branch Target Cache memory and re-used as described in Section 4.2.2. The value of the CD bit does not take effect until the execution of the next branch instruction. The CO bit is initialized to 1 on reset.

3.2.3.5 CHANNEL ADDRESS (CHA, REGISTER 4)
This protected special-purpose register (Figure 3-7) is used to report exceptions during external accesses or coprocessor transfers. It also is used to restart interrupted load-multiple and store-multiple operations, and to restart other external accesses when possible (e.g., after TLB misses are serviced). The restarting of external accesses is described in Section 7.3.4.

![Figure 3-7 Channel Address Register](image)

The Channel Address Register is updated on the execution of every load or store instruction, and on every load or store in a load-multiple or store-multiple sequence, except when the Freeze (FZ) bit in the Current Processor Status Register is 1.

Bits 31–0: Channel Address (CHA)—This field contains the address of the current channel transaction (if the FZ bit of the Current Processor Status Register is 0). For external data accesses, the address is virtual if address translation was enabled for the access, or physical if translation was disabled. For transfers to the coprocessor, the CHA field contains data transferred to the coprocessor.

3.2.3.6 CHANNEL DATA (CHD, REGISTER 5)
This protected special-purpose register (Figure 3-8) is used to report exceptions during external accesses or coprocessor transfers. It is also used to restart the first store of an interrupted store-multiple operation and to restart other external accesses when possible (e.g., after TLB misses are serviced). The restarting of external accesses is described in Section 7.3.4.
The Channel Data Register is updated on the execution of every load or store instruction, and on every load or store in a load-multiple or store-multiple sequence, except when the Freeze (FZ) bit in the Current Processor Status Register is 1. When the Channel Data Register is updated for a load operation, the resulting value is unpredictable.

**Bits 31–0: Channel Data (CHD)**—This field contains the data (if any) associated with the current channel transaction (if the FZ bit of the Current Processor Status Register is 0). If the current channel transaction is not a store or a transfer to the coprocessor, the value of this field is irrelevant.

### 3.2.3.7 CHANNEL CONTROL (CHC, REGISTER 6)

This protected special-purpose register (Figure 3-9) is used to report exceptions during external accesses or coprocessor transfers. It also is used to restart interrupted load-multiple and store-multiple operations, and to restart other external accesses when possible (e.g., after TLB misses are serviced). The restarting of external accesses is described in Section 7.3.4.

The Channel Control Register is updated on the execution of every load or store instruction, and on every load or store in a load-multiple or store-multiple sequence, except when the Freeze (FZ) bit in the Current Processor Status Register is 1.

**Bits 31–24:**—These bits are a direct copy of bits 23–16 from the load or store instruction which started the current channel transaction (see Section 3.4.4 and Section 6.1.2).

**Bits 23–16:** **Load/Store Count Remaining (CR)**—The CR field indicates the remaining number of transfers for a load-multiple or store-multiple operation that encountered an exception or was interrupted before completion. This number is zero-based; for example, a value of 28 in this field indicates that 29 transfers remain to be completed.

**Bit 15:** **Load/Store (LS)**—The LS bit is 0 if the channel transaction is a store operation, and 1 if it is a load operation.

**Bit 14:** **Multiple Operation (ML)**—The ML bit is 1 if the current channel transaction is a partially-complete load-multiple or store-multiple operation; otherwise it is 0.
Blit 13: Set (ST)—The ST bit is 1 if the current channel transaction is for a Load and Set instruction; otherwise it is 0.

Blit 12: Lock Active (LA)—The LA bit is 1 if the current channel transaction is for a Load and Lock or Store and Lock instruction; otherwise it is 0. Note that this bit is not set as the result of the Lock (LK) bit in the Current Processor Status Register.

Blit 11: Reserved.

Blit 10: Transaction Faulted (TF)—The TF bit indicates that the current channel transaction did not complete due to some exceptional circumstance. This bit is set only for exceptions reported via the DErr input, and it causes a Data Access Exception or Coprocessor Exception trap to occur (depending on the value of the CE bit) when it is 1.

The TF bit allows the proper sequencing of externally reported errors that get preempted by higher-priority traps (see Section 3.5.8). It is reset by software that handles the resulting trap.

Bits 9–2: Target Register (TR)—The TR field indicates the absolute-register number of data operand for the current transaction (either a load target or store data source). Since the register-number in this field is absolute, it reflects the Stack-Pointer addition when the indicated register is a local register.

Bit 1: Not Needed (NN)—The NN bit indicates that, even though the Channel Address, Channel Data, and Channel Control registers contain a valid representation of an incomplete load operation, the data requested is not needed. This situation arises when a load instruction is overlapped with an instruction which writes the load target register.

Bit 0: Contents Valid (CV)—The CV bit indicates that the contents of the Channel Address, Channel Data, and Channel Control registers are valid.

3.2.3.8 REGISTER BANK PROTECT (RBP, REGISTER 7)

This protected special-purpose register (Figure 3-10) protects banks of general-purpose registers from User-mode program accesses.

<table>
<thead>
<tr>
<th>Figure 3-10</th>
<th>Register Bank Protect Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7 0</td>
<td>Reserved B15 .................... 0</td>
</tr>
</tbody>
</table>

The general-purpose registers are partitioned into 16 banks of 16 registers each (except that Bank 0 contains 12 registers). The banks are organized as shown in Figure 3-2 of Section 3.2.1.

Bits 31–16: Reserved.

Bits 15–0: Bank 15 through Bank 0 Protection Bits (B15–B0)—In the Register Bank Protect Register, each bit is associated with a particular bank of registers, and the bit number gives the associated bank number (e.g., B11 determines the protection for Bank 11).

When a protection bit is 1, the corresponding bank is protected from access by programs executing in the User mode. A Protection Violation trap occurs when a User-mode program attempts to access (either read or write) a register in a protected bank.
When a bit in this register is 0, the corresponding bank is available to programs executing in the User mode.

Supervisor-mode and Monitor-mode programs are not affected by the Register Bank Protect Register.

Register protection is based on absolute-register numbers. For local registers, the protection checking is performed after the Stack-Pointer addition is performed.

3.2.3.9 TIMER COUNTER (TMC, REGISTER 8)

This protected special-purpose register (Figure 3-11) contains the counter for the Timer Facility.

Figure 3-11 Timer Counter Register

31 23 15 7 0

| Reserved | TCV |

Bits 31–24: Reserved.

Bits 23–0: Timer Count Value (TCV)—The 24-bit TCV field decrements by one on each processor clock. When the TCV field decrements to zero, it is reloaded with the content of the Timer Reload Value field in the Timer Reload Register. At this time, the Interrupt bit in the Timer Reload Register is set.

3.2.3.10 TIMER RELOAD (TMR, REGISTER 9)

This protected special-purpose register (Figure 3-12) maintains synchronization of the Timer Counter Register, enables Timer interrupts, and maintains Timer Facility status information.

Figure 3-12 Timer Reload Register

31 23 15 7 0

| Reserved | TRV |

| Reserved | OV | IE | IN |

Bits 31–27: Reserved.

Bit 26: Overflow (OV)—The OV bit indicates that a Timer interrupt occurred before a previous Timer interrupt was serviced. It is set if the Interrupt (IN) bit is 1 (see below) when the Timer Count Value (TCV) field of the Timer Counter Register decrements to zero. In this case, a Timer interrupt caused by the IN bit has not been serviced when another interrupt is created.

Bit 25: Interrupt (IN)—The IN bit is set whenever the TCV field decrements to zero. If this bit is 1 and the IE bit is also 1, a Timer interrupt occurs. Note that the IN bit is set when the TCV field decrements to zero, regardless of the value of the IE bit. The IN bit is reset by software that handles the Timer interrupt.
The TCV field is zero-based with respect to the Timer interrupt interval; for example, a value of 28 in the TCV field causes the IN bit to be set in the 29th subsequent processor cycle. The reason for this is that the TCV field is zero for a complete cycle before the IN bit is set.

**Bit 24: Interrupt Enable (IE)**—When the IE bit is 1, the Timer interrupt is enabled, and the Timer interrupt occurs whenever the IN bit is 1. When this bit is 0, the Timer interrupt is disabled. Note that Timer interrupts may be disabled by the DA bit of the Current Processor Status Register regardless of the value of the IE bit.

**Bits 23–0: Timer Reload Value (TRV)**—The value of this field is written into the Timer Count Value (TCV) field of the Timer Counter Register when the TCV field decrements to zero.

### 3.2.3.11 PROGRAM COUNTER 0 (PC0, REGISTER 10)

This protected special-purpose register (Figure 3-13) is used, on an interrupt return, to restart the instruction which was in the decode stage when the original interrupt or trap was taken.

#### Figure 3-13 Program Counter 0 Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PC0</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0 0</td>
</tr>
</tbody>
</table>

**Bits 31–2: Program Counter 0 (PC0)**—This field captures the word-address of an instruction as it enters the decode stage of the processor pipeline, unless the Freeze (FZ) bit of the Current Processor Status Register is 1. If the FZ bit is 1, PC0 holds its value.

When an interrupt or trap is taken, the PC0 field contains the word-address of the instruction in the decode stage; the interrupt or trap has prevented this instruction from executing. The processor uses the PC0 field to restart this instruction on an interrupt return.

**Bits 1–0: Zeros**—These bits are zero, since instruction addresses are always word aligned.

### 3.2.3.12 PROGRAM COUNTER 1 (PC1, REGISTER 11)

This protected special-purpose register (Figure 3-14) is used, on an interrupt return, to restart the instruction that was in the execute stage when the original interrupt or trap was taken.

#### Figure 3-14 Program Counter 1 Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PC1</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0 0</td>
</tr>
</tbody>
</table>

**Bits 31–2: Program Counter 1 (PC1)**—This field captures the word-address of an instruction as it enters the execute stage of the processor pipeline, unless the Freeze (FZ) bit of the Current Processor Status Register is 1. If the FZ bit is 1, PC1 holds its value.
When an interrupt or trap is taken, the PC1 field contains the word-address of the instruction in the execute stage; the interrupt or trap has prevented this instruction from completing execution. The processor uses the PC1 field to restart this instruction on an interrupt return.

**Bits 1–0:** Zeros—These bits are zero, since instruction addresses are always word aligned.

### 3.2.3.13 PROGRAM COUNTER 2 (PC2, REGISTER 12)

This protected special-purpose register (Figure 3-15) reports the address of certain instructions causing traps.

**Figure 3-15 Program Counter 2 Register**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PC2</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0 0</td>
</tr>
</tbody>
</table>

**Bits 31–2:** Program Counter 2 (PC2)—This field captures the word address of an instruction as it enters the write-back stage of the processor pipeline, unless the Freeze (FZ) bit of the Current Processor Status Register is 1. If the FZ bit is 1, PC2 holds its value.

When an interrupt or trap is taken, the PC2 field contains the word address of the instruction in the write-back stage. In certain cases, as described in Section 3.5.9, PC2 contains the address of the instruction causing a trap. The PC2 field is used to report the address of this instruction, and has no other use in the processor.

**Bits 1–0:** Zeros—These bits are zero, since instruction addresses are always word aligned.

### 3.2.3.14 MMU CONFIGURATION (MMU, REGISTER 13)

This protected special-purpose register (Figure 3-16) specifies parameters associated with the Memory Management Unit (MMU).

**Figure 3-16 MMU Configuration Register**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PS</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>PID</td>
</tr>
</tbody>
</table>

**Bits 31–10:** Reserved.

**Bits 9–8:** Page Size (PS)—The PS field specifies the page size for address translation. The page size affects translation as discussed in Section 3.6.2. The PS field has a delayed effect on address translation (see Section 3.6.2). At least one cycle of delay must separate an instruction which sets the PS field and an instruction that performs address translation. The PS field is encoded as follows:

<table>
<thead>
<tr>
<th>PS</th>
<th>Page Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0</td>
<td>1 kb</td>
</tr>
<tr>
<td>0 1</td>
<td>2 kb</td>
</tr>
<tr>
<td>1 0</td>
<td>4 kb</td>
</tr>
<tr>
<td>1 1</td>
<td>8 kb</td>
</tr>
</tbody>
</table>
Bits 7–0: Process Identifier (PID)—For translated User-mode loads and stores, this 8-bit field is compared to Task Identifier (TID) fields in Translation Look-Aside Buffer entries when address translation is performed. For the address translation to be valid, the PID field must match the TID field in an entry. This allows a separate 32-bit virtual-address space to be allocated to each active User-mode process (within the limit of 255 such processes). Translated Supervisor-mode and Monitor-mode loads and stores use a fixed process identifier of zero, and require that the TID field be zero for successful translation.

### LRU RECOMMENDATION (LRU, REGISTER 14)

This protected special-purpose register (Figure 3-17) assists Translation Look-Aside Buffer (TLB) re-loading by indicating the least-recently used TLB entry in the required replacement line.

#### Figure 3-17 LRU Recommendation Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>LRU</td>
</tr>
</tbody>
</table>

- **Bits 31–7**: Reserved.
- **Bits 6–1**: Least-Recently Used Entry (LRU)—The LRU field is updated whenever a TLB miss occurs during an address translation. It gives the TLB register number of the TLB entry selected for replacement. The LRU field also is updated whenever a memory-protection violation occurs; however, it has no interpretation in this case.
- **Bit 0**: Zero—The appended 0 serves to identify Word 0 of the TLB entry.

### REASON VECTOR (RSN, REGISTER 15)

This protected special-purpose register (Figure 3-18) reports the cause of a trap into the Monitor Mode.

#### Figure 3-18 Reason Vector Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>RSN</td>
</tr>
</tbody>
</table>

- **Bits 31–8**: Reserved.
- **Bits 7–0**: Reason Vector (RSN)—The RSN field is set whenever a Monitor trap occurs (see Section 3.5.7). The RSN field is set to the vector number of the trap which would have been taken had the Monitor trap not been taken.

### REGION MAPPING ADDRESS 0 (RMA0, REGISTER 16)

This protected special-purpose register (Figure 3-19) specifies a mapping from a region of virtual address space to physical address space. Together with the Region Mapping Control 0 Register, it controls the Region Mapping Unit 0.

- **Bits 31–16**: Virtual Base Address (VBA)—The VBA field defines the base address of the virtual region to be mapped. The most-significant bits of this field are compared
to the corresponding bits of the virtual address during address translation. The number of bits compared is determined by the size of the virtual region, as defined by the Region Size field of the Region Mapping Control 0 Register. All unused bits of the VBA field must be 0.

**Bits 15–0: Physical Base Address (PBA)**—The PBA field defines the base address of the physical region. When an address translation is performed, the most-significant bits of this field replace the corresponding bits of the virtual address. The number of bits replaced is determined by the size of the virtual region, as defined by the Region Size field of the Region Mapping Control 0 Register. All unused bits of the PBA field must be 0.

### 3.2.3.18 REGION MAPPING CONTROL 0 (RMCO, REGISTER 17)

This protected special-purpose register (Figure 3-20) contains control information associated with the mapping specified by the Region Mapping Address 0 Register. Together with Region Mapping Address 0 Register, it controls the Region Mapping Unit 0.

**Figure 3-20** Region Mapping Control 0 Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reserved</td>
<td>PGM</td>
<td>RGS</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- **Bits 31–24:** Reserved.
- **Bits 23–22:** User-Programmable (PGM)—These bits are placed on the MPGM(1–0) outputs when a translated address is transmitted for an access. They have no predefined effect on the access; any effect is defined by logic external to the processor.
- **Bit 21:** Reserved.
- **Bits 20–17:** Region Size (RGS)—The RGS field defines the size of the virtual region. The value in the RGS field is the number of low-order address bits which are ignored in virtual address comparisons and physical address substitutions. Thus, if the RGS value is 0000, the size of the virtual region is 64 kb; if the RGS value is 0001, the size of the virtual region is 128 kb; and so on, up to an RGS value of 1111 and a maximum virtual region size of 2 Gb.
- **Bit 16:** Input/Output Address Space (IO)—When the IO bit is 1, a valid translation results in an access to the input/output address space. When the IO bit is 0, the access is performed in the instruction/data memory address space.
Bit 15: Reserved.

Bit 14: Valid Entry (VE)—If the VE bit is 1, Region Mapping Address 0 Register specifies a valid translation. If the VE bit is 0, the translation is invalid.

Bit 13: Supervisor Read (SR)—When the SR bit is 1, Supervisor-mode load operations to the virtual region are permitted. When the SR bit is 0, such loads are not permitted, and any attempt is trapped with a Data MMU Protection Violation.

Bit 12: Supervisor Write (SW)—When the SW bit is 1, Supervisor-mode store operations to the virtual region are permitted. When the SW bit is 0, such stores are not permitted, and any attempt is trapped with a Data MMU Protection Violation.

Bit 11: Supervisor Execute (SE)—When the SE bit is 1, Supervisor-mode instruction accesses to the virtual region are permitted. When the SE bit is 0, such accesses are not permitted, and any attempt is trapped with an Instruction MMU Protection Violation.

Bit 10: User Read (UR)—When the UR bit is 1, User-mode load operations to the virtual region are permitted. When the UR bit is 0, such loads are not permitted, and any attempt is trapped with a Data MMU Protection Violation.

Bit 9: User Write (UW)—When the UW bit is 1, User-mode store operations to the virtual region are permitted. When the UW bit is 0, such stores are not permitted, and any attempt is trapped with a Data MMU Protection Violation.

Bit 8: User Execute (UE)—When the UE bit is 1, User-mode instruction accesses to the virtual region are permitted. When the UE bit is 0, such accesses are not permitted, and any attempt is trapped with an Instruction MMU Protection Violation.

Bits 7–0: Task Identifier (TID)—The Task Identifier field allows Region Mapping Address Unit 0 to be associated with a particular process. For a translation to be valid, the TID field must match the Process Identifier (PID) in the MMU Configuration Register. If the Task Identifier is zero, however, any otherwise-valid Supervisor-mode or Monitor-mode access is allowed, even if the Process Identifier is not zero.

3.2.3.19 REGION MAPPING ADDRESS 1 (RMA1, REGISTER 18)

This protected special-purpose register specifies a mapping from a region of virtual address space to physical address space. Together with the Region Mapping Control 1 Register, it controls the Region Mapping Unit 1.

The structure of the Region Mapping Address 1 Register is identical to that of the Region Mapping Address 0 Register (Figure 3-19).

3.2.3.20 REGION MAPPING CONTROL 1 (RMC1, REGISTER 19)

This protected special-purpose register contains control information associated with the mapping specified by the Region Mapping Address 1 Register. Together with the Region Mapping Address 1 Register, it controls the Region Mapping Unit 1.

The structure of the Region Mapping Control 1 Register is identical to that of the Region Mapping Control 0 Register (Figure 3-20).

3.2.3.21 SHADOW PROGRAM COUNTER 0 (SPC0, REGISTER 20)

This protected special-purpose register (Figure 3-21) is analogous to the Program Counter 0 Register, except that it operates even when the FZ bit of the Current Processor Status Register is 1; it freezes only upon entry into the Monitor Mode. The Shadow Program Counter 0 Register is used upon exit from the Monitor Mode to restart the instruction which was in the decode stage at the time of entry.
Bits 31–2: Shadow Program Counter 0 (SPC0)—This field captures the word-address of an instruction as it enters the decode stage of the processor pipeline, unless the processor is in the Monitor Mode. While the processor is in the Monitor Mode, the value of SPC0 is not modified.

Bits 1–0: Zeros—These bits are always zero, since instruction addresses are word-aligned.

3.2.3.22 SHADOW PROGRAM COUNTER 1 (SPC1, REGISTER 21)
This protected special-purpose register (Figure 3-22) is analogous to the Program Counter 1 Register, except that it operates even when the FZ bit of the Current Processor Status Register is 1; it freezes only upon entry into the Monitor Mode. The Shadow Program Counter 1 Register is used upon exit from the Monitor Mode to restart the instruction which was in the execute stage at the time of entry.

Bits 31–2: Shadow Program Counter 1 (SPC1)—This field captures the word-address of an instruction as it enters the execute stage of the processor pipeline, unless the processor is in the Monitor Mode. While the processor is in the Monitor Mode, the value of SPC1 is not modified.

Bits 1–0: Zeros—These bits are always zero, since instruction addresses are word-aligned.

3.2.3.23 SHADOW PROGRAM COUNTER 2 (SPC2, REGISTER 22)
This protected special-purpose register (Figure 3-23) is analogous to the Program Counter 2 Register, except that it operates even when the FZ bit of the Current Processor Status Register is 1; it freezes only upon entry into the Monitor Mode. The Shadow Program Counter 2 Register provides information only; it is not used by processor in a return from Monitor Mode.
Bits 31–2: Shadow Program Counter 2 (SPC2)—This field captures the word-address of an instruction as it enters the write-back stage of the processor pipeline, unless the processor is in the Monitor Mode. While the processor is in the Monitor Mode, the value of SPC2 is not modified.

Bits 1–0: Zeros—These bits are always zero, since instruction addresses are word-aligned.

3.2.3.24

**INSTRUCTION BREAKPOINT ADDRESS 0 (IBA0, REGISTER 23)**

This protected special-purpose register (Figure 3-24) contains the address of an instruction breakpoint.

![Figure 3-24](image1)

**Figure 3-24** Instruction Breakpoint Address 0 Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

IBA

<table>
<thead>
<tr>
<th>0</th>
<th>0</th>
</tr>
</thead>
</table>

Bits 31–2: Instruction Breakpoint Address (IBA)—The value in the IBA field is compared to the value of the Program Counter to determine whether an instruction breakpoint has been encountered.

Bits 1–0: Zeros—These bits are always zero, since instruction addresses are word-aligned.

3.2.3.25

**INSTRUCTION BREAKPOINT CONTROL 0 (IBC0, REGISTER 24)**

This protected special-purpose register (Figure 3-25) contains control and status information for the instruction breakpoint specified by the Instruction Breakpoint Address 0 Register.

![Figure 3-25](image2)

**Figure 3-25** Instruction Breakpoint Control 0 Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Reserved

<p>| |</p>
<table>
<thead>
<tr>
<th></th>
</tr>
</thead>
</table>

BHO, BSY, BTE

BEN, BRM

Bits 31–13: Reserved.

Bit 12: Breakpoint Has Occurred (BHO)—The BHO bit indicates whether a trap for valid breakpoint comparison has occurred. When such a trap occurs, the BHO bit is set to 1. At the next valid breakpoint comparison, the BHO bit is reset to 0, and the breakpoint trap is not taken. The BHO bit acts as a temporary breakpoint disable, ensuring that only one breakpoint comparison trap is taken each time the breakpoint is encountered and allowing the processor to progress past the breakpoint address.

Bit 11: Breakpoint Enable (BEN)—When the BEN bit is 1, the breakpoint comparison is enabled. When the BEN bit is 0, the breakpoint comparison is disabled and neither a breakpoint nor a synchronization pulse is generated when the breakpoint condition is met. The BEN bit is initialized to 0 upon reset.
Bit 10: Break or Synchronize (BSY)—The BSY bit determines the action taken when the breakpoint condition is met. If the BSY bit is 1, a breakpoint occurs; if the BSY bit is 0, a synchronization pulse is generated (see Section 5.3).

Bit 9: Break ROM (BRM)—If the BRM bit is 0, the breakpoint comparison is performed only for addresses in the instruction memory address space. If the BRM bit is 1, the breakpoint comparison is performed only for addresses in the instruction ROM address space.

Bit 8: Break on Translation Enabled (BTE)—If the BTE bit is 1, the breakpoint comparison is performed only when instruction translation is enabled (that is, when the PI bit of the Current Processor Status Register is 0). If the BTE bit is 0, the breakpoint comparison is performed when instruction translation is disabled (the PI bit is 1). Comparisons for translated addresses are further conditioned by the BPID field and the Process Identifier field of the MMU Configuration Register; these fields are ignored if the BTE bit is 0.

Bits 7–0: Breakpoint Process Identifier (BPID)—The BPID field allows the breakpoint comparison of virtual instruction addresses to be associated with a particular process. The BPID field is ignored for untranslated instruction addresses. For a User-mode virtual instruction address, the value of the BPID field must match the value of the PID field of the MMU Configuration Register for the breakpoint comparison to be valid. For a Supervisor-mode virtual address, the breakpoint condition is met only if the value of the BPID field is 0.

3.2.3.26 INSTRUCTION BREAKPOINT ADDRESS 1 (IBA1, REGISTER 25)
This protected special-purpose register contains the address of an instruction breakpoint.

The structure of the Instruction Breakpoint Address 1 Register is identical to that of the Instruction Breakpoint Address 0 Register (Figure 3-24).

3.2.3.27 INSTRUCTION BREAKPOINT CONTROL 1 (IBC1, REGISTER 26)
This protected special-purpose register contains control and status information for the instruction breakpoint specified by the Instruction Breakpoint Address 1 Register.

The structure of the Instruction Breakpoint Control 1 Register is identical to that of the Instruction Breakpoint Control 0 Register (Figure 3-25).

3.2.3.28 REGISTERS 112–127—RESERVED FOR TESTING
Special-purpose registers 112 to 127 are reserved for hardware testing. In the User Mode, an attempt to read or write these registers causes a Protection Violation trap. In the Supervisor and Monitor Modes, attempted writes have unpredictable effects on processor operation.

3.2.3.29 INDIRECT POINTER C (IPC, REGISTER 128)
This unprotected special-purpose register (Figure 3-26) provides the RC-operand register number (see Section 8.3) when an instruction RC field has the value zero (i.e., when Global Register 0 is specified).

Figure 3-26  Indirect Pointer C Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td></td>
<td></td>
<td>IPC</td>
<td>0 0</td>
</tr>
</tbody>
</table>

3-24  PROGRAMMER REFERENCE
Bits 31–10: Reserved.

**Bits 9–2: Indirect Pointer C (IPC)**—The 8-bit IPC field contains an absolute-register number for a general-purpose register. This number directly selects a register (Stack-Pointer addition is not performed in the case of local registers).

**Bits 1–0: Zeros**—The IPC field is aligned for compatibility with word addresses.

### 3.2.3.30 INDIRECT POINTER A (IPA, REGISTER 129)

This unprotected special-purpose register (Figure 3-27) provides the RA-operand register number (see Section 8.3) when an instruction RA field has the value zero (i.e., when Global Register 0 is specified).

<table>
<thead>
<tr>
<th>Figure 3-27</th>
<th>Indirect Pointer A Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bits 31–10:</td>
<td>Reserved.</td>
</tr>
<tr>
<td>Bits 9–2:</td>
<td>Indirect Pointer A (IPA) —</td>
</tr>
<tr>
<td></td>
<td>The 8-bit IPA field contains</td>
</tr>
<tr>
<td></td>
<td>an absolute-register number</td>
</tr>
<tr>
<td></td>
<td>for either a general-purpose</td>
</tr>
<tr>
<td></td>
<td>register or a local register.</td>
</tr>
<tr>
<td></td>
<td>This number directly selects</td>
</tr>
<tr>
<td></td>
<td>a register (Stack-Pointer</td>
</tr>
<tr>
<td></td>
<td>addition is not performed in</td>
</tr>
<tr>
<td></td>
<td>the case of local registers).</td>
</tr>
<tr>
<td>Bits 1–0:</td>
<td>Zeros—The IPA field is</td>
</tr>
<tr>
<td></td>
<td>aligned for compatibility</td>
</tr>
<tr>
<td></td>
<td>with word addresses.</td>
</tr>
</tbody>
</table>

### 3.2.3.31 INDIRECT POINTER B (IPB, REGISTER 130)

This unprotected special-purpose register (Figure 3-28) provides the RB-operand register number (see Section 8.3) when an instruction RB field has the value zero (i.e., when Global Register 0 is specified).

<table>
<thead>
<tr>
<th>Figure 3-28</th>
<th>Indirect Pointer B Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bits 31–10:</td>
<td>Reserved.</td>
</tr>
<tr>
<td>Bits 9–2:</td>
<td>Indirect Pointer B (IPB) —</td>
</tr>
<tr>
<td></td>
<td>The 8-bit IPB field contains</td>
</tr>
<tr>
<td></td>
<td>an absolute-register number</td>
</tr>
<tr>
<td></td>
<td>for a general-purpose</td>
</tr>
<tr>
<td></td>
<td>register. This number</td>
</tr>
<tr>
<td></td>
<td>directly selects a register</td>
</tr>
<tr>
<td></td>
<td>(Stack-Pointer addition is</td>
</tr>
<tr>
<td></td>
<td>not performed in the case of</td>
</tr>
<tr>
<td></td>
<td>local registers).</td>
</tr>
<tr>
<td>Bits 1–0:</td>
<td>Zeros—The IPB field is</td>
</tr>
<tr>
<td></td>
<td>aligned for compatibility</td>
</tr>
<tr>
<td></td>
<td>with word addresses.</td>
</tr>
</tbody>
</table>

### 3.2.3.32 Q (Q, REGISTER 131)

The Q Register is an unprotected special-purpose register (Figure 3-29).
3.2.3.33 ALU STATUS (ALU, REGISTER 132)

This unprotected special-purpose register (Figure 3-30) holds information about the outcome of Arithmetic/Logic Unit (ALU) operations as well as control for certain operations performed by the Execution Unit.

Bits 31–12: Reserved.

Bit 11: Divide Flag (DF)—The DF bit is used by the instructions that implement division. This bit is set at the end of the division instructions either to 1 or to the complement of the 33rd bit of the ALU. When a Divide Step instruction is executed, then the DF bit determines whether an addition or subtraction operation is performed by the ALU.

Bit 10: Overflow (V)—The V bit indicates that the result of a signed, two's-complement ALU operation required more than 32 bits to represent the result correctly. The value of this bit is determined by exclusive-ORing the ALU carry-out with the carry-in to the most-significant bit for signed, two's-complement operations. This bit is not used for any special purpose in the processor, and is provided for information only.

Bit 9: Negative (N)—The N bit is set with the value of the most-significant bit of the result of an arithmetic or logical operation. If two's-complement overflow occurs, the N bit does not reflect the true sign of the result. This bit is used in divide operations.

Bit 8: Zero (Z)—The Z bit indicates that the result of an arithmetic or logical operation is zero. This bit is not used for any special purpose in the processor, and is provided for information only.

Bit 7: Carry (C)—The C bit stores the carry-out of the ALU for arithmetic operations. It is used by the add-with-carry and subtract-with-carry instructions to generate the carry into the Arithmetic/Logic Unit.
Bits 6–5: Byte Pointer (BP)—The BP field holds a 2-bit pointer to a byte within a word. It is used by Insert Byte and Extract Byte instructions. The mapping of the pointer value to the byte position depends on the value of the Byte Order (BO) bit in the Configuration Register.

The most-significant bit of the BP field is used to determine the position of a half-word within a word for the Insert Half-Word, Extract Half-Word, and Extract Half-Word, Sign-Extended instructions. The mapping of the most-significant bit to the half-word position depends on the value of the BO bit in the Configuration Register.

The BP field is set by a Move To Special Register instruction with either the ALU Status Register or the Byte Pointer Register as the destination. It is also set by a load or store instruction if the Set Byte Pointer (SB) bit in the instruction is 1. A load or store sets the BP field either with the two least-significant bits of the address (if the DW bit of the Configuration Register is 0) or with the complement of the Byte Order bit of the Configuration Register (if DW is 1).

Bits 4–0: Funnel Shift Count (FC)—The FC field contains a 5-bit shift count for the Funnel Shifter. The Funnel Shifter concatenates two source-operands into a single 64-bit operand and extracts a 32-bit result from this 64-bit operand; the FC field specifies the number of bit positions from the most-significant bit of the 64-bit operand to the most-significant bit of the 32-bit result. The FC field is used by the EXTRACT instruction.

The FC field is set by a Move To Special Register instruction with either the ALU Status Register or the Funnel Shift Count Register as the destination.

### BYTE POINTER (BP, REGISTER 133)

This unprotected special-purpose register (Figure 3-31) provides an alternate access to the BP field in the ALU Status Register.

<table>
<thead>
<tr>
<th>Figure 3-31</th>
<th>Byte Pointer Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7</td>
<td>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BP</td>
</tr>
</tbody>
</table>

- Bits 31–2: Zeros.
- Bits 1–0: Byte Pointer (BP)—This field allows a program to change the BP field without affecting other fields in the ALU Status Register.

### FUNNEL SHIFT COUNT (FC, REGISTER 134)

This unprotected special-purpose register (Figure 3-32) provides an alternate access to the FC field in the ALU Status Register.

<table>
<thead>
<tr>
<th>Figure 3-32</th>
<th>Funnel Shift Count Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7</td>
<td>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 FC</td>
</tr>
</tbody>
</table>

- Bits 31–5: Zeros.
- Bits 4–0: Funnel Shift Count (FC)—This field allows a program to change the FC field without affecting other fields in the ALU Status Register.
3.2.3.36  LOAD/STORE COUNT REMAINING (CR, REGISTER 135)

This unprotected special-purpose register (Figure 3-33) provides alternate access to the CR field in the Channel Control Register.

Figure 3-33  Load/Store Count Remaining Register

<table>
<thead>
<tr>
<th>Bits</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31-8</td>
<td>Zeros</td>
</tr>
<tr>
<td>7-0</td>
<td>Load/Store Count Remaining (CR)</td>
</tr>
</tbody>
</table>

- Bits 7-0: Load/Store Count Remaining (CR)—This field allows a program to change the CR field without affecting other fields in the Channel Control Register, and is used to initialize the value before a Load Multiple or Store Multiple instruction is executed.

3.2.3.37  FLOATING-POINT ENVIRONMENT (FPE, REGISTER 160)

This unprotected special-purpose register (Figure 3-34) contains control bits that affect the execution of floating-point operations. Writing the Floating-Point Environment Register is a serializing operation; that is, all currently executing floating-point operations are completed before the write is performed.

Figure 3-34  Floating-Point Environment Register

<table>
<thead>
<tr>
<th>Bits</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31-11</td>
<td>Reserved</td>
</tr>
</tbody>
</table>
| 10-9       | Accumulator Format (ACF)—The ACF field specifies the format of the Floating-Point Accumulator Registers, as follows:

<table>
<thead>
<tr>
<th>ACF1-0</th>
<th>Accumulator Format</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>Reserved</td>
</tr>
<tr>
<td>01</td>
<td>Single-Precision</td>
</tr>
<tr>
<td>10</td>
<td>Double-Precision</td>
</tr>
<tr>
<td>11</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

- Bit 8: Fast Float Select (FF)—The FF bit being 1 enables fast floating-point operations, in which certain requirements of the IEEE floating-point specification are not met. This improves the performance of certain operations by sacrificing conformance to the IEEE specification. The fast floating-point operations are discussed in Section 7.2.8.
Bits 7–6: Floating-Point Round Mode (FRM)—This field specifies the default mode used to round the results of floating-point operations, as follows:

<table>
<thead>
<tr>
<th>FRM1–0</th>
<th>Round Mode</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>Round to nearest</td>
</tr>
<tr>
<td>01</td>
<td>Round to $-\infty$</td>
</tr>
<tr>
<td>10</td>
<td>Round to $+\infty$</td>
</tr>
<tr>
<td>11</td>
<td>Round to zero</td>
</tr>
</tbody>
</table>

Rounding is discussed in Section 7.2.7.

Bit 5: Floating-Point Divide-By-Zero Mask (DM)—If the DM bit is 0, a Floating-Point Exception trap occurs when the divisor of a floating-point division operation is zero and the dividend is a non-zero, finite number. If the DM bit is 1, a Floating-Point Exception trap does not occur for divide-by-zero.

Bit 4: Floating-Point Inexact Result Mask (XM)—If the XM bit is 0, a Floating-Point Exception trap occurs when the result of a floating-point operation is not equal to the infinitely precise result. If the XM bit is 1, a Floating-Point Exception trap does not occur for an inexact result.

Bit 3: Floating-Point Underflow Mask (UM)—If the UM bit is 0, a Floating-Point Exception trap occurs when the result of a floating-point operation is too small to be expressed in the destination format. If the UM bit is 1, a Floating-Point Exception trap does not occur for underflow.

Bit 2: Floating-Point Overflow Mask (VM)—If the VM bit is 0, a Floating-Point Exception trap occurs when the result of a floating-point operation is too large to be expressed in the destination format. If the VM bit is 1, a Floating-Point Exception trap does not occur for overflow.

Bit 1: Floating-Point Reserved Operand Mask (RM)—If the RM bit is 0, a Floating-Point Exception trap occurs when one or more input operands to a floating-point operation is a reserved value, or when the result of a floating-point operation is a reserved value. If the RM bit is 1, a Floating-Point Exception trap does not occur for reserved operands.

Bit 0: Floating-Point Invalid Operation Mask (NM)—If the NM bit is 0, a Floating-Point Exception trap occurs when the input operands to a floating-point operation produce an indeterminate result (e.g., $0 \times \infty$). If the NM bit is 1, a Floating-Point Exception trap does not occur for invalid operations.

3.2.3.38 INTEGER ENVIRONMENT (INTE, REGISTER 161)

This unprotected special-purpose register (Figure 3-35) contains control bits which affect the execution of integer multiplication and division operations. Writing the Integer Environment Register is a serializing operation. All currently executing operations are completed before the write is performed.
3.2.3.39

Figure 3-36

Floating-Point Status

<p>| | | | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Bits 31–2: Reserved.

Bit 1: Integer Division Overflow Mask (DO)—If the DO bit is 0, an Out of Range trap occurs when overflow of a signed or unsigned 32-bit result occurs during a DIVIDE or DIVIDU instruction, respectively. If the DO bit is 1, an Out of Range trap does not occur for overflow during integer divide operations.

The DIVIDE and DIVIDU instructions always cause an Out of Range Trap upon division by zero, regardless of the value of the DO bit.

Bit 0: Integer Multiplication Overflow Exception Mask (MO)—If the MO bit is 0, an Out of Range trap occurs when overflow of a signed or unsigned 32-bit result occurs during a MULTIPLY or MULTIPLU instruction, respectively. If the DO bit is 1, an Out of Range trap does not occur for overflow during integer multiply operations.

3.2.3.39

FLOATING-POINT STATUS (FPS, REGISTER 162)

This unprotected special-purpose register (Figure 3-36) contains status bits indicating the outcome of floating-point operations.

The floating-point status bits are divided into two groups. The first group consists of the sticky status bits (DS, XS, US, VS, RS, and NS), which, once set, remain set until explicitly cleared by a Move-to-Special-Register (MTSR) or Move-to-Special-Register-Immediate (MTSRIM) instruction. Sticky status bits are updated in either of two ways:

1. For floating-point operations that do not cause a Floating-Point Exception trap (FMAC, DMAC, FMSM, DMSM, and MTACC), all sticky status bits are updated at the end of instruction execution.

2. For all other floating-point operations, including CONVERT, only those sticky status bits corresponding to masked exceptions are updated. The update occurs at the end of instruction execution.

The second group consists of the trap status bits (DT, XT, UT, VT, RT, and NT), which report the status of an operation for which a Floating-Point Exception trap is taken. These bits are updated only by an operation which takes a trap as a result of an unmasked Floating-Point Exception; all other operations leave these bits unchanged. A trap status bit is updated regardless of the state of the corresponding exception mask in the Floating-Point Environment Register.

Reading or writing the Floating-Point Status Register is a serializing operation. All currently executing floating-point operations are completed before the read or write is performed.

Figure 3-36

Floating-Point Status

<p>| | | | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Bits 31–14: Reserved.

Bit 13: Floating-Point Divide By Zero Trap (DT)—The DT bit is set when a Floating-Point Exception trap occurs, and the associated floating-point operation is a divide with a zero divisor and a non-zero, finite dividend. Otherwise, this bit is reset when a Floating-Point Exception trap occurs.
Bit 12: Floating-Point Inexact Result Trap (XT)—The XT bit is set when a Floating-Point Exception trap occurs, and the result of the associated floating-point operation is not equal to the infinitely-precise result. Otherwise, this bit is reset when a Floating-Point Exception trap occurs.

Bit 11: Floating-Point Underflow Trap (UT)—The UT bit is set when a Floating-Point Exception trap occurs, and the result of the associated floating-point operation is too small to be expressed in the destination format. Otherwise, this bit is reset when a Floating-Point Exception trap occurs.

Bit 10: Floating-Point Overflow Trap (VT)—The VT bit is set when a Floating-Point Exception trap occurs, and the result of the associated floating-point operation is too large to be expressed in the destination format. Otherwise, this bit is reset when a Floating-Point Exception trap occurs.

Bit 9: Floating-Point Reserved Operand Trap (RT)—The RT bit is set when a Floating-Point Exception trap occurs, and the result of the associated floating-point operation is a reserved value. Otherwise, this bit is reset when a Floating-Point Exception trap occurs.

Bit 8: Floating-Point Invalid Operation Trap (NT)—The NT bit is set when a Floating-Point Exception trap occurs, and the input operands to the associated floating-point operation produce an indeterminate result. Otherwise, this bit is reset when a Floating-Point Exception trap occurs.

Bits 7–6: Reserved.

Bit 5: Floating-Point Divide By Zero Sticky (DS)—The DS bit is set when the DM bit of the Floating-Point Environment Register is 1, the divisor of a floating-point division operation is a zero, and the dividend is a non-zero, finite number.

Bit 4: Floating-Point Inexact Result Sticky (XS)—The XS bit is set when the XM bit of the Floating-Point Environment Register is 1, and the result of a floating-point operation is not equal to the infinitely precise result.

Bit 3: Floating-Point Underflow Sticky (US)—The US bit is set when the UM bit of the Floating-Point Environment Register is 1, and the result of a floating-point operation is too small to be expressed in the destination format.

Bit 2: Floating-Point Overflow Sticky (VS)—The VS bit is set when the VM bit of the Floating-Point Environment Register is 1, and the result of a floating-point operation is too large to be expressed in the destination format.

Bit 1: Floating-Point Reserved Operand Sticky (RS)—The RS bit is set when the RM bit of the Floating-Point Environment Register is 1, and either one or more input operands to a floating-point operation is a reserved value or the result of a floating-point operation is a reserved value.

Bit 0: Floating-Point Invalid Operation Sticky (NS)—The NS bit is set when the NM bit of the Floating-Point Environment Register is 1, and the input operands to a floating-point operation produce an indeterminate result.

3.2.3.40

EXCEPTION OPCODE (EXOP, REGISTER 164)

This unprotected special-purpose register (Figure 3-37) reports the opcode of an instruction causing an Illegal Opcode, Floating-Point Exception, or Out-of-Range trap. Writing the Exception Opcode Register is a serializing operation. All currently executing floating-point operations are completed before the write is performed.
3.2.4 TLB Registers

The Am29050 microprocessor contains 128 Translation Look-Aside Buffer (TLB) registers. The organization of the TLB registers is shown in Figure 3-38.

Figure 3-38 Translation Look-Aside Buffer Registers

<table>
<thead>
<tr>
<th>TLB Reg#</th>
<th>TLB Set 0</th>
<th>TLB Set 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>TLB Entry Line 0 Word 0</td>
<td>TLB Entry Line 0 Word 0</td>
</tr>
<tr>
<td>1</td>
<td>TLB Entry Line 0 Word 1</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>TLB Entry Line 1 Word 0</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>TLB Entry Line 1 Word 1</td>
<td></td>
</tr>
<tr>
<td>62</td>
<td>TLB Entry Line 31 Word 0</td>
<td>TLB Entry Line 0 Word 0</td>
</tr>
<tr>
<td>63</td>
<td>TLB Entry Line 31 Word 1</td>
<td></td>
</tr>
</tbody>
</table>

Bits 31–8: Reserved.

Bits 7–0: Instruction Opcode (IOP)—This field captures the opcode of an instruction causing a trap as a result of instruction execution; the opcode is captured as the instruction enters the write-back stage of the processor pipeline. Instructions that do not trap as a consequence of execution do not modify the IOP field.

The Exception Opcode Register can be written explicitly by using it as the destination of a Move-to-Special-Register (MTSR) instruction.
The TLB registers comprise the TLB entries, and are provided so that programs may inspect and alter TLB entries. This allows the loading, invalidation, saving, and restoring of TLB entries.

TLB registers have fields that are reserved for future processor implementations. When a TLB register is read, a bit in a reserved field is read as a 0. An attempt to write a reserved bit with a 1 has no effect; however, this should be avoided because of upward-compatibility considerations.

The Translation Look-aside Buffer (TLB) registers are accessed only by explicit data movement by Supervisor-mode programs. Instructions that move data to or from a TLB register specify a general-purpose register containing a TLB register number. The TLB register number is given by the contents of bits 6–0 of the general-purpose register. TLB register numbers may only be specified indirectly by general-purpose registers.

TLB entries are accessed as registers numbered 0–127. Since two words are required to completely specify a TLB entry, two registers are required for each TLB entry. The words corresponding to an entry are paired as two sequentially numbered registers starting on an even-numbered register. The word with the even register number is called Word 0, and the word with the odd register number is called Word 1. The entries for TLB Set 0 are in registers numbered 0–63, and the entries for TLB Set 1 are in registers numbered 64–127.

### 3.2.4.1 TLB ENTRY WORD 0

The TLB Entry Word 0 register is shown in Figure 3-39.

#### Figure 3-39 TLB Entry Word 0 Register

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>VTAG</td>
<td></td>
<td></td>
<td></td>
<td>TID</td>
</tr>
</tbody>
</table>

| Bits 31–15: Virtual Tag (VTAG)—When the TLB is searched for an address translation, the VTAG field of the TLB entry must match the most-significant 17, 16, 15, or 14 bits of the address being translated—for page sizes of 1, 2, 4, and 8 kb, respectively—for the search to be successful. |

When software loads a TLB entry with an address translation, the most-significant 14 bits of the Virtual Tag are set with the most-significant 14 bits of the virtual address whose translation is being loaded into the TLB. The remaining three bits of the Virtual Tag must be set either to the corresponding bits of the address, or to zeros, depending on the page size, as follows (A refers to corresponding address bits):

<table>
<thead>
<tr>
<th>Page Size</th>
<th>VTAG 2–0 (TLB Word 0 Bits 17–15)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 kb</td>
<td>AAA</td>
</tr>
<tr>
<td>2 kb</td>
<td>AA0</td>
</tr>
<tr>
<td>4 kb</td>
<td>A00</td>
</tr>
<tr>
<td>8 kb</td>
<td>000</td>
</tr>
</tbody>
</table>
Bit 14: Valid Entry (VE)—If this bit is 1, the associated TLB entry is valid; if it is 0, the entry is invalid.

Bit 13: Supervisor Read (SR)—If the SR bit is 1, Supervisor-mode load operations from the virtual page are allowed; if it is 0, Supervisor-mode loads are not allowed.

Bit 12: Supervisor Write (SW)—If the SW bit is 1, Supervisor-mode store operations to the virtual page are allowed; if it is 0, Supervisor-mode stores are not allowed.

Bit 11: Supervisor Execute (SE)—If the SE bit is 1, Supervisor-mode instruction accesses to the virtual page are allowed; if it is 0, Supervisor-mode instruction accesses are not allowed.

Bit 10: User Read (UR)—If the UR bit is 1, User-mode load operations from the virtual page are allowed; if it is 0, User-mode loads are not allowed.

Bit 9: User Write (UW)—If the UW bit is 1, User-mode store operations to the virtual page are allowed; if it is 0, User-mode stores are not allowed.

Bit 8: User Execute (UE)—If the UE bit is 1, User-mode instruction accesses to the virtual page are allowed; if it is 0, User-mode instruction accesses are not allowed.

Bits 7–0: Task Identifier (TID)—When the TLB is searched for an address translation, the TID must match the Process Identifier (PID) in the MMU Configuration Register for the translation to be successful. This field allows the TLB entry to be associated with a particular process.

3.2.4.2

**TLB ENTRY WORD 1**

The TLB Entry Word 1 register is shown in Figure 3-40.

<table>
<thead>
<tr>
<th>Figure 3-40</th>
<th>TLB Entry Word 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 23 15 7</td>
<td></td>
</tr>
<tr>
<td>RPN Res PGM</td>
<td></td>
</tr>
<tr>
<td>Res U</td>
<td></td>
</tr>
</tbody>
</table>

**Bits 31–10: Real Page Number (RPN)**—The RPN field gives the most-significant 22, 21, 20, or 19 bits of the physical address of the page for page sizes of 1, 2, 4, and 8 kb, respectively. It is concatenated to bits 9–0, 10–0, 11–0, or 12–0 of the address being translated—for 1, 2, 4, and 8 kb page sizes, respectively—to form the physical address for the access.

When software loads a TLB entry with an address translation, the most-significant 19 bits of the Real Page Number are set with the most-significant 19 bits of the physical address associated with the translation. The remaining three bits of the Real Page Number must be set either to the corresponding bits of the physical address, or to zeros, depending on the page size, as follows (A refers to corresponding address bits):

<table>
<thead>
<tr>
<th>Page Size</th>
<th>RPN 2–0 (TLB Word 1 Bits 12–10)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 kb</td>
<td>AAA</td>
</tr>
<tr>
<td>2 kb</td>
<td>AA0</td>
</tr>
<tr>
<td>4 kb</td>
<td>A00</td>
</tr>
<tr>
<td>8 kb</td>
<td>000</td>
</tr>
</tbody>
</table>
Bits 7–6: User Programmable (PGM)—These bits are placed on the MPGM(1–0) outputs when the address is transmitted for an access. They have no predefined effect on the access; any effect is defined by logic external to the processor.

Bit 1: Usage (U)—This bit indicates which entry in a given TLB line was least recently used to perform an address translation. If this bit is a 0, then the entry in Set 0 in the line is least-recently-used; if it is 1, then the entry in Set 1 is least-recently-used. This bit has an equal value for both entries in a line. Whenever a TLB entry is used to translate an address, the Usage bit of both entries in the line used for translation are set according to the TLB set containing the translation. This bit is set whenever the translation is valid, regardless of the outcome of memory-protection checking.

Bit 0: Input/Output (IO)—The IO bit determines whether the access is directed to the instruction/data memory (IO = 0) or the input/output (IO = 1) address space.

### 3.3 INSTRUCTION SET

The Am29050 microprocessor implements 125 instructions. All instructions execute in a single cycle, except for IRET, IRETINV, LOADM, STOREM, and certain arithmetic instructions such as floating-point instructions.

Most instructions deal with general-purpose registers for operands and results; however, in most instructions, an 8-bit constant can be used in place of a register-based operand. Some instructions deal with special-purpose registers, TLB registers, external devices and memories, and coprocessors.

This section describes the nine instruction classes in the Am29050 microprocessor, and provides a brief summary of instruction operations. A detailed instruction specification is contained in Chapter 8. Section 8.1 describes the nomenclature used here.

If the processor attempts to execute an instruction which is not implemented, an Illegal Opcode trap occurs, unless the instruction is reserved for emulation (see Section 3.3.10). Reserved instructions are assigned separate traps.

#### 3.3.1 Integer Arithmetic

The Integer Arithmetic instructions perform add, subtract, multiply, and divide operations on word-length integers. Certain instructions in this class cause traps if signed or unsigned overflow occurs during the execution of the instruction. There is support for multi-precision arithmetic on operands whose lengths are multiples of words. All instructions in this class set the ALU Status Register. The integer arithmetic instructions are shown in Table 3-1.

#### 3.3.2 Compare

The Compare instructions test for various relationships between two values. For all Compare instructions except the CPBYTE instruction, the comparisons are performed on word-length signed or unsigned integers. There are two types of Compare instructions. The first type places a Boolean value reflecting the outcome of the compare into a general-purpose register. For the second type (assert instructions), instruction execution continues only if the comparison is true; otherwise a trap occurs. The assert instructions specify a vector for the trap (see Section 3.5.4).

The assert instructions support run-time operand checking and operating-system calls. If the trap occurs in the User mode, and a trap number between 0 and 63 is specified by the instruction, a Protection Violation trap occurs. The Compare instructions are shown in Table 3-2.
<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD</td>
<td>DEST ← SRCA + SRCB</td>
</tr>
<tr>
<td>ADDS</td>
<td>DEST ← SRCA + SRCB (Out of Range)</td>
</tr>
<tr>
<td>ADDU</td>
<td>DEST ← SRCA + SRCB (Out of Range)</td>
</tr>
<tr>
<td>ADDC</td>
<td>DEST ← SRCA + SRCB + C</td>
</tr>
<tr>
<td>ADDCS</td>
<td>DEST ← SRCA + SRCB + C (Out of Range)</td>
</tr>
<tr>
<td>ADDCU</td>
<td>DEST ← SRCA + SRCB + C (Out of Range)</td>
</tr>
<tr>
<td>SUB</td>
<td>DEST ← SRCA - SRCB</td>
</tr>
<tr>
<td>SUBS</td>
<td>DEST ← SRCA - SRCB (Out of Range)</td>
</tr>
<tr>
<td>SUBU</td>
<td>DEST ← SRCA - SRCB (Out of Range)</td>
</tr>
<tr>
<td>SUBC</td>
<td>DEST ← SRCA - SRCB - 1 + C</td>
</tr>
<tr>
<td>SUBCS</td>
<td>DEST ← SRCA - SRCB - 1 + C (Out of Range)</td>
</tr>
<tr>
<td>SUBCU</td>
<td>DEST ← SRCA - SRCB - 1 + C (Out of Range)</td>
</tr>
<tr>
<td>SUBR</td>
<td>DEST ← SRCB - SRCA</td>
</tr>
<tr>
<td>SUBRS</td>
<td>DEST ← SRCB - SRCA (Out of Range)</td>
</tr>
<tr>
<td>SUBRU</td>
<td>DEST ← SRCB - SRCA (Out of Range)</td>
</tr>
<tr>
<td>SUBRC</td>
<td>DEST ← SRCB - SRCA - 1 + C</td>
</tr>
<tr>
<td>SUBRCS</td>
<td>DEST ← SRCB - SRCA - 1 + C (Out of Range)</td>
</tr>
<tr>
<td>SUBRCU</td>
<td>DEST ← SRCB - SRCA - 1 + C (Out of Range)</td>
</tr>
<tr>
<td>MULTIPLU</td>
<td>DEST ← SRCA · SRCB (unsigned)</td>
</tr>
<tr>
<td>MULTIPLY</td>
<td>DEST ← SRCA · SRCB (signed)</td>
</tr>
<tr>
<td>MUL</td>
<td>Perform one-bit step of a multiply operation (signed)</td>
</tr>
<tr>
<td>MULL</td>
<td>Complete a sequence of multiply steps</td>
</tr>
<tr>
<td>MULTM</td>
<td>DEST ← SRCA · SRCB (signed), most-significant bits</td>
</tr>
<tr>
<td>MULTMU</td>
<td>DEST ← SRCA · SRCB (unsigned), most-significant bits</td>
</tr>
<tr>
<td>MULU</td>
<td>Perform one-bit step of a multiply operation (unsigned)</td>
</tr>
<tr>
<td>DIVIDE</td>
<td>DEST ← (Q/SRCA)/SRCB (signed)</td>
</tr>
<tr>
<td>DIVIDU</td>
<td>DEST ← (Q/SRCA)/SRCB (unsigned)</td>
</tr>
<tr>
<td>DIV0</td>
<td>Initialize for a sequence of divide steps (unsigned)</td>
</tr>
<tr>
<td>DIV</td>
<td>Perform one-bit step of a divide operation (unsigned)</td>
</tr>
<tr>
<td>DIVL</td>
<td>Complete a sequence of divide steps (unsigned)</td>
</tr>
<tr>
<td>DIVREM</td>
<td>Generate remainder for divide operation (unsigned)</td>
</tr>
</tbody>
</table>
### Table 3-2  Compare Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPEQ</td>
<td>IF SRCA = SRCB THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPNEQ</td>
<td>IF SRCA ≱ SRCB THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPLT</td>
<td>IF SRCA &lt; SRCB THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPLTU</td>
<td>IF SRCA &lt; SRCB (unsigned) THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPLE</td>
<td>IF SRCA ≤ SRCB THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPLEU</td>
<td>IF SRCA ≤ SRCB (unsigned) THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPGT</td>
<td>IF SRCA &gt; SRCB THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPGTU</td>
<td>IF SRCA &gt; SRCB (unsigned) THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPGE</td>
<td>IF SRCA ≥ SRCB THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPGEU</td>
<td>IF SRCA ≥ SRCB (unsigned) THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>CPBYTE</td>
<td>IF (SRCA.BYTE0 = SRCB.BYTE0) OR&lt;br&gt;   (SRCA.BYTE1 = SRCB.BYTE1) OR&lt;br&gt;   (SRCA.BYTE2 = SRCB.BYTE2) OR&lt;br&gt;   (SRCA.BYTE3 = SRCB.BYTE3) THEN DEST ← TRUE&lt;br&gt;ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>ASEQ</td>
<td>IF SRCA = SRCB THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASNEQ</td>
<td>IF SRCA ≱ SRCB THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASLT</td>
<td>IF SRCA &lt; SRCB THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASLTU</td>
<td>IF SRCA &lt; SRCB (unsigned) THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASLE</td>
<td>IF SRCA ≤ SRCB THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASLEU</td>
<td>IF SRCA ≤ SRCB (unsigned) THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASGT</td>
<td>IF SRCA &gt; SRCB THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASGTÜ</td>
<td>IF SRCA &gt; SRCB (unsigned) THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASGE</td>
<td>IF SRCA ≥ SRCB THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
<tr>
<td>ASGEU</td>
<td>IF SRCA ≥ SRCB (unsigned) THEN Continue&lt;br&gt;ELSE Trap (VN)</td>
</tr>
</tbody>
</table>
3.3.3 Logical

The Logical instructions perform a set of bit-by-bit Boolean functions on word-length bit strings. All instructions in this class set the ALU Status Register. These instructions are shown in Table 3-3.

Table 3-3 Logical Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>AND</td>
<td>DEST ← SRCA &amp; SRCB</td>
</tr>
<tr>
<td>ANDN</td>
<td>DEST ← SRCA &amp; ~ SRCB</td>
</tr>
<tr>
<td>NAND</td>
<td>DEST ← ~ (SRCA &amp; SRCB)</td>
</tr>
<tr>
<td>OR</td>
<td>DEST ← SRCA</td>
</tr>
<tr>
<td>ORN</td>
<td>DEST ← SRCA</td>
</tr>
<tr>
<td>NOR</td>
<td>DEST ← ~ (SRCA</td>
</tr>
<tr>
<td>XOR</td>
<td>DEST ← SRCA ^ SRCB</td>
</tr>
<tr>
<td>XNOR</td>
<td>DEST ← ~ (SRCA ^ SRCB)</td>
</tr>
</tbody>
</table>

3.3.4 Shift

The Shift instructions (Table 3-4) perform arithmetic and logical shifts. All but the EXTRACT instruction operate on word-length data and produce a word-length result. The EXTRACT instruction operates on double-word data and produces a word-length result. If both parts of the double-word for the EXTRACT instruction are from the same source, the EXTRACT operation is equivalent to a rotate operation. For each operation, the shift count is a 5-bit integer, specifying a shift amount in the range of 0 to 31 bits.

Table 3-4 Shift Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SLL</td>
<td>DEST ← SRCA &lt;&lt; SRCB  (zero fill)</td>
</tr>
<tr>
<td>SRL</td>
<td>DEST ← SRCA &gt;&gt; SRCB  (zero fill)</td>
</tr>
<tr>
<td>SRA</td>
<td>DEST ← SRCA &gt;&gt; SRCB  (sign fill)</td>
</tr>
<tr>
<td>EXTRACT</td>
<td>DEST ← high-order word of (SRCA/SRCB &lt;&lt; FC)</td>
</tr>
</tbody>
</table>

3.3.5 Data Movement

The Data Movement instructions (Table 3-5) move bytes, half-words, and words between processor registers. In addition, they move data between general-purpose registers and external devices, memories, and the coprocessor.

3.3.6 Constant

The Constant instructions (Table 3-6) provide the ability to place half-word and word constants into registers. Most instructions in the instruction set allow an 8-bit constant as an operand. The Constant instructions allow the construction of larger constants.
### Table 3-5  Data Movement Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>LOAD</td>
<td>DEST ← EXTERNAL WORD [SRCB]</td>
</tr>
<tr>
<td>LOADL</td>
<td>DEST ← EXTERNAL WORD [SRCB] assert LOCK output during access</td>
</tr>
<tr>
<td>LOADSET</td>
<td>DEST ← EXTERNAL WORD [SRCB] EXTERNAL WORD [SRCB] ← h'FFFFFFFF' assert LOCK output during access</td>
</tr>
<tr>
<td>LOADM</td>
<td>DEST.. DEST + COUNT ← EXTERNAL WORD [SRCB] .. EXTERNAL WORD [SRCB + COUNT . 4]</td>
</tr>
<tr>
<td>STORE</td>
<td>EXTERNAL WORD [SRCB] ← SRCA</td>
</tr>
<tr>
<td>STOREL</td>
<td>EXTERNAL WORD [SRCB] ← SRCA assert LOCK output during access</td>
</tr>
<tr>
<td>STOREM</td>
<td>EXTERNAL WORD [SRCB] .. EXTERNAL WORD [SRCB + COUNT . 4] ← SRCA .. SRCA + COUNT</td>
</tr>
<tr>
<td>EXBYTE</td>
<td>DEST ← SRCB, with low-order byte replaced by byte in SRCA selected by BP</td>
</tr>
<tr>
<td>EXHW</td>
<td>DEST ← SRCB, with low-order half-word replaced by half-word in SRCA selected by BP</td>
</tr>
<tr>
<td>EXHWS</td>
<td>DEST ← half-word in SRCA selected by BP, sign-extended to 32 bits</td>
</tr>
<tr>
<td>INBYTE</td>
<td>DEST ← SRCA, with byte selected by BP replaced by low-order byte of SRCB</td>
</tr>
<tr>
<td>INHW</td>
<td>DEST ← SRCA, with half-word selected by BP replaced by low-order half-word of SRCB</td>
</tr>
<tr>
<td>MFSR</td>
<td>DEST ← SPECIAL</td>
</tr>
<tr>
<td>MFTLB</td>
<td>DEST ← TLB [SRCA]</td>
</tr>
<tr>
<td>MTSR</td>
<td>SPDEST ← SRCB</td>
</tr>
<tr>
<td>MTSRIM</td>
<td>SPDEST ← 0116</td>
</tr>
<tr>
<td>MTTLB</td>
<td>TLB [SRCA] ← SRCB</td>
</tr>
</tbody>
</table>

### Table 3-6  Constant Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CONST</td>
<td>DEST ← 0116</td>
</tr>
<tr>
<td>CONSTH</td>
<td>Replace high-order half-word of SRCA by I16</td>
</tr>
<tr>
<td>CONSTHZ</td>
<td>Replace high-order half-word of SRCA with I16, and replace low-order half-word of SRCA with zeros.</td>
</tr>
<tr>
<td>CONSTN</td>
<td>DEST ← 1116</td>
</tr>
</tbody>
</table>
3.3.7 **Floating-Point**

The Floating-Point instructions (Table 3-7) provide operations on single-precision (32-bit) or double-precision (64-bit) floating-point data. They also provide conversions between single-precision, double-precision, and integer number representations.

3.3.8 **Branch**

The Branch instructions (Table 3-8) control the execution flow of instructions. Branch target addresses may be absolute, relative to the Program Counter (with the offset given by a signed instruction constant), or contained in a general-purpose register. For conditional jumps, the outcome of the jump is based on a Boolean value in a general-purpose register. Procedure calls are unconditional, and save the return address in a general-purpose register. All branches have a delayed effect; the instruction sequence following the branch is executed regardless of the outcome of the branch.

3.3.9 **Miscellaneous**

The Miscellaneous instructions (Table 3-9) perform various operations that cannot be grouped into other instruction classes. In certain cases, these are control functions available only to Supervisor-mode programs.

3.3.10 **Reserved Instructions**

Several Am29050 microprocessor operation codes are reserved for instruction emulation. Each of these instructions causes a trap and sets the indirect pointers IPC, IPA, and IPB. Some of these operation codes cause a trap to a unique trap vector, and others cause traps to shared trap vector 28. The relevant operation codes, and the corresponding trap vectors, are:

<table>
<thead>
<tr>
<th>Operation Codes (Hexadecimal)</th>
<th>Trap Vector Numbers (Decimal)</th>
</tr>
</thead>
<tbody>
<tr>
<td>BF, CF-D6, DC</td>
<td>28</td>
</tr>
<tr>
<td>DD</td>
<td>29</td>
</tr>
<tr>
<td>E7</td>
<td>39</td>
</tr>
<tr>
<td>F8</td>
<td>56</td>
</tr>
<tr>
<td>FA–FF</td>
<td>58–63</td>
</tr>
</tbody>
</table>

The reserved instructions are intended for future processor enhancements, and users desiring compatibility with future processor versions should not use them for any purpose.

3.4 **DATA FORMATS AND HANDLING**

This section describes the various data types supported by the Am29050 microprocessor, and the mechanisms for accessing data in external devices and memories. The Am29050 microprocessor includes provisions for the external access of bytes, half-words, unaligned words, and unaligned half-words, as described in this section.

3.4.1 **Integer Data Types**

Most Am29050 microprocessor instructions deal directly with word-length integer data; integers may be either signed or unsigned, depending on the instruction. Some instructions (e.g., AND) treat word-length operands as strings of bits. In addition, there is support for character, half-word, and Boolean data types.
Table 3-7 Floating-Point Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>FADD</td>
<td>DEST (single-precision) ← SRCA (single-precision) + SRCB (single-precision)</td>
</tr>
<tr>
<td>DADD</td>
<td>DEST (double-precision) ← SRCA (double-precision) + SRCB (double-precision)</td>
</tr>
<tr>
<td>FSUB</td>
<td>DEST (single-precision) ← SRCA (double-precision) - SRCB (single-precision)</td>
</tr>
<tr>
<td>DSUB</td>
<td>DEST (double-precision) ← SRCA (double-precision) - SRCB (double-precision)</td>
</tr>
<tr>
<td>FMUL</td>
<td>DEST (single-precision) ← SRCA (single-precision) · SRCB (single-precision)</td>
</tr>
<tr>
<td>FDMUL</td>
<td>DEST (double-precision) ← SRCA (single-precision) · SRCB (single-precision)</td>
</tr>
<tr>
<td>DMUL</td>
<td>DEST (double-precision) ← SRCA (double-precision) · SRCB (double-precision)</td>
</tr>
<tr>
<td>FDIV</td>
<td>DEST (single-precision) ← SRCA (single-precision) / SRCB (single-precision)</td>
</tr>
<tr>
<td>DDIV</td>
<td>DEST (double-precision) ← SRCA (double-precision) / SRCB (double-precision)</td>
</tr>
<tr>
<td>FMAC</td>
<td>ACC(ACN) (variable-precision) ← SRCA (single-precision) · SRCB (single-precision) + ACC(ACN) (variable-precision)</td>
</tr>
<tr>
<td>DMAC</td>
<td>ACC(ACN) (double-precision) ← SRCA (double-precision) · SRCB (double-precision) + ACC(ACN) (double-precision)</td>
</tr>
<tr>
<td>FMSM</td>
<td>DEST (single-precision) ← SRCA (single-precision) · ACC(0) (single-precision) + SRCB (single-precision)</td>
</tr>
<tr>
<td>DMSM</td>
<td>DEST (double-precision) ← SRCA (double-precision) · ACC(0) (double-precision) + SRCB (double-precision)</td>
</tr>
<tr>
<td>MFACC</td>
<td>DEST ← ACC(ACN)</td>
</tr>
<tr>
<td>MTACC</td>
<td>ACC(ACN) ← SRCA</td>
</tr>
<tr>
<td>FEQ</td>
<td>IF SRCA (single-precision) = SRCB (single-precision) THEN DEST ← TRUE ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>DEQ</td>
<td>IF SRCA (double-precision) = SRCB (double-precision) THEN DEST ← TRUE ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>FGE</td>
<td>IF SRCA (single-precision) &gt;= SRCB (single-precision) THEN DEST ← TRUE ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>DGE</td>
<td>IF SRCA (double-precision) &gt;= SRCB (double-precision) THEN DEST ← TRUE ELSE DEST ← FALSE</td>
</tr>
<tr>
<td>FGT</td>
<td>IF SRCA (single-precision) &gt; SRCB (single-precision) THEN DEST ← TRUE ELSE DEST ← FALSE</td>
</tr>
</tbody>
</table>
### Table 3-7  
**Floating-Point Instructions (continued)**

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
</table>
| DGT      | IF SRCA (double-precision) > SRCB (double-precision)  
THEN DEST ← TRUE  
ELSE DEST ← FALSE |
| SQRT     | DEST (single-precision, double-precision)  
← SQRT (SRCA (single-precision, double-precision)) |
| CONVERT  | DEST (integer, single-precision, double-precision)  
← SRCA (integer, single-precision, double-precision) |
| CLASS    | DEST ← CLASS (SRCA (single-precision, double-precision)) |

### Table 3-8  
**Branch Instructions**

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
</table>
| CALL     | DEST ← PC//00 + 8  
PC ← TARGET  
Execute delay instruction |
| CALLI    | DEST ← PC//00 + 8  
PC ← SRCB  
Execute delay instruction |
| JMP      | PC ← TARGET  
Execute delay instruction |
| JMPI     | PC ← SRCB  
Execute delay instruction |
| JMPT     | IF SRCA = TRUE THEN PC ← TARGET  
Execute delay instruction |
| JMPTI    | IF SRCA = TRUE THEN PC ← SRCB  
Execute delay instruction |
| JMPF     | IF SRCA = FALSE THEN PC ← TARGET  
Execute delay instruction |
| JMPFI    | IF SRCA = FALSE THEN PC ← SRCB  
Execute delay instruction |
| JMPFDEC  | IF SRCA = FALSE THEN  
SRCA ← SRCA - 1  
PC ← TARGET  
ELSE  
SRCA ← SRCA - 1  
Execute delay instruction |

### Table 3-9  
**Miscellaneous Instructions**

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLZ</td>
<td>Determine number of leading zeros in a word</td>
</tr>
<tr>
<td>SETIP</td>
<td>Set IPA, IPB, and IPC with operand register numbers</td>
</tr>
<tr>
<td>EMULATE</td>
<td>Load IPA and IPB with operand register numbers, and Trap (VN)</td>
</tr>
<tr>
<td>INV</td>
<td>Reset all Valid bits in Branch Target Cache memory to zeros</td>
</tr>
<tr>
<td>IRET</td>
<td>Perform an interrupt return sequence</td>
</tr>
<tr>
<td>IRETINV</td>
<td>Perform an interrupt return sequence, and reset all Valid bits in Branch Target Cache memory to zeros</td>
</tr>
<tr>
<td>HALT</td>
<td>Enter Halt mode</td>
</tr>
</tbody>
</table>
3.4.1.1 BYTE OPERATIONS

The processor supports character data through load, store, extraction and insertion operations on word-length operands, and by a compare operation on byte-length fields within words. The format for unsigned and signed characters is shown in Figure 3-41; for signed characters, the sign bit is the most-significant bit of the character. For sequences of packed characters within words, bytes are ordered either left-to-right or right-to-left, depending on the BO bit of the Configuration Register (see Section 3.4.5.2).

Figure 3-41 Character Format

<table>
<thead>
<tr>
<th>Unsigned:</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Data</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Signed:</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
</tr>
<tr>
<td></td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
<td>s</td>
</tr>
<tr>
<td>Data</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

If the Data Width Enable (DW) bit of the Configuration Register is 1, the Am29050 microprocessor is enabled to load and store byte data. On a load, an external packed byte is converted to one of the character formats shown in Figure 3-41. On a store, the low-order byte of a word is packed into every byte of an external word. Section 3.4.6 describes external byte accesses in more detail.

The Extract Byte (EXBYTE) instruction replaces the low-order character of a destination word with an arbitrary byte-aligned character from a source word. For the EXBYTE instruction, the destination word can be a zero word, which effectively zero-extends the character from the source operand.

The Insert Byte (INBYTE) instruction replaces an arbitrary byte-aligned character in a destination word with the low-order character of a source word. For the INBYTE instruction, the source operand can be a character constant specified by the instruction.

The Compare Bytes (CPBYTE) instruction compares two word-length operands and gives a result of TRUE if any corresponding bytes within the operands have equivalent values. This allows programs to detect characters within words without first having to extract individual characters, one at a time, from the word of interest.

3.4.1.2 HALF-WORD OPERATIONS

The processor supports half-word data through load, store, insertion and extraction operations on word-length operands. The format for unsigned and signed half-words is shown in Figure 3-42; for signed half-words, the sign bit is the most-significant bit of the half-word. For sequences of packed half-words within words, half-words are ordered either left-to-right or right-to-left, depending on the Byte Order (BO) bit of the Configuration Register (see Section 3.4.5.2).

If the Data Width Enable (DW) bit of the Configuration Register is 1, the Am29050 microprocessor is enabled to load and store half-word data. On a load, an external packed half-word is converted to one of the formats shown in Figure 3-42. On a store, the low-order half-word of a word is packed into every half-word of an external word. Section 3.4.5 describes external half-word accesses in more detail.
### 3.4.1.3 BOOLEAN DATA

Some instructions in the Compare class generate word-length Boolean results. Also, conditional branches are conditional upon Boolean operands. The Boolean format used by the processor is such that the Boolean values TRUE and FALSE are represented by a 1 or 0, respectively, in the most-significant bit of a word. The remaining bits are unimportant: for the compare instructions, they are reset. Note that two’s-complement negative integers are indicated by the Boolean value TRUE in this encoding scheme.

### 3.4.2 Floating-Point Data Types

The Am29050 microprocessor supports single- and double-precision floating-point formats that comply with the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std. 754–1985).

In this section, the following nomenclature is used to denote fields in a floating-point value:

- s: sign bit
- bexp: biased exponent
- frac: fraction
- sig: significand
3.4.2.1 SINGLE-PRECISION FLOATING-POINT
The format for a single-precision floating-point value is shown in Figure 3-43.

![Figure 3-43 Single-Precision Floating-Point Format](image)

Typically, the value of a single-precision operand is expressed by:

\[ (-1)^s \times 1.frac \times 2^{(bexp-127)}. \]

The encoding of special floating-point values is given in Section 3.4.3.

3.4.2.2 DOUBLE-PRECISION FLOATING-POINT
The format for a double-precision floating-point value is shown in Figure 3-44.

![Figure 3-44 Double-Precision Floating-Point Format](image)

Typically, the value of a double-precision operand is expressed by:

\[ (-1)^s \times 1.frac \times 2^{(bexp-1023)}. \]

The encoding of special floating-point values is given in Section 3.4.3.

In order to be properly referenced by a floating-point instruction, a double-precision floating-point value must be double-word aligned. The absolute-register number of the register containing the first word (labeled 0 in Figure 3-44) must be even. The absolute-register number of the register containing the second word (labeled 1 in Figure 3-44) must be odd. If these conditions are not met, the results of the instruction are unpredictable. Note that the appropriate registers for a double-precision value in the local registers depends on the value of the Stack Pointer.

3.4.3 Special Floating-Point Values
The Am29050 microprocessor defines floating-point values which are encoded for special interpretation. The values are described in this section.

3.4.3.1 NOT-A-NUMBER
A Not-a-Number (NaN) is a symbolic value used to report certain floating-point exceptions. It also can be used to implement user-defined extensions to floating-point operations. A NaN comprises a floating-point number with maximum biased exponent and non-zero fraction. The sign bit can be either 0 or 1, and has no significance. There are two types of NaN: signaling NaNs and quiet NaNs. A signaling NaN causes an Invalid Operation exception if used as an input operand to a floating-point operation.
operation; a quiet NaN does not cause an exception. The Am29050 microprocessor distinguishes signaling and quiet NaNs by the most-significant bit of the fraction: a 1 indicates a quiet NaN, and a 0 indicates a signaling NaN.

An operation never generates a signaling NaN as a result. A quiet NaN result can be generated in one of two ways:

- As the result of an invalid operation that cannot generate a reasonable result, or
- As the result of an operation for which one or more input operands are either signaling or quiet NaNs.

In either case, the Am29050 microprocessor produces a quiet NaN having a fraction of 11000...0; that is, the two most-significant bits of the fraction are 11, and the remaining bits are 0. If desired, the Reserved Operand exception can be enabled to cause a Floating-Point Exception trap. The trap handler in this case can implement a scheme whereby user-defined NaN values appear to pass through operations as results, providing overall status for a series of operations.

3.4.3.2 INFINITY

Infinity is an encoded value used to represent a value that is too large to be represented as a finite number in a given floating-point format. Infinity comprises a floating-point number with maximum biased exponent and zero fraction. The sign bit of an infinity distinguishes +∞ from −∞.

3.4.3.3 DENORMALIZED NUMBERS

The IEEE Standard specifies that, wherever possible, a result that is too small to be represented as a normalized number be represented as a denormalized number. A denormalized number may be used as an input operand to any operation. For single- and double-precision formats, a denormalized number is a floating-point number with a biased exponent of zero and a non-zero fraction field; the sign bit can be either 1 or 0. The value of a denormalized number is expressed by:

\[(-1)^s \times 0.\text{frac} \times 2^{(-\text{bias}+1)},\]

where bias is the exponent bias for the format in question (127 for single precision, 1023 for double precision). The handling of denormalized numbers is discussed in Appendix C.

3.4.3.4 ZERO

A zero is a floating-point number with a biased exponent of zero and a zero fraction field. The sign bit of a zero can be either 0 or 1; however, positive and negative zero are both exactly zero, and are considered equal by comparison operations.

3.4.4 External Data Accesses

All processor external accesses occur between general-purpose registers and external devices and memories. Accesses occur as the result of the execution of load and store instructions. The load and store instructions specify which general-purpose register receives the data (for a load) or supplies the data (for a store). The format of the load and store instructions is shown in Figure 3-45.
Addresses for accesses are given either by the content of a general-purpose register or by a constant value specified by the load or store instruction. The load and store instructions do not perform address computation directly. Any required address computations are performed explicitly by other instructions.

In the load or store instruction, the Coprocessor Enable (CE) bit (bit 23) determines whether or not the access is directed to the coprocessor. If the CE bit is 0, the access is directed to an external device or memory. If the CE bit is 1, data is transferred to or from the coprocessor. The CE bit affects the interpretation of the Control (CNTL) field as well as the channel protocol. Coprocessor accesses are discussed in Chapter 6.

This section deals with all other external accesses.

The format of the instructions that do not perform coprocessor data transfers (i.e., in which the CE bit is 0) is shown in Figure 3-46.

In load and store instructions, the RB or I field specifies the address for access. The address is either the content of a general-purpose register, with register number RB, or a constant with a value I (zero-extended to 32 bits). The M bit determines whether the register or the constant is used.

The data for the access is written into the general-purpose register RA for a load, and is supplied by register RA for a store.

The definitions for other fields in the load or store instruction are given below:

**Bit 23: Coprocessor Enable (CE)**—The CE bit is 0 for a non-coprocessor load or store.

**Bit 22: Address Space (AS)**—If the AS bit is 0 for an untranslated load or store, the access is directed to instruction/data memory. If the AS bit is 1 for an untranslated load or store, the access is directed to input/output. The AS bit must be 0 for a translated load or store; if the AS bit is 1 for a translated load or store, a Protection Violation trap occurs. The address space for a translated load or store is determined by the Input/Output (IO) bit of the associated TLB entry.

**Bit 21: Physical Address (PA)**—The PA bit may be used by a Supervisor-mode program to disable address translation for an access. If the PA bit is 1, then address
translation is not performed for the access, regardless of the value of the Physical Addressing/Data (PD) bit in the Current Processor Status Register. If the PA bit is 0, address translation depends on the PD bit.

The PA bit may be 1 only for Supervisor-mode instructions. If it is 1 for a User-mode instruction, a Protection Violation trap occurs.

**Bit 20: Set Byte Pointer/Sign Bit (SB)**—If the Data Width Enable (DW) bit of the Configuration Register is 0 and the SB bit is 1, the Byte Pointer Register is written with the two least-significant bits of the address for the access. These address bits can control subsequent character and half-word operations. If the BP bit is 0, the Byte Pointer Register is not affected.

If the Data Width Enable (DW) bit of the Configuration Register is 1 and the SB bit is 1 for a load, the loaded byte or half-word is sign-extended in the destination register; if the SB bit is 0, the byte or half-word is zero-extended. If the DW bit is 1 and the SB bit is 1 for either a load or store, then each bit of the Byte Pointer Register is written with the complement of the Byte Order bit of the Configuration Register. The Byte Pointer Register is set in this case to provide software compatibility across different types of memory systems. If the SB bit is 0, the Byte Pointer Register is not affected.

**Bit 19: User Access (UA)**—The UA bit allows programs executing in the Supervisor mode to emulate User-mode accesses. This allows checking of the authorization of an access requested by a User-mode program. It also causes address translation (if applicable) to be performed using the PID field of the MMU Configuration Register, rather than the fixed Supervisor-mode process identifier zero.

If the UA bit is 1 for a Supervisor-mode load or store, the access associated with the instruction is performed in the User mode. In this case, the User mode affects only TLB protection-checking, the SUP/US output, and the use of the PID field in translation; it has no effect on the registers that can be accessed by the instruction. If the UA bit is 0, the program mode for the access is controlled by the SM bit.

If the UA bit is 1 for a User-mode load or store, a Protection Violation trap occurs.

**Bits 18–16: Option (OPT)**—This field is placed on the OPT(2–0) outputs during the address cycle of the access. There is a one-to-one correspondence between the OPT field and the OPT(2–0) outputs; that is, the most-significant OPT bit is placed on OPT2, and so on.

The OPT field controls system functions as described below.

**Bits 15–8: (RA)**—The data for the access is written into the general-purpose register RA for a load, and is supplied by register RA for a store.

**Bits 7–0: (RB or I)**—In load and store instructions, the RB or I field specifies the address for the access. The address is either the content of a general-purpose register, with register number RB, or a constant value I (zero-extended to 32 bits). The M bit of the operation code (bit 24) determines whether the register or the constant is used.

Load and store operations are overlapped with the execution of instructions that follow the load or store instruction. Only one load or store may be in progress on any given cycle. If a load or store instruction is encountered while another load or store operation is in progress, the processor enters the Pipeline Hold mode until the first operation completes. However, the address for the second operation may appear on the Address Bus if the first operation is to a device or memory that supports pipelined operations (see Section 5.2.8).
3.4.4.1 LOAD OPERATIONS

The processor provides the following instructions for performing load operations: Load (LOAD), Load and Lock (LOADL), Load and Set (LOADSET), and Load Multiple (LOADM). All of these instructions transfer data from an external device or memory into one or more general-purpose registers.

The LOADL instruction supports the implementation of device and memory interlocks in a multi-processor configuration. It activates the LOCK output during the address cycle of the access.

The LOADSET instruction implements a binary semaphore. It loads a general-purpose register and atomically writes the accessed location with a word which has 1 in every bit position (that is, the write is indivisible from the read). The LOCK output is asserted during both the read and write access. Note that, if address translation is enabled for the LOADSET instruction, the TLB memory-protection bits must allow both the read and write access. If either the read or write access is not allowed, neither access is performed.

The LOADM loads a specified number of registers from sequential addresses, as explained below.

Load operations are overlapped with the execution of instructions that follow the load instruction. The processor detects any dependencies on the loaded data that subsequent instructions may have, and, if such a dependency is detected, enters the Pipeline Hold mode until the data is returned by the external device or memory. If a register that is the target of an incomplete load is written with the result of a subsequent instruction, the processor does not write the returning data into the register when the load completes; the Not Needed (NN) bit in the Channel Control Register is set in this case.

Whenever possible, the Am29050 microprocessor performs an early load, making the physical address available at the end of the decode cycle of the load instruction. At the beginning of the next cycle, when the load enters the execute stage, the physical address appears on the channel. Early loads reduce the effective external access time by one cycle. The hardware that supports early loads is discussed in Section 4.3.

3.4.4.2 STORE OPERATIONS

The processor provides the following instructions for performing store operations: Store (STORE), Store and Lock (STOREL), and Store Multiple (STOREM). All of these instructions transfer data from one or more general-purpose registers to an external device or memory.

The STOREL instruction supports the implementation of device and memory interlocks in a multi-processor configuration. It activates the LOCK output during the address cycle of the access.

The STOREM instruction stores a specified number of registers to sequential addresses, as explained below.

Store operations are overlapped with the execution of instructions that follow the store instruction. However, no data dependencies can exist, since the store prevents any subsequent accesses until it completes.

3.4.4.3 MULTIPLE ACCESSES

Load Multiple (LOADM) and Store Multiple (STOREM) instructions move contiguous words of data between general-purpose registers and external devices and memories. The number of transfers is determined by the Load/Store Count Remaining Register.
The Load/Store Count Remaining (CR) field in the Load/Store Count Remaining Register specifies the number of transfers to be performed by the next LOADM or STOREM executed in the instruction sequence. The CR field is in the range of 0 to 255, and is zero-based: a count value of 0 represents one transfer, and a count value of 255 represents 256 transfers. The CR field also appears in the Channel Control Register.

Before a LOADM or STOREM is executed, the CR field is set by a Move To Special Register. A LOADM or STOREM uses the most-recently written value of the CR field. If an attempt is made to alter the CR field, and the Channel Control Register contains information for an external access that has not yet completed, the processor enters the Pipeline Hold mode until the access completes. Note that since the CR is set independently of the LOADM and STOREM, the CR field may represent valid state of an interrupted program even if the Contents Valid (CV) bit of the Channel Control Register is 0.

Because of the pipelined implementation of LOADM and STOREM, at least one instruction (e.g., the instruction that sets the CR field) must separate two successive LOADM and/or STOREM instructions.

After the CR field is set, the execution of a LOADM or STOREM begins the data transfer. As with any other load or store operation, the LOADM or STOREM waits until any pending load or store operation is complete before starting. The LOADM instruction specifies the starting address and starting destination general-purpose register. The STOREM instruction specifies the starting address and the starting source general-purpose register.

During the execution of the LOADM or STOREM instruction, the processor updates the address and register number after every access, incrementing the address by four and the register number by 1. This continues until either all accesses are completed or an interrupt or trap is taken.

For a load-multiple or store-multiple address sequence, addresses wrap from the largest possible value (hexadecimal FFFFFFFC) to the smallest possible value (hexadecimal 00000000).

The processor increments absolute register numbers during the load-multiple or store-multiple sequence. Absolute-register numbers wrap from 127 to 128, and from 255 to 128. Thus, a sequence that begins in the global registers may transition to the local registers, but a sequence that begins in the local registers remains in the local registers. Also, note that the local registers are addressed circularly.

The normal restrictions on register accesses apply for the load-multiple and store-multiple sequences. For example, if a protected general-purpose register is encountered in the sequence for a User-mode program, a Protection Violation trap occurs.

Intermediate addresses are stored in the Channel Address Register, and register numbers are stored in the Target Register (TR) field of the Channel Control Register. For the STOREM instruction, the data for every access is stored in the Channel Data Register (this register also is set during the execution of the LOADM instruction, but has no interpretation in this case). The CR field is updated on the completion of every access, so that it indicates the number of accesses remaining in the sequence.

Load-multiple and store-multiple operations are indicated by the Multiple Operation (ML) bit in the Channel Control Register. The ML bit is used to restart a multiple operation on an interrupt return; if it is set independently by a Move To Special Register before a load or store instruction is executed, the results are unpredictable.

While a multiple load or store is executing, the processor is in the Pipeline Hold mode, suspending any subsequent instruction execution until the multiple access completes.
If an interrupt or trap is taken, the Channel Address, Channel Data, and Channel Control registers contain the state of the multiple access at the point of interruption. The multiple access may be resumed at this point, at a later time, by an interrupt return.

The processor attempts to complete multiple accesses using the burst-mode capability of the channel (see Section 5.2.9). For this reason, multiple accesses of individual bytes and half-words is not supported. If the burst-mode access is preempted, the processor retransmits the address at the point of preemption. If the external device or memory cannot support burst-mode accesses, the processor transmits an address for every access. If the address sequence causes a virtual page-boundary crossing, the processor preempts the burst-mode access, translates the address for the new page, and re-establishes the burst-mode access using the new physical address.

3.4.4.4 OPTION BITS

The Option field in the load and store instructions supports system functions, such as byte and half-word accesses. The definition of this field for a load or store, depending on the AS bit of the instruction, is as follows:

<table>
<thead>
<tr>
<th>AS</th>
<th>OPT2</th>
<th>OPT1</th>
<th>OPT0</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>Word-length access</td>
</tr>
<tr>
<td>x</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>Byte access</td>
</tr>
<tr>
<td>x</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Half-word access</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>Instruction ROM access (as data)</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>Cache control</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>Hardware-development system accesses</td>
</tr>
</tbody>
</table>

—All Others —  Reserved

Note that some of these encodings do not affect processor operation, and could have other interpretations in a particular system. For example, the OPT values 000, 001, and 010 affect processor operation only if the DW bit of the Configuration Register is 1. However, non-standard uses of the OPT field have an implication on the portability of software between different systems.

3.4.5 Addressing and Alignment

3.4.5.1 ADDRESS SPACES

External instructions and data are contained in one of five 32-bit address-spaces:

1. Data Memory
2. Input/Output
3. Coprocessor
4. Instruction Read-Only Memory (Instruction ROM)
5. Instruction Random-Access Memory (Instruction RAM)

An address in the instruction/data memory address space may be treated as virtual or physical, as determined by the Current Processor Status Register. Address translation for data accesses is enabled separately from address translation for instruction accesses. A program in the Supervisor mode may temporarily disable address translation for individual loads and stores; this permits load-real and store-real operations.

It is possible to partition physical instruction and data addresses into two separate physical address spaces. However, virtual instruction and data addresses appear in the same virtual address space (i.e., instruction/data memory).
The coprocessor address space is not an address space in the strictest sense. The coprocessor address space is defined so that transfers of operands and operation codes to the coprocessor do not interfere with other external devices and memories.

The processor does not directly support the access of the instruction ROM or instruction RAM address spaces using loads and stores; this capability is defined as a system option requiring external hardware.

For untranslated data accesses, bits contained in load and store instructions distinguish between the instruction/data memory, input/output, and coprocessor address spaces. For translated data accesses, the Input/Output bit of the associated TLB entry distinguishes between the instruction/data memory and input/output address spaces.

For instruction fetches, the ROM Enable (RE) bit of the Current Processor Status Register distinguishes between the instruction/data and instruction ROM address spaces.

### 3.4.5.2 BYTE AND HALF-WORD ADDRESSING

The Am29050 microprocessor generates word-oriented byte addresses for accesses to external devices and memories. Addresses are word-oriented because loads, stores, and instruction fetches access words. However, addresses are byte addresses because they are sufficient to select bytes packed within accessed words. For load and store operations, the processor provides means for using the least-significant address bits to access bytes and half-words within external words.

The selection of a byte within a word is determined by the two least-significant bits of an address and the Byte Order (BO) bit of the Configuration Register. The selection of a half-word within a word is determined by the next-to-least significant bit of an address and the BO bit. Figure 3-47 illustrates the addressing of bytes and half-words when the BO bit is 0 (big endian), and Figure 3-48 illustrates the addressing of bytes and half-words when the BO bit is 1 (little endian). In Figure 3-47 and Figure 3-48, addresses are represented in hexadecimal notation.

#### Figure 3-47 Byte and Half-Word Addressing with BO = 0 (Big Endian)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Half-Word 00000000</td>
<td>Word 00000000</td>
<td>Half-Word 00000002</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Byte 00000000</td>
<td>Byte 00000001</td>
<td>Byte 00000002</td>
<td>Byte 00000003</td>
<td></td>
</tr>
<tr>
<td>Half-Word 00000004</td>
<td>Word 00000004</td>
<td>Half-Word 00000006</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Byte 00000004</td>
<td>Byte 00000005</td>
<td>Byte 00000006</td>
<td>Byte 00000007</td>
<td></td>
</tr>
</tbody>
</table>

#### Figure 3-48 Byte and Half-Word Addressing with BO = 1 (Little Endian)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Half-Word 00000000</td>
<td>Word 00000000</td>
<td>Half-Word 00000002</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Byte 00000000</td>
<td>Byte 00000001</td>
<td>Byte 00000002</td>
<td>Byte 00000003</td>
<td></td>
</tr>
<tr>
<td>Half-Word 00000004</td>
<td>Word 00000004</td>
<td>Half-Word 00000006</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Byte 00000004</td>
<td>Byte 00000005</td>
<td>Byte 00000006</td>
<td>Byte 00000007</td>
<td></td>
</tr>
</tbody>
</table>

```
<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Half-Word FFFFFFF8</td>
<td>Word FFFFFFF8</td>
<td>Half-Word FFFFFFFA</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Byte FFFFFFF8</td>
<td>Byte FFFFFFF9</td>
<td>Byte FFFFFFFA</td>
<td>Byte FFFFFFFB</td>
<td></td>
</tr>
<tr>
<td>Half-Word FFFFFFFC</td>
<td>Word FFFFFFFC</td>
<td>Half-Word FFFFFFFE</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Byte FFFFFFFC</td>
<td>Byte FFFFFFFD</td>
<td>Byte FFFFFFFE</td>
<td>Byte FFFFFFFF</td>
<td></td>
</tr>
</tbody>
</table>
```
In the processor, the two least-significant bits of an external address can be reflected in the Byte Pointer (BP) field of the ALU Status Register when the DW bit of the Configuration Register is 0. Alternatively, the two least-significant bits of the address can be used to control byte and half-word accesses when the DW bit is 1. The BO bit affects only the interpretation of the BP field and the two least-significant address bits.

If the BO bit is 0, bytes are ordered within words such that a 00 in the BP field or in the two least-significant address bits selects the high-order byte of a word, and a 11 selects the low-order byte. If the BO bit is 1, a 00 in the BP field or in the two least-significant address bits selects the low-order byte of a word, and a 11 selects the high-order byte.

If the BO bit is 0, half-words are ordered within words such that a 0 in the most-significant bit of the BP field or the next-to-least-significant address bit selects the high-order half-word, and a 1 selects the low-order half-word. If the BO bit is 1, a 0 in the most-significant bit of the BP field or the next-to-least-significant address bit selects the low-order half-word of a word, and a 1 selects the high-order half-word. Note that since the least-significant bit of the BP field or an address does not participate in the selection of half-words, the alignment of half-words is forced to half-word boundaries in this case.

### 3.4.5.3 ALIGNMENT OF WORDS AND HALF-WORDS

Since only byte addressing is supported, it is possible that an address for the access of a word or half-word is not aligned to the desired word or half-word. The Am29050 microprocessor either ignores or forces alignment in most cases. However, some systems may require that unaligned accesses be supported, for compatibility reasons. Because of this, the Am29050 microprocessor provides an option that creates a trap when a non-aligned access is attempted. This trap allows software emulation of the non-aligned accesses, in a manner which is appropriate for the particular system.

The detection of unaligned accesses is activated by a 1 in the Trap Unaligned Access (TU) bit of the Current Processor Status Register. Unaligned-access detection is
based on the data length as indicated by the OPT field of a load or store instruction, and on the two least-significant bits of the specified address. Only addresses for instruction/data memory accesses are checked; alignment is ignored for input/output accesses and coprocessor transfers.

An Unaligned Access trap occurs only if the TU bit is 1 and any of the following combinations of OPT field and address bits is detected for a load or store to instruction/data memory:

<table>
<thead>
<tr>
<th>OPT2</th>
<th>OPT1</th>
<th>OPT0</th>
<th>A1</th>
<th>A0</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>[Unaligned Word Access]</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>[Unaligned Half-Word Access]</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>[Unaligned Half-Word Access]</td>
</tr>
</tbody>
</table>

The trap handler for the Unaligned Access trap is responsible for generating the correct sequence of aligned accesses and performing any necessary shifting, masking and/or merging. Note that a virtual page-boundary crossing also may have to be considered.

### 3.4.5.4 ALIGNMENT OF INSTRUCTIONS

In the Am29050 microprocessor, all instructions are 32 bits in length, and are aligned on word-address boundaries. The processor's Program Counter is 30 bits in length, and the least-significant two bits of processor-generated instruction addresses are always 00. An unaligned address can be generated by indirect jumps and calls. However, alignment is ignored by the processor in this case, and it expects the system to force alignment (i.e., by interpreting the two least-significant address bits as 00, regardless of their values).

### 3.4.5.5 ACCESSING INSTRUCTIONS AS DATA

To aid the external access of instructions and data on separate buses, the processor distinguishes between instruction and data accesses. However, it does not support a logical distinction between instruction and data address spaces (except in the case of instruction read-only memory). In particular, address translation in the Memory Management Unit is in no way affected by this distinction (although memory protection is).

In systems where it is necessary to access instructions as data, this function should be performed via the shared address space. The OPT field provides a means for loads to access instructions in the instruction read-only memory (ROM) address space. The Am29050 microprocessor does not take any action to prevent a store to the instruction ROM address space.

### 3.4.6 Byte and Half-Word Accesses

The Am29050 microprocessor can perform byte and half-word accesses in either software or hardware, under control of the Data Width Enable (DW) bit of the Configuration Register. Software byte and half-word accesses are selected by a DW bit of 0, and hardware byte and half-word accesses are selected by a DW bit of 1. Software byte and half-word accesses are less efficient than hardware byte and half-word accesses, but hardware accesses require that the system be able to selectively write individual byte and half-word positions within external devices and memories. The software-only technique is compatible with systems designed to provide hardware support for byte and half-word accesses.
This section describes the operation of both software and hardware byte and half-word accesses. Byte and half-word accesses operate as described here for memory and input/output accesses, but not for coprocessor transfers. Coprocessor transfers are unaffected by the DW bit.

The DW bit is cleared by a processor reset. It must explicitly be set to 1 by software before hardware byte and half-word accesses can be performed.

### SOFTWARE BYTE AND HALF-WORD ACCESSES

If the DW bit is 0, the Am29050 microprocessor allows the Byte Pointer Register to be set with the least-significant bits of an address specified by any load or store instruction, except those that transfer information to and from the coprocessor. Insert and extract instructions can then be used to access the byte or half-word of interest, after the external word has been accessed. This provides a general-purpose mechanism for manipulating external byte and half-word data, without the need for external hardware support.

To load a byte or half-word, a word load first is performed. This load sets the BP field with the two least-significant bits of the address. A subsequent EXBYTE, EXHW or EXHWS instruction extracts the byte or half-word of interest from the accessed word.

To store a byte or half-word, a load is first performed, setting the BP field with the two least-significant bits of the address. A subsequent INBYTE or INHW instruction inserts the byte or half-word of interest into the accessed word, and the resulting word then is stored.

Software which relies on loads and stores setting the BP field cannot operate correctly when the Freeze (FZ) bit of the Current Processor Status Register is 1, because the ALU Status Register is frozen.

### HARDWARE BYTE AND HALF-WORD ACCESSES

If the DW bit is 1 on a load, the Am29050 microprocessor selects a byte or half-word from the loaded word depending on: the Option (OPT) bits of the load instruction, the Byte Order (BO) bit of the Configuration Register, and the two least-significant bits of the address (for bytes) or the next-to-least-significant bit of the address (for half-words). The selected byte or half-word is right-justified within the destination register.

If the SB bit of the load instruction is 0, the remainder of the destination register is zero-extended. If the SB bit is 1, the remainder of the destination register is sign-extended with the sign bit of the selected byte or half-word.

If the DW bit is 1 on a store, the Am29050 microprocessor replicates the low-order byte or half-word in the source register into every byte and half-word position of the stored word. The system is responsible for generating the appropriate byte and/or half-word strobes, based on the OPT(2–0) signals and the two least-significant bits of the address, to write the appropriate byte or half-word in the selected device or memory (the system byte order must also be considered). The SB bit does not affect the operation of a store, except for setting the BP field as described below.

If the SB bit is 1 for either a load or store, and the DW bit is also 1, both bits of the BP field are set to the complement of the BO bit when the load or store is executed. This does not directly affect the load or store access, but supports compatibility for software developed for word-write-only systems. Hardware byte and half-word accesses (in contrast to software byte and half-word accesses) can be performed when the FZ bit is 1, because these accesses do not rely on the BP field.
3.4.6.3 SYSTEM ALTERNATIVES AND COMPATIBILITY

The two mechanisms for performing byte and half-word accesses create the possibility of two types of systems. These are named for convenience:

- Type 1: simple, word-only accesses in external devices and memories; software byte and half-word accesses.
- Type 2: byte/half-word strobes in external devices and memories; hardware byte and half-word accesses by the Am29050 microprocessor.

The provision for hardware byte and half-word accesses encourages Type 2 systems. Software for Type 1 systems can execute on Type 2 systems, but the reverse is not true. Software compatibility is possible primarily because of the DW bit and because the Am29050 microprocessor sets the BP field with an appropriate byte pointer even when it performs byte and half-word accesses with internal hardware. Also, the system must return a full word in either type of system, regardless of the access data-width. The DW bit must be 0 in Type 1 systems and must be 1 in Type 2 systems.

To illustrate compatibility between systems, consider the following steps of an unsigned byte load compiled for a Type 1 system, but executing on a Type 2 system:

Perform a load with OPT = 001 and SB = 1.

- Type 1 system: The addressed word is accessed and placed into the destination register. The BP field is set with the two least-significant bits of the address.
- Type 2 system: The addressed byte is accessed, aligned, padded, and placed into the destination register. The BP field is set to point to the low-order byte, reflecting the alignment that has been performed (the pointer depends on the value of the BO bit).

Perform a byte extract on the loaded word.

- Type 1 system: The byte selected by the BP field is aligned to the low-order byte of the destination register and the remainder of the word is zero-extended. The selected byte may be in any byte position.
- Type 2 system: The byte selected by the BP field (set to point to the low-order byte) is aligned to the low-order byte of the destination register and the remainder of the word is zero-extended. (Note that the selected byte was already in the low-order byte position. This operation does not change program state but merely allows software compatibility.)

The recommended instruction sequences for all types of byte and half-word accesses and for both types of systems are enumerated below. Compatibility between these systems follows the above example, but for brevity, compatibility is not described in detail here.

**Byte Read, Unsigned:**

<table>
<thead>
<tr>
<th>Type 1</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>load 0,17,temp,addr;</td>
<td>OPT=001, SB=1</td>
</tr>
<tr>
<td>exbyte temp,temp,0</td>
<td>get byte</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Type 2</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>load 0,1,temp,addr;</td>
<td>OPT=001, SB=0</td>
</tr>
</tbody>
</table>
Byte Read, Signed:

**Type 1**
- `load 0,17,temp,addr`
- `exbyte temp,temp,0`
- `sll temp,temp,24`
- `sra temp,temp,24`

**Type 2**
- `load 0,17,temp,addr`

**Comments**
- ; OPT=001, SB=1
- ; get byte
- ; sign extend

**Byte Write:**

**Type 1**
- `load 0,17,temp,addr`
- `inbyte temp,temp,data`
- `store 0,1,temp,addr`

**Type 2**
- `store 0,1,data,addr`

**Comments**
- ; OPT=001, SB=1
- ; insert byte
- ; store

Half-Word Read, Unsigned:

**Type 1**
- `load 0,18,temp,addr`
- `exhw temp,temp,0`

**Type 2**
- `load 0,2,temp,addr`

**Comments**
- ; OPT=010, SB=1
- ; get half-word unsigned

Half-Word Read, Signed:

**Type 1**
- `load 0,18,temp,addr`
- `exhws temp,temp`

**Type 2**
- `load 0,18,temp,addr`

**Comments**
- ; OPT=010, SB=1
- ; get half-word sign-extend
- ; OPT=010, SB=1, (sign-extend)

Half-Word Write:

**Type 1**
- `load 0,18,temp,addr`
- `inhw temp,temp,data`
- `store 0,2,temp,addr`

**Type 2**
- `store 0,2,data,addr`

**Comments**
- ; OPT=010, SB=1
- ; insert half-word
- ; store
- ; OPT=010, SB=0
INTERRUPTS AND TRAPS

Interrupts and traps cause the Am29050 microprocessor to suspend the execution of an instruction sequence and to begin the execution of a new sequence. The processor may or may not later resume the execution of the original instruction sequence.

The distinction between interrupts and traps is largely one of causation and enabling. Interrupts allow external devices and the Timer Facility to control processor execution, and are always asynchronous to program execution. Traps are intended to be used for certain exceptional events that occur during instruction execution, and are generally synchronous to program execution.

Throughout this manual, a distinction is made between the point at which an interrupt or trap occurs and the point at which it is taken. An interrupt or trap is said to occur when all conditions that define the interrupt or trap are met. However, an interrupt or trap that occurs is not necessarily recognized by the processor, either because of various enables, or because of the processor's operational mode (e.g., Halt mode). An interrupt or trap is taken when the processor recognizes the interrupt or trap and alters its behavior accordingly.

3.5.1 Interrupts

Interrupts are caused by signals applied to any of the external inputs INTR(3–0), or by the Timer Facility (see Section 7.3.6). The processor may be disabled from taking certain interrupts by the masking capability provided by the Disable All Interrupts and Traps (DA) bit, Disable Interrupts (DI) bit, and Interrupt Mask (IM) field in the Current Processor Status Register.

The DA bit disables all interrupts. The DI bit disables external interrupts without affecting the recognition of traps and Timer interrupts. The 2-bit IM field selectively enables external interrupts as follows:

<table>
<thead>
<tr>
<th>IM Value</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>INTRO enabled</td>
</tr>
<tr>
<td>01</td>
<td>INTR(1–0) enabled</td>
</tr>
<tr>
<td>10</td>
<td>INTR(2–0) enabled</td>
</tr>
<tr>
<td>11</td>
<td>INTR(3–0) enabled</td>
</tr>
</tbody>
</table>

Note that the INTRO interrupt cannot be disabled by the IM field. Also, note that no external interrupt is taken if either the DA or DI bit is 1. The Interrupt Pending bit in the Current Processor Status indicates that one or more of the signals INTR(3–0) is active, but that the corresponding interrupt is disabled due to the value of either DA, DI, or IM.

3.5.2 Traps

Traps are caused by signals applied to one of the inputs TRAP(1–0), or by exceptional conditions such as protection violations. Except for the Instruction Access Exception, Data Access Exception, and Coprocessor Exception traps, traps are disabled by the DA bit in the Current Processor Status; a 1 in the DA bit disables traps, and a 0 enables traps. It is not possible to selectively disable individual traps. If a trap occurs (except a trap caused by TRAP(1–0)) and the DA bit is 1, the processor enters the Monitor mode via a Monitor trap (see Section 3.5.7).
3.5.3 **Wait Mode**

A wait-for-interrupt capability is provided by the Wait mode. The processor is in the Wait mode whenever the Wait Mode (WM) bit of the Current Processor Status is 1. While in Wait mode, the processor neither fetches nor executes instructions, and performs no external accesses. The Wait mode is exited when an interrupt or trap is taken.

Note that the processor can take only those interrupts or traps for which it is enabled, even in the Wait mode. For example, if the processor is in the Wait mode with a DA bit of 1, it can leave the Wait mode only via the Reset mode (see Section 3.9) or a WARN trap (see Section 3.5.6).

3.5.4 **Vector Area**

Interrupt and trap processing relies on the existence of a user-managed Vector Area in external instruction/data memory or instruction read-only memory (instruction ROM). The Vector Area begins at an address specified by the Vector Area Base Address Register, and provides for as many as 256 different interrupt and trap handling routines. The processor reserves 64 routines for system operation and instruction emulation. The number and definition of the remaining 192 possible routines are system-dependent.

The Vector Area has one of two possible structures as determined by the Vector Fetch (VF) bit in the Configuration Register. The first structure, as described below, requires less external memory than the second, but imposes the performance penalty of the vector-table lookup.

If the VF bit is 1, the structure of the Vector Area is a table of vectors in instruction/data memory. The layout of a single vector is shown in Figure 3-49. Each vector gives the beginning word-address of the associated interrupt or trap handling routine, and specifies, by the R bit, whether the routine is contained in instruction/data memory ($R = 0$) or instruction ROM ($R = 1$).

![Vector Table Entry](image)

If the VF bit is 0, the structure of the Vector Area is a segment of contiguous blocks of instructions in instruction/data memory or instruction ROM. The ROM Vector Area (RV) bit of the Configuration Register determines whether the Vector Area is in instruction/data memory ($RV = 0$) or instruction ROM ($RV = 1$). A 64-instruction block contains exactly one interrupt or trap handling routine, and blocks are aligned on 64-instruction address boundaries.

3.5.4.1 **VECTOR NUMBERS**

When an interrupt or trap is taken, the processor determines an 8-bit vector number associated with the interrupt or trap. The vector number gives either the number of a vector table entry or the number of an instruction block, depending on the value of the VF bit. If the VF bit is 1, the physical address of the vector table entry is generated by replacing bits 9–2 of the value in the Vector Area Base Address Register with the vector number. If the VF bit is 0, the physical address of the first instruction of the
handling routine is generated by replacing bits 15–8 of the value in the Vector Table Base Address Register with the vector number.

Vector numbers are either pre-defined, or specified by an instruction causing the trap. The assignment of vector numbers is shown in Table 3-10 (vector numbers are in decimal notation). Vector numbers 64 to 255 are for use by trapping instructions; the definition of the routines associated with these numbers is system-dependent.

3.5.5 Interrupt and Trap Handling

Interrupt and trap handling consists of two distinct operations: taking the interrupt or trap, and returning from the interrupt or trap handler. If the interrupt or trap handler returns directly to the interrupted routine, the interrupt or trap handler need not save and restore processor state.

3.5.5.1 TAKING AN INTERRUPT OR TRAP

The following operations are performed in sequence by the processor when an interrupt or trap is taken.

1. Instruction execution is suspended.
2. Instruction fetching is suspended.
3. Any in-progress load or store operation is completed. Any additional operations are canceled in the case of load multiple and store multiple.
4. The contents of the Current Processor Status Register are copied into the Old Processor Status Register.
5. The Current Processor Status register is modified as shown in Figure 3-50 (the value $u$ means unaffected, and the MM bit is set only if the trap causes the processor to enter the Monitor mode). Note that setting the Freeze (FZ) bit freezes the Channel Address, Channel Data, Channel Control, Program Counter 0, Program Counter 1, Program Counter 2, and ALU Status Registers.
6. The address of the first instruction of the interrupt or trap handler is determined. If the VF bit of the Configuration Register is 1, the address is obtained by accessing a vector from instruction/data memory, using the physical address obtained from the Vector Area Base Address Register and the vector number. This access appears on the channel as a data access, and the OPT(2–0) signals indicate a word-length access. If the VF bit is 0, the instruction address is given directly by the Vector Area Base Address Register and the vector number.
7. If the VF bit is 1, the R bit in the vector fetched in step 6 is copied into the RE bit of the Current Processor Status Register. If the VF bit is 0, the RV bit of the Configuration Register is copied into the RE bit. This step determines whether or not the first instruction of the interrupt handler is in instruction ROM.
8. An instruction fetch is initiated using the instruction address determined in step 6. At this point, normal instruction execution resumes.

Note that the processor does not explicitly save the contents of any registers when an interrupt is taken. If register saving is required, it is the responsibility of the interrupt-or trap-handling routine. For proper operation, registers must be saved before any further interrupts or traps may be taken. The FZ bit must be reset at least two instructions before interrupts or traps are re-enabled, to allow program state to be reflected properly in processor registers if an interrupt or trap is taken.
<table>
<thead>
<tr>
<th>Number</th>
<th>Type of Trap or Interrupt</th>
<th>Cause</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Illegal Opcode</td>
<td>Executing undefined instruction*</td>
</tr>
<tr>
<td>1</td>
<td>Unaligned Access</td>
<td>Access on unnatural boundary, TU = 1</td>
</tr>
<tr>
<td>2</td>
<td>Out of Range</td>
<td>Overflow or underflow</td>
</tr>
<tr>
<td>3</td>
<td>Coprocessor Not Present</td>
<td>Coprocessor access, CP = 0</td>
</tr>
<tr>
<td>4</td>
<td>Coprocessor Exception</td>
<td>Coprocessor DERR response</td>
</tr>
<tr>
<td>5</td>
<td>Protection Violation</td>
<td>Invalid User-mode operation</td>
</tr>
<tr>
<td>6</td>
<td>Instruction Access Exception</td>
<td>IERR response</td>
</tr>
<tr>
<td>7</td>
<td>Data Access Exception</td>
<td>DERR response, not coprocessor</td>
</tr>
<tr>
<td>8</td>
<td>User-Mode Instruction TLB Miss</td>
<td>No TLB entry for translation</td>
</tr>
<tr>
<td>9</td>
<td>User-Mode Data TLB Miss</td>
<td>No TLB entry for translation</td>
</tr>
<tr>
<td>10</td>
<td>Supervisor-Mode Instruction TLB Miss</td>
<td>No TLB entry for translation</td>
</tr>
<tr>
<td>11</td>
<td>Supervisor-Mode Data TLB Miss</td>
<td>No TLB entry for translation</td>
</tr>
<tr>
<td>12</td>
<td>Instruction MMU Protection Violation</td>
<td>TLB or RMU UE/SE = 0</td>
</tr>
<tr>
<td>13</td>
<td>Data MMU Protection Violation</td>
<td>TLB or RMU UR/SR = 0, UW/SW = 0 on write</td>
</tr>
<tr>
<td>14</td>
<td>Timer</td>
<td>Timer Facility</td>
</tr>
<tr>
<td>15</td>
<td>Trace</td>
<td>Trace Facility, breakpoint comparisons</td>
</tr>
<tr>
<td>16</td>
<td>INTR0</td>
<td>INTR0 input</td>
</tr>
<tr>
<td>17</td>
<td>INTR1</td>
<td>INTR1 input</td>
</tr>
<tr>
<td>18</td>
<td>INTR2</td>
<td>INTR2 input</td>
</tr>
<tr>
<td>19</td>
<td>INTR3</td>
<td>INTR3 input</td>
</tr>
<tr>
<td>20</td>
<td>TRAP0</td>
<td>TRAP0 input</td>
</tr>
<tr>
<td>21</td>
<td>TRAP1</td>
<td>TRAP1 input</td>
</tr>
<tr>
<td>22</td>
<td>Floating-Point Exception</td>
<td>Unmasked floating-point exception</td>
</tr>
<tr>
<td>23</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>24</td>
<td>FMAC exception</td>
<td>ACF in FPE Register = 00 or 11</td>
</tr>
<tr>
<td>25</td>
<td>DMAC exception</td>
<td>ACF in FPE Register = 00 or 11</td>
</tr>
<tr>
<td>26–27</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>28</td>
<td>Reserved for instruction emulation (opcodes BF, CF–D6, DC)</td>
<td></td>
</tr>
<tr>
<td>29</td>
<td>Reserved for instruction emulation (opcode DD)</td>
<td></td>
</tr>
<tr>
<td>30–32</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>33</td>
<td>DIVIDE</td>
<td>DIVIDE instruction</td>
</tr>
<tr>
<td>34</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>35</td>
<td>DIVIDU</td>
<td>DIVIDU instruction</td>
</tr>
<tr>
<td>36</td>
<td>CONVERT exception</td>
<td>FS = 00 or 11 or FD = 00 or 11</td>
</tr>
<tr>
<td>37</td>
<td>SQRT exception</td>
<td>FS = 00 or 11</td>
</tr>
<tr>
<td>38</td>
<td>CLASS exception</td>
<td>FS = 00 or 11</td>
</tr>
<tr>
<td>39</td>
<td>Reserved for instruction emulation (opcode E7)</td>
<td></td>
</tr>
<tr>
<td>40</td>
<td>MTACC exception</td>
<td>FMT = 11 or FMT = 00 and ACF = 00 or 11</td>
</tr>
<tr>
<td>41</td>
<td>MFACC exception</td>
<td>FMT = 11 or FMT = 00 and ACF = 00 or 11</td>
</tr>
<tr>
<td>42–55</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>56</td>
<td>Reserved for instruction emulation (opcode F8)</td>
<td></td>
</tr>
<tr>
<td>57</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>58–63</td>
<td>Reserved for instruction emulation (opcode FA–FF)</td>
<td></td>
</tr>
<tr>
<td>64–255</td>
<td>ASSERT and EMULATE instruction traps (vector number specified by instruction)</td>
<td></td>
</tr>
</tbody>
</table>

* This vector number also results if an external device removes INTR(3–0) or TRAP(1–0) before the corresponding interrupt or trap is taken by the processor.
3.5.5.2 RETURNING FROM AN INTERRUPT OR TRAP

Two instructions are used to resume the execution of an interrupted program: Interrupt Return (IRET), and Interrupt Return and Invalidate (IRETINV). These instructions are identical except in one respect: the IRETINV instruction resets all Valid bits in the Branch Target Cache memory, whereas the IRET instruction does not affect the Valid bits.

In some situations, the processor state must be set properly by software before the interrupt return is executed. The following is a list of operations normally performed in such cases:

1. The Current Processor Status is configured as shown in Figure 3-51 (the value $x$ is a $don't care$ and the value $u$ means unaffected). Note that setting the FZ bit freezes the registers listed below so that they may be set for the interrupt return.

2. The Old Processor Status is set to the value of the Current Processor Status for the target routine.

3. The Channel Address, Channel Data, and Channel Control registers are set to restart or resume uncompleted channel operations of the target routine.

4. The Program Counter 1 and Program Counter 0 registers are set to the addresses of the first and second instructions, respectively, to be executed in the target routine.

5. Other registers are set as required. These may include registers such as the ALU Status, Q, and so forth, depending on the particular situation. Some of these registers are unaffected by the FZ bit, so they must be set in such a manner that they are not modified unintentionally before the interrupt return.

---

**Figure 3-50**  
Current Processor Status After an Interrupt or Trap

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Reserved

MM IP TP FZ RE PD SM IM DA
CA TE TU LK WM PI DI

---

**Figure 3-51**  
Current Processor Status Before Interrupt Return

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
Once the processor registers are configured properly, as described above, an interrupt return instruction (IRET or IRETNV) performs the remaining steps necessary to return to the target routine. The following operations are performed by the interrupt return instruction:

1. Any in-progress load or store operation is completed. If a load-multiple or store-multiple sequence is in progress, the interrupt return is not executed until the sequence completes.

2. Interrupts and traps are disabled, regardless of the settings of the DA, DI, and IM fields of the Current Processor Status, for steps 3 through 10.

3. If the interrupt return instruction is an IRETNV, all Valid bits in the Branch Target Cache memory are reset.

4. The contents of the Old Processor Status Register are copied into the Current Processor Status Register. This normally resets the FZ bit allowing the Program Counter 0, 1, 2, Channel Address, Data, Control, and ALU Status registers to update normally. Since certain bits of the Current Processor Status Register always are updated by the processor, this copy operation may be irrelevant for certain bits (e.g., the Interrupt Pending bit).

5. If the Contents Valid (CV) bit of the Channel Control Register is 1, and the Not Needed (NN) and Multiple Operation (ML) bits are both 0, an external access is started. This operation is based on the contents of the Channel Address, Channel Data, and Channel Control registers. The Current Processor Status Register conditions the access—as is normally the case. Note that load-multiple and store-multiple operations are not restarted at this point.

6. The address in Program Counter 1 is used to fetch an instruction. The Current Processor Status Register conditions the fetch. This step is treated as a branch in the sense that the processor searches the Branch Target Cache memory for the target of the fetch.

7. The instruction fetched in step 6 enters the decode stage of the pipeline.

8. The address in Program Counter 0 is used to fetch an instruction. The Current Processor Status Register conditions the fetch. This step is treated as a branch in the sense that the processor searches the Branch Target Cache memory for the target of the fetch.

9. The instruction fetched in step 6 enters the execute stage of the pipeline, and the instruction fetched in step 8 enters the decode stage.

10. If the CV bit in the Channel Control Register is a 1, the NN bit is 0, and the ML bit is 1, a load-multiple or store-multiple sequence is started, based on the contents of the Channel Address, Channel Data, and Channel Control registers.

11. Interrupts and traps are enabled per the appropriate bits in the Current Processor Status Register.

12. The processor resumes normal operation.

### 3.5.5.3 FAST INTERRUPT PROCESSING

The registers affected by the FZ bit of the Current Processor Status Register are those which are modified by almost any usual sequence of instructions. Since the FZ bit is set by an interrupt or trap, the interrupt or trap handler is able to execute while not disturbing the state of the interrupted routine, though its execution is somewhat restricted. Thus, it is not necessary in many cases for the interrupt or trap handler to save the registers that are affected by the FZ bit.
The processor provides an additional benefit if the Program Counter 0 and Program Counter 1 Registers are not modified by the interrupt or trap handler. If Program Counters 0 and 1 contain the addresses of sequential instructions when an interrupt or trap is taken, and if they are not modified before an interrupt return is executed, step 8 of the interrupt return sequence above occurs as a sequential fetch—instead of a branch—for the interrupt return. The performance impact of a sequential fetch is normally less than that of a non-sequential fetch.

Because the registers affected by the FZ bit are sometimes required for instruction execution, it is not possible for the interrupt or trap handler to execute all instructions, unless the required registers are first saved elsewhere (e.g., in one or more global registers). Most of the restrictions due to register dependencies are obvious (e.g., the Byte Pointer for byte extracts), and will not be discussed here. Other less obvious restrictions are listed below:

1. Load Multiple and Store Multiple. The Channel Address, Channel Data, and Channel Control registers are used to sequence load-multiple and store-multiple operations, so these instructions cannot be executed while the registers are frozen. However, note that other external accesses may occur; the Channel Address, Channel Data, and Channel Control registers are required only to restart an access after an exception, and the interrupt or trap handler is not expected to encounter any exceptions.

2. Loads and stores which set the Byte Pointer. If the Set Byte Pointer (SB) of a load or store instruction is 1, and the FZ bit is also 1, there is no effect on the Byte Pointer. Thus, the execution of external byte and half-word accesses using this mechanism is not possible.

3. Extended arithmetic. The Carry bit of the ALU Status Register is not updated while the FZ bit is 1.

4. Divide step instructions. The Divide Flag of the ALU Status Register is not updated when the FZ bit is 1.

If the interrupt or trap handler does not save the state of the interrupted routine, it cannot allow additional interrupts and traps. Also, the operation of the interrupt or trap handler cannot depend on any trapping instructions (e.g., DIVIDE and DIVIDU instructions, illegal operation codes, arithmetic overflow, etc.), since these cause a Monitor trap (see Section 3.5.7). There are certain cases, however, where traps are unavoidable; these are discussed in Section 3.5.10.

**3.5.6 WARN Trap**

The processor recognizes a special trap, caused by the activation of the **WARN** input, which cannot be masked. The **WARN** trap is intended to be used for severe system-error or deadlock conditions. It allows the processor to be placed in a known, operable state, while preserving much of its original state for error reporting and possible recovery. Therefore, it shares some features in common with the Reset mode as well as features common to other traps described in this section.

The major differences between the **WARN** trap and other traps are:

1. The processor does not wait for an in-progress external access to complete before taking the trap, since this access might not complete. However, the information related to any outstanding access is retained by the Channel Address, Channel Data, and Channel Control registers when the trap is taken.
2. The vector-fetch operation is not performed, regardless of the VF bit of the Configuration Register, when the WARN trap is taken. Instead, the ROM Enable (RE) bit in the Current Processor Status is set, and instruction fetching begins immediately at address 16 in the instruction ROM. The trap handler executes directly from the instruction ROM without the need to access external (and possibly non-functional or invalid) instruction/data memory.

Note that WARN trap may disrupt the state of the routine that is executing when it is taken, prohibiting this routine from being restarted.

3.5.7 Monitor Trap

The processor takes a special trap, called the Monitor trap, to enter the Monitor mode. A Monitor trap is taken when the DA bit of the Current Processor Status is 1 and a trap occurs, except for a trap caused by the TRAP(1–0) inputs. Interrupts caused by the INTR(3–0) inputs and the Timer facility cannot cause a Monitor trap.

The major difference between a Monitor trap and other traps is that the processor does not perform the vector-fetch operation. Instead, the processor immediately begins fetching instructions at location 16 in the instruction ROM, as for a WARN trap. The Monitor trap can be distinguished from a WARN trap because the Monitor Mode (MM) bit in the Current Processor Status is 1. The processor also behaves as if the Freeze (FZ), ROM Enable, Physical Addressing/Data, Physical Addressing/Instruction, and Supervisor Mode bits of the Current Processor Status Register were 1. However, the Current Processor Status Register is not affected.

When the Monitor trap is taken, the Shadow Program Counters 0, 1, and 2 contain instruction addresses for the suspended program. The values in the shadow program counters are held while the processor is in the Monitor mode, unless they are explicitly modified by a move-to-special-register instruction. This allows the suspended program to be restarted even if the FZ bit was 1 when the trap was taken; if the FZ bit was 1, the Program Counter 0, 1, and 2 registers do not contain the appropriate addresses.

Also, when the Monitor trap is taken, the Reason Vector Register is set to indicate the cause of the trap. The Reason Vector Register is set with the vector number of the trap condition that caused the Monitor trap. If the Monitor trap is caused by a WARN trap, the value 16 (decimal) is placed into the Reason Vector Register. This is the vector number for the INTRO interrupt; since interrupts cannot cause a Monitor trap, there is no conflict.

In the Monitor mode, the processor ignores interrupts and traps, except for the following traps: Data Access Exception, Coprocessor Exception, Instruction Access Exception, Instruction TLB Miss, Instruction MMU Protection Violation. An occurrence of one of these traps will cause another Monitor trap; however, the shadow program counters and Reason Vector Registers will not be set.

An IRET or IRETINV instruction, executed in the Monitor mode, causes a return from Monitor mode. The process performs all actions that normally apply for an interrupt return, except that it simply clears the MM bit in the Current Processor Status Register rather than loading this register from the Old Processor Status Register, and it resumes execution using the addresses in the shadow program counters rather than the program counters.
### 3.5.8 Sequencing of Interrupts and Traps

On every cycle, the processor decides either to execute instructions or to take an interrupt or trap. Since there are multiple sources of interrupts and traps, more than one interrupt or trap may be pending on a given cycle.

To resolve conflicts, interrupts and traps are taken according to the priority shown in Table 3-11. In this table, interrupts and traps are listed in order of decreasing priority. This section discusses the first three columns of Table 3-11. The last two columns are discussed in Section 3.5.9.

In Table 3-11, interrupts and traps fall into one of two categories depending on the timing of their occurrence relative to instruction execution. These categories are indicated in the third column of Table 3-11 by the labels Inst and Async. These labels have the following meaning:

1. Inst—Generated by the execution or attempted execution of an instruction.
2. Async—Generated asynchronous to and independent of the instruction being executed, although it may be a result of an instruction executed previously.

The principle for interrupt and trap sequencing is that the highest priority interrupt or trap is taken first. Other interrupts and traps remain active until they can be taken, or are regenerated when they can be taken. This is accomplished, depending on the type of interrupt or trap, as follows:

1. All traps in Table 3-11 with priority 13 through 15 are regenerated by the re-execution of the causing instruction.
2. Most of the interrupts and traps of priority 4 through 12 must be held by external hardware until they are taken. The exceptions to this are listed in 3) below.
3. The exceptions to 2 above are the Data Access Exception trap, the Coprocessor Exception trap, the Timer interrupt, and the Trace trap. These are caused by bits in various registers in the processor and are held by these registers until taken or cleared. The relevant bits are: the Transaction Faulted (TF) bit of the Channel Control Register for Data Access Exception and Coprocessor Exception traps, the Interrupt (IN) bit of the Timer Reload Register for Timer interrupts, and the Trace Pending (TP) bit of the Current Processor Status Register for Trace traps.
4. All traps of priority 2 and 3 in Table 3-11, except for the Unaligned Access trap, are not regenerated. These traps are mutually exclusive, and are given high priority because they cannot be regenerated; they must be taken if they occur. If one of these traps occurs at the same time as a reset or WARN trap, it is not taken, and its occurrence is lost.
5. The Unaligned Access trap is regenerated internally when an external access is restarted by the Channel Address, Channel Data, and Channel Control registers. Note that this trap is not necessarily exclusive to the traps discussed in 4) above.

Note that the Channel Address, Channel Data, and Channel Control registers are set for a WARN trap only if an external access is in progress when the trap is taken.

### 3.5.9 Exception Reporting and Restarting

When an instruction encounters an exceptional condition, the Program Counter 0, Program Counter 1, and Program Counter 2 registers report the relevant instruction address(es), and allow the instruction sequence to be restarted once the exceptional condition has been remedied (if possible). Similarly, when an external access or coprocessor transfer encounters an exceptional condition, the Channel Address,
### Table 3-11: Interrupt and Trap Priority Table

<table>
<thead>
<tr>
<th>Priority</th>
<th>Type of Interrupt or Trap</th>
<th>Inst/Async</th>
<th>PC1</th>
<th>Channel Regs</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>WARN (Highest)</td>
<td>Async</td>
<td>Next</td>
<td>See Note 1</td>
</tr>
<tr>
<td></td>
<td>User-Mode Data TLB Miss</td>
<td>Inst</td>
<td>Next</td>
<td>All</td>
</tr>
<tr>
<td></td>
<td>Supervisor-Mode Data TLB Miss</td>
<td>Inst</td>
<td>Next</td>
<td>All</td>
</tr>
<tr>
<td></td>
<td>Data MMU Protection Violation</td>
<td>Inst</td>
<td>Next</td>
<td>All</td>
</tr>
<tr>
<td></td>
<td>Unaligned Access</td>
<td>Inst</td>
<td>Next</td>
<td>All</td>
</tr>
<tr>
<td></td>
<td>Coprocessor Not Present</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Out of Range</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td>2</td>
<td>Floating-Point Exceptions</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Assert Instructions</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Instruction Emulation</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>DIVIDE</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>DIVIDU</td>
<td>Inst</td>
<td>Next</td>
<td>N/A</td>
</tr>
<tr>
<td>3</td>
<td>Data Access Exception</td>
<td>Async</td>
<td>Next</td>
<td>All</td>
</tr>
<tr>
<td></td>
<td>Coprocessor Exception</td>
<td>Async</td>
<td>Next</td>
<td>All</td>
</tr>
<tr>
<td>4</td>
<td>TRAP0</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>5</td>
<td>TRAP1</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>6</td>
<td>INTR0</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>7</td>
<td>INTR1</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>8</td>
<td>INTR2</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>9</td>
<td>INTR3</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>10</td>
<td>Timer</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td>12</td>
<td>Trace (caused by TE, TP bits)</td>
<td>Async</td>
<td>Next</td>
<td>Multiple</td>
</tr>
<tr>
<td></td>
<td>User-Mode Instruction TLB Miss</td>
<td>Inst</td>
<td>Curr</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Supervisor-Mode Instr. TLB Miss</td>
<td>Inst</td>
<td>Curr</td>
<td>N/A</td>
</tr>
<tr>
<td>13</td>
<td>Instruction MMU Protection Violation</td>
<td>Inst</td>
<td>Curr</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Instruction Access Exception</td>
<td>Inst</td>
<td>Curr</td>
<td>N/A</td>
</tr>
<tr>
<td>14</td>
<td>Trace (caused by breakpoint comparison)</td>
<td>Inst</td>
<td>Curr</td>
<td>N/A</td>
</tr>
<tr>
<td>15</td>
<td>Illegal Opcode (Lowest)</td>
<td>Inst</td>
<td>Curr</td>
<td>N/A</td>
</tr>
</tbody>
</table>

Note 1: The Channel Address, Channel Data, and Channel Control registers are set for a **WARN** trap only if an external access is in progress when the trap is taken.
Channel Data, and Channel Control registers report information on the access or transfer, and allow it to be restarted. This section describes the interpretation and use of these registers.

The PC1 column in Table 3-11 describes the value held in the Program Counter 1 Register (PC1) when the interrupt or trap is taken. For traps in the Inst category, PC1 contains either the address of the instruction causing the trap, indicated by Curr, or the address of the instruction following the instruction causing the trap, indicated by Next.

For interrupts and traps in the Async category, PC1 contains the address of the first instruction which was not executed due to the taking of the interrupt or trap. This is the next instruction to be executed upon interrupt return, as indicated by Next in the PC1 column.

### 3.5.9.1 INSTRUCTION EXCEPTIONS

For traps caused by the execution of an instruction (e.g., the Out of Range trap), the Program Counter 2 Register contains the address of the instruction causing the trap. In all of these cases, PC1 is in the Next category. The Exception Opcode Register contains the operation code of the instruction causing the trap.

The traps associated with instruction fetches (i.e., those of priority 13) occur only if the processor attempts the execution of the associated instruction. An exception may be detected during an instruction prefetch, but the associated trap does not occur if a non-sequential fetch occurs before the processor attempts the execution of the invalid instruction. This prevents the spurious indication of instruction exceptions.

In the case of a Monitor trap, the relevant instruction addresses are contained in the Shadow Program Counter 0, 1, and 2 registers rather than the Program Counter 0, 1, and 2 registers.

### 3.5.9.2 DATA EXCEPTIONS

The Channel Regs column of Table 3-11 indicates the cases for which the Channel Address, Channel Data, and Channel Control registers contain information related to an external access or coprocessor transfer (these registers collectively are termed “channel registers” in the following discussion). For the cases indicated, the access or transfer did not complete because of some exceptional condition. Note that the Channel Data Register contains relevant information only in the case of a store.

For the WARN trap, the channel registers are valid only if a load or store were in progress when the trap was taken. Recall that the WARN trap does not wait for any in-progress access to complete.

For the traps with an All in the Channel Regs column of Table 3-11, the channel registers contain information relevant to the trap in all cases. These traps are associated with exceptional events during external accesses or coprocessor transfers.

For the traps with a Multiple in the Channel Regs column, the channel registers might contain information for restarting an interrupted load-multiple or store-multiple operation. In these cases, the operation did not encounter an exception, but was simply canceled for latency considerations.

The information contained in the channel registers allows the processor to restart the related operation during an interrupt return sequence, without any special assistance by software. Software must only insure that the relevant information is retained in, or restored to, the channel registers before an interrupt return is executed.
3.5.10 Arithmetic Exceptions

Integer and floating-point instructions can cause Out of Range or Floating-Point Exception traps, respectively, if an exception is detected during the arithmetic operation. This section describes the conditions under which these traps occur and the additional operations performed beyond those described in Section 3.5.5.

3.5.10.1 INTEGER EXCEPTIONS

Some integer add and subtract instructions—ADDS, ADDU, ADDCS, ADDCU, SUBS, SUBU, SUBCS, SUBCU, SUBRS, SUBRU, SUBRCS, and SUBRCU—cause an Out of Range trap upon overflow or underflow of a 32-bit signed or unsigned result, depending on the instruction.

Two integer multiply instructions—MULTIPLY and MULTIPLU—cause an Out of Range trap upon overflow of a 32-bit signed or unsigned result, respectively, if the MO bit of the Integer Environment Register is 0. If the MO bit is 1, these multiply instructions cannot cause an Out of Range trap.

Two integer divide instructions—DIVIDE and DIVIDU—take the Out of Range trap upon overflow of a 32-bit signed or unsigned result, respectively, if the DO bit of the Integer Environment Register is 0. If the DO bit is 1, the divide instructions cannot cause an Out of Range trap unless the divisor is zero. If the divisor is zero, an Out of Range trap always occurs, regardless of the DO bit.

In addition to the operations described Section 3.5.5, the following operations are performed when an Out of Range trap is taken:

1. The operation code of the instruction causing the exception is placed in the IOP field of the Exception Opcode Register.
2. For the MULTIPLY, MULTIPLU, DIVIDE, and DIVIDU instructions, the absolute register numbers of the excepting instruction's source and destination registers are placed into the Indirect Pointer A, Indirect Pointer B, and Indirect Pointer C registers.
3. For the MULTIPLY, MULTIPLU, DIVIDE, and DIVIDU instructions, the destination register or registers are unchanged.

3.5.10.2 FLOATING-POINT EXCEPTIONS

A Floating-Point Exception trap occurs when an exception is detected during a floating-point operation, and the exception is not masked by the corresponding bit of the Floating-Point Mask Register. In this context, a floating-point operation is defined as any operation that accepts a floating-point number as a source operand, that produces a floating-point result, or both. Thus, for example, the CONVERT instruction may create an exception while attempting to convert a floating-point value to an integer value or vice versa. The occurrence of floating-point exceptions is discussed in detail in Appendix C.

In addition to the operations described in Section 3.5.5, the following operations are performed when a Floating-Point Exception trap is taken:

1. The operation code of the instruction causing the exception is placed in the IOP field of the Exception Opcode Register.
2. The status of the trapping operation is written into the trap status bits of the Floating-Point Status Register. The written status bits do not depend on the values of the corresponding mask bits in the Floating-Point Environment Register.
3. The absolute-register numbers of the excepting instruction's source and destination registers are placed into the Indirect Pointer A, Indirect Pointer B, and
Indirect Point C registers. If the RB or RC field specifies a function code, that code is transferred to the corresponding indirect pointer. Note that if the most-significant bit of this function code is one, the value of the Stack Pointer has been added to the RB field, and must be subtracted to recover the original field.

4. The destination register or registers are left unchanged.

3.5.11 **Exceptions During Interrupt and Trap Handling**

In most cases, interrupt and trap handling routines are executed with the DA bit in the Current Processor Status having a value of 1. It is normally assumed that these routines do not create many of the exceptions possible in most other processor routines, or that whatever exceptions do occur can be handled in the Monitor mode.

If these assumptions are not valid for a particular interrupt or trap handler, it is important that the handler save the state of the processor and reset the FZ bit of the Current Processor Status, so that the handler itself may be restarted properly. This must be accomplished before any interrupts or traps can be taken. In this case, the state (or the state of some other process) must be restored before an interrupt return is executed.

If the processor does take a trap while handling another interrupt or trap, it enters the Monitor mode, and the state of the interrupt or trap handler is reflected in the Shadow Program Counter 0, 1, and 2 registers and the Reason Vector Register. Other processor state is preserved, including the Current Processor Status Register. This allows the Monitor trap routine to handle the trap.

3.6 **MEMORY MANAGEMENT**

The Am29050 microprocessor incorporates a Memory Management Unit (MMU) for performing virtual-to-physical address translation and memory access protection. This section describes the logical operation of the Memory Management Unit. Related issues are discussed in Sections 7.3.3 and 7.3.4.

Address translation is performed either by one of the two Region Mapping Units (RMUs), or by the Translation Look-Aside Buffer (TLB). The RMUs map virtual regions of variable size, ranging from 64 kb to 2 Gb, into regions of physical memory. Each RMU consists of two protected special-purpose registers, which are described in Section 3.2.3. Any virtual address not mapped by the RMUs is translated by the TLB. The TLB maps virtual regions of fixed size, called pages, into physical regions of the same size, called page frames. The structure of the TLB is described below.

Address translation can be performed only for instruction/data memory accesses. No address translation is performed for instruction ROM, input/output, coprocessor or interrupt/trap vector accesses. However, an instruction/data memory access can be re-directed to input/output by the address-translation process.

3.6.1 **Translation Look-Aside Buffer**

The MMU stores the most-recently performed address translations in a special cache, the Translation Look-Aside Buffer (TLB). The TLB reflects information in the processor system page tables, except that it specifies the translation for many fewer pages; this restriction allows the TLB to be incorporated on the processor chip where the performance of address translation is maximized.

A diagram of the TLB is shown in Figure 3-52. The TLB is a table of 64 entries, divided into two equal sets, called Set 0 and Set 1. Within each set, entries are
numbered 0 to 31. Entries in different sets which have equivalent entry-numbers are
grouped into a unit called a line; there are thus 32 lines in the TLB, numbered 0 to 31.

**Figure 3-52  Translation Look-Aside Buffer Organization**

<table>
<thead>
<tr>
<th>Entry #</th>
<th>TLB SET 0</th>
<th>Entry #</th>
<th>TLB SET 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>Line 0</td>
<td>0</td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>Line 1</td>
<td>1</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>Line 2</td>
<td>2</td>
<td></td>
<td>2</td>
</tr>
<tr>
<td>Line 3</td>
<td>3</td>
<td></td>
<td>3</td>
</tr>
<tr>
<td>Line 4</td>
<td>4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Line 31</td>
<td>31</td>
<td></td>
<td>31</td>
</tr>
</tbody>
</table>

Each TLB entry is 64 bits long, and contains mapping and protection information for a
single virtual page. TLB entries may be inspected and modified by processor instruc-
tions executed in the Supervisor mode. The layout of TLB entries is described in
Section 3.2.4.

The TLB stores information about the ownership of the TLB entries in an 8-bit Task
Identifier (TID) field in each entry. This makes it possible for the TLB to be shared by
several independent processes without the need for invalidation of the entire TLB as
processes are activated. It also increases system performance by permitting proc-
cesses to warm-start (i.e., to start execution on the processor with a certain number of
TLB entries remaining in the TLB from a previous execution).

Each TLB entry contains a Usage bit to assist management of the TLB entries. The
Usage bit indicates which set of the entry within a given line was least recently used
to perform an address translation. Usage bits for two entries in the same line are
equivalent.

The TLB contains other fields which are described in the following sections.
3.6.2 Address Translation

For the purpose of address translation, the virtual instruction/data address-space of a process is typically partitioned into regions of fixed size, called pages. Pages are mapped into equivalent-sized regions of physical memory, called page frames. All accesses to instructions or data contained within a given page use the same virtual-to-physical address translation.

In addition to the page-by-page translation provided by the TLB, the Am29050 microprocessor supports translation for variable-sized regions, ranging from 64 kb to 2 Gb, by means of two Region Mapping Units. Each RMU consists of two special-purpose registers. In each RMU, a Region Mapping Address Register contains the base address of the virtual region to be mapped and the base address of the corresponding physical region. A Region Mapping Control Register specifies the region size and contains information which is used to control access, including a Task Identifier. The RMUs have priority over the TLB translation; in addition, RMU0 has priority over RMU1.

3.6.2.1 ADDRESS TRANSLATION CONTROLS

The processor attempts to perform address translation for the following external accesses:

1. Instruction accesses, if the Physical Addressing/Instructions (PI) and ROM Enable (RE) bits of the Current Processor Status are both 0.
2. User-mode accesses to instruction/data memory if the Physical Addressing/Data (PD) bit of the Current Processor Status is 0.
3. Supervisor-mode accesses to instruction/data memory if the Physical Address (PA) bit of the load or store instruction performing the access is 0, and the PD bit of the Current Processor Status is 0.

Address translation is controlled by the MMU Configuration Register. This register specifies the virtual page size, and contains an 8-bit Process Identifier (PID) field. The PID field specifies the process number associated with the currently running program, if this is a User-mode program. Supervisor-mode programs are assigned a fixed process number of zero. The process number is compared with Task Identifier (TID) field of the Region Mapping Control Register or the TLB entry, as appropriate, during address translation. The TID field must match the process number for the translation to be valid.

3.6.2.2 RMU ADDRESS TRANSLATION PROCESS

In a successful RMU address translation, the most-significant bits of the virtual address match the corresponding bits of the Virtual Base Address (VBA) field of the Region Mapping Address Register, and are replaced with the contents of the Physical Base Address (PBA) field. The number of bits compared and subject to replacement is determined by the Region Size (RGS) field of the Region Mapping Control Register. For example, if the region size is 64 kb, 16 bits are compared; if the region size is 128 kb, 15 bits are compared, and so on.

For an address translation to be valid, the following conditions must be met:

1. The most-significant bits of the virtual address, determined by the RGS field, match the corresponding bits of the VBA field of the Region Mapping Address Register.
2. For a User-mode access, the TID field in the Region Mapping Control Register matches the PID field in the MMU Configuration Register. For a Supervisor-mode access, the TID field is zero.
3. The VE bit of the Region Mapping Control Register is 1.

The address space of the physical address is determined by the Input/Output (IO) bit of the Region Mapping Control Register. If the IO bit is 0, the address is in the instruction/data memory address space. If the IO bit is 1, the address is in the input/output address space.

If the address translation is valid, then certain bits of the Region Mapping Control register are used to perform protection checking (see Section 3.6.5). If there is no protection violation, the translation is performed and the resulting physical address is placed on the processor's Address Bus. If there is a protection violation, a Data or Instruction MMU Protection Violation trap occurs, depending on the access.

If address translation is valid, and there is no protection violation, the PGM bits from the Region Mapping Control register are placed on the MPGM(1–0) outputs during the address cycle for the access.

If the address translation is not valid in RMU0, it is attempted by RMU1. If the translation is not valid in RMU1, it is attempted by the TLB.

### 3.6.2.3 TLB ADDRESS TRANSLATION PROCESS

Virtual addresses are partitioned into three fields for TLB address translation, as shown in Figure 3-53. The partitioning of the virtual address is based on the page size. Pages may be of size 1, 2, 4, or 8 kb, as specified by the MMU Configuration Register.

#### Figure 3-53 Virtual Address for 1, 2, 4, and 8 kb Pages

**1-kb Page Size:**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>TLB Line Select</th>
<th>Page Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>Virtual Tag Comparison</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**2-kb Page Size:**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>TLB Line Select</th>
<th>Page Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>Virtual Tag Comparison</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**4-kb Page Size:**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>TLB Line Select</th>
<th>Page Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>Virtual Tag Comparison</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**8-kb Page Size:**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>TLB Line Select</th>
<th>Page Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>Virtual Tag Comparison</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

The TLB address-translation process is diagrammed in Figure 3-54. Address translation is performed by the following fields in the TLB entry: the Virtual Tag (VTAG), the Task Identifier (TID), the Valid Entry (VE) bit, the Real Page Number (RPN) field, and the Input/Output (IO) bit. To perform an address translation, the processor accesses
the TLB line whose number is given by certain bits in the virtual address. The bits used depend on the page size as follows:

<table>
<thead>
<tr>
<th>Page Size</th>
<th>Virtual Address Bits (for Line Access)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 kb</td>
<td>14–10</td>
</tr>
<tr>
<td>2 kb</td>
<td>15–11</td>
</tr>
<tr>
<td>4 kb</td>
<td>16–12</td>
</tr>
<tr>
<td>8 kb</td>
<td>17–13</td>
</tr>
</tbody>
</table>

The accessed line contains two TLB entries, which in turn contain two VTAG fields. The VTAG fields are both compared to bits in the virtual address. This comparison depends on the page size as follows (note that VTAG bit-numbers are relative to the VTAG field, not the TLB entry):

<table>
<thead>
<tr>
<th>Page Size</th>
<th>Virtual Address Bits</th>
<th>VTAG Bits</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 kb</td>
<td>31–15</td>
<td>16–0</td>
</tr>
<tr>
<td>2 kb</td>
<td>31–16</td>
<td>16–1</td>
</tr>
<tr>
<td>4 kb</td>
<td>31–17</td>
<td>16–2</td>
</tr>
<tr>
<td>8 kb</td>
<td>31–18</td>
<td>16–3</td>
</tr>
</tbody>
</table>

Certain bits of the VTAG field do not participate in the comparison for page sizes larger than 1 kb. These bits of the VTAG field are required to be zero.

**Figure 3-54 TLB Address Translation Process**
For an address translation to be valid, the following conditions must be met:

1. The virtual address bits match corresponding bits of the VTAG field as specified above.

2. For a User-mode access, the TID field in the TLB entry matches the PID field in the MMU Configuration Register. For a Supervisor-mode access, the TID field is zero.

3. The VE bit in the TLB entry is 1.

4. Only one entry in the line meets conditions 1, 2, and 3 above. If this condition is not met, the results of the translation may be treated as valid by the processor, but the results are unpredictable.

If the address translation is valid for one TLB entry in the selected line, the RPN field in this entry is used to form the physical address of the access. The RPN field gives the portion of the physical address that depends on the translation; the remaining portion of the virtual address—called the Page Offset—is invariant with address translation.

The Page Offset comprises the low-order bits of the virtual address, and gives the location of a byte (because of byte addressing) within the virtual page. This byte is located at the same position in the physical page frame, so the Page Offset also comprises the low-order bits of the physical address.

The 32-bit physical address is the concatenation of certain bits of the RPN field and Page Offset, where the bits from each depend on the page size as follows (note that RPN bit numbers are relative to the RPN field, not the TLB entry):

<table>
<thead>
<tr>
<th>Page Size</th>
<th>RPN Bits</th>
<th>Virtual Address Bits for Page Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 kb</td>
<td>21-0</td>
<td>9-0</td>
</tr>
<tr>
<td>2 kb</td>
<td>21-1</td>
<td>10-0</td>
</tr>
<tr>
<td>4 kb</td>
<td>21-2</td>
<td>11-0</td>
</tr>
<tr>
<td>8 kb</td>
<td>21-3</td>
<td>12-0</td>
</tr>
</tbody>
</table>

Note: Certain bits of the RPN field are not used in forming the physical address for page sizes greater than 1 kb. These bits of the RPN are required to be zero. In addition, for certain instruction accesses, the Page Offset is incremented by 8 or 16 as described in Section 4.2.3.1.

The address space of the physical address is determined by the Input/Output (IO) bit of the TLB entry. If the IO bit is 0, the address is in the instruction/data memory address space. If the IO bit is 1, the address is in the input/output address space.

If an address translation is successful, the TLB entry is further used to perform protection checking for the access. Bits in the TLB make it possible to restrict accesses—indeed independently for Supervisor-mode and User-mode accesses—to any combination of load, store, and instruction accesses, or to no access. Section 3.6.5 describes protection in more detail.

If the address translation is valid, and no protection violation is detected, the physical address from the translation is placed on the processor's Address Bus, and the access is initiated. If the translation is not valid, or a protection violation is detected, a trap occurs. Depending on the state of the channel interface, the access request may be placed on the Address Bus with the signal BINV asserted, even though the trap occurs.
Also, if the address translation is successful, and there is no protection violation, the
PGM bits from the TLB entry used for translation are placed on the MPGM(1–0) out-
puts during the address cycle for the access. If address translation is not performed,
these pins are both Low for the address cycle.

If the TLB cannot translate an address, a TLB miss occurs. The MMU causes a trap if
either a TLB miss occurs, or the translation is successful and a protection violation is
detected. The processor distinguishes between traps caused by instruction and data
accesses, and between traps caused by User-and Supervisor-mode accesses, as
follows:

<table>
<thead>
<tr>
<th>Trap Vector Number</th>
<th>Type of Trap</th>
</tr>
</thead>
<tbody>
<tr>
<td>8</td>
<td>User-Mode Instruction TLB Miss</td>
</tr>
<tr>
<td>9</td>
<td>User-Mode Data TLB Miss</td>
</tr>
<tr>
<td>10</td>
<td>Supervisor-Mode Instruction TLB Miss</td>
</tr>
<tr>
<td>11</td>
<td>Supervisor-Mode Data TLB Miss</td>
</tr>
<tr>
<td>12</td>
<td>Instruction TLB Protection Violation</td>
</tr>
<tr>
<td>13</td>
<td>Data TLB Protection Violation</td>
</tr>
</tbody>
</table>

The distinction between the above traps is made to assist trap handling, particularly
the routines that load TLB entries.

### 3.6.3 TLB Reload

So that the MMU may support a large variety of memory-management architectures, it
does not directly load TLB entries that are required for address translation. It simply
causes a TLB miss trap when address translation is unsuccessful. The trap causes a
program—called the TLB reload routine—to execute. The TLB reload routine is de-
efined according to the structure and access method of the page table contained in an
external device or memory.

When a TLB miss trap occurs, the LRU Recommendation Register is written with the
TLB register number for Word 0 of the TLB entry to be used by the TLB reload rou-
tine. For instruction accesses, the Program Counter 1 Register contains the instruc-
tion address that was not successfully translated. For data accesses, the Channel
Address Register contains the data address that was not successfully translated.

The TLB reload routine determines the translation for the address given by the Pro-
gram Counter 1 Register or Channel Address Register, as appropriate. The TLB
reload routine uses an external page table to determine the required translation, and
loads the TLB entry indicated by the LRU Recommendation Register so that the entry
may perform this translation. In a demand-paged environment, the TLB reload routine
may additionally invoke a page-fault handler when the translation cannot be per-
formed.

TLB entries are written by the Move To TLB (MTTLB) instruction, which copies the
contents of a general-purpose register into a TLB register. The TLB register number is
specified by bits 6-0 of a general purpose register. TLB entries are read by the Move
From TLB (MFTLB) instruction, which copies the contents of a TLB register into a
general-purpose register. Again, the TLB register number is specified by a general
purpose register.

### 3.6.4 TLB Entry Invalidation

There are two methods for invalidating TLB entries that are no longer required at a
given point in program execution. The first involves resetting the Valid Entry bit of a
single entry (this is done by a Move To TLB instruction). The second involves changing the value of the Process Identifier (PID) field of the MMU Configuration Register; this invalidates all entries whose Task Identifier (TID) fields do not match the new value.

If an entry is invalidated by changing the PID field, the TLB entry still remains valid in some sense. If the PID field is changed again to match the TID field, the entry may once again participate in address translation. This ability can be used to reduce the number of TLB misses in a system during process switching. However, it is important to manage TLB entries so that an invalid match cannot occur between the PID field and the TID field of an old TLB entry.

### 3.6.5 Protection

If an address translation is performed successfully as described in Section 3.6.2, the Region Mapping Control Register or TLB entry used in address translation is used to perform protection checking for the access. Six bits are used for this purpose; their names and functions are the same in the Region Mapping Control Registers and the TLB entries: Supervisor Read (SR), Supervisor Write (SW), Supervisor Execute (SE), User Read (UR), User Write (UW), and User Execute (UE). These bits restrict accesses, depending on the program mode of the access, as shown in Table 3-12 (the value x is a don't care).

<table>
<thead>
<tr>
<th>SR</th>
<th>SW</th>
<th>SE</th>
<th>UR</th>
<th>UW</th>
<th>UE</th>
<th>Type of Access Allowed</th>
</tr>
</thead>
<tbody>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>No User access</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>User instruction</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>User store</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>User store or instruction</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>User load or instruction</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>User load</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>User load or store</td>
</tr>
<tr>
<td>x</td>
<td>x</td>
<td>x</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>Any User access</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>No Supervisor access</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Supervisor instruction</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Supervisor store</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Supervisor store or instruction</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Supervisor load</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Supervisor load or instruction</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Supervisor load or store</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>Any Supervisor access</td>
</tr>
</tbody>
</table>

Note that for the Load and Set (LOADSET) instruction, the protection bits must be set to allow both the load and store access. If this condition does not hold, neither access is performed.

If protection checking indicates that a given access is not allowed, a Data MMU Protection Violation or Instruction MMU Protection Violation trap occurs. The cause of the trap can be determined by inspecting the Program Counter 1 Register for an Instruction MMU Protection Violation, or by inspecting the contents of the Channel Address and Channel Control registers for a Data MMU Protection Violation.
3.7 DEBUGGING

Software debugging is supported by the Trace Facility, hardware breakpoints, and the Monitor mode. The Trace Facility guarantees exactly one trap after the execution of any instruction in a program being tested. The Trace trap allows a debug routine to follow the execution of instructions, and to determine the state of the processor and system at the end of each instruction. Hardware breakpoints return control to a debugger at specified program addresses. The Monitor mode allows the debugging of operating-system routines and interrupt and trap handlers.

3.7.1 Trace Facility

Tracing is controlled by the Trace Enable (TE) and Trace Pending (TP) bits of the Current Processor Status Register. The value of the TE bit always is copied into the TP bit when an instruction enters the write-back stage. A Trace trap occurs whenever the TP bit is 1. As with most traps, the Trace trap can be disabled only by the DA bit of the Current Processor Status Register.

In order to trace the execution of a program, the debug routine performs an interrupt return to cause the program to begin or resume execution. However, before the interrupt return is executed, the TE and TP bits of the Old Processor Status are set with the values 1 and 0, respectively. The interrupt return causes these bits to be copied into the TE and TP bits of the Current Processor Status.

When the target of the interrupt return (whose address is contained in the Program Counter 1 Register when the interrupt return is executed) enters the write-back stage, the processor copies the value of the TE bit into the TP bit. Since the TP bit is a 1, a Trace trap occurs. This trap prevents any further instruction execution in the target routine until the interrupt is taken and the routine is resumed with an interrupt return. When the Trace trap is taken, the TE and TP bits are both reset automatically, preventing any further Trace traps.

Since the Trace Facility is managed by the Old and Current Processor Status registers, it operates properly in the event that the processor takes an interrupt or trap—that is unrelated to the Trace Facility—before the above trace sequence completes. When the unrelated interrupt or trap is taken, the state of the Trace Facility (i.e., the values of the TE and TP bits) is copied into the Old Processor Status from the Current Processor Status. The Trace Facility then resumes operation when the interrupted routine is restarted by an interrupt return.

Note that it is possible to cause a Trace trap by directly setting the TP and/or TE bits in the Current Processor Status Register. This may be accomplished only by a Supervisor-mode program.

3.7.2 Instruction Breakpoints

The Am29050 microprocessor provides two hardware breakpoints for causing Trace traps at specified instruction addresses. These hardware breakpoints are specified by the following registers: Instruction Breakpoint Address 0, Instruction Breakpoint Control 0, Instruction Breakpoint Address 1, and Instruction Breakpoint Control 1. The two hardware breakpoints are identical in definition and capability.

Breakpoint comparisons are performed by both hardware breakpoints on instructions as the instructions enter the execute stage of the processor pipeline. If one (or both) of the breakpoint comparisons is valid, a Trace trap occurs and the instruction is not completed. The Trace trap caused by a hardware breakpoint has lower priority than a
Trace trap caused by the Trace facility. Also, if the DA bit in the Current Processor Status Register is 1 when the Trace trap occurs, a Monitor trap is taken.

A breakpoint comparison is valid when the instruction address matches the address in the Instruction Breakpoint Address 0 Register or Instruction Breakpoint Address 1 Register, and the following conditions are met by the corresponding Instruction Breakpoint Control register.

1. The Breakpoint Has Occurred (BHO) bit is 0. The BHO bit allows the processor to progress beyond the breakpoint once it has been encountered.

2. The Breakpoint Enable (BEN) bit is 1.

3. The Break or Synchronize (BSY) bit is 1. If the BSY bit is 0 and all other conditions are valid, a synchronization pulse is generated externally by placing the value 010 on the STAT(2–0) outputs for one cycle (see Section 5.3). This permits the hardware breakpoint to generate a trigger for external logic, without causing a Trace trap that disturbs system timing.

4. The value of the Break ROM (BRM) bit is equal to the value of the ROM Enable bit in the Current Processor Status Register. This differentiates between a breakpoint in the instruction/data memory and one in the Instruction ROM.

5. The value of the Break on Translation Enabled (BTE) bit is equal to the complement of the Physical Addressing/Instructions (PI) bit in the Current Processor Status Register. This differentiates between a physical breakpoint address and a virtual breakpoint address.

6. If address translation is enabled for instructions, the Breakpoint Process Identifier (BPID) field matches the PID field of the MMU Configuration Register, for a User-mode program. For a Supervisor-mode program, the BPID field must be zero. The BPID field allows the breakpoint to be associated with a particular process in a multi-tasking system.

When a hardware breakpoint trap is taken, the processor sets the BHO bit. If the Trace trap handler returns to the routine with the breakpoint enabled, the BHO bit being 1 prevents the breakpoint comparison from causing another Trace trap. The processor resets the BHO bit when it encounters the breakpoint upon return, so that the Trace trap is once again enabled.

A hardware-development system (see Section 5.4) can use the hardware breakpoints to cause the processor to enter the Halt mode rather than take a Trace trap.

### 3.7.3 Debugging System-Level Routines

The Monitor Mode provides a mechanism for debugging interrupt handlers and other system-level routines using a software debugger. The processor can enter the Monitor mode without affecting the state of any running program, much as it can take an interrupt without disturbing the state of an application program.

When a Trace trap occurs, and the DA bit in the Current Processor Status Register is 1, the processor takes a Monitor trap. The instruction addresses of the trapped program are contained in the Shadow Program Counters, the cause of the trap is encoded in the Reason Vector Register, and the Current Processor Status Register is unmodified (except that the Monitor Mode bit is set). This provides the information required to debug the trapped routine, regardless of whether the trapped routine was enabled to take interrupts. The Monitor trap handler can resume the execution of the trapped routine (e.g. for tracing) by executing an IRET or IRETI 0 instruction.
3.8 SERIALIZATION

The Am29050 microprocessor overlaps external data references with other operations, and typically performs floating-point operations in parallel with each other and with integer operations. When an external data reference must be restarted, however, or a floating-point operation causes a Floating-Point Exception trap to be taken, the processor context must be the same as when the operation was first attempted. To ensure this, certain operations are serialized.

The processor serializes by entering the Pipeline Hold mode in any of the following circumstances:

1. An external access is not yet completed, and one of the following instructions is encountered:
   - Move to Special Register
   - Move to Special Register Immediate
   - Move to TLB
   - Interrupt Return
   - Interrupt Return and Invalidate
   - Halt

2. An external access is not yet completed, and an interrupt or trap, other than a WARN trap, is taken.

3. The processor detects that a floating-point instruction may cause an unmasked floating-point exception. In this case, the instruction is issued for execution, but the pipeline holds until execution of the instruction is completed.

Writes to certain registers—the Floating-Point Environment Register, the Integer Environment Register, the Floating-Point Status Register, and the Exception Opcode Register—could, if overlapped with arithmetic operations, change the context required or expected by the arithmetic operations, or could conflict with register updates caused by the operations. Writes to these registers are therefore serialized; that is, they are not performed until the completion of all operations performed by the floating-point unit.

Similarly, reading the Floating-Point Status Register concurrently with the execution of a floating-point instruction might not obtain the status of previously issued instructions. Therefore, reads of the Floating-Point Status Register are also serialized with floating-point operations.

If the processor is in the Pipeline Hold mode due to serialization, it enters the Executing mode once the external access or floating-point operation is completed. Note that the processor may immediately take a Data Access Exception, Coprocessor Exception, or Floating-Point exception trap.

3.9 INITIALIZATION

When power is first applied to the processor, it is in an unknown state, and must be placed in a known state. Also, under certain circumstances, it may be necessary to place the processor in a defined state. This is accomplished by the Reset mode, which is invoked by activating the RESET pin for at least four cycles. The Reset mode configures the processor state as follows:

1. Instruction execution is suspended.
2. Instruction fetching is suspended.
3. Any interrupt or trap conditions are ignored.
4. The Current Processor Status Register is set as shown in Figure 3-55.
5. The Cache Disable bit of the Configuration Register is set.
6. The Data Width Enable bit of the Configuration Register is reset.
7. The Early Load Enable bit of the Configuration Register is reset.
8. The Floating-Point Environment Register is set as shown in Figure 3-56.
9. The Integer Division Overflow Exception Mask and the Integer Multiplication
Overflow Exception Mask bits of the Integer Environment Register are both set.
10. The Contents Valid bit of the Channel Control Register is reset.

Except as previously noted, the contents of all general-purpose registers, special-pur-
pose registers, floating-point accumulator registers, and TLB registers are undefined.
The contents of the Branch Target Cache memory are also undefined.

The Reset mode also configures the processor to initiate an instruction fetch using an
address of zero. Since the ROM enable (RE) bit of the Current Processor Status is 1,
this fetch is directed to external instruction read-only memory. This fetch occurs when
the Reset mode is exited (i.e., when the RESET input is de-asserted). Section 5.5
contains more information on this instruction fetch.
This chapter describes the operation of the Am29050 microprocessor pipeline, and the processor's three major functional units. The functional units are the Instruction Fetch Unit, the Execution Unit, and the Memory Management Unit. These units, which were shown in abstract form in Figure 2-2, are shown in detail in Figure 4-1.

Figure 4-1 Am29050 Microprocessor Data Flow
The operation of the functional units is coordinated by the Pipeline Hold mode, which insures that operations are performed in the proper order. This chapter also describes the Pipeline Hold mode.

Since this chapter describes the internal operation of the Am29050 microprocessor, it provides information that may not be required by some users. However, it aids in understanding the behavior of the Am29050 microprocessor under certain conditions, especially the behavior of the system interfaces described in Chapter 5.

4.1 FOUR-STAGE PIPELINE

The Am29050 microprocessor implements a four-stage pipeline for integer instruction execution. The four stages are fetch, decode, execute, and write-back. The execute stage of floating-point operations is pipelined to a depth determined by the latency of the operation. For either integer or floating-point operations, the pipeline is organized so that the effective instruction-execution rate may be as high as one instruction per cycle.

During the fetch stage, the Instruction Fetch Unit (Section 4.2) determines the location of the next processor instruction, and issues the instruction to the decode stage. The instruction is fetched either from the Instruction Prefetch Buffer, the Branch Target Cache memory or an external instruction memory.

During the decode stage, the Execution Unit (Section 4.3) decodes the instruction selected during the fetch stage, and fetches and/or assembles the required operands. It also evaluates addresses for branches, loads, and stores.

During the execute stage, the Execution Unit performs the operation specified by the instruction. In the case of branches, loads, and stores, the Memory Management Unit (Section 4.4) performs address translation if required. In the case of an early load, the physical address is transmitted to an external device or memory. The execution unit pipelines floating-point operations to a depth greater than one cycle, as described in Section 4.3.7.

During the write-back stage, the results of the operation performed during the execute stage are stored. In the case of branches, loads, and stores, the physical address resulting from translation during the execute stage is transmitted to an external device or memory, unless an early load occurs.

Most pipeline dependencies that are internal to the processor are handled by forwarding logic in the processor. For those dependencies that result from the external system, the Pipeline Hold mode insures proper operation.

In a few special cases, the processor pipeline is exposed to software executing on the Am29050 microprocessor (see Section 7.4).

4.2 INSTRUCTION FETCH UNIT

The Instruction Fetch Unit performs the functions required to keep the processor pipeline supplied with instructions. Since the processor can execute one instruction per cycle, instructions must be supplied at this rate if the execution stage is to perform at the maximum rate. To accomplish this, the Instruction Fetch Unit contains mechanisms for requesting instructions from instruction memory before they are required for execution, and for caching the most-recently executed branch target instructions.

The Instruction Fetch Unit also incorporates the logic necessary to calculate and sequence instruction addresses. The processor is word-oriented, but generates byte addresses for all external accesses. Since all processor instructions are word-length,
and are aligned on word-address boundaries, the Instruction Fetch Unit deals only with 30-bit addresses. For external instruction accesses, these addresses are appended with 00 in the two least-significant bits to form the required 32-bit address (note that the two least-significant bits of an external instruction address may not be 00 for indirect jumps).

4.2.1 Instruction Prefetch Buffer

All instructions executed by the processor are fetched either from the Branch Target Cache memory or from external instruction memory (i.e., instruction/data memory or instruction read-only memory). When instructions are fetched from the external memory, they are requested in advance to assist the timing of instruction accesses. The processor attempts to initiate the fetch for any given instruction at least four cycles before it is required for execution.

Since instructions are requested in advance, based on a predicted need, it is possible that a prefetched instruction is not required immediately for execution when the prefetch completes. To accommodate this possibility, the Instruction Fetch Unit contains a four-word Instruction Prefetch Buffer (IPB), as shown in Figure 4-1. The IPB is a circularly addressed buffer which acts as a first-in/first-out (FIFO) queue for instructions.

If instruction fetching is enabled, the processor requests an external instruction fetch on any cycle for which the IPB contains an available location. Instructions are stored in the IPB as they are returned from the external instruction memory. An instruction is stored into the IPB location whose number is given by bits 3–2 of the instruction address.

The instruction is held in the IPB until it is required for execution. When required, the instruction is sent to the decode stage, and the IPB location is freed to receive a subsequent instruction.

4.2.1.1 Instruction Prefetch Stream

An instruction prefetch stream is established whenever the processor performs a non-sequential instruction reference. Non-sequential references normally occur as the result of successful branches, but may also result either from the taking of an interrupt or trap (including the WARN trap) or from an interrupt return.

A non-sequential instruction fetch is initiated by placing an instruction-fetch request on the Address Bus. Once the external instruction fetch has been initiated, the processor generates prefetches for subsequent instructions based on the availability of IPB locations, either by transmitting subsequent addresses, or by issuing burst-mode instruction requests.

The addresses for prefetched instructions are computed by a word-length register called the Instruction Fetch Pointer (IFP), which is maintained by the Instruction Fetch Unit. The IFP latches the physical instruction-address obtained from the Memory Management Unit whenever a non-sequential instruction reference occurs. Then, for instruction prefetches, an 8-bit incrementer associated with the IFP updates bits 9–2 of the IFP to point to sequential instructions in the prefetch stream. The incrementer is limited to eight bits because it increments physical addresses, and thus cannot increment beyond any possible virtual-page boundaries (recall that the minimum virtual page size is 1 kb). If the incrementer overflows, as indicated by a carry-out, prefetching is preempted. The prefetch stream is later re-established as described below.

The physical address in the IFP is always the address of the most-recently prefetched instruction, even though this address may not appear on the Address Bus for
burst-mode fetches. If the burst is externally preempted, the IFP is used to reestablish the burst at the point of preemption.

4.2.1.2 INSTRUCTION PREFETCH BUFFER STATES

Four states are associated with each Instruction Prefetch Buffer location. The state-transition diagram for these states is shown in Figure 4-2.

**Figure 4-2 IPB State Transitions**

- **Available**—The IPB location is free for a new fetch. It contains no valid instruction, and is not due to receive any requested instruction.

- **Allocated**—The IPB location has been scheduled to receive a requested instruction, which has not yet been returned from the external instruction memory.

- **Valid**—The IPB location contains a valid instruction.

- **Error**—The IPB location contains an instruction which was returned from the external memory with an IERR indication.

If all internal conditions are such that an instruction fetch can occur, the IPB location given by bits 3–2 of the instruction address is set to the Allocated state, and the instruction is requested externally. Once this instruction is returned to the processor, it is stored in the IPB location. The location is set to the Valid or Error state (based on the IERR input), unless the instruction is sent immediately to the decode stage, in which case the buffer is set to the Available state.

The instruction remains in the buffer until it is required for execution. When the instruction is required, it is issued to the decode stage, and the IPB location is set to the
Available state. If the buffer were in the Error state, it is still set to the Available state, but an Instruction Access Exception trap occurs.

It is possible for all IPB locations to be in the Available or Valid states, but only one is allowed to be in the Allocated state at any given time. This restricts the number of unsatisfied instruction prefetches to one, reducing the amount of logic required to keep track of external fetches. It additionally restricts the number of apparent pipeline stages in the external prefetch mechanism to one stage (the other stages involved in the four-stage prefetch pipeline are the request stage and the processor’s fetch and decode stages). Larger external prefetch pipelines may be implemented, but they are required to appear as single-stage pipelines; at most, one instruction can be returned to the processor from the old instruction prefetch stream after a non-sequential fetch occurs.

When a non-sequential fetch occurs, all buffer locations are set to the Available state during the execute stage of the non-sequential fetch. All instruction requesting for the previous prefetch stream is terminated at this time. There is at most one instruction that will be returned to the processor after instruction fetches are terminated; this instruction is returned before any instruction associated with the new instruction stream is requested externally.

The Error state is provided only to handle errors reported via the IERR input. However, there are many other situations in which the IPB does not contain a valid instruction. These situations arise because of errors, such as memory-management protection violations, and because instruction fetching is sometimes preempted, such as is the case when the IFP adder overflows. All of these cases are indicated by the fact that the IPB location is in the Available state when the instruction is required for execution (note that the location should, normally, at least be in the Allocated state when the instruction is required).

If the processor requires an instruction from an IPB location that is in the Available state, it initiates the fetch for the instruction using the current value of the Program Counter. This fetch resolves the exceptional condition. It either performs an address translation with the proper address, eliminating page-boundary-crossing problems, or re-creates an error condition, in which case a trap occurs.

### 4.2.2 Branch Target Cache Memory

The Branch Target Cache memory on the Am29050 microprocessor allows fast access to instructions fetched non-sequentially. A branch instruction may execute in a single cycle, if the branch target is in the Branch Target Cache memory.

The target of a non-sequential fetch is in the Branch Target Cache memory if a similar fetch to the same target has occurred recently enough that it has neither been replaced by the target of another non-sequential fetch, nor invalidated by an INV or IRETINV instruction.

### 4.2.2.1 BRANCH TARGET CACHE MEMORY ORGANIZATION

The Branch Target Cache memory (BTC) is a 1-kb storage array which contains blocks of instructions from recently taken branches. To improve the proportion of successful searches in the BTC memory, it is organized as a two-way set-associative memory. Each set contains 128, 32-bit words (each instruction occupies one word). The sets are divided into blocks of either four instructions each or two instructions each, depending on the value of the Branch Target Cache memory organization (CO) bit of the Configuration Register. Blocks which lie in different sets but have the same block number constitute a unit called a line. Figure 4-3 shows the organization of the
BTC memory when the CO bit is 0. Figure 4-4 shows the organization of the BTC memory when the CO bit is 1.

**Figure 4-3 Branch Target Cache Memory Organization (CO = 0)**

A 29-bit cache tag is associated with each block. Of the 29 bits, 26 are derived from the address (possibly virtual) of the instructions in the block and are called the Address Tag.

Note that the Address Tag is 26 bits in length, rather than 24 bits as might be implied by the organization of the Branch Target Cache memory. The reason for this is that branch target instruction sequences are aligned on cache-block boundaries, and cache blocks are not aligned with respect to memory addresses. Thus, more bits are required in the Address Tag than would be required if cache locations were mapped one-to-one to memory locations.

Three additional bits in the cache tag, called the Space Identification field (Space ID), indicate the instruction memory from which the instructions were fetched (instruction/data or read-only memory), whether the instructions were fetched from a virtual or
physical address space, and the program mode under which the instructions were fetched (Supervisor or User). When instructions are placed into the Branch Target Cache memory, the Space ID bits are written with the values of the following bits of the Current Processor Status Register: ROM Enable (RE), Physical Addressing/Instructions (PI), and Supervisor Mode (SM).

There are four valid bits per block, corresponding to the four words available per block when the CO bit of the Configuration Register is 0. Cache invalidation instructions make it possible to reset all Valid Bits in a single processor cycle. However, for the Invalidate instruction, the Valid bits are not reset until the next branch is executed.

4.2.2.2 BRANCH TARGET CACHE MEMORY OPERATION

It is possible to disable the operation of the Branch Target Cache memory via the Branch Target Cache Memory Disable (CD) bit of the Configuration Register. If the CD bit is 1, all Branch Target Cache entries are made to appear invalid. If the CD bit is 0, there is no effect on Branch Target Cache memory entries. However, note that a change in the CD bit does not take effect until after the next non-sequential instruction fetch occurs.

When the Branch Target Cache memory is disabled, it continues to operate as described in this section. However, entries are made to appear invalid, even though they may be valid. If the Branch Target Cache memory is enabled after a period of being disabled, its contents reflect the most recent instruction execution, and it operates accordingly.
The Branch Target Cache memory lookup process is diagrammed in Figure 4-5. A given branch target sequence may be contained in one of two cache blocks, where these blocks are in the same line. The sequence is contained in the line whose number is given by bits 5–2 of the address of the first instruction of the sequence. A given branch target sequence is in a given cache block only if the following conditions are met:

1. Bits 31–6 of the address for the first instruction in the sequence match the corresponding bits in the Address Tag associated with the block.

2. The address of the first instruction in the block has a valid translation in the Memory Management Unit, if it is a virtual address.

3. The instruction address space indicated by the Current Processor Status Register (RE, PI, and SM bits) matches the address space indicated by the Space ID field.

4. The CD bit of the Configuration Register was 0 for the previous non-sequential instruction fetch. Note that it is not required that all instructions in the sequence be present in the cache for the block to be considered valid.

In addition to the above requirements, the Valid bit must be 1 for any instruction retrieved from the cache.

Whenever a non-sequential fetch occurs (either for a branch instruction, an interrupt or a trap), the address for the fetch is presented to the Branch Target Cache memory at the same time that the address is translated by the Memory Management Unit. If the target instruction for the non-sequential fetch is in the cache, it is presented for decoding in the next cycle. This instruction is always the first instruction of the cache block, and its address matches the cache tag. Subsequent instructions in the cache are presented for decoding as required in subsequent cycles. However, their addresses do not necessarily match the Address Tag.

4.2.2.3 BRANCH TARGET CACHE MEMORY REPLACEMENT

On a non-sequential fetch, if the target instruction is not found in the Branch Target Cache memory, the address of the fetch selects a line to be used to store the instruction sequence of the new branch target. The replacement block within the line is selected at random, based on the processor clock. Random replacement has slightly better performance than least-recently used replacement, and has a simpler implementation.

To replace the selected entry, all Valid bits associated with the entry are reset, the Address Tag is set with the appropriate address bits of the first instruction in the new sequence, and the Space ID bits are set according to the Current Processor Status Register. Instructions from the new fetch stream are stored into the selected cache block as they are issued to the decode stage. The first instruction is stored into the first word of the block, the second instruction is stored into the second word, and so on up to a maximum of four instructions. The Valid bit for each word is set as the instruction is stored.

4.2.2.4 SPECIAL CASES OF BRANCH TARGET CACHE MEMORY ENTRIES

If a branch instruction appears as one of the first two instructions in a branch target sequence, the branch is executed before the Branch Target Cache memory block is filled. In this case, the cache block contains less than four valid instructions. The final valid instruction is the delay instruction of the branch.

When a block is only partially filled due to a branch within the block, the behavior of the cache during subsequent executions of the instructions in the block depends on the outcome of this branch.
If the branch is subsequently successful, then the instructions following the delay instruction of the branch are not needed, and the fact that they are not contained in the cache is irrelevant.

If the branch is subsequently unsuccessful, then the instructions following the delay instruction are required, and must be fetched externally. In this case, a required entry has a Valid bit of 0. When the invalid entry is encountered, the Program Counter is used to create an external instruction fetch for the missing instruction; this fetch is called a demand fetch. When the fetch completes, the instruction is stored in the cache location that was previously invalid, and the Valid bit for this entry is set.

Since an instruction sequence in a four-word (or two-word) cache block is not necessarily aligned on a four-word (respectively, two-word) address boundary, a virtual-page address boundary may be crossed for the sequence in the cache. The processor does not prefetch instructions beyond this boundary, so the cache block is only partially filled in this case. If the processor requires instructions beyond the boundary, it creates a fetch for them as described above for the case of a branch instruction in the cache block.

When a fetch is created for a page-boundary crossing, this fetch is treated as a non-sequential fetch; a new cache block is allocated, and the first instructions at the

**Figure 4-5  Branch Target Cache Memory Lookup Process (CO = 0)**
boundary are placed into the new cache block as they are returned by the instruction memory. Subsequent references to the original cache block also encounter an invalid instruction at the page boundary, and also create a special fetch for this instruction. However, since the instructions beyond this boundary are in the Branch Target Cache memory, subsequent boundary crossings do not incur the instruction-fetch latency.

4.2.3 Non-Sequential Instruction Fetches

When a non-sequential instruction fetch occurs, the Memory Management Unit performs an address translation for target instruction, if address translation is enabled. If the address translation is valid, and the target of the fetch is not in the Branch Target Cache memory, an external instruction fetch is initiated. If there is a Translation Look-Aside Buffer (TLB) miss or memory-protection violation on this address, fetching is not initiated.

4.2.3.1 INSTRUCTION FETCH-AHEAD

When a non-sequential fetch occurs, if the target of the fetch is found in the Branch Target Cache memory, the processor normally begins fetching instructions beyond the valid instructions in the target block. This behavior is termed fetch-ahead. The valid bits of the target block and the CO bit of the Configuration Register determine the address of the request (A is the address of the target instruction):

<table>
<thead>
<tr>
<th>CO Bit</th>
<th>Valid Bits</th>
<th>Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0 XXX</td>
<td>A + 0 (Miss)</td>
</tr>
<tr>
<td>0</td>
<td>10 XX</td>
<td>None (demand fetch)</td>
</tr>
<tr>
<td>0</td>
<td>110 X</td>
<td>A + 8</td>
</tr>
<tr>
<td>0</td>
<td>1110</td>
<td>None (demand fetch)</td>
</tr>
<tr>
<td>0</td>
<td>1111</td>
<td>A + 16</td>
</tr>
<tr>
<td>1</td>
<td>0 XXX</td>
<td>A + 0 (Miss)</td>
</tr>
<tr>
<td>1</td>
<td>10 XX</td>
<td>None (demand fetch)</td>
</tr>
<tr>
<td>1</td>
<td>11 XX</td>
<td>A + 8</td>
</tr>
</tbody>
</table>

The computation required to obtain the address for the fetch-ahead is performed in parallel with address translation, by a 6-bit adder called the Fetch-Ahead Adder (see Figure 4-1).

The Fetch-Ahead Adder can overflow during the address computation for the fetch-ahead, as indicated by a carry out of the Fetch-Ahead Adder. Here, a page boundary may have been crossed, making the address translation—which is performed concurrently—invalid. In this case, fetch-ahead is not initiated.

If fetch-ahead is not initiated for an instruction that the processor eventually requires, this fetch is restarted on the cycle in which the missing instruction is required, using a demand fetch. The Program Counter is used, guaranteeing that the proper instruction address is used.

4.2.4 Program Counter Unit

The Program Counter Unit, shown in Figure 4-6, forms and sequences instruction addresses for the Instruction Fetch Unit. It contains the Program Counter (PC), the Program-Counter Multiplexer (PC MUX), the Return Address Latch, and the Program-Counter Buffer (PC Buffer).

The PC forms addresses for sequential instructions executed by the processor. The master of the PC Register, PC L1, contains the address of the instruction being
fetched in the Instruction Fetch Unit. The slave of the PC Register, PC L2, contains the next sequential address, which may be fetched by the Instruction Fetch Unit in the next cycle.

The Return Address Latch passes the address of the instruction following the delayed instruction of a call to the register file. This address is the return address of the call.

The PC Buffer stores the addresses of instructions in various stages of execution when an interrupt or trap is taken. The registers in this buffer—Program Counters 0, 1, and 2 (PC0, PC1, and PC2) and Shadow Program Counters 0, 1, and 2 (SPC0, SPC1, and SPC2)—are normally updated from the PC as instructions flow through the processor pipeline.

When an interrupt or trap is taken, the Freeze (FZ) bit in the Current Processor Status is set, holding the quantities in the PC Buffer. When the FZ bit is set, PC0, PC1, and PC2 contain the addresses of the instructions in the decode, execute, and write-back stages of the pipeline, respectively. The Shadow Program Counters continue to operate and continue to update from the PC unless a Monitor trap occurs.
Upon the execution of an interrupt return, the target instruction stream is restarted using the instruction addresses in PC0 and PC1 (or SPC0 and SPC1, upon return from a Monitor trap). Two registers are required here because the processor implements delayed branches. An interrupt or trap may be taken when the processor is executing the delay instruction of a branch and decoding the target of the branch. This discontinuous instruction sequence must be restarted properly upon an interrupt return. Restarting the instruction pipeline using two separate registers correctly handles this special case; in this case PC1 (or SPC1) points to the delay instruction of the branch, and PC0 (or SPC0) points to its target. PC2 (SPC2) does not participate in the interrupt return, but is included to report the addresses of instructions causing certain exceptions.

The PC is not defined as a special-purpose register. It cannot be modified or inspected by instructions. Instead, the interrupting and restarting of the pipeline is done by the PC Buffer registers PC0 and PC1 or SPC0 and SPC1.

4.3 EXECUTION UNIT

The Execution Unit performs most of the operations required for instruction execution. It incorporates the Register File, the Address Unit, the Arithmetic/Logic Unit, the Field Shift Unit, the Prioritizer and the Floating-Point Unit.

4.3.1 Register File

The general-purpose registers are implemented by a four-port, 192-location Register File. The Register File performs two read accesses and two write accesses in a single cycle. If a location is written and read in the same cycle, the data read is that written during the cycle.

The Register Address Generator, shown in Figure 4-7, computes register numbers for operands, detects pipeline data dependencies, and calculates register-number sequences for load-multiple and store-multiple operations.

4.3.1.1 REGISTER ADDRESSING

Register numbers for instruction operands are computed during the decode stage. This computation is performed during the first half of a cycle, and the operands are read in the second half of a cycle. Three multiplexers select two source-operand register numbers and a single destination register number for any given instruction.

If the most-significant bit of a register number is 0, the global registers are selected, and the register number is used directly as a register address. If the most-significant bit of the register number is 1, the local registers are selected, and the lower seven bits of the register number are added to the Stack Pointer to form the desired local register address.

The Stack Pointer is a hardware shadow-copy of bits 8-2 of Global Register 1, and is updated whenever Global Register 1 is written with the result of an Arithmetic or Logical instruction. Global Register 1 is implemented as a full 32-bit register in the Register File; this register is distinct from the 192 locations that implement general-purpose registers.

If a register number is zero (i.e., if Global Register 0 is specified as an operand), the Register Address Generator selects the content of an indirect pointer as the register number. There are three indirect pointers, and each appears as a special-purpose register.
4.3.1.2 **PIPELINE DATA DEPENDENCIES**

For the Register File, the pipeline delay in result write-back, compared to operand access, creates situations where a result from a previous operation may be required as an operand before it has been written into the register file. When one of these situations arises, a pipeline data dependency is said to exist.

The register numbers for the write-back of instruction results require two buffering registers, so that they are presented to the Register File during the write-back stage. In addition, the register numbers for uncompleted load operations are held until the load completes (these register numbers are held in the ETR Register shown in Figure 4-6).

Register read-address comparators detect pipeline data dependencies, and activate multiplexers to forward data directly to the required functional unit, without waiting for
the data to be written to the register file. The comparators activate the forwarding multiplexers if they detect one of the following situations:

1. One of the source register numbers matches the destination register number of the immediately previous instruction.

2. One of the source register numbers matches the target register number (in the ETR) of an outstanding load.

In the first case listed above, the result of the execute stage is selected as an operand, instead of the output of the Register File port for which the forwarding condition is detected. In the second case, data from the channel is selected. The comparison may cause the processor to enter the Pipeline Hold mode if the load has not completed. However, data forwarding allows data from the Data Bus to be used immediately, in the cycle after it is returned on the Data Bus.

The content of the ETR is further compared to the register numbers supplied to the write-back stage. If the target register for a load is written with the result of an overlapped instruction, the Not Needed (NN) bit in the Channel Control Register is set. If the comparators determine that the NN bit should be set, they also inhibit the write-back of load data on the completion of the load. The NN bit inhibits the restarting of the load operation if an exception occurs.

The Am29050 microprocessor Floating-Point Unit contains hardware comparable to that described above for detecting dependencies on floating-point operations to forward data, cause a pipeline hold, or prevent the write-back of a floating-point operation, as required. The Floating-Point Unit also manages write-back register numbers, and presents the register number of a result to the register file at the appropriate time.

4.3.1.3 LOAD-MULTIPLE AND STORE-MULTIPLE SEQUENCES

During load-multiple and store-multiple operations, sequential register numbers are computed by an incrementer associated with the ETR/DTR pair shown in Figure 4-7. In the case of store multiple, the register numbers are supplied as read addresses to the Register File by the incrementer. The read addresses are latched by the DTR so that they may be incremented further. In the case of load multiple, target register numbers are held by the ETR as for any other load. However, the ETR is set with a sequence of incremented addresses in this case.

4.3.2 Address Unit

The Address Unit, shown in Figure 4-8, computes addresses for branch target instructions, and load-multiple and store-multiple sequences. It also assembles instruction-immediate data and creates addresses for restarting terminated instruction prefetch streams.

The Address Unit consists of a 30-bit adder, the Decode PC Register, the ADRF Latch, and logic for formatting instruction-immediate data and generating the constants zero and one. The Decode PC Register holds the address of the instruction in the decode stage of the pipeline.

4.3.2.1 BRANCH TARGET ADDRESSES

Branch target addresses are either fetched from the Register File or calculated by the Address Unit. The Address Unit calculates target addresses during the decode stage of branch instructions. These addresses are of two possible types:

1. PC Relative: the current PC value is added to a sign-extended, 16-bit offset field from the branch instruction.
4.3.2.2 Address Unit

2. Absolute: a zero-extended, 16-bit field of the branch instruction is used directly as an instruction address.

For each of the above types of addresses, the 16-bit instruction field is aligned on a word address-boundary (i.e., it is shifted left by two bits).

To calculate the branch target address, the Address Unit formats the 16-bit instruction field as required and presents it to the 30-bit adder. This adder adds the formatted field either to the contents of the Decode PC Register or to zero, as required for PC-relative or absolute addresses, respectively.

**LOAD-MULTIPLE AND STORE-MULTIPLE ADDRESSES**

During the execution of Load Multiple and Store Multiple instructions, addresses for the access sequence are held in the ADRF Latch. An address in the ADRF Latch is updated, as required for an access in the sequence, by the 30-bit adder in the Address Unit. The formatting logic creates a constant offset of one for the update. The updated address is presented to the Memory Management Unit for translation and protection checking, and is placed into the ADRF Latch for further address computations.
For load-multiple and store-multiple operations performed using burst-mode accesses, the physical address for each access does not appear on the Address Bus, but the addresses are maintained in the processor so that they may be used to restart the burst-mode access upon preemption.

**4.3.2.3 SPECIAL INSTRUCTION FETCHES**

As discussed in Section 4.2, the processor must create demand fetches when it encounters an invalid instruction in the middle of a Branch Target Cache memory block, or when it attempts to fetch an instruction from an Instruction Prefetch Buffer location which is in the Available state. The Address Unit routes the address for this fetch in a manner similar to the routing of a branch target address. It passes the contents of the Decode PC (containing the required instruction address) through the 30-bit adder, adding it to zero. This address is presented to the Memory Management Unit for translation, and is used in the Instruction Fetch Unit to complete the fetch.

**4.3.3 Early Loads**

The early load feature speeds up the execution of load operations by making the physical address of the load available at the end of the decode cycle of the load instruction. At the beginning of the next cycle, when the load enters the execute stage, the physical address appears on the channel. In effect, early loads reduce the memory access time by one cycle.

Early loads can occur in two different ways. Either the physical address of the load is available in the Physical Address Cache memory (PAC), or, when an address computation immediately precedes the load instruction, the computed physical address can be forwarded directly to the channel. The latter method is performed by an Early Address Generator (EAG).

For either type of early load to occur, all of the following conditions must be met:

1. The operation must be a LOAD, with a general-purpose register, rather than a constant, specified as an address source operand.
2. The operation must load the external word addressed by the source register, rather than transfer a word from a coprocessor.
3. The source register can be neither the IPB specifier nor the Stack Pointer.
4. The load instruction must not disable address translation for the access (PA = 0). In other words, address translation must remain under the control of the PD bit of the Current Processor Status Register.
5. The load instruction must not force the access to be made in the User mode (UA = 0). The program mode must remain under the control of the SM bit of the Current Processor Status Register.

**4.3.3.1 PHYSICAL ADDRESS CACHE MEMORY**

The PAC is a four-entry, direct-mapped cache. Each PAC entry consists of two words. PAC entries cannot be accessed by software. The first word (Word 0) is the Translated Physical Address, while the second word (Word 1) contains a Register Tag and various control bits. The PAC entry registers are illustrated in Figure 4-9 and Figure 4-10.

PAC Entry Word 0 contains the 32-bit physical address of the load. The valid (V) bit of PAC Entry Word 1 is 1 if the physical address is a valid translation. The IO bit is set equal to the IO bit of the TLB or RMU translation of the address, if address translation is in effect. Otherwise, the IO bit in the PAC entry is ignored. The Register Tag field
contains the number of the register that holds the memory address of the load; its value is taken directly from the RB field of the load instruction.

The value of the PGM field is taken from the PGM field of the TLB or RMU translation of the address, if address translation is in effect. Otherwise, the PGM field in the PAC entry contains zeros.

The PAC supports the following operations:

- Searching for a valid translation for the load in the decode stage.
- Invalidating by clearing all Valid bits.
- Invalidating a single entry by clearing its Valid bit.
- Updating an existing entry by modifying its Translated Physical Address field and setting its Valid bit.
- Replacing an existing entry with a new entry and setting its Valid bit.

When a load is in decode, the PAC is searched for a valid entry corresponding to the memory address register of the load (specified by the RB field). The PAC entry is selected by the two least significant bits of the RB field of the load instruction. If a valid translation for the load is found in the PAC, a PAC hit results and the physical address from the PAC is used for the access. This address is available one cycle earlier than if the address translation were to wait until the execute stage, and an early load occurs.

In the case of a PAC miss, the newly translated physical address is written to the PAC, replacing an existing PAC entry. Only load instructions can replace PAC entries. This address is then available for subsequent instructions that use the same address register.

An individual PAC entry is invalidated if any instruction modifies the register whose translated address is cached in the PAC entry.

The entire PAC is invalidated if any of the following occurs:

- The RESET or WARN input is asserted.
- The Stack Pointer (GR1) is modified.
• The processor executes an MTSR instruction whose destination is the MMU Configuration Register, the Current Processor Status Register, or a Region Mapping Unit register.

• The processor takes a trap.

• The processor executes an IRET, IRETINV, MTTLB, or LOADM instruction.

• The processor executes an instruction which updates the Register File using an indirect pointer.

### 4.3.3.2 EARLY ADDRESS GENERATOR

When a load is being decoded, its address can be translated early if the instruction in the execute stage is computing the associated address. An Early Address Generator (EAG) is constantly translating the results of certain instructions during execution. If a load happens to refer to an address being computed and translated by the EAG, the translated address is available for use at the beginning of the execute stage of the load. In this case, an early load occurs.

Because the EAG must compute and translate an address in a single cycle, it only operates on a simple subset of instructions: CONST, CONSTH, ADD, ADDS, and ADDU. These instructions, though simple, are frequently used to compute load addresses. For the add instructions, the EAG cannot translate the address if the add causes the input values to cross a page boundary—for example, if there is a carry-out of bit 11 for 4-kb pages. The page boundary depends on the page size, and, for addresses translated by a Region Mapping Unit, the page size is treated as 64 kb (the minimum region size).

If the EAG computes and translates an address for an instruction whose destination register is mapped by the PAC, the PAC entry is updated with the new translation whether or not there is a load in decode and whether or not the Valid bit is set for the PAC entry. This allows the PAC to be updated for common addressing patterns, such as incremented addresses, and increases the effectiveness of the PAC. However, this update can occur only if there is not a load using another PAC entry. If there is such a load, the entry associated with the EAG destination is invalidated.

### 4.3.3 INHIBITION OF EARLY LOADS

Early loads cause contention for the Address Bus between instruction and data addresses when a jump or call appears immediately before a load instruction. In this case, the jump instruction uses the Address Bus during the execute stage of the load instruction, and the early load is inhibited.

Early loads are also inhibited if a trap or interrupt is taken during the decode stage of the load.

### 4.3.4 Arithmetic/Logic Unit

The Arithmetic/Logic Unit (ALU) performs 32-bit arithmetic and logical operations. The arithmetic operations consist of addition, subtraction, addition with carry-in, subtraction with carry-in, and primitives for multiplication and division. Instructions specify whether or not a trap is generated on signed or unsigned arithmetic overflow.

The A and B operands may be complemented independently in the ALU; complements for data into the ALU are controlled by instructions. This allows subtraction and reverse subtraction to be formed from addition, and allows certain logical operations (e.g., XNOR) to be formed from other basic operations (e.g., XOR). The carry-in to the ALU can be 0, 1, or the value of the Carry bit in the ALU Status Register. The carry-out of the ALU is used in overflow detection, unsigned comparisons, multiplication, and division. The ALU carry-out is stored in the ALU Status Register for multiprecision arithmetic.
The ALU also evaluates the relational expressions equal to, not equal to, less-than, less-than-or-equal-to, greater-than, and greater-than-or-equal-to. Each comparison computes a Boolean corresponding to the relation between two integer operands or creates a trap (possibly) based on this relation. The Boolean constants FALSE and TRUE are represented by a 0 and 1, respectively, in the most-significant bit of a word.

The relational operators may be applied to either signed or unsigned operands. For unsigned operands, these operators are implemented by recognizing that the ALU carry-out is the Boolean result of an unsigned comparison if the two numbers are subtracted and the carry-in is appropriately controlled. For comparison of signed numbers, the true sign of the result (i.e., the resulting sign exclusive-ORed with the overflow indication) gives the result of the compare. The relational operators equal-to and not-equal-to are independent of the data type. These operators are implemented by a 32-bit equal-to-zero comparator.

The ALU also supports the 32-bit logical operations AND, OR, NAND, NOR, AND-NOT, OR-NOT, XOR, and XNOR.

### 4.3.5 Field Shift Unit

The Field Shift Unit contains a Funnel Shifter, logic for performing word extracts, and logic for performing byte and half-word extracts and inserts.

The Funnel Shifter performs N-bit shifts, where N is an integer between 0 and 31, inclusive, given by a 5-bit shift count. The source of the shift count is specified by the shift instruction; the shift count is given either by a constant field in the shift instruction, bits 4–0 of a general-purpose register specified by the shift instruction, or by the 5-bit Funnel Shift Count field in the ALU Status Register.

Both arithmetic and logical shifts are supported, with the difference being the values stored into vacated bits: arithmetic shifts fill these bits with the sign bit of the operand, while logical shifts fill them with zero-bits. Arithmetic shifts are possible only for right shifts.

The Field Shift Unit operates on 32-bit words, 16-bit half-words, and 8-bit bytes. For byte operations, the position of a byte operand within a word is supplied by the 2-bit Byte Pointer (BP) field of the ALU Status Register. For half-word operations, the position of a half-word operand is given by the most-significant bit of the BP field; the least-significant bit is ignored. The processor supports either left-to-right or right-to-left byte and half-word ordering within a word.

### 4.3.6 Prioritizer

The prioritizer counts the number of leading zero-bits in an operand. The count of the number of zero-bits up to the leading 1 is stored in the specified destination register. If the operand does not contain a 1, the value stored is 32.

### 4.3.7 Floating-Point Unit

The Am29050 microprocessor Floating-Point Unit (FPU) has separate addition/subtraction, multiplication, and division/square root pipelines, all of which share a common rounding circuit. A block diagram of the Floating-Point Unit is shown in Figure 4-11.

The FPU contains eight functional units:

- **Classifier (CL)**—Determines operand type for the CLASS instruction.
Denormalizer (DN)—Equalizes the exponent values of two floating-point operands by right-shifting the significand of the smaller operand.

Adder (AD)—Adds and subtracts the significands of floating-point operands.

Renormalizer (RN)—Normalizes the result of a floating-point operation by left-shifting the result's significand until the most significant bit is 1, or until the exponent is 0.

Multiplier (MT)—Performs a 32-bit by 32-bit multiplication, producing a 64-bit result in redundant (sum/carry) form. The multiplier performs both floating-point and integer multiplications.

Partial Product Summer (PS)—Converts the redundant multiplier output to binary form. Also sums four successive multiplier outputs to form the intermediate result of a double-precision multiplication.

Divide/Square Root Unit (DS)—Interactively computes floating-point divisions and square roots.

Round Unit (RU)—Rounds an intermediate result to fit the destination format. The round unit is also responsible for processing exceptions that occur at the end of an operation.
Table 4-1 shows the functional units used by each operation, and the order in which they are used. As indicated in the table, some operations may require an additional cycle for certain types of data inputs:

**Table 4-1 Staging of Floating-Point Operations**

<table>
<thead>
<tr>
<th>Operation</th>
<th>CL</th>
<th>DS</th>
<th>MT</th>
<th>PS</th>
<th>DN</th>
<th>AD</th>
<th>RN</th>
<th>RU</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLASS (s.p., d.p.)</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CONVERT (int → s.p.)</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CONVERT (int → d.p.)</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CONVERT (f.p. → int)</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CONVERT (f.p. → f.p.)</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td>(3)</td>
<td>3</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>DADD</td>
<td>1</td>
<td>2</td>
<td>(3)</td>
<td>3</td>
<td>4</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DDIV</td>
<td>1-17</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEQ</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DGE</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DGT</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DMAC</td>
<td>1-4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DMSM</td>
<td>1-4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DMUL</td>
<td>1-4</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>DSUB</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td>(3)</td>
<td>3</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>FADD</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td>(3)</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>FDIV</td>
<td>1-10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>FDMUL</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>FEQ</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>FGE</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>FGT</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>FMAC</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
</tr>
<tr>
<td>FSMM</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
</tr>
<tr>
<td>FMUL</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>FSUB</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td>(3)</td>
<td>3</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>MFACC</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MTACC</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td>(3)</td>
<td>3</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>MULTIPLU</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MULTIPLY</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MULTM</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MULTMU</td>
<td>1</td>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SQRT (s.p.)</td>
<td>1-27</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>28</td>
</tr>
<tr>
<td>(d.p.)</td>
<td>1-56</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>57</td>
</tr>
</tbody>
</table>

Notes: * = Denormalized source operands or results that need to be denormalized will require additional cycles for operand wrapping or unwrapping (see Table C-3).

() = Optional sequencing (see text).

- When the CONVERT or MTACC instruction is used to convert a denormalized single-precision floating-point number to double-precision, an additional cycle is used to normalize the operand.
- When a DADD or FADD instruction receives operands having the potential for massive cancellation—i.e., whose exponent values differ by 0 or 1, and whose signs are different—an additional cycle is used to re-normalize the intermediate operand.
- When a DSUB or FSUB instruction receives operands having the potential for massive cancellation—i.e., whose exponent values differ by 0 or 1, and whose signs are the same—an additional cycle is used to renormalize the intermediate operand.
The sequencing shown in the table does not apply to the DDIV, DMUL, FDIV, FDMUL, FMUL, and SQRT operations if one or more of the input operands is denormalized, or if the result is denormalized; additional cycles are required for wrapping and/or unwrapping operands (see Table C-3).

The Floating-Point Unit can support multiple operations concurrently, the principal limitation being resource contention. The resources capable of causing contention are the 32-by-32 Multiplier (MT), the Divide/Square Root Units (DS), and the Round Unit (RU).

The 32-by-32 Multiplier and the Divide/Square Root unit can give rise to contention because they may be allocated to a single operation for multiple cycles. If one of these resources is busy when required by a subsequent instruction, a pipeline hold results until the needed resource is free.

While contention for the Round Unit is not common, there are situations in which two or more functional units have data for the Round Unit at the same time. In general, data from the earliest-issued contending operation has the highest priority for access to RU. Priorities, listed from highest to lowest, are:

1. The Renormalizer (RN) result, if the RU contains a result that needs to be unwrapped (e.g., a denormalized number in normalized form), and the Denormalizer (DN) and Adder (AD) Units are also busy. Allowing the RN result to go to RU allows the wrapped result in RU to be forwarded to DN.
2. The Divide/Square Root Unit (DS).
3. The Partial Product Summer (PS), if it has a result that has been waiting for at least one cycle.
4. The Renormalizer (RN), if it has a result that has been waiting for at least one cycle.
5. The Partial Product Summer (PS).
6. The Renormalizer (RN).
7. The Adder (AD).

As with loads, floating-point operations are fully interlocked; an operation requiring the result of a previous operation is prevented from proceeding until that result is available.

In a single cycle, the Register File can transfer to the Floating-Point Unit:
- One or two double-precision floating-point operands, each of which originates in a double-word-aligned register pair.
- One or two integer or single-precision floating-point operands.

In a single cycle, the Floating-Point Unit can write one of the following to the Register File:
- A double-precision floating-point result, written to a double-word-aligned register pair.
- One or two integer or single-precision floating-point results.

There is a 64-bit Register File port dedicated to the writing of floating-point results. These results can be written without interfering with integer operations.
4.4 MEMORY MANAGEMENT UNIT

The Memory Management Unit (MMU) performs all memory-management functions described in Section 3.6. Address translation is performed during the execute stage of any load, store or branch instruction that requires address translation. Address translation also is performed whenever the processor requires an instruction that has not been prefetched; as discussed in Section 4.2, address translation is performed in this case to resolve certain exceptional events that occur during instruction prefetching.

Though the MMU is shared for instruction and data accesses, the processor pipeline is arranged so that there is no contention for the MMU. In general, this is the result of the instruction-set definition and the fact that instruction prefetch addresses are generated by the Instruction Fetch Pointer (see Section 4.2.1).

An instruction address is normally translated only when a branch is executed. Since neither a load nor a store is executed at the same time, there is no contention for the MMU. If the Instruction Fetch Pointer overflows, the address translation is deferred until the Instruction Fetch Unit determines that the processor requires the associated instruction. Since instruction execution cannot occur at this time, the MMU cannot be required for the translation of a load or store address, and again there is no contention.

When the processor performs load-multiple and store-multiple operations, the MMU translates the address associated with every access. Load-multiple and store-multiple address sequencing is performed in the virtual address space, rather than both the virtual and physical address spaces, so that only a single address incrementer is required. Since the execution of Load Multiple and Store Multiple instructions is not overlapped with the execution of other instructions, there is no penalty associated with using the MMU for every access.

The MMU performs address translation in a single cycle. If an address translation is valid, the results of the translation are placed on the Address Bus along with the instruction-access or data-access request. In many cases, the address appears on the Address Bus during the cycle immediately following address translation (it does not appear if the Address Bus is occupied with another access). This address appears regardless of the outcome of memory protection checking; this relaxes the timing constraints on protection checking, which can be performed only after address translation is complete. If a protection violation is detected, the processor activates the BINV signal late in the first address cycle for the request.

4.5 PIPELINE HOLD MODE

The Pipeline Hold mode is activated whenever sequential processor operation cannot be guaranteed. When this mode is active, the pipeline stages do not advance, and most internal processor state is not modified. The processor places itself in the Pipeline Hold mode in the following situations:

1. The processor requires an instruction that has either not been fetched or not been returned by the external instruction memory.

2. The processor requires data from an in-progress load or floating-point operation, and the operation has not completed.

3. The processor attempts to execute a load or store instruction while another load or store is in progress.

4. The processor attempts to execute a floating-point operation and the required functional unit is busy.
5. The processor must perform a serialization operation as described in Section 3.8.

6. The processor is performing a sequence of load-multiple or store-multiple accesses. The Pipeline Hold mode in this case prevents further instruction execution until the completion of the load-multiple or store-multiple sequence.

7. The processor has taken an interrupt or trap, and the first instruction of the interrupt or trap handler has not entered the execute stage. The Pipeline Hold mode in this case prevents the processor pipeline from advancing until the interrupt or trap handler can begin execution.

8. The processor has executed an interrupt return, and the target instruction of the interrupt return has not entered the execute stage. The Pipeline Hold mode in this case prevents the processor pipeline from advancing until the interrupt return sequence is complete.

The Pipeline Hold mode is exited whenever the causing conditions no longer exist, or when the WARN or RESET input is asserted.
The Am29050 microprocessor is pin-compatible with the Am29000 processor. This chapter describes the attachment of the Am29050 microprocessor to its hardware environment. It describes the channel, which allows the processor to communicate with external devices and memories. The Test/Development interface, provided for hardware development and testing, is also described. In addition, this chapter includes sections on external interrupts, traps, processor reset, clock generation, and master/slave checking.

In the signal descriptions of Section 5.1, certain outputs are described as being 3-state or bi-directional outputs. However, all outputs (except MSERR) may be placed in a high-impedance state by the Test mode. The 3-state and bi-directional terminology in this section is for those outputs (except SYSCLK) that are disabled when the processor grants the channel to another master.

### 5.1 SIGNAL DESCRIPTION

<table>
<thead>
<tr>
<th>Signal</th>
<th>Description</th>
</tr>
</thead>
</table>
| A(31–0) | Address Bus (3-State Outputs, Synchronous)  
The Address Bus transfers the byte address for all accesses except burst-mode accesses. For burst-mode accesses, it transfers the address for the first access in the sequence. |
| BREQ | Bus Request (Input, Synchronous)  
This input allows other masters to arbitrate for control of the processor channel. |
| BGRT | Bus Grant (Output, Synchronous)  
This output signals to an external master that the processor is relinquishing control of the channel in response to BREQ. |
| BINV | Bus Invalid (Output, Synchronous)  
This output indicates that the Address Bus and related controls are invalid. It defines an idle cycle for the channel. |
| R/W | Read/Write (3-state Output, Synchronous)  
This signal indicates whether data is being transferred from the processor to the system, or from the system to the processor. |
| SUP/US | Supervisor/User Mode (3-State Output, Synchronous)  
This output indicates the program mode for an access. |
| LOCK | Lock (3-State Output, Synchronous)  
This output allows the implementation of various channel and device interlocks. It may be active only for the duration of an access, or active for an extended period of time under control of the Lock bit in the Current Processor Status.  
The processor does not relinquish the channel (in response to BREQ) when LOCK is active. |
MPGM(1–0) MMU Programmable (3-State Outputs, Synchronous)
These outputs reflect the value of two PGM bits in the Translation Look-Aside Buffer entry associated with the access. If no address translation is performed, these signals are both Low.

PEN Pipeline Enable (Input, Synchronous)
This signal allows devices that can support pipelined accesses (i.e., that have input latches for the address and required controls) to signal that a second access may begin while the first completes.

I(31–0) Instruction Bus (Inputs, Synchronous)
The Instruction Bus transfers instructions to the processor.

IREQ Instruction Request (3-State Output, Synchronous)
This signal requests an instruction access. When it is active, the address for the access appears on the Address Bus.

IREQT Instruction Request Type (3-State Output, Synchronous)
This signal specifies the address space of an instruction request when IREQ is active.

<table>
<thead>
<tr>
<th>IREQT</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Instruction/random access memory access</td>
</tr>
<tr>
<td>1</td>
<td>Instruction read-only memory access</td>
</tr>
</tbody>
</table>

IRDY Instruction Ready (Input, Synchronous)
This input indicates that a valid instruction is on the Instruction Bus. The processor ignores this signal if there is no pending instruction access.

IERR Instruction Error (Input, Synchronous)
This input indicates that an error occurred during the current instruction access. The processor ignores the content of the Instruction Bus, and an Instruction Access Exception trap occurs if the processor attempts to execute the invalid instruction. The processor ignores this signal if there is no pending instruction access.

IBREQ Instruction Burst Request (3-State Output, Synchronous)
This signal is used to establish a burst-mode instruction access and to request instruction transfers during a burst-mode instruction access. IBREQ may be active even though the Address Bus is being used for a data access. This signal becomes valid late in the cycle, with respect to IREQ.

IBACK Instruction Burst Acknowledge (Input, Synchronous)
This input is active whenever a burst-mode instruction access has been established. It may be active even though no instructions currently are being accessed.

PIA Pipelined Instruction Access (3-State Output, Synchronous)
If IREQ is not active, this output indicates that an instruction access is pipelined with another in-progress instruction access. The indicated access cannot complete until the first access is complete. The completion of the first access is signaled by the assertion of IREQ.
**Data Bus (Bi-directional, Synchronous)**
The Data Bus transfers data to and from the processor, for load and store operations.

**DREQ**
Data Request (3-State Output, Synchronous)
This signal requests a data access. When it is active, the address for the access appears on the Address Bus.

**DREQT(1–0)**
Data Request Type (3-State Outputs, Synchronous)
These signals specify the address space of a data access as follows (the value x is a don't care).

<table>
<thead>
<tr>
<th>DREQT1</th>
<th>DREQT0</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>Instruction/data memory access</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>Input/output access</td>
</tr>
<tr>
<td>1</td>
<td>x</td>
<td>Coprocessor transfer</td>
</tr>
</tbody>
</table>

An interrupt/trap vector request is indicated as a data-memory read. If required, the system can identify the vector fetch by the STAT(2–0) outputs.

**DRDY**
Data Ready (Input, Synchronous)
For loads, this input indicates that valid data is on the Data Bus. For stores, it indicates that the access is complete, and that data need no longer be driven on the Data Bus. The processor ignores this signal if there is no pending data access.

**DERR**
Data Error (Input, Synchronous)
This input indicates that an error occurred during the current data access. For a load, the processor ignores the content of the Data Bus. For a store, the access is terminated. In either case, a Data Access Exception trap occurs. The processor ignores this signal if there is no pending data access.

**DBREQ**
Data Burst Request (3-State Output, Synchronous)
This signal is used to establish a burst-mode data access and to request data transfers during a burst-mode data access. DBREQ may be active even though the Address Bus is being used for an instruction access. This signal becomes valid late in the cycle, with respect to DREQ.

**DBACK**
Data Burst Acknowledge (Input, Synchronous)
This input is active whenever a burst-mode data access has been established. It may be active even though no data are currently being accessed.

**PDA**
Pipelined Data Access (3-State Output, Synchronous)
If DREQ is not active, this output indicates that a data access is pipelined with another in-progress data access. The indicated access cannot complete until the first access is complete. The completion of the first access is signaled by the assertion of DREQ.

**OPT(2–0)**
Option Control (3-State Outputs, Synchronous)
These outputs reflect the value of bits 18–16 of the load or store instruction which begins an access. Bit 18 of the instruction is reflected on OPT2, bit 17 on OPT1, and bit 16 on OPT0.
The standard definitions of these signals (based on DREQT) are as follows (the value $x$ is a don't care).

<table>
<thead>
<tr>
<th>DREQT1</th>
<th>DREQT0</th>
<th>OPT2</th>
<th>OPT1</th>
<th>OPT0</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>Word-length access</td>
</tr>
<tr>
<td>0</td>
<td>x</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>Byte access</td>
</tr>
<tr>
<td>0</td>
<td>x</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Half-word access</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>Instruction ROM access</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>Cache control</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>Hardware-development system accesses</td>
</tr>
</tbody>
</table>

—All Others—

During an interrupt/trap vector fetch, the OPT(2–0) signals indicate a word-length access (000). Also, the system should return an entire, aligned word for a read, regardless of the indicated data length.

The Am29050 microprocessor does not explicitly prevent a store to the instruction ROM.

**CDA**

**Coprocessor Data Accept (Input, Synchronous)**

This signal allows the coprocessor to indicate the acceptance of operands or operation codes. For transfers to the coprocessor, the processor does not expect a DRDY response; an active level on CDA performs the function normally performed by DRDY. CDA may be active whenever the coprocessor is able to accept transfers.

**WARN**

**Warn (Input, Asynchronous, Edge-Sensitive)**

A high-to-low transition on this input causes a non-maskable WARN trap to occur. This trap bypasses the normal trap vector fetch sequence, and is useful in situations where the vector fetch may not work (e.g., when data memory is faulty).

**INTR(3–0)**

**Interrupt Request (Inputs, Asynchronous)**

These inputs generate prioritized interrupt requests. The interrupt caused by INTR0 has the highest priority, and the interrupt caused by INTR3 has the lowest priority. The interrupt requests are masked in prioritized order by the Interrupt Mask field in the Current Processor Status Register.

**TRAP(1–0)**

**Trap Request (Inputs, Asynchronous)**

These inputs generate prioritized trap requests. The trap caused by TRAP0 has the highest priority. These trap requests are disabled by the DA bit of the Current Processor Status Register.
CPU Status (Outputs, Synchronous)
These outputs indicate the state of the processor's execution stage on the previous cycle. They are encoded as follows:

<table>
<thead>
<tr>
<th>STAT2</th>
<th>STAT1</th>
<th>STAT0</th>
<th>Condition</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>Halt or Step Modes</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>Pipeline Hold Mode</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Load Test Instruction Mode, Synchronize</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Wait Mode</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>Interrupt Return</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>Taking Interrupt or Trap</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>Non-Sequential Instruction Fetch</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>Executing Mode</td>
</tr>
</tbody>
</table>

CPU Control (Inputs, Asynchronous)
These inputs control the processor mode:

<table>
<thead>
<tr>
<th>CNTL1</th>
<th>CNTL0</th>
<th>Mode</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>Load Test Instruction</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>Step</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>Halt</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>Normal</td>
</tr>
</tbody>
</table>

Reset (Input, Asynchronous)
This input places the processor in the Reset mode.

Test Mode (Input, Asynchronous)
When this input is active, the processor is in Test mode. All outputs and bi-directional lines, except MSERR, are forced to the high-impedance state.

Master/Slave Error (Output, Synchronous)
This output shows the result of the comparison of processor outputs with the signals provided internally to the off-chip drivers. If there is a difference for any enabled driver, this line is asserted.

System Clock (Bi-directional)
This is either a clock output with a frequency that is half that of INCLK, or an input from an external clock generator at the processor's operating frequency.

Input Clock (Input)
When the processor generates the clock for the system, this is an oscillator input to the processor, at twice the processor's operating frequency. In systems where the clock is not generated by the processor, this signal must be tied High or Low, except in certain master/slave configurations as discussed in Section 5.8.

The following pins are not signal pins, but are named in Am29050 microprocessor documentation because of their special role in the processor and system.

Power Supply for SYSCLK Driver
This pin is a power supply for the SYSCLK output driver. It isolates the SYSCLK driver, and is used to determine whether or not the Am29050 microprocessor generates the clock for the system. If power (+5 volts) is applied to this pin, the Am29050 microprocessor
5.2 CHANNEL DESCRIPTION

The processor channel provides the bandwidth required for performance, while permitting the connection of many different types of devices. This section describes the channel, and methods of connecting devices and memories to the processor.

The channel also is used for transfers to and from the coprocessor. Coprocessor transfers are described in Section 6.2.

Timing diagrams for operations described in this chapter appear in Appendix A.

5.2.1 Channel Overview

The channel consists of three 32-bit synchronous buses with associated control and status signals: the Address Bus, Data Bus, and Instruction Bus. The Address Bus transfers addresses and control information to devices and memories. The Data Bus transfers data to and from devices and memories. The Instruction Bus transfers instructions to the processor from instruction memories. In addition, a set of signals allow control of the channel to be relinquished to an external master.

There are five logical groups of signals performing five distinct functions, as follows (since some signals perform more than one function, a signal may appear in more than one group):

1. Instruction Address Transfer and Instruction Access Requests: A(31–0), SUP/US, MPGM(1–0), PEN, IREQ, IREQT, PIA, BINV.
2. Instruction Transfer: I(31–0), IBREQ, IRDY, IERR, IBACK.
3. Data Address Transfer and Data Access Requests: A(31–0), R/W, SUP/US, LOCK, MPGM(1–0), PEN, DREQ, DREQT(1–0), OPT(2–0), PDA, BINV.
4. Data Transfer: D(31–0), DBREQ, DRDY, DERR, DBACK, CDA.
5. Arbitration: BREQ, BGRT, BINV.

5.2.2 User-Defined Signals

There are two types of user-defined outputs on the processor to control devices and memories directly in a system-dependent manner. Each of these outputs is valid simultaneously with—and for the same duration as—the address for an access.

The first set of user-defined signals, MPGM(1–0), is determined by the PGM bits in the Translation Look-Aside Buffer entry used in address translation. If address translation is not performed, these outputs are both Low.

The second set of signals, OPT(2–0), are determined by bits 18–16 of the load or store instruction that initiates an access. These signals are valid only for data accesses, and have a pre-defined interpretation for coprocessor data transfers.
Standard interpretations of OPT(2–0) are given in Section 5.1. Since the OPT(2–0) signals are determined by instructions, they have an impact on application-software compatibility, and system hardware should use the given definitions of OPT(2–0). The OPT(2–0) signals are used to encode byte and half-word accesses. However, for a load, the system should return an entire, aligned word, regardless of the indicated data width.

Note that the standard interpretations of OPT(2–0) apply only to accesses to instruction/data memory and input/output. Other interpretations may be used for coprocessor transfers.

For interrupt and trap vector fetches, the MPGM(1–0) and OPT(2–0) outputs are all Low.

### 5.2.3 Instruction Accesses

Instruction accesses occur to one of two address spaces: instruction/data memory and instruction read-only memory (Instruction ROM). The distinction between these address spaces is made by the IREQT signal, which is in turn derived from the ROM Enable (RE) bit of the Current Processor Status Register. These are truly distinct address spaces; each may be populated independently based on the needs of a particular system.

Instruction/data memory contains both instructions and data. Although the channel supports separate instruction and data memories, the Memory Management Unit does not. In certain systems, it may be required to access instructions via loads and stores, even though instructions may be contained in physically separate memories. For example, this requirement might be imposed because of the need to load instructions into memory. Note also that the OPT(2–0) signals may be used to allow the access of instructions in instruction ROM, using loads; the Am29050 microprocessor does not prevent a store to the instruction ROM, and protection against stores to the instruction ROM must be provided externally, if required.

All processor instruction fetches are read accesses, and the R/W signal is High for all instruction fetches.

### 5.2.4 Data Accesses

Data accesses occur to one of three address spaces: instruction/data memory, input/output (I/O), and the coprocessor. The distinction between these spaces is made by the DREQT(1–0) signals, which are in turn determined by the load or store instruction which initiates a data access. Each of these address spaces is distinct from the others.

The protocol for data transfers to and from the coprocessor is slightly different than the protocol for instruction/data memory and I/O accesses. These transfers are described in Section 6.2.

Data accesses may occur either from a slave device or memory to the processor (for a load), or from the processor to a slave device or memory (for a store). The direction of transfer is determined by the R/W signal. In the case of a load, the processor requires that data on the Data Bus be held valid only for a short time before the end of a cycle. In the case of a store, the processor drives the Data Bus as soon as the bus is available and holds the data valid until the slave device or memory signals that the access is complete.
5.2.5 Reporting Errors

The successful completion of an instruction access is indicated by an active level on the IRDY input, and the successful completion of a data access is indicated by an active level on the DRDY input. If there are exceptional conditions for which an instruction or data access cannot complete successfully, the unsuccessful completion is indicated by an active level on the IERR or DERR input, as appropriate.

If the processor receives an IERR or DERR in response to an instruction or data access, it ignores the content of the Instruction or Data Bus and the value of IRDY or DRDY. An IERR response causes an Instruction Access Exception trap or a Monitor trap, unless it is associated with an instruction that the processor does not ultimately execute (because of a non-sequential instruction fetch). A DERR response always causes either a Data Access Exception trap, a Coprocessor Exception trap, or a Monitor trap.

The processor supports the restarting of unsuccessful accesses upon an interrupt return. In the case of an unsuccessful instruction access, the restart is performed by the Program Counter 0 and Program Counter 1 registers. In the case of an unsuccessful data access, the restart is performed by the Channel Address, Channel Data, and Channel Control registers. In any event, the control program must determine whether or not an access can and/or should be restarted.

The Instruction Access Exception and Data Access Exception traps cannot be masked. If one of these traps occurs within an interrupt or trap handler, a Monitor trap occurs.

5.2.6 Access Protocols

Figure 5-1 shows a control flowchart for accesses performed by the Am29050 microprocessor. This control flow applies independently to both instruction and data accesses. Since the processor performs concurrent instruction and data accesses, these accesses may be at different points in the control flow at any given point in time.

Note that the items on the flowchart of Figure 5-1 do not represent actual states, and have no particular relationship to processor cycles. The flowchart provides only a high-level understanding of the control flow. Also, exceptions and error conditions are not shown.

The channel supports three protocols for accesses: simple, pipelined, and burst-mode. These are described in the following sections. The various protocols are defined to accommodate minimum-latency accesses as well as maximum-transfer-rate accesses. The protocols allow an access to complete in a single cycle, although they support accesses requiring arbitrary numbers of cycles. Address transfers for accesses may be independent of instruction or data transfers.

5.2.7 Simple Accesses

For a simple access, the processor holds the address valid throughout the entire access. This protocol is used for single-cycle accesses, and for accesses to simple devices and memories.

On any cycle before the completion of the access, a simple access may be converted to a pipelined access (by the assertion of PEN) or to a burst-mode access (by the assertion of IBACK or DBACK, if the processor is asserting IBREQ or DBREQ). Thus, the protocol for simple accesses also may be used during the initial cycles of pipelined and/or burst-mode accesses. This is advantageous, for example, in cases where

5-8 SYSTEM INTERFACES
Figure 5-1  Channel Flowchart

Processor

No Access

Need an access?

YES

Primary Access

Initiate Access

Assert IREQ, DREQ

Latch Result

Simple

NO

Pipeline or burst
Support?

YES

Latch Address

Burst requested and supported?

NO

Burst

YES

Primary Access

see Figures 5-2 thru 5-5

Primary access complete?

YES

Primary Access

Complete access

Drive result and IRDY or DRDY

Pipelined

NO

Burst

YES

Primary Access

Complete access

Drive result and IRDY or DRDY

Primary Access Complete

NO

Initiate pipelined access

Assert PIA, PDA

Start Access

(optional)

Primary access complete

(IREQ, DREQ active)?

YES

PIA

PDA active?

Yes

NO

De-assert PIA, PDA

NO

Interrupt or Exception?

YES

Remove pipelined access from channel

PDA

Slave Device

Burst-mode Access

NO

Primary access complete

(IREQ, DREQ active)?

YES

PIA

PDA active?

NO
the slave device or memory either requires the address to be held for multiple cycles at the beginning of the pipelined or burst-mode access, or cannot respond to the pipelined or burst-mode request within one cycle.

5.2.8 Pipelined Accesses

A pipelined access is one that starts before an earlier in-progress access completes. The in-progress access is called a primary access, and the second access is called a pipelined access. A pipelined access is of the same type as the primary access. For example, an instruction access that begins before the completion of a data access is not considered to be a pipelined access, whereas a second data access is.

The Am29050 microprocessor allows only one pipelined access at any given time, and does not perform pipelined accesses for the Load Multiple and Store Multiple instructions.

5.2.8.1 TRADEOFFS

For accesses that require more than one cycle to complete, pipelined accesses perform better than simple accesses, because they allow the overlap of portions of two accesses. In addition, the ability to latch addresses in support of pipelined accesses reduces utilization of the Address Bus, thereby reducing contention between instruction and data accesses. However, devices and memories that support pipelined accesses are somewhat more complex than devices and memories that support only simple accesses.

Support for pipelined operations is required for both the primary access and the pipelined access. The slave performing the primary access must contain some means for storing the address and other information about the access. The slave performing the pipelined access must be able to restrict its use of the Instruction Bus or Data Bus, and must be prepared to cancel the access (as explained below).

5.2.8.2 PIPELINED OPERATION

Pipelined accesses are controlled by the signals PEN, PIA, and PDA. Because of internal data-flow constraints, the Am29050 microprocessor does not perform a pipelined store operation while a load is in progress. However, the protocol does not restrict pipelined operations. Other channel masters may perform a pipelined store during a load.

Except as noted above, the processor attempts to perform pipelining for every access; the input PEN indicates whether or not pipelining is supported for a given access. The PEN input can be driven by individual devices, or can be tied active or inactive to enable or disable system-wide pipelined accesses. The processor ignores the value of PEN unless it is performing an access.

The processor samples PEN on every cycle during a primary access. If PEN is active on any cycle, the processor may cease to drive the address and associated controls for the primary access in the next cycle. Following this, PEN must remain active. If the processor requires another access before the primary access completes, it drives the address and controls for the second access, asserting PIA or PDA to indicate that the second access is a pipelined access.

The output IREQ or DREQ, as appropriate, is not asserted for a pipelined access. Devices and memories that cannot support pipelined accesses should therefore ignore PIA and/or PDA, and base their operation upon IREQ and/or DREQ.

A device or memory that receives a request for a pipelined access may treat it as any other access, with one exception: the pipelined access cannot use the Instruction and
Data buses nor the associated controls (e.g., IRDY or DRDY). In the case of a data read or instruction access, the results of the pipelined access cannot be driven on the appropriate bus. In the case of a data write, the data does not appear on the Data Bus. Any other operations for the access, such as address decoding, can occur.

When the primary access completes (as indicated by IRDY or DRDY), the pipelined access becomes a primary access. The processor indicates this by asserting IREQ or DREQ, depending on the type of access. The device or memory performing the pipelined access may complete the access as soon as IREQ or DREQ is asserted (possibly in the same cycle). When the access becomes a primary access, it controls the channel as any other primary access. For example, it may determine whether or not another pipelined access can be performed.

When the pipelined access becomes a primary access, the output PIA or PDA remains asserted for one cycle, to insure continuity of control within the slave device or memory. In the cycle after IREQ or DREQ is asserted, PIA or PDA is de-asserted, unless the processor initiates another pipelined access, in which case PIA or PDA remains asserted for the new access.

**5.2.8.3 CANCELLATION OF PIPELINED ACCESSES**

If the processor takes an interrupt or trap before a pipelined access becomes a primary access, the request for the pipelined access is removed from the channel. This may occur, for example, when IERR or DERR is signaled for the primary access.

If the pipelined access is removed from the channel, the slave device or memory does not receive an IREQ or DREQ for the pipelined access. Hence, the pipelined access does not become a primary access, and cannot complete. A pipelined access may be canceled in this manner at any time before it becomes a primary access. Because of this, a pipelined access should not change the state of a slave device or memory until the pipelined access becomes a primary access.

**5.2.9 Burst-Mode Accesses**

A burst-mode access allows multiple instructions or data words at sequential addresses to be accessed with a single address transfer. The number of accesses performed, and the timing of each access within the sequence, is controlled dynamically by the burst-mode protocol. Burst-mode accesses take advantage of sequential addressing patterns, and provide several benefits over simple and pipelined accesses:

1. Simultaneous instruction and data accesses. Burst-mode accesses reduce the utilization of the Address Bus. This is especially important for instruction accesses, which are normally sequential. Burst-mode instruction accesses eliminate most of the address transfers for instructions, allowing the Address Bus to be used for simultaneous data accesses.

2. Faster access times. By eliminating the address-transfer cycle, burst-mode accesses allow addresses to be generated in a manner which improves access times.

3. Faster memory access modes. Many memories have special high-bandwidth access modes (e.g., static-column page mode and nibble mode). These modes generally require a sequential addressing pattern, even though addresses may not be presented explicitly to the memory for all accesses. Burst-mode accesses allow the use of these access modes, without hardware to detect sequential addressing patterns.
BURST-MODE OVERVIEW

The control-flow diagrams in Figure 5-2 and Figure 5-3 illustrate the operation of the processor and an instruction memory during a burst-mode instruction access. The control-flow diagrams in Figure 5-4 and Figure 5-5 illustrate the operation of the processor and a data memory or device during a burst-mode data access. These diagrams are for illustration only; nodes on these diagrams do not necessarily correspond to processor or slave states, and transitions on these diagrams do not necessarily correspond to processor cycles.

A burst-mode access is in one of the following operational conditions at any given time.

Established—The processor and slave device have successfully initiated the burst-mode access. A burst-mode access that has been established is either...

---

Figure 5-2 Processor Burst-Mode Instruction Accesses: Control Flow

---

Notes:
(1) IPB = Instruction Prefetch Buffer
active or suspended. An established burst-mode access may become preempted, terminated, or canceled.

**Active**—Instruction or data accesses and transfers are being performed as the result of the burst-mode access. An active burst-mode access may become suspended.

**Suspended**—No accesses or transfers are being performed as the result of the burst-mode access, but the burst-mode access remains established. Additional accesses and transfers may occur at some later time (i.e., the burst-mode access may become active) without the re-transmission of the address for the access.

**Preempted**—The burst-mode access can no longer continue because of some condition, but the burst-mode access can be re-established within a short amount of time.

**Terminated**—All required accesses have been performed.
Figure 5-4 Processor Burst-Mode Data Accesses: Control Flow

Canceled—The burst-mode access can no longer continue because of some exceptional condition. The access may be re-established only after the exceptional condition has been corrected, if possible.

Each of the preceding conditions, except for the terminated condition, is under the control of both the processor and slave device or memory. The terminated condition is determined by the processor, since only the processor can determine that all required accesses have been performed. The following sections discuss each of the above conditions with respect to the burst-mode protocol.
5.2.9.2 ESTABLISHING BURST-MODE ACCESSES

The Am29050 microprocessor attempts to perform all instruction prefetches using burst-mode accesses, except for instruction fetches at the last word before a 1-kb address boundary. For data accesses, the processor attempts to perform load-multiple and store-multiple operations using burst-mode accesses. The processor indicates that it desires a burst-mode access by asserting IBREQ or DBREQ during the cycle in which the initial address is placed on the Address Bus (however, note that these signals become valid later in the cycle than the address).

The inputs IBACK and DBACK indicate that a requested burst-mode access is supported. The processor ignores the value of IBACK unless IBREQ is asserted, and it ignores the value of DBACK unless DBREQ is asserted.

When it desires a burst-mode access, the processor continues to drive IBREQ or DBREQ on every cycle for which the address is valid on the Address Bus. During this time, the device or memory involved in the access may assert IBACK or DBACK to indicate that it can perform the burst-mode access. If IBACK or DBACK (as appropriate) is asserted while the initial address appears on the Address Bus, the burst-mode access is established. In the following cycle, the processor removes the request
address and de-asserts IREQ or DREQ. However, it continues to assert IBREQ or DBREQ.

If the burst-mode access is not established on the first access, the processor attempts to establish a burst-mode access on each subsequent address transfer, as long as there are more accesses yet to be performed. During any subsequent access, the addressed device or memory may establish a burst-mode access by asserting IBACK or DBACK. If the burst-mode access is never established, the default behavior is to have the processor transmit an address for every access.

5.2.9.3

ACTIVE AND SUSPENDED BURST-MODE ACCESSES

After the burst-mode access is established, IBREQ and DBREQ are used during subsequent accesses to indicate that the processor requires at least one more access. If IBREQ or DBREQ is active at the end of the cycle in which an access successfully completes (i.e., when IRDY or DRDY is active), the processor requires another access. If the slave device or memory previously has not preempted the burst-mode access, and does not preempt (by de-asserting IBACK or DBACK) or cancel (by asserting IERR or DERR) the burst-mode access in the cycle that the access completes, the additional access must be performed.

The execution rate of instructions is known only dynamically, so that in certain situations, a burst-mode instruction access must be suspended. If IBREQ is inactive during the cycle in which an instruction access completes, the burst-mode access is suspended (if it is neither preempted nor canceled at the same time). The burst-mode access remains suspended unless the processor requests a new instruction access (in which case IREQ is asserted), or unless the instruction memory preempts the burst-mode access.

A suspended burst-mode instruction access becomes active whenever the processor can accept more instructions. The processor activates the burst-mode access by asserting IBREQ. If the instruction memory does not preempt the burst-mode access during this cycle, an instruction access must be performed.

When a suspended burst-mode instruction access is activated, the resulting instruction access is not permitted to complete in the cycle in which IBREQ is asserted, but may complete in the next cycle. The reason for this restriction is that the burst-mode protocol is defined such that the combination of an active level on IBREQ and IRDY causes an instruction access (as previously discussed). If the instruction access completes immediately in the cycle that a suspended burst-mode access is activated, there is an ambiguity in the protocol: it is possible to interpret a single-cycle assertion of IBREQ as a request for two instructions.

The above ambiguity is resolved by delaying the instruction access resulting from a re-activated burst-mode access for a cycle. Since this restriction applies only when the Instruction Prefetch Buffer is full and the instruction memory is capable of a very fast access, the delayed instruction response has no performance impact.

The Am29050 microprocessor does not suspend burst-mode data accesses, because the data transfers occur to and from general-purpose registers, which are always available. However, other channel masters may suspend burst-mode data accesses (during direct memory accesses, for example). The principles for suspending burst-mode accesses are the same as those for instruction accesses discussed above.

5.2.9.4

PROCESSOR PREEMPTION, TERMINATION, AND CANCELLATION

The processor may preempt, terminate or cancel a burst-mode access by de-asserting IBREQ or DBREQ, and asserting IREQ or DREQ at some later point. During the
period after IBREQ or DBREQ is de-asserted and before IREQ or DREQ is asserted, the burst-mode access is in a suspended condition. Normally, the processor receives one more instruction or data word after IBREQ or DBREQ is de-asserted. However, this access may complete in the same cycle that IBREQ or DBREQ is de-asserted. Please note that the processor may de-assert IBREQ or DBREQ without receiving an IBACK or DBACK to acknowledge the burst-mode access.

The slave device or memory cannot distinguish between preempted, terminated, and canceled burst-mode accesses, when these are caused by the processor, until the processor asserts IREQ or DREQ. If the slave continues to assert IBACK or DBACK after IBREQ or DBREQ is de-asserted, the slave should be prepared to accept any new request during the cycle that IREQ or DREQ is asserted to begin the new access.

The reason for this is that the processor may attempt to establish a burst-mode access for the new access: if the slave is asserting IBACK or DBACK because of a previously preempted, terminated, or canceled burst-mode access, the processor interprets the active IBACK or DBACK as establishing the new burst-mode access and removes the request in the following cycle.

The processor preempts a burst-mode access when an external channel master arbitrates for the channel, or when a burst-mode fetch crosses a potential virtual-page boundary. Since the minimum page size is 1 kb, burst-mode instruction and data accesses are preempted whenever the address sequence crosses a 1-kb address boundary. The burst is re-established as soon as a new address translation is performed (if required). A new physical address is transmitted when the burst-mode access is re-established.

Note that the preemption resulting from page boundaries is advantageous for devices or memories that require counters to follow the burst-mode address sequence. Since all burst-mode accesses are word accesses, and the processor re-transmits an address at every 1-kb address boundary, an 8-bit counter in the slave device or memory is sufficient to follow the burst-mode address sequence. Additional address bits are simply latched.

The processor terminates a burst-mode access whenever all required instructions or data have been accessed. In the case of instruction accesses, the burst-mode access is terminated when a non-sequential fetch occurs. In the case of data accesses, the burst-mode access is terminated when the count indicates a single load or store remains. The last load or store is executed as a simple access.

The processor cancels a burst-mode access when an interrupt or trap is taken. Note that a trap may be caused by the burst-mode access, for example when a Translation Look-Aside Buffer miss occurs on an address in the burst-mode sequence. If the processor cancels a burst-mode access when an access in the sequence remains to be complete, this access must be completed in spite of the cancellation.

Canceled burst-mode data accesses may be restarted at some (possibly much later) point in execution via the Channel Address, Channel Data, and Channel Control registers. In this case, the burst-mode access is restarted at the point at which it was canceled, rather than at the beginning of the original address sequence.

5.2.9.5

SLAVE PREEMPTION AND CANCELLATION

The slave device or memory involved in a burst-mode access may preempt the access by de-asserting IBACK or DBACK. The processor samples IBACK and DBACK when IRDY and DRDY are active, so that IBACK and DBACK may be de-asserted as the last supported access is completed. However, IBACK and DBACK also may be de-asserted in any cycle before the access completes; to preempt the access, IBACK
or DBACK must remain inactive until IRDY or DRDY is asserted. If IBACK or DBACK is de-asserted when the processor is in a state where it expects an access, the access must be completed.

In general, the slave device or memory preempts the burst-mode access whenever it cannot support any further accesses in the burst-mode sequence. This normally occurs whenever an implementation-dependent address boundary is encountered (e.g., a cache-block boundary), but may occur for any reason. By preempting the burst-mode access, the slave receives a new request, with the address of the next instruction or data word required by the processor.

The slave device or memory may cancel a burst-mode access by asserting IERR or DERR in response to a requested access. The signals IBACK or DBACK need not be de-asserted at this time, but should be de-asserted in the next cycle.

Note that the IERR and DERR signals cause non-maskable traps, except in the case where IERR is asserted for an instruction which the processor does not execute.

5.2.10 Arbitration

External masters can gain access to the Address, Data, and Instruction buses by asserting the BREQ input. The processor completes any pending access, preempts any burst-mode access, and asserts the BGRT output. At this time, the processor places all channel outputs associated with the Address, Data, and Instruction buses in the high-impedance state.

For the first cycle that BGRT is asserted, the output BINV is also asserted. If the external master cannot control the Address Bus and associated controls in the cycle that BGRT is asserted, the active level on BINV may be used to define an idle cycle for the channel (i.e. any spurious access requests are ignored). The BINV signal is asserted only for a single cycle, so the external master must take control of the channel in the cycle after BGRT is asserted.

While the BREQ input remains asserted, the processor continues to assert BGRT. The external master has control over the channel during this time.

To release the channel to the processor, the external master de-asserts BREQ, but must continue to control the channel for the first cycle in which BREQ is de-asserted. In the cycle after BREQ is de-asserted, the processor asserts BINV and de-asserts BGRT; the external master should release control of the channel at this time. On the following cycle, the processor de-asserts BINV, and is able to use the channel. The processor re-establishes any burst-mode access preempted by arbitration.

The processor does not relinquish the channel when the LOCK signal is active. This prevents external masters from interfering with exclusive accesses.

5.2.11 Use of BINV to Cancel an Access

Besides using the BINV signal to transfer control of the channel from one master to another, the Am29050 microprocessor uses the BINV signal to cancel accesses after they have been initiated. To cancel an access, BINV is asserted during a cycle in which IREQ or DREQ also is asserted. If an access is canceled, the accompanying response (using IRDY, IERR, DRDY or DERR) is ignored during the cycle that BINV is asserted; thereafter, the system should not respond to the canceled access.
The BINV signal is used to cancel an instruction access in the following situations:

- When an interrupt or trap is taken;
- When an instruction fetch-ahead is canceled because a target block is only partially present in the Branch Target Cache memory;
- When an instruction TLB miss or protection violation occurs on an instruction access;
- When a branch instruction is the delay instruction of another branch, and the targets of both branches are in the Branch Target Cache memory (in this case, the external fetch for the target of the first branch is not required); and
- When the processor enters the Load Test Instruction Mode, and there is an active instruction request on the channel.

The BINV signal is used to cancel a data access in the following situations:

- When a data TLB miss or protection violation occurs on the data access; and
- When an interrupt or trap is taken in the cycle that a data access appears on the channel.

When a LOADSET instruction encounters a protection violation because store access is not permitted, the processor cancels the load access with BINV.

**5.2.12 Bus Sharing—Electrical Considerations**

When buses are shared among multiple masters and slaves, it is important to avoid situations where these devices are driving a bus at the same time. This may occur when more than one master or slave is allowed to drive a bus in the same cycle, if bus arbitration is incompletely or incorrectly performed. However, it also occurs when a master or slave releases a bus in the same cycle that another master or slave gains control, and the first master or slave is slow in disabling its bus drivers, compared to the point at which the second master or slave begins to drive the bus. The latter situation is called a bus collision in the following discussion.

In addition to the logical errors that can occur when multiple devices drive a bus simultaneously, such situations may cause bus drivers to carry large amounts of electrical current. This can have a significant impact on driver reliability and power dissipation. Since bus collisions usually occur for a small amount of time, they are of less concern, but may contribute to high-frequency electromagnetic emissions.

The Am29050 microprocessor channel is defined to prevent all situations where multiple drivers are driving a bus simultaneously. However, bus collisions may be allowed to occur, depending on the system design.

In the case of the Am29050 microprocessor channel, arbitration for the channel prevents the processor from driving the Address and Data buses at the same time as another channel master. If there is more than one external master, the system design must include some means for insuring that only one external master gains control of the channel, and that no external master gains control of the channel at the same time as the processor.

When the processor relinquishes control of the channel to an external master, bus collisions may be prevented by not allowing the external master to drive any bus while BINV is active. This insures that all processor outputs are disabled by the time the external master takes control of the channel. However, there is nothing in the channel protocol to prevent the external master from taking control as soon as BGRT is asserted.
Slave devices and memories are prevented from simultaneously driving the Instruction Bus or Data Bus by allowing only the device or memory performing a primary access to drive the appropriate bus. When a pipelined access becomes a primary access, it may drive the Instruction or Data Bus immediately, so that there is a potential bus collision if the pipelined access is performed by a slave other than the slave performing the original primary access. This bus collision may be prevented by restricting all slaves to driving the Instruction and Data buses in the second half-cycle (using SYSCLK, for example). Since the processor samples data only at the end of a cycle, this restriction does not affect performance.

When the processor performs a store immediately following a load, it drives the Data Bus and asserts DREQ for the store in the second cycle following the cycle in which the data for the load appears on the Data Bus. This provides a complete cycle for the slave involved in the load to disable its data drivers. The processor continues to drive the Data Bus until it receives a DRDY or DERR in response to the store; it ceases to drive the Data Bus in the cycle following the response.

5.2.13 **Channel Behavior for Interrupts and Traps**

If an interrupt or trap is taken, any burst-mode accesses are canceled. If a request for a pipelined access is on the Address Bus, this request is removed. Any other accesses are completed, and no new accesses are started, other than those required for the interrupt or trap. Note that any accesses that the processor expects to complete must be completed, even though burst-mode and pipelined accesses are canceled.

When interrupt or trap processing is complete, any canceled burst-mode accesses transactions are re-established, using the address of the access that was to be performed next when the interrupt or trap was taken. Uncompleted pipelined accesses are restarted, either by the interrupt return sequence in the case of an instruction access, or by restarting the initiating instruction in the case of a data access.

Note that the restarting of a pipelined access is not performed by the Channel Address, Channel Data, and Channel Control registers, since these registers may be required to restart the primary access. The instruction initiating the pipelined access is not allowed to complete until the primary access completes, so that the Program Counter 1 (PC1) Register contains the address of the initiating instruction when a pipelined access is canceled. The address in PC1 can restart this instruction on interrupt return.

5.2.14 **Effect of the LOCK Output**

The LOCK output provides synchronization and exclusion of accesses in a multi-processor environment. LOCK has no pre-defined effect for a system, other than the fact that the Am29050 microprocessor does not grant the channel to an external master while LOCK is active.

The LOCK output is asserted for the address cycle of the Load-and-Lock and Store-and-Lock instructions, and is asserted for both the read and write accesses of a Load and Set instruction. LOCK may also be active for an extended period of time, under control of the Lock bit in the Current Processor Status Register (this capability is available only to Supervisor-mode programs).

LOCK may be defined to provide any level of resource locking for a particular system. For example, it may lock the channel, an individual device or memory or a location within a device or memory.
When a resource is locked, it is available for access only by the processor with the appropriate access privilege. The mechanisms for restricting accesses, and the methods for reporting attempted violations of the restrictions, are system-dependent.

5.3 TEST/DEVELOPMENT INTERFACE

The Test/Development Interface consists of the inputs CNTL(1–0) and TEST, and the outputs STAT(2–0). The CNTL(1–0) inputs provide control of processor operation, and the STAT(2–0) outputs provide information about processor operation for external monitoring.

A hardware-development system uses CNTL(1–0) and STAT(2–0) to control the processor for the purposes of processor and system debug.

A hardware tester uses the TEST input to place all processor outputs in the high-impedance state. This allows the tester to check other system logic by driving processor outputs directly, without requiring that the processor be removed from the system.

5.3.1 Processor Status Outputs

The STAT(2–0) outputs indicate certain information about processor modes, along with other information about processor operation. STAT(2–0) may be used to provide feedback of processor behavior during normal processor operation and when the processor is under the control of a hardware-development system.

The encoding of STAT(2–0) is as follows:

<table>
<thead>
<tr>
<th>STAT2</th>
<th>STAT1</th>
<th>STAT0</th>
<th>Mode or Condition</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>Halt or Step Modes</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>Pipeline Hold Mode</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Load Test Instruction Mode, Synchronize</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Wait Mode</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>Interrupt Return</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>Taking Interrupt or Trap</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>Non-Sequential Instruction Fetch</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>Executing Mode</td>
</tr>
</tbody>
</table>

On any given cycle, the STAT(2–0) signals reflect the state of the processor's execute stage on the previous cycle. Where the conditions listed above are not mutually exclusive, the condition listed first is the one reflected on STAT(2–0).

The first cycle of a multi-cycle instruction (Load Multiple, Store Multiple, Interrupt Return, or Interrupt Return and Invalidate) is indicated as an "Executing Mode" cycle. When an interrupt or trap is taken, the first cycle is indicated as a "Taking Interrupt or Trap" cycle. Additional cycles of these multi-cycle operations are indicated as "Pipeline Hold" cycles.

A Low level on STAT2 indicates that the processor is idle, and may be used as an indication of processor performance. Since most processor instructions execute in a single cycle, and since extra cycles spent executing multiple-cycle operations are counted as Pipeline Hold cycles, a count of the number of cycles within a given time interval that the processor is not idle (i.e., a count of the number of cycles for which STAT2 is High) is a close approximation to the number of instructions executed within that interval, and thus approximates the instruction-execution rate. The only source of error in this approximation are the cycles in which the processor takes an interrupt or
trap. If desired, this source of error can be eliminated by fully decoding the STAT(2-0) outputs.

The STAT2 output also may be used to implement processor timeouts for reliability. For example, a Low level on STAT2 may be used to start a hardware timeout counter, with a High level resetting and stopping the counter. If the counter exceeds a maximum expected count of idle cycles for a system, it is likely that an error has occurred. This error can be reported by the WARN trap (see Section 3.5.6 and Section 5.6).

The value 010 on the STAT(2-0) outputs is used by the hardware breakpoints for synchronization of external hardware. If this value appears during normal processor operation for one cycle, a valid breakpoint comparison has been detected with the BSY bit being 0. The processor takes no other actions related to the breakpoint. The synchronization pulse can be used to trigger or synchronize external logic.

5.3.2 CPU Control Inputs

Certain processor operational modes are under the control of the CNTL(1-0) inputs. These inputs have an effect on the processor mode as follows:

<table>
<thead>
<tr>
<th>CNTL1</th>
<th>CNTL0</th>
<th>Mode</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>Load Test Instruction</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>Step</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>Halt</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>Normal</td>
</tr>
</tbody>
</table>

These inputs are asynchronous to the processor clock. In addition, changes on the CNTL(1-0) inputs are restricted so that only CNTL1 or CNTL0, but not both, may change in any given processor cycle. The allowed transitions are shown in Figure 5-6. The restriction on CNTL(1-0) transitions allows these inputs to be driven directly by an external hardware-development system or tester, without any intervening logic. Proper operation is insured by making only single-input changes on CNTL(1-0), and by restricting the interval between all changes to be greater than a processor cycle. If these restrictions are violated, processor operation is unpredictable, and a processor reset is required to resume predictable operation.

Note that, because of the restriction described above, it is not possible to transition directly between all possible modes that are controlled by these inputs. For example, the processor cannot go from the Load Test Instruction mode to Normal operation without first entering the Halt or Step modes.

5.3.3 Hardware Development

The Halt, Step, and Load Test Instruction modes of operation are defined to support the debug of the processor system (both hardware and software) by a hardware-development system. This section describes the use of these modes during debug, and describes the corresponding activity on the CNTL(1-0) and STAT(2-0) lines.

5.3.3.1 Halt Mode

The Halt mode allows the hardware-development system to stop processor operation while preserving its internal state. The Halt mode is defined so that normal operation may resume from the point at which the processor enters the Halt mode. All external accesses are completed before the Halt mode is entered, so a minimum amount of system logic is required to support the Halt mode.
The Halt mode can be invoked by applying a value of 10 to the CNTL(1–0) inputs. The processor enters the Halt mode within two or three cycles after the CNTL(1–0) inputs are changed (depending on synchronization time), except that it first completes any external data access in progress.

The Halt mode can also be entered as the result of executing a HALT instruction or encountering a hardware breakpoint with a hardware-development system attached (see below). When a HALT instruction is executed or a breakpoint is encountered, the processor enters the Halt mode on the next cycle, except that it completes any external data accesses in progress. In this case, the processor remains in the Halt mode even though the CNTL(1–0) inputs are 11. However, the processor cannot exit the Halt mode except as the result of the CNTL(1–0) or RESET inputs. If the instruction following a Halt instruction has an exception (e.g., instruction TLB Miss), the trap associated with the exception is taken before the processor enters the Halt mode.

The Halt instruction is designed to be used as an instruction breakpoint by the hardware-development system, augmenting the hardware breakpoints provided by the Am29050 microprocessor. However, the Halt instruction normally is a privileged instruction, causing a Protection Violation trap upon attempted execution by a User-mode program. The hardware-development system can disable this Protection Violation by holding the CNTL(1–0) inputs at 10 during a reset; this signals the presence of an external debugger and disables protection checking for Halt instructions until the next processor reset.

If an external hardware debugger has signaled its presence, any condition that would otherwise cause the processor to take a Monitor trap instead causes the processor to enter the Halt Mode at location 16 in Instruction ROM address space (the WARN Trap handler). This permits the hardware-development system to debug system-level routines. If the processor enters the Halt Mode due to a synchronous trap, the Reason

**Figure 5-6 Valid Transitions on CNTL(1–0) Inputs**

- **Normal 11**
- **Halt 10**
- **Step 01**
- **Load Test Instruction 00**
Vector Register is updated, and the MM bit of the Current Processor Status Register is set.

If an external debugger has signaled its presence and a valid breakpoint comparison is encountered, the processor enters the Halt Mode at the beginning of the Trace Trap handler. The Shadow Program Counter registers point to the location where the breakpoint was encountered.

If a burst-mode instruction access is established before the processor enters the Halt mode, it remains established when the processor enters the Halt mode, but is suspended.

While in the Halt mode, the processor does not execute instructions, and performs no external accesses. The Timer Facility does not operate (i.e., the Timer Counter Register does not change).

The Halt mode is exited whenever the Reset mode is entered, or the CNTL(1-0) lines place the processor into another mode. The only valid transitions on the CNTL(1-0) lines from the value of 10 are to the value 00, which places the processor into the Load Test Instruction mode, and to the value 11, which causes the processor to resume normal execution.

5.3.3.2 **STEP MODE**

The Step mode causes the Am29050 microprocessor to execute at a rate determined by the hardware-development system, allowing the hardware-development system to easily control and monitor processor operation. The Step mode is defined so that normal operation may resume after stepping is complete. Since all external accesses are completed during any step, a minimum amount of system logic is required to support the slower rate of execution.

The Step mode is invoked by the application of a value of 01 to the CNTL(1-0) inputs. The processor enters the Step mode within two or three cycles after the CNTL(1-0) inputs are changed (depending on synchronization time), except that it first completes any external data access in progress.

If a burst-mode instruction access is established before the processor enters the Step mode, it remains established when the processor enters the Step mode, but is suspended.

While in the Step mode, the processor does not execute instructions, and performs no external accesses. The Timer Facility does not operate (i.e., the Timer Counter Register does not change) while the processor is in the Step mode.

The Step mode is identical to the Halt mode in every respect except one. This difference is apparent on the transition of the CNTL(1-0) lines from the value 01 (Step mode) to the value 11 (Normal). On this transition, the processor steps. That is, the processor state advances by one pipeline stage, and it completes any external access which is initiated by this state change.

If the processor immediately enters the Pipeline Hold mode on a step, the step may require multiple cycles to execute, since the processor pipeline cannot advance while the processor is in the Pipeline Hold mode. The STAT(2-0) lines reflect the state of the processor for every cycle of the step; STAT2 is High for one cycle, and only one cycle, before the step completes.

The Timer Counter decrements by one for every cycle of the step; if the Timer Counter decrements to zero, the usual Timer-Facility actions are performed, and a Timer interrupt may occur.
After the step is performed, the processor re-enters the Step mode, and remains in the Step mode even though the CNTL(1–0) inputs have the value 11 (this prevents the need for a time-critical transition on the CNTL(1–0) inputs). The processor remains in this condition until the CNTL(1–0) inputs transition to 10 or 01 (or RESET is asserted). The transition to 10 causes the processor to enter the Halt mode, and is used to clear the Step mode. The transition to 01 causes the processor to remain in the Step mode, so that it may perform additional steps.

5.3.3.3 HALT/STEP MODE AND LOADM/STOREM

If the Am29050 microprocessor is placed in the Halt or Step mode while either a LOADM or STOREM instruction is being executed, the STAT(2–0) outputs indicate the Halt or Step mode for one cycle (STAT(2–0) = 000), and then indicate the Pipeline Hold mode (STAT(2–0) = 001) until the final access of the LOADM or STOREM is complete, at which time they return to indicating the Halt or Step mode. A hardware-development system must therefore ignore any single-cycle Halt/Step mode indication on the STAT(2–0) outputs as an indication that the processor is halted.

5.3.3.4 LOAD TEST INSTRUCTION MODE

The processor incorporates an Instruction Register (IR) that holds instructions while they are decoded. In the Load Test Instruction mode, the IR is enabled to receive the content of the Instruction Bus, regardless of the state of the processor’s Instruction Fetch Unit. This allows the hardware-development system to provide instructions for execution directly, thereby providing means for the hardware-development system to examine and modify the internal state of the processor without altering the processor’s instruction stream.

The hardware-development system can place an instruction in the IR by first placing 00 on CNTL(1–0). The processor enters the Load Test Instruction mode within two or three cycles after the CNTL(1–0) inputs are changed (depending on synchronization time), except that it first preempts any established burst-mode instruction access. The Load Test Instruction mode can be entered only from the Halt or Step modes. Note that the burst-mode instruction access that is preempted here was previously suspended for the Halt or Step modes.

When the processor enters the Load Test Instruction Mode, the processor behaves as though the Current Processor Status Register were forced to the value shown in Figure 5-7, even though the register is not changed.

**Figure 5-7 Processor Status While in Load Test Instruction Mode**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>u</td>
<td>u</td>
<td>u</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>u</td>
<td>u</td>
<td>u</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>u</td>
<td>u</td>
<td>u</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>u</td>
<td>u</td>
<td>u</td>
<td>1</td>
</tr>
</tbody>
</table>

The visible processor state, including the Shadow Program Counter Registers, remains unchanged while the processor is in the Load Test Instruction Mode. The processor status shown in Figure 5-7 remains in effect until the next transition to the Normal Mode via the Halt Mode.

While the processor is in the Load Test Instruction mode, it ignores all interrupts and traps, except for the Data Access Exception and Coprocessor Exception. These latter exceptions are also ignored if the load or store which causes the exception has the
value 110 for the OPT bits (indicating a load or store to the hardware-development system).

The STAT(2–0) lines have a value of 010 while the processor is in the Load Test Instruction mode; this may be used as a verification that the processor is loading the IR.

While the processor is in the Load Test Instruction mode, the IR continually is storing the value on the Instruction Bus; any change in the value on this bus is reflected in the IR on the next cycle. The hardware-development system can place a desired instruction into the IR by driving this instruction on the Instruction Bus. The value of IRDY and IERR are irrelevant.

The processor exits the Load Test Instruction mode in the second cycle following a change on the CNTL(1–0) inputs. The only valid change here is either to the Halt mode (CNTL(1–0) = 10) or the Step mode (CNTL(1–0) = 01).

When the Load Test Instruction mode is exited, the most recent value stored into the IR is held. If the processor is placed in the Step mode, the IR is marked as having valid content, enabling the processor to decode and execute the instruction. If the processor is placed in the Halt mode, it ignores any instruction placed in the IR by the Load Test Instruction mode, and reverts to its normal instruction-fetch mechanism.

Once the IR has been set by the Load Test Instruction mode, the instruction in the IR may be executed via the Step mode as discussed in the previous subsection. A single step is sufficient to cause the execution of this instruction. However, because of pipelining, multiple steps may be required before the instruction completes execution. If more than one step is performed, the processor executes the instruction in the IR on every step. If it is desired to step an instruction to completion without repeated execution, a NO-OP may be set into the IR (using the Load Test Instruction mode) after the first step.

The Load Test Instruction mode may be used to cause the execution of most processor instructions (restrictions are discussed below). This allows inspection and modification of processor state.

The hardware-development system uses load and store instructions, executed via the Load Test Instruction mode, to alter and inspect the contents of general-purpose registers. The OPT field for these loads and stores have the value 110; this causes the system to ignore the resulting access. Furthermore, it causes the Am29050 microprocessor to ignore the DRDY and DERR responses for the access; the Am29050 microprocessor completes the access at the end of the next stepped instruction, rather than upon the assertion of DRDY. This eliminates the need for the hardware-development system to generate a synchronous DRDY in response to the load or store.

Because of sequencing constraints, the Load Test Instruction mode cannot be used to cause the execution of the following instructions: conditional jumps, Load Multiple, Store Multiple, Interrupt Return, and Interrupt Return and Invalidate. Unconditional jumps and calls are permitted, but affect only the Program Counter (instruction sequencing is not affected).

It is not possible to execute a load directly following a store—nor a store directly following a load—using the Load Test Instruction mode. At least one NO-OP (or other operation) must be executed between adjacent loads and stores, because of control conflicts that arise when these instructions are stepped in a system that performs the resulting accesses at normal speed. However, a sequence of only loads or only stores is permitted without restriction.
The contents of the Program Counter 0, Program Counter 1, Program Counter 2, Channel Address, Channel Data, Channel Control, and ALU Status registers are not updated while instructions are executed via the Load Test Instruction mode, except explicitly by Move To Special Register instructions. Instructions executed using the Load Test Instruction mode may access protected processor state even though the processor is in the User mode.

Instructions executed via the Load Test Instruction mode may be used to access an external device or memory. Recall that the processor completes any normal data access before completing a step. This allows the processor to access devices and memories on behalf of the hardware-development system, and simplifies the timing constraints on the hardware-development system.

During processor execution via the Load Test Instruction mode, the processor retains the information required to resume normal operation. If any processor state is modified by the hardware-development system, this state must be restored properly for normal operation to resume properly.

In order to leave the Load Test Instruction mode and resume normal execution, an IRET instruction is placed into the IR and stepped through the processor pipeline. When the IRET instruction is executed, the processor re-fetches the instructions at the addresses in the Shadow Program Counter 0 and Shadow Program Counter 1 registers. Following this, a transition on CNTL(1-0) to the Halt mode (CNTL(1-0) = 10) and then to the Normal mode (CNTL(1-0) = 11) causes the processor to leave the Load Test Instruction mode and resume normal operation. Alternatively, the hardware-development system can continue to use the Step mode to maintain control of the processor and step through its normal execution sequence.

**SUMMARY OF DEVELOPMENT SYSTEM OPERATION**

When the capabilities provided by the Halt, Step, and Load Test Instruction Register modes are combined, an extremely flexible test and development interface results. The following is an example sequence performed by the hardware-development system during debug:

1. Halt the processor either by a HALT instruction, by the hardware breakpoints, or by a 10 on the CNTL(1-0) inputs. The HALT instruction may be used as a primitive operation in the implementation of a general instruction-breakpoint capability.

2. Load the IR with an instruction to inspect or alter the processor state. The hardware-development system should wait for the value 010 on STAT(2-0) (Load Test Instruction mode) before driving the Instruction Bus. After the IR is loaded, the hardware-development system sets CNTL(1-0) to 01 (Step mode).

3. Step the processor by a transition of CNTL(1-0) from 01 to 11 and back to 01. Data may be supplied on the Data Bus during one of the steps to satisfy a load operation; the data must be held valid until the stepped instruction completes.

4. Repeat steps 2 and 3 as desired. Finally, perform steps 2 and 3 using an IRET instruction.

5. After the final step, enter the Halt mode by placing 10, instead of 01, on CNTL(1-0).

6. Resume normal execution by placing 11 on CNTL(1-0).
5.3.4 **Hardware Testing**

The Test mode in the Am29050 microprocessor allows processor outputs to be driven directly for testing or diagnostic purposes. The Test mode places all processor outputs (except MSERR) into the high-impedance state, so that they do not interfere electrically with externally supplied signals. In all other respects, processor operation is unchanged.

The Test mode is invoked by an active level on the $\overline{\text{TEST}}$ input, regardless of the processor's operational mode (for example, the Test mode is not affected by the Halt mode). The disabling of processor outputs is performed combinatorially, and is asynchronous to SYSCLK.

For some outputs, the transition to the high-impedance state that results from the Test mode may occur at a much slower rate than applies during normal system operation (for example, when the processor relinquishes the channel to another master). For this reason, the Test mode may not be appropriate for special user-defined purposes.

Note that SYSCLK is also placed in the high-impedance state by the Test mode. This allows the testing of external clock-distribution circuits, but care must be taken to ensure that a high-impedance SYSCLK output does not have an adverse effect on the system. Furthermore, if SYSCLK is disabled, and a signal is not externally supplied, processor state may be lost.

### EXTERNAL INTERRUPTS AND TRAPS

An external device causes an interrupt by asserting one of the $\overline{\text{INTR}(3-0)}$ inputs, and causes a trap by asserting one of the $\overline{\text{TRAP}(1-0)}$ inputs. Transitions on each of these inputs may be asynchronous to the processor clock; they are protected against metastable states. For this reason, an assertion of one of these inputs that meets the proper set-up-time criteria does not cause the corresponding interrupt or trap until the second following cycle.

The $\overline{\text{INTR}(3-0)}$ inputs are prioritized with respect to each other and with respect to the processor. To resolve conflicts between these inputs, the inputs are prioritized in order, so that the interrupt caused by $\overline{\text{INTR}0}$ has the highest priority, and the interrupt caused by $\overline{\text{INTR}3}$ has the lowest priority.

The interrupts caused by $\overline{\text{INTR}(3-0)}$ may be masked by the Disable Interrupts (DI) or Disable All Interrupts and Traps (DA) bits of the Current Processor Status Register. In addition, the Interrupt Mask (IM) field of the Current Processor Status Register sets the priority of the processor with respect to these inputs. The IM field enables the $\overline{\text{INTR}(3-0)}$ inputs as follows:

<table>
<thead>
<tr>
<th>IM Value</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>$\overline{\text{INTR}0}$ enabled</td>
</tr>
<tr>
<td>01</td>
<td>$\overline{\text{INTR}(1-0)}$ enabled</td>
</tr>
<tr>
<td>10</td>
<td>$\overline{\text{INTR}(2-0)}$ enabled</td>
</tr>
<tr>
<td>11</td>
<td>$\overline{\text{INTR}(3-0)}$ enabled</td>
</tr>
</tbody>
</table>

Note that the interrupt caused by the $\overline{\text{INTR}0}$ input cannot be disabled by the IM field.

If one of the $\overline{\text{INTR}(3-0)}$ inputs is active, and the resulting interrupt is disabled by the DA bit, DI bit or IM field, the Interrupt Pending (IP) bit of the Current Processor Status Register is set. The IP bit is reset if the interrupt is enabled, or if all disabled external interrupts are de-asserted.
The TRAP(1–0) inputs are prioritized with respect to each other, so that the trap caused by TRAP0 has priority over the trap caused by TRAP1 when a conflict occurs. Both TRAP0 and TRAP1 have priority over the INTR(3–0) inputs. The TRAP(1–0) inputs cannot be disabled selectively. Both traps, however, can be disabled by the DA bit in the Current Processor Status Register.

The INTR(3–0) and TRAP(1–0) inputs are level-sensitive. Once asserted, they must be held active until the corresponding interrupt or trap is acknowledged by the interrupt or trap handler (this acknowledgment is system-dependent, since there is no interrupt-acknowledge mechanism defined for the processor).

If any of these inputs is asserted, then de-asserted before it is acknowledged, it is not possible to predict (unless the interrupt or trap is masked) whether or not the processor has taken the corresponding interrupt or trap. During interrupt and trap processing, the vector number is determined in part by which of the INTR(3–0) and TRAP(1–0) inputs is active. If the input causing an interrupt or trap is de-asserted before the vector number is determined, the vector number is unpredictable, with the result that processor operation is also unpredictable.

There is a three-cycle latency from the de-assertion of an INTR(3–0) or TRAP(1–0) input to the time that the corresponding interrupt or trap is actually not recognized by the processor. The de-assertion must be timed so that, when the corresponding mask is reset, the processor does not recognize the interrupt or trap. Otherwise, a spurious interrupt or trap may occur.

5.5 PROCESSOR RESET

When power is first applied to the processor, it is in an indeterminate state, and must be placed in a known state. Also, under certain circumstances, it may be necessary to place the processor in a defined state. This is accomplished by the Reset mode, which places the processor into a pre-defined state (see Section 3.9).

The Reset mode is invoked by asserting the RESET input, and can be entered only if the SYSCLK pin is operating normally, whether or not the SYSCLK pin is being driven by the processor (see Section 5.7). The Reset mode is entered within four processor cycles after RESET is asserted. The RESET input must be asserted for at least four processor cycles to accomplish a processor reset.

The Reset mode can be entered from any other processor mode (e.g., the Reset mode can be entered from the Halt mode). If the RESET input is asserted at the time that power is first applied to the processor, the processor enters the Reset mode only after four cycles have occurred on the SYSCLK pin.

The Reset mode is exited when the RESET input is de-asserted. Either three or four cycles after RESET is de-asserted (depending on internal synchronization time), the processor performs an initial instruction access on the channel. The initial instruction access is directed to address 0 in the instruction read-only memory (Instruction ROM). If instruction ROM is not implemented in a particular system, another device or memory must respond to this instruction fetch.

If the CNTL(1–0) inputs are 10 or 01 when RESET is de-asserted, the processor enters the Halt or Step mode, respectively. If the processor enters the Halt mode immediately after reset, the protection checking that normally applies to the Halt instruction is disabled, so that the Halt instruction can be used as an instruction breakpoint in a User-mode program. The Load Test Instruction mode cannot be directly entered from the Reset mode. If the CNTL(1–0) inputs are 00 immediately
after \texttt{RESET} is de-asserted, the effect on processor operation is unpredictable. If the CNTL(1–0) inputs are 11, the processor enters the Executing mode.

The processor samples the STAT0 output internally when \texttt{RESET} is asserted. A High level on STAT0 in this case is used to enable a special test configuration, and causes the processor to be inoperable. When \texttt{RESET} is asserted, the processor drives STAT0 Low in order to disable this test configuration. However, if processor outputs are disabled by the Test mode, the processor is not able to drive STAT0. Thus, if \texttt{RESET} is asserted when the processor is in the Test mode, the STAT0 pin must be driven Low externally. (In a master/slave configuration, as described in Section 5.8, STAT0 is driven Low by the master processor when \texttt{RESET} is asserted.)

\section*{5.6 WARN INPUT}

An inactive-to-active transition on the \texttt{WARN} input causes a \texttt{WARN} trap to be taken by the processor. The \texttt{WARN} trap cannot be disabled; the processor responds to the \texttt{WARN} input regardless of its internal condition, unless the \texttt{RESET} input also is asserted. This input is provided so that the system can gain control of the processor in extreme situations, such as when system power is about to be removed or when a severe non-recoverable error occurs.

The \texttt{WARN} input is edge-sensitive, so that an active level on the \texttt{WARN} input for long intervals does not cause the processor to take multiple \texttt{WARN} traps. However, \texttt{WARN} must be held active for at least 4 cycles in order to be properly recognized by the processor. The processor still takes the \texttt{WARN} trap if \texttt{WARN} is de-asserted after four cycles. Another \texttt{WARN} trap occurs if \texttt{WARN} makes another inactive-to-active transition.

The processor enters the Executing mode when the \texttt{WARN} input is asserted, regardless of its previous operational mode. Either seven or eight cycles after \texttt{WARN} is asserted (depending on internal synchronization time), the processor performs a trap-handler instruction access on the channel. This instruction access is directed to address 16 in the instruction read-only memory (Instruction ROM). If Instruction ROM is not implemented in a particular system, another device or memory must respond to this instruction fetch.

If the CNTL(1–0) inputs are 10 or 01 when the trap-handler instruction fetch completes, the processor enters the Halt or Step mode, respectively. Before the completion of this instruction fetch, the CNTL(1–0) inputs are irrelevant, except that the Load Test Instruction mode cannot be entered directly after a \texttt{WARN} trap is taken. If the CNTL(1–0) inputs are 00 immediately after \texttt{WARN} is de-asserted, the effect on processor operation is unpredictable. If the CNTL(1–0) inputs are 11, the processor remains in the Executing mode.

\section*{5.7 CLOCKS}

The Am29050 microprocessor supports two methods of system-clock generation and distribution. In one arrangement, the processor generates a clock for the system at its operating frequency; this clock appears on the SYSCLK pin, and may be distributed externally to other system components. In the second arrangement, the system provides its own clock generation and distribution; in this case, the processor receives the externally generated clock on the SYSCLK pin.
In both arrangements, the circuits that generate and buffer SYSCLK are designed to minimize the apparent skew between internal processor clocks and external system clocks.

The processor provides a power-supply pin named PWRCLK for the SYSCLK driver that is independent of all other chip power distribution. The separate PWRCLK supply electrically isolates other processor circuits from noise which might be induced on the power supply by the SYSCLK driver. The PWRCLK pin also is used to decide between the two possible clocking arrangements.

5.7.1 Processor-Generated Clock

If power (i.e., +5 volts) is applied to the PWRCLK pin, the processor is configured to generate clocks for the system. In this case, the SYSCLK pin is an output, and the signal on INCLK is used to generate the system clock. The processor divides the INCLK signal by two in the generation of SYSCLK, so INCLK should be driven at twice the processor's operating frequency.

5.7.2 System-Generated Clock

If the PWRCLK pin is grounded, the processor is configured to receive an externally generated clock. In this case, the SYSCLK pin is an input used directly as the processor clock. SYSCLK should be driven at the processor's operating frequency. In this configuration, the INCLK input should be tied High or Low, except in certain master/slave configurations as discussed in Section 5.8.

5.7.3 Clock Synchronization

The SYSCLK pin is at a High level during the first half of the processor cycle, and at a Low level during the second half of the processor cycle. Thus, a processor cycle begins on a Low-to-High transition of SYSCLK. The definition of the beginning of the processor cycle is independent of the clocking arrangement chosen for a particular system.

In some systems, it might be desirable to have two or more processors operate in lock-step synchronization, with each processor driven by a common INCLK signal. In this case, synchronization of the processors is achieved by the RESET input. If the de-assertion of RESET meets a specified set-up time with respect to the High-to-Low transition of INCLK, the SYSCLK output is guaranteed to be Low after the second following rising edge of INCLK. Thus, all processors may be synchronized as required.

5.7.4 Electrical Specifications

The electrical specifications for SYSCLK are different than the specifications for most other processor inputs and outputs. In order to reduce clock-skew effects, the SYSCLK pin is electrically compatible with the processor's CMOS circuits, rather than being compatible with transistor-transistor-logic (TTL) circuits.

Note that the SYSCLK pin is placed in the high-impedance state by the Test mode. If an externally generated clock is not supplied in this case, processor state may be lost.
**5.8 MASTER/SLAVE CHECKING**

Each Am29050 microprocessor output has associated logic which compares the signal on the output with the signal that the processor is providing internally to the output driver. The comparison between the two signals is made any time a given driver is enabled, and any time the driver is disabled only because of the Test mode. If, when the comparison is made, the output of a driver does not agree with its input, the processor asserts the MSERR output on the second following cycle.

When the processor asserts MSERR, it takes no other actions with respect to the detected miscomparison. In particular, no traps occur. However, MSERR may be used externally to perform any system function, including the generation of a trap.

**5.8.1 Master/Slave Operation**

If there is a single processor in the system, the MSERR output indicates that a processor driver is faulty, or that there is a short-circuit in a processor output. However, a much higher level of fault detection is possible if a second processor (called a slave) is connected in parallel with the first (called a master), where the slave processor has outputs disabled by the Test mode.

The slave processor, by comparing its outputs to the outputs of the master processor, performs a comprehensive check of the operation of the master processor. In addition, if the slave processor is connected at the proper position on the channel, it may detect open circuits and other faults in the electrical path between the master processor and its local devices and memories. Note that the master processor still performs the comparison on its outputs in this configuration.

**5.8.2 Preventing Spurious Errors**

When two processors are connected in a master/slave configuration, it is necessary to prevent spurious assertions of MSERR. These result from situations where the outputs of the slave processor do not agree with the outputs of the master processor, but both processors are operating correctly.

There are several potential sources of spurious errors in a master/slave configuration that are avoided by the Am29050 microprocessor design:

1. Unimplemented bits in processor registers that are reflected on processor outputs. This is avoided in the Am29050 microprocessor design by having all unimplemented bits be read as 0.

2. Unpredictable values for channel signals. If a DERR or IERR response is asserted in response to an access, the Data Bus or Instruction Bus may be at an indeterminate level (e.g., high-impedance), causing the master and slave processors to detect different values. If these values are later reflected on processor outputs, a spurious MSERR assertion may occur. The Am29050 microprocessor avoids this problem by ignoring the instruction or data word returned with DERR or IERR.

3. Unpredictable power-up state that is reflected on processor outputs. The Am29050 microprocessor avoids this problem upon reset by forcing to a known value any state that might be reflected on outputs before the completion of initialization.

Another source of spurious errors is a lack of synchronization between the master and slave processors. To maintain synchronization between the master and slave processors, it is first necessary that they operate with identical clocks. This is
accomplished by having the master processor drive SYSCLK, with the slave processor receiving SYSCLK as an input, or by driving both processors' SYSCLK inputs with the same externally generated clock.

However, the fact that both processors operate with the same clock is not sufficient to guarantee synchronization. Asynchronous processor inputs, if they are truly asynchronous to the operation of the master and slave processors, may affect the master processor a cycle sooner or later than they affect the slave processor. For this reason, the relevant asynchronous inputs (i.e., WARN, INTR(3-0), TRAP(1-0), CNTL(1-0) and RESET) must be externally synchronized to both the master and slave processors. Note that in the case of RESET, only the active-to-inactive transition must be synchronized.

5.8.3 Switching Master and Slave Processors

In some master/slave configurations, it might be desirable to give the slave processor control over the system when an error is isolated to the master processor. It is possible to grant control of the system to the slave processor by taking it out of the Test mode, and placing the master processor into the Test Mode. Note that synchronization must be maintained when this is accomplished (e.g., using the Halt mode).

If the original master processor is configured to generate SYSCLK in this case, the slave processor also must generate SYSCLK when it becomes a master. Because of this, the INCLK signal must be supplied to both the master and slave processors, with both processors being configured to generate clocks.

In this master/slave configuration, the slave processor still receives SYSCLK from the master processor as described previously. The slave processor does not drive SYSCLK because of the Test mode. However, when the slave processor is taken out of the Test mode, it is able to drive SYSCLK as required.

Note that this processor-switching scheme may be generalized to more than two processors.
A coprocessor for the Am29050 microprocessor is an off-chip extension of the processor's execution unit. The Am29050 microprocessor communicates with the coprocessor using a mechanism that is very similar to the mechanism used to communicate with other external devices and memories. However, because the coprocessor extends the instruction-execution capabilities of the processor, transfers to and from the coprocessor are in terms of operands, operation codes, results, and status information. This is in contrast to address and data transfers that occur for other types of external accesses. This chapter describes the coprocessor interface, both from a software and a hardware point of view.

6.1 COPROCESSOR PROGRAMMING

6.1.1 Overview of Coprocessor Operations

A program executes the following steps to perform a coprocessor operation. This sequence is intended only as a guide, since there are many possible variations:

1. Send operands to the coprocessor. The number of transfers to the coprocessor depends on the number of operands, and the length of each operand. As many as 64 bits of information can be transferred in a single cycle.

2. Send an operation code and other operation information to the coprocessor. The operation can be specified by as many as 64 bits of information.

3. Start the coprocessor operation. This can occur simultaneously with the operation-code transfer of step 2.

4. Read the coprocessor results. The number of transfers from the coprocessor depends on the number of results, and the length of each result.

The above sequence is defined so that coprocessor operations may be concurrent with other processor operations, including external accesses. This is possible because coprocessor operations are decoupled from the transfer of information to and from the coprocessor. Once the operation is started, in step 3, the processor may continue further execution, overlapped with coprocessor execution, until the coprocessor results are read.

Because the Am29050 microprocessor implements overlapped loads, it can continue execution after attempting to read a coprocessor result. However, if the processor attempts to use the result before the operation is complete, the processor enters the Pipeline Hold mode until the operation is complete.

In certain circumstances, it may be desired to perform multiple coprocessor operations before any results are read. For example, certain array computations form a single result from more than one operation. In this case, steps 1 through 3 above may be repeated—in any combination desired and as many times as desired—before results are read. The coprocessor interface allows the coprocessor to prevent the transfer of operands and/or operation codes if it is not prepared to receive them.
6.1.2 Coprocessor Transfers

All coprocessor transfers occur between general-purpose registers and the coprocessor. The transfers occur as the result of the execution of load and store instructions for which the Coprocessor Enable (CE) bit has a value 1. For a store, the information transferred to the coprocessor is given either by the contents of two general-purpose registers, or by the contents of a general-purpose register and an 8-bit constant. For a load, information is transferred into a single general-purpose register in the Am29050 microprocessor.

The coprocessor model includes no provision for addressing. Although it is possible to extend the coprocessor interface to include addressing, addressing is more appropriately handled by normal external accesses defined for the processor (such as input/output).

The format of the instructions that transfer information to and from a coprocessor is shown in Figure 6-1.

**Figure 6-1 Coprocessor Load/Store Format**

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>TC</td>
<td>res</td>
<td>CE</td>
<td>SA</td>
<td>UA</td>
</tr>
</tbody>
</table>

For coprocessor stores, the RA and RB or I fields specify the source of data to be transferred to the coprocessor. The RA field specifies a general-purpose register whose contents are transferred to the coprocessor. The RB or I field specifies either a general-purpose register whose contents are transferred to the coprocessor, or a zero-extended constant that is transferred to the coprocessor. For the latter, the M bit of the operation code (bit 24) determines whether the register or the constant is used, as with most instructions. Note that as many as 64 bits of information may be transferred to the coprocessor by a single store instruction.

For coprocessor loads, the data transferred from the coprocessor is written to the general-purpose register given by RA; the RB or I field is unused in this case (however, the contents of the specified register, or the zero-extended constant, appears on the Address Bus). In contrast to the coprocessor store, a load transfers only 32 bits of information from the coprocessor.

Other bits in the coprocessor load and store instructions are defined as follows:

**Bit 22: Transfer Control (TC)**—This bit affects the behavior of the coprocessor for the transfer, depending on whether the transfer is for a load or store. The definition of this bit is by convention only, and is not enforced by the processor.

For transfers to the coprocessor (i.e., stores), a value of 1 for the TC bit causes a coprocessor operation to start. For transfers from the coprocessor (i.e., loads) a value of 1 for the TC bit causes the coprocessor to suppress exception reporting. In either case, a value of 0 for the TC bit has no special effect on the coprocessor.
Bit 21: **Set Coprocessor Active (SA)**—This bit is provided to signal the beginning and end of a coprocessor operation, so that the proper action may be taken by software if the operation is interrupted.

An SA bit of 1 affects the Coprocessor Active (CA) bit in the Current Processor Status. If the SA bit is 1 for a store, the CA bit is set. If the SA bit is 1 for a load, the CA bit is reset. If the SA bit is 0, there is no effect on the CA bit.

**Bit 20: reserved**

**Bit 19: User Access (UA)**—The UA bit allows programs executing in the Supervisor mode to emulate User-mode coprocessor transfers. This allows checking of the authorization of a transfer requested by a User-mode program. Note that this checking is performed externally, since the processor imposes no restriction on User-mode coprocessor transfers.

If the UA bit is 1, the coprocessor transfer is performed in the User mode, regardless of the value of the Supervisor Mode (SM) bit in the Current Processor Status. In this case, the User mode affects only the SUP/US output; it has no effect on the registers that can be accessed by the instruction. If the UA bit is 0, the program mode for the transfer is controlled by the SM bit.

**Bits 18–16: Option (OPT)**—The OPT field is placed on the OPT(2–0) outputs during the coprocessor transfer. There is a one-to-one correspondence between the OPT field and the OPT(2–0) outputs; that is, the most-significant OPT bit is placed on OPT2, and so on.

The OPT bits define the quantities being transferred to or from the coprocessor. For example, they can specify whether operands or operation codes are being transferred. The interpretation of the OPT field depends on the definition of a given coprocessor.

The transfer of data to or from the coprocessor may be caused by any load or store instruction defined for the processor; the operation of coprocessor transfers is very similar to the operation of external accesses.

Coprocessor transfers are overlapped with the execution of instructions that sequentially follow the coprocessor load or store instruction. However, only one load or store may be in progress in any given cycle, whether or not the load or store is directed to a coprocessor. The pipeline interlocks that apply to external accesses also apply to coprocessor transfers, except that coprocessor-transfer interlocks are determined by the time taken by the coprocessor to perform an operation, rather than the time taken to perform an access.

Note that coprocessor transfers may be performed by Load Multiple and Store Multiple instructions. However, register RB has no defined interpretation for a Store Multiple to the coprocessor. For this reason, Store Multiple is defined to transfer multiple, 32-bit quantities to the coprocessor. Similarly, a Load Multiple transfers multiple, 32-bit quantities from the coprocessor. Note, however, that the incrementing address sequence defined for Load Multiple and Store Multiple still appears on the Address Bus for coprocessor transfers.

### 6.1.3 Coprocessor Exceptions

A Coprocessor Exception trap occurs if the coprocessor reports an exception (using the DERR signal) during a coprocessor transfer. The Coprocessor Exception may occur either for a coprocessor load or store.
In the case of a load that reads a coprocessor result, the Coprocessor Exception can be used to indicate that the result is incorrect because of some exceptional condition. In some cases, the Am29050 microprocessor might be able to correct the results of the operation.

In the case of a store to the coprocessor, the Coprocessor Exception can be used to indicate that the coprocessor cannot accept the transfer because of some exceptional condition. For example, it may indicate an error in a stream of calculations, where intermediate results are not being read. As with a load, the Am29050 microprocessor may be able to correct the exceptional condition.

As noted above, the trap handler that executes as the result of the Coprocessor Exception trap may attempt to correct the exceptional condition. In many cases, the trap handler must be able to read the intermediate results of the operation from the coprocessor, along with other information about the operation. When this information is read, it may be necessary to suppress further exception reporting, so that the trap handler does not create additional Coprocessor Exception traps. For this reason, the TC bit in the coprocessor load or store instruction allows the processor to read coprocessor results while suppressing exception reporting.

Additionally, the TC bit allows a program to read the result of a coprocessor operation regardless of any errors that may have occurred. This provides an optional trapping capability analogous to that provided for certain Am29050 microprocessor arithmetic operations (e.g., Am29050 microprocessor instructions allow an optional trap on arithmetic overflow).

6.1.4 Coprocessor as a System Option

When the coprocessor is a system option, coprocessor operations are performed by the processor when the coprocessor is not present.

The coprocessor may be designed as a system option by use of the Coprocessor Present (CP) bit of the Configuration Register. The CP bit is set during system initialization, based on the presence (CP = 1) or absence (CP = 0) of the coprocessor. If the CP bit is 0 when the processor attempts to execute a coprocessor load or store instruction, a Coprocessor Not Present trap occurs.

When a Coprocessor Not Present trap is taken, the Channel Address, Channel Data, and Channel Control registers contain information related to the coprocessor transfer. This information may be used by the trap handler to emulate the operation of the coprocessor.

6.1.5 Interrupted Coprocessor Operations

The Coprocessor Active (CA) bit of the Current Processor Status Register may be used to indicate the duration of a coprocessor operation. The value 1 in the CA bit indicates that the coprocessor has begun an operation that has not completed (i.e., the final results have not been read).

The CA bit is affected by the Set Coprocessor Active (SA) bit in the coprocessor load and store instructions. If the SA bit is 1 for a store, the CA bit is set; if the SA bit is 1 for a load, the CA bit is reset. The routine that accesses the coprocessor is responsible for setting and resetting the CA bit appropriately.

If an interrupt or trap is taken during a coprocessor operation, and the CA bit has been properly managed, the CA bit of the Old Processor Status signals to an interrupt or trap handler that the interrupted routine had begun a coprocessor operation, but had not completed the operation before the interrupt or trap was taken. In this case,
the coprocessor contains state information that must be preserved. This information may be saved and restored across the interrupt or trap, or, alternatively, kept in the coprocessor.

Upon an interrupt or trap, the state information contained in the coprocessor depends on both the operation being performed and the definition of the coprocessor. The methods used to determine what state information must be saved, and the methods used to transfer this information, are also dependent on the definition of the coprocessor.

Due to interrupt-latency considerations, it may be desirable to leave state information in the coprocessor upon interrupt, rather than require that it always be saved. A problem arises, however, when a routine other than the one that was originally interrupted attempts to use the coprocessor. The coprocessor may be protected from such use by resetting the CP bit in the Configuration Register. If another routine attempts to use the coprocessor in this case, a Coprocessor Not Present trap occurs. The trap handler for this trap may either save the coprocessor state and make the coprocessor available to the trapping routine, or return control to the routine that was originally using the coprocessor.

Certain coprocessor operations may not be interruptible. For these operations, interrupts may be disabled by the Disable Interrupts (DI) and/or Disable All Interrupts and Traps (DA) bits in the Current Processor Status Register. However, this disabling can be performed only by a program in the Supervisor mode. Any User-mode programs that perform non-interruptible coprocessor operations incur the overhead of a call to a Supervisor-mode program.

### 6.2 COPROCESSOR ATTACHMENT

Communication with the coprocessor occurs via the Am29050 microprocessor channel. Figure 6-2 illustrates a typical coprocessor connection. For transfers to the coprocessor, 64 bits of data are transferred in a single cycle, using the Address Bus and Data Bus simultaneously. For transfers from the coprocessor, 32 bits of data are transferred in a cycle, using the Data Bus.

The width of transfers to the coprocessor is greater than the width of transfers from the coprocessor because the Am29050 microprocessor is optimized for computations performed on two word-length operands, with a single word-length result. The operand/result data flow of the processor is reflected in the interface to the coprocessor.

The protocol for coprocessor transfers is nearly identical to the protocol for other external accesses on the channel. Minor differences result from the fact that there are no addresses for coprocessor transfers, and from the fact that the coprocessor is operation-oriented, rather than access-oriented.

#### 6.2.1 Signal Description

Coprocessor transfers are indicated on the channel by the DREQT1 output being High during a request. The DREQT0 output also affects the transfer, based on the R/W signal, as follows:

<table>
<thead>
<tr>
<th>R/W</th>
<th>DREQT1</th>
<th>DREQT0</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Transfer to coprocessor</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Transfer to coprocessor, start operation</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>Transfer from coprocessor</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>Transfer from coprocessor, suppress errors</td>
</tr>
</tbody>
</table>

"The coprocessor contains state information that must be preserved. This information may be saved and restored across the interrupt or trap, or, alternatively, kept in the coprocessor.

Upon an interrupt or trap, the state information contained in the coprocessor depends on both the operation being performed and the definition of the coprocessor. The methods used to determine what state information must be saved, and the methods used to transfer this information, are also dependent on the definition of the coprocessor.

Due to interrupt-latency considerations, it may be desirable to leave state information in the coprocessor upon interrupt, rather than require that it always be saved. A problem arises, however, when a routine other than the one that was originally interrupted attempts to use the coprocessor. The coprocessor may be protected from such use by resetting the CP bit in the Configuration Register. If another routine attempts to use the coprocessor in this case, a Coprocessor Not Present trap occurs. The trap handler for this trap may either save the coprocessor state and make the coprocessor available to the trapping routine, or return control to the routine that was originally using the coprocessor.

Certain coprocessor operations may not be interruptible. For these operations, interrupts may be disabled by the Disable Interrupts (DI) and/or Disable All Interrupts and Traps (DA) bits in the Current Processor Status Register. However, this disabling can be performed only by a program in the Supervisor mode. Any User-mode programs that perform non-interruptible coprocessor operations incur the overhead of a call to a Supervisor-mode program.

### 6.2 COPROCESSOR ATTACHMENT

Communication with the coprocessor occurs via the Am29050 microprocessor channel. Figure 6-2 illustrates a typical coprocessor connection. For transfers to the coprocessor, 64 bits of data are transferred in a single cycle, using the Address Bus and Data Bus simultaneously. For transfers from the coprocessor, 32 bits of data are transferred in a cycle, using the Data Bus.

The width of transfers to the coprocessor is greater than the width of transfers from the coprocessor because the Am29050 microprocessor is optimized for computations performed on two word-length operands, with a single word-length result. The operand/result data flow of the processor is reflected in the interface to the coprocessor.

The protocol for coprocessor transfers is nearly identical to the protocol for other external accesses on the channel. Minor differences result from the fact that there are no addresses for coprocessor transfers, and from the fact that the coprocessor is operation-oriented, rather than access-oriented.

#### 6.2.1 Signal Description

Coprocessor transfers are indicated on the channel by the DREQT1 output being High during a request. The DREQT0 output also affects the transfer, based on the R/W signal, as follows:

<table>
<thead>
<tr>
<th>R/W</th>
<th>DREQT1</th>
<th>DREQT0</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>Transfer to coprocessor</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Transfer to coprocessor, start operation</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>Transfer from coprocessor</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>Transfer from coprocessor, suppress errors</td>
</tr>
</tbody>
</table>
Note that the interpretation of DREQT0 during a coprocessor transfer is by convention only.

The only signal unique to coprocessor transfers is the CDA input. The coprocessor de-asserts this signal whenever it can accept no transfers from the processor (normally, this is because it is performing an operation).
6.2.2 Coprocessor Communication

The completion of a transfer to the coprocessor is indicated when the coprocessor asserts CDA. The input DRDY is not used in this case. The performance of transfers to the coprocessor is enhanced by the use of CDA, since it eliminates the need for the coprocessor to decode a transfer request and respond with DRDY and thereby eliminates the logic delay involved. Note that the coprocessor normally de-asserts CDA when it starts an operation, so that CDA can be independent of transfer requests.

The Address Bus is used to transfer information to the coprocessor. Therefore, the addressing function of other devices and memories on the channel must be disabled during coprocessor transfers. Since DREQT1 is High for all coprocessor transfers, it should be used to inhibit the address-decoding function of channel devices and memories, as well as to indicate to the coprocessor that a transfer is occurring.

The OPT(2–0) outputs are used during coprocessor transfers to indicate the type of transfer, or to provide other controls for the coprocessor. The interpretation of the OPT(2–0) signals depends on the implementation of the coprocessor, and may also depend on the R/W signal.

6.2.2.1 COPROCESSOR TRANSFER PROTOCOLS

The protocols available for coprocessor transfers are based on the protocols for simple, pipelined, and burst-mode data accesses discussed in Section 5.2.6. The protocols for write accesses are used for transfers to the coprocessor, and the protocols for read accesses are used for transfers from the coprocessor.

The protocol for coprocessor transfers differs in several respects from the protocol for external data accesses:

1. The CDA signal consistently replaces the DRDY for transfers to the coprocessor. An active level on CDA, for transfers to the coprocessor, has an effect that is equivalent to the effect of an active level on DRDY for normal store-operations. Note that DRDY is still used for transfers from the coprocessor.

2. The Address Bus does not contain an address during a coprocessor transfer, but may contain data in the case of a transfer from the coprocessor. However, the Address Bus is still sequenced as described in Section 5.2, and the sequencing is determined by the same controls—except that CDA replaces DRDY for transfers to the coprocessor. The contents of the Address Bus are determined by the coprocessor load instruction, as for other load instructions.

3. For any coprocessor transfer, an active level on DERR causes a Coprocessor Exception trap, rather than a Data Access Exception trap.

4. For burst-mode coprocessor transfers, the interpretation of sequential addressing is undefined. For this reason, burst-mode transfers are normally restricted to 32 bits of information for every transfer, regardless of whether the transfer is to or from the coprocessor. Note, however, that the incrementing address sequence is still present in the definition of a burst-mode coprocessor transfer, and may be useful in some cases.

6.2.2.2 SEQUENCING OF CDA

The coprocessor de-asserts CDA whenever it cannot accept a transfer from the Am29050 microprocessor. An inactive level on CDA prevents the Am29050...
microprocessor from transferring operands or operation codes to the coprocessor when these transfers might interfere with coprocessor operation.

Normally, the coprocessor de-asserts \( \overline{CDA} \) when it begins an operation. \( \overline{CDA} \) remains inactive until the coprocessor has completed the operation and can accept further transfers from the processor. For some operations, a result may have to be read before the coprocessor can assert \( \overline{CDA} \).

Independent of the presence of the coprocessor, a pull-down resistor in the range of 33K to 68K ohms on \( \overline{CDA} \) is necessary for standard coprocessor detector to function properly.

The coprocessor can acknowledge a transfer by asserting \( \overline{CDA} \). However, it is generally more efficient for the coprocessor to hold \( \overline{CDA} \) active as long as it can accept transfers. In the latter case, multiple data transfers can occur at a high rate, without involving long logic delays. \( \overline{CDA} \) is related to the operation of the coprocessor in this case, rather than to the transfer of data.

**EXCEPTION REPORTING**

The coprocessor reports exceptions by the activation of \( \overline{DERR} \) during any coprocessor transfer. This causes a Coprocessor Exception trap to occur. However, if the \( \overline{DREQT}(1-0) \) signals have the value 11 for a transfer from the coprocessor, exception reporting should be suppressed, and \( \overline{DERR} \) should not be asserted. Note, however, that the Am29050 microprocessor does not enforce the suppression of exception reporting.
This chapter discusses programming topics as they relate to the Am29050 microprocessor. It focuses on the use of processor resources that were more formally described in Chapter 3. The presentation in this chapter is intended to be used as a guide in the implementation of software systems for the processor, not necessarily as a strict definition of how these systems should be implemented.

This chapter is organized into four sections. The first section describes the run-time storage organization recommended for the Am29050 microprocessor and the use of the local registers to improve the performance of procedure calls. The two subsequent sections discuss applications and systems programming for the processor. The final section discusses certain features of the Am29050 microprocessor pipeline that are exposed to—and must be properly handled by—software which executes on the processor.

7.1 RUN-TIME STORAGE ORGANIZATION AND CALLING CONVENTION

Programming languages that use recursive procedures, such as C and Pascal, generally use a stack to store data objects that are dynamically allocated at run-time. The organization of the run-time storage, including the run-time stack, determines how data objects are stored and how procedures are called at the machine level. The Am29050 microprocessor is designed to minimize the overhead of calling a procedure, and allows efficient passing of parameters to a procedure and returning of results from a procedure. This section describes the Am29050 microprocessor run-time storage organization and procedure-calling conventions.

7.1.1 Run-Time Stack Organization and Use

A run-time stack consists of consecutive overlapping structures called activation records. An activation record contains dynamically allocated information specific to a particular activation (or call) of a procedure (such as local data objects). Because of recursion, multiple copies of a procedure may be active at any given time. Each active procedure has its own unique activation record, allocated somewhere on the run-time stack. The local variables required by a particular procedure activation are contained in the activation record associated with that activation. Thus, the local variables for different activations do not interfere with one another. A compiler generates the instructions to create and manage the run-time stack, and compiler-generated instructions are based on its existence.

As an example, Figure 7-1 shows three activation records on a run-time stack. This stack configuration was generated by procedure A calling procedure B, which in turn called procedure C. The fact that procedure C is the currently active procedure is reflected by its activation record being on the top of the run-time stack. The Stack Pointer points to the top of procedure C's activation record.
In Figure 7-1, the storage areas labeled Out args and In args are the outgoing arguments area (for the caller) or the incoming arguments area (for the callee). These are shared between the caller procedure and the callee for the communication of parameters and results. The areas labeled locals contain storage for local variables, temporary variables (for example, for expression evaluation) and any other items required for the proper execution of the procedure.

### 7.1.1.1 MANAGEMENT OF THE RUN-TIME STACK

A run-time stack starts at a high address in memory and grows toward lower memory addresses as procedures are called. The bottom of the stack is the location, with a high address, at which the stack starts; the top of the stack is the location, with a lower address, at which the most recent activation record has been allocated.

When a procedure is called, a new activation record may need to be allocated on the run-time stack. An activation record is allocated by subtracting from the stack pointer the number of locations needed by the new activation record. The stack pointer is decremented so that variables referenced during procedure execution are referenced in terms of positive offsets from the stack pointer.

When storage for an activation record is allocated, the number of storage locations allocated is the sum of the number of locations needed for:

1. Local variables;
2. Restarting the caller, such as locations for return addresses; and
3. Arguments of procedures that may be called in turn by the called procedure (the outgoing arguments area).

Note that, in some cases, no storage is required for one or more of the above items. Also, the incoming arguments area, though it is part of the activation record of the callee, is not allocated storage at this time, because this storage was allocated as the outgoing arguments area of the calling procedure.

An activation record is de-allocated, just prior to returning to the caller, by adding to the stack pointer the value that was subtracted during allocation.
The Am29050 microprocessor run-time storage actually is implemented as two stacks: the Register Stack and the Memory Stack. Storage is allocated and de-allocated on these stacks at the same time. The Register Stack stores activation records associated with all active procedures (except leaf routines, as described later). The Memory Stack stores activation-record information that does not fit into the Register Stack or that must be kept in memory for other reasons (e.g., because of pointer de-references). Both the Register Stack and the Memory Stack are stored in the external data memory. However, a portion of the Register Stack is kept in the Am29050 microprocessor local registers for performance. The term stack cache in this section refers to the use of the local registers to contain a portion of the Register Stack.

7.1.1.2

THE REGISTER STACK

The Register Stack contains activation records for active procedures (Figure 7-2). An activation record in the Register Stack stores the following information.

- Input arguments to the called procedure. This portion of the activation record is shared between a caller and the callee. It is allocated by the caller as part of the caller's activation record.

- The caller's frame pointer. This is the address of the lowest-addressed byte above the highest-address word of the caller's activation record, and is used to manage the Register Stack. This portion of the activation record is shared between a caller and the callee. It is allocated by the caller as part of the caller's activation record.

- The caller's return address. This is used to resume the execution of the caller after the called procedure terminates. This is also part of the caller's activation record.

- The memory frame pointer. This is the address of the top of the caller's Memory Stack (see below). This address is stored by the callee (if required), and used to restore the memory stack upon return.

Figure 7-2  An Activation Record in the Register Stack

<table>
<thead>
<tr>
<th>Incoming Arguments</th>
<th>Frame Pointer</th>
<th>Return Address</th>
<th>Memory Frame Pointer</th>
</tr>
</thead>
<tbody>
<tr>
<td>Caller's Activation Record</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Local Variables of Call</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Outgoing Arguments</td>
<td>Frame Pointer</td>
<td>Return Address</td>
<td></td>
</tr>
<tr>
<td></td>
<td>LR1 (Caller)</td>
<td>LR0 (Caller)</td>
<td></td>
</tr>
<tr>
<td>Caller's Stack Pointer</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Before and After Call</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| LR1 (Caller) | | During Call |
| LR0 (Caller) | | |
| Callee's Stack Pointer |

PROGRAMMING 7-3
• The local variables of the called procedure, if any.
• Outgoing parameters of the called procedure, if any.
• The frame pointer of the called procedure, if the procedure calls another procedure.
• The return address for the called procedure, if the procedure calls another procedure. This location is allocated in the Register Stack, and used when the called procedure calls another procedure.

7.1.1.3 AM29050 MEMORY LOCAL REGISTERS AS A STACK CACHE

The AM29050 microprocessor was designed for efficient implementation of the Register Stack. Specifically, the AM29050 microprocessor can use the large number of relatively addressed local registers to cache portions of the Register Stack, yielding a significant gain in performance. Allocation and de-allocation of activation records occurs largely within the confines of the high-speed local registers, and most procedure calls occur without external references. Furthermore, during procedure execution most data accesses occur without external references, because activation-record data are referenced most frequently. The principle of locality of reference—which allows any cache to be effective—also applies to the stack cache. The entries in the stack cache are likely to remain there for re-use, because the size of the Register Stack does not change very much over long intervals of program execution. Activation records are typically small, so the 128 locations in the local register file can hold many activation records.

Allocating Register-Stack activation records in the local registers is facilitated by the Stack Pointer in Global Register 1. During the execution of a procedure, the Stack Pointer points simultaneously to the top of the Register Stack in memory and to the local register at the top of the stack cache. In other words, Global Register 1, a word-length register, contains the 32-bit address of the top of the Register Stack, while bits 8–2 of Global Register 1 (with a 1 appended to the most-significant bit) indicate the absolute register number of Local Register 0. Allocation and de-allocation of the Register Stack is accomplished by subtracting from or adding to, respectively, the value of the Stack Pointer.

Using this register-addressing scheme, locations from the Register Stack are automatically mapped into the local register file. Figure 7-3 shows the relationship between the Register Stack and the stack cache in the local registers. As shown, pointers are required to define the boundaries between the Register Stack and the stack cache.

• The register free bound (rfb, gr127) pointer defines the boundary between the portion of the Register Stack that is cached in the local registers and the portion that is stored in the external data memory. The rfb pointer contains the address of the first word in the Register Stack that is not contained in the local registers, but which is in memory.

• The frame pointer (fp, lr1) contains the memory address of the lowest-addressed word not in the current activation record. The current activation record is not necessarily in the data memory: the fp is used to determine whether or not an activation record is contained in the local registers when a procedure returns from a call, as described later.

• The register stack pointer (rsp, gr1) points to the top of the Register Stack either in the local registers or the data memory; the rsp is contained in the local-register Stack Pointer (Global Register 1). The top of the Register Stack may or may not be contained in the data memory—the rsp simply defines the location of the top of the Register Stack.
Figure 7-3  Relationship of Stack Cache and Register Stack

- The register allocate bound \((rab, gr126)\) pointer defines the lowest-addressed stack location that can be cached within the local registers. This defines the limit to which local registers can be allocated in the Register Stack.

Several activation records may exist in the Register Stack at any given time, but only one stack location may be mapped to a local register at a given time. When the Register Stack grows beyond the 128-word capacity of the local registers, some movement of data between the stack cache and the Register Stack in data memory must occur.

*Stack overflow* occurs when a procedure is called, but the activation record of the callee requires more registers than can be allocated in the stack cache (this is detected by comparing \(rsp\) with \(rab\)); Figure 7-4 illustrates stack overflow. In this case, the contents of a number of registers must be moved to data memory. The number of registers involved must be sufficient to allow the entire activation record of the callee to reside in the local registers. A block of the registers is copied, or *spilled* into an area of external data memory, freeing space in the local register file for the most recent procedure call.

*Stack underflow* occurs when a procedure returns to the caller, but the entire activation record of the caller is not resident in the stack cache (this is detected by comparing \(fp\) with \(rnb\)); Figure 7-5 illustrates stack underflow. In this case, the non-resident portion of the caller's stack must be moved from data memory to the local registers. Underflow occurs because overflow occurred at some previous point during program execution, causing part of the Register Stack to be moved to data memory.
Figure 7-4  Stack Overflow

Figure 7-5  Stack Underflow
The processor performs no hardware management of the stack cache, and cannot detect a reference to a quantity that is not in the stack cache. Consequently, software must keep the size of an activation record less than or equal to the size of the local register file (128 words). Any additional storage requirements are satisfied by the Memory Stack.

7.1.1.4 THE MEMORY STACK

In general, the Memory Stack is used to augment the Register Stack, holding additional information associated with activation records. For example, the Memory Stack holds large data structures than cannot fit into the Register Stack. Similar to the Register Stack, the Memory Stack contains a series of (possibly overlapping) activation records, each corresponding to a procedure activation. However, a Memory Stack activation record need not exist for a procedure that does not need a Memory Stack Area. The Memory Stack contains the following information:

- Overflow incoming arguments. These are incoming arguments that do not fit in the allowed incoming arguments area of the Register Stack activation record.
- Spilled incoming arguments. These are incoming arguments that cannot be kept in the Register Stack. For example, if the address of an argument is used in a called procedure, the associated value must be in the Memory Stack.
- Any procedure-local variable not allocated to a register.
- Local block space. This storage is allocated dynamically on the Memory Stack. It is used to implement functions such as the `alloca()` function in the C programming language.
- Overflow outgoing arguments. These are outgoing arguments that do not fit in the allowed outgoing arguments area of the Register Stack activation record.

In contrast to the Register Stack, the Memory Stack is not cached and has no fixed size limit. The top of the Memory Stack is defined by the memory stack pointer (`msp`), which is stored in Global Register 125 by convention.

7.1.2 Procedure Linkage Conventions

The procedure linkage conventions define the standard sequences of instructions used to call and return from procedures. These instruction sequences perform the following operations (other, more-general operations may also be required, as described later):

- Put procedure arguments to the outgoing arguments area in the activation record. This may or may not involve copying the arguments; copying is not necessary if the arguments are placed into the appropriate registers as the result of computation.
- Branch to the procedure using a call instruction, which also places the return address in a register.
- Allocate a frame on the Register Stack. A frame is the storage that contains the procedure's activation record.
- If overflow occurs during frame allocation, spill the least-recently used locations of the Register Stack. The number of spilled locations must be sufficient to allow the new frame to reside entirely within the local registers.
- Determine the frame-pointer value of the called procedure, if this procedure may call another procedure.
- Execute the procedure.
7.1.2.1

ARGUMENT PASSING

The linkage convention allows up to 16 words of arguments to be passed from the caller to the callee in local registers. These arguments are passed in Local Register 2 through Local Register 17 of the caller (note that the local-register numbers are different for the caller and the callee, because of Stack-Pointer addressing).

When more than 16 words are required to pass arguments, the additional words are passed on the Memory Stack. In this case, the memory stack pointer (in Global Register 125) points to the 17th word of the arguments, and the remaining argument words have higher memory addresses. Multi-word arguments may be split across the Register Stack and the Memory Stack. For example, if a multi-word argument starts on the 16th word of the outgoing arguments, the first word of the argument is passed in the Register Stack, and the remainder of the argument is passed in the Memory Stack.

All arguments occupy at least one word; arguments which are a byte or half-word in length (for example, a character) are padded to 32 bits and passed as a full word. However, an array or structure composed of multiple byte or half-word components is passed as a single, packed array or structure of bytes or half-words rather than an array or structure of padded bytes or half-words.

No argument is aligned to other than a word address boundary, including multi-word arguments. Some multi-word arguments are referenced as a single object (for example, double-precision floating-point values). Note that it may be necessary to copy such arguments to an aligned memory or register area before use.

7.1.2.2

PROCEDURE PROLOGUE

When a procedure is called, and the procedure may call another procedure, the callee must allocate a frame for itself on the Register Stack (this is not required for leaf procedures that do not call other procedures, as described later). A frame is allocated by decrementing the register stack pointer to accommodate the size of the required activation record. The procedure prologue is the instruction sequence that allocates the callee's Register Stack frame.

To allocate the stack frame, the prologue routine decrements the register stack pointer by the amount rszize (see Figure 7-6). The value of rszize must be an even number given by the following formula:

\[ rszize \geq \text{(size of local variable area)} + \text{(size of outgoing arguments area)} + 2 \]

The value 2 in this formula accounts for the space required by the return address (in Local Register 0) and the frame pointer (in Local Register 1). The size of the local variable area includes the space for the memory frame pointer, if required. If the formula total is an odd value, the total must be adjusted (by adding 1) so that the resulting rszize value is even. This aligns the top of the Register Stack on a double-word boundary. The reason for this alignment is that double-precision floating-point values must be aligned to registers with even absolute-register numbers. Alignment of

- Place return values into the appropriate registers.
- De-allocate the activation-record frame.
- Fill locations of the local registers from the Register Stack in external memory, if underflow occurs.
- Branch to the procedure's return address.

This section describes the routines that implement the Am29050 microprocessor procedure linkage conventions. The operations described here are not required on every procedure call. In some cases, operations can be omitted or simpler routines used; these cases and the accompanying simplifications are also described here.
double-precision values is accomplished by placing these values into even-numbered local registers and making rsize even (it is also assumed that the register stack pointer is initialized on an even-word boundary).

Note that rsize is not the size of the entire activation record of the callee, because the callee's activation record includes storage that was allocated as part of the caller's activation record frame (e.g., the caller's outgoing arguments area, which is the callee's incoming arguments area). The size of the callee's entire activation record is denoted size, and is given by the following formula:

\[ \text{size} = \text{rsize} + (\text{size of the incoming arguments area}) + 2 \]

In the prologue routine, the following instruction is used to allocate the stack frame (rsp = gr1):

```
prologue:
    sub    rsp, rsp, rsize*4
```

However, this instruction does not account for the fact that there may not be enough room in the local registers to contain the activation record. There must be additional instructions to detect stack overflow and to cause spilling if overflow occurs. This is accomplished by comparing the new value of the register stack pointer with the value of the register allocate bound and invoking a trap handler (with vector number V_SPILL) if overflow is detected.

Furthermore, if the procedure calls another procedure, the prologue must compute a frame pointer. The frame pointer will be used by procedures called in turn by the callee to insure that the callee's activation record is in the local registers upon return (i.e., that it has not been spilled onto the Register Stack in data memory). The frame pointer is computed in the prologue because it need only be computed once, regardless of how many procedures are called by given procedure.
The complete procedure prologue is then (fp = lr1):

prologue:

sub rsp, rsp, rsize*4 ; allocate frame
asgeu V_SPILL, rsp, rab ; call spill handler if needed
add fp, rsp, size*4 ; compute frame pointer

7.1.2.3

SPILL HANDLER

If overflow occurs, the assert instruction in the prologue fails, causing a trap. The trap handler invokes a User-mode routine in the trapping process to spill Register Stack locations from the local registers to external memory. Having most of the spill handling in a User-mode routine minimizes the amount of time that interrupts are disabled, and insures that spilling is performed using the correct virtual-memory configuration.

The spill handler uses two registers. The first register, Global Register 121, normally contains a trap-handler argument (tav), but is used by the spill handler as a temporary register. The second register, Global Register 122, stores a trap handler return address (tpc). This register is used by the User-mode spill handler to return to the trapping procedure. It is assumed that the address of the User-mode spill handler is contained in a global register, denoted user_spill_reg in the following instruction sequence.

The complete spill handler is:

Spill:

user_spill:

mfsr tpc, PC1 ; operating-system routine
mtsr PC1, user_spill_reg ; save return address
add tav, user_spill_reg, 4 ; branch to User spill via interrupt return
mtsr PC0, tav
iret

7.1.2.4

RETURN VALUES

If the called procedure returns one or more results, the first 16 words of the result(s) are returned in Global Register 96 through Global Register 111, starting with Global Register 96.

If more than 16 words are required for the results, the additional words are returned in memory locations allocated by the caller. In this case, a large return pointer (lrp) provided by the caller in Global Register 123 at the time of the call points to the 17th word of the results, and subsequent words are stored at higher memory addresses.

7.1.2.5

PROCEDURE EPILOGUE

The procedure epilogue de-allocates the stack frame that was allocated by the procedure prologue, and returns to the calling procedure. Stack de-allocation is accomplished by adding the rsize value back to the register stack pointer, after which the de-allocated registers are no longer used and are considered invalid. The epilogue
also detects stack underflow and causes register filling if underflow occurs. This is accomplished by comparing the value of the caller's frame pointer with the register free bound and invoking a trap handler (with vector number V_FILL) if underflow is detected. Finally, the epilogue returns to the caller using the caller's return address.

The complete procedure epilogue is:

epilogue:
  add    rsp, rsp, rsize*4 ; add back rsize count
  nop    ; cannot reference a local register here
  asleu  V_FILL, fp, rfb ; call fill handler if needed
  jmpi   Ir0 ; jump to return address

7.1.2.6 FILL HANDLERS

If underflow occurs, the assert instruction in the epilogue fails, causing a trap. The trap handler invokes a User-mode routine in the trapping process to fill Register Stack locations from the external memory to local registers. The fill handler is similar in organization to the spill handler discussed above.

The complete fill handler is:

Fill:
  mfsr   tpc, PC1 ; operating-system routine
  mtsr   PC1, user_fill_reg ; save return address
  add    tav, user_fill_reg, 4 ; branch to User fill via interrupt return
  mtsr   PC0, tav
  iret

user_fill:
  const  tav, 0x80 << 2 ; User-mode fill handler
  or     tav, tav, rfb ; put starting register number into Indirect Point A
  mtsr   IPA, tav
  add    tav, Ir1, rfb ; compute number of bytes to fill
  sub    tav, rab, tav ; adjust the allocate bound
  srl    tav, tav, 2 ; change byte count to word count
  sub    tav, tav, 1 ; make count zero-based
  mtsr   CR, tav ; set Count Remaining register
  loadm  0, 0, gr0, rfb ; fill
  jmpi   tpc ; return to trapping procedure
  add    rfb, Ir1, 0 ; adjust the free bound

7.1.2.7 THE REGISTER STACK LEAF FRAME

A leaf procedure is one that does not call any other procedure. The incoming arguments of a leaf procedure are already allocated in the calling procedure's activation-record frame, and the leaf routine is not required to allocate locations for any outgoing arguments, frame pointer or return address (since it performs no call). Hence, a leaf procedure need not allocate a stack frame in the local registers, and can avoid the overhead of the procedure prologue and epilogue routines. Instead, a leaf routine can use a set of global registers for local variables; Global Register 96 through Global Register 124 are reserved for this purpose (among other purposes). If there is an insufficient number of global registers, the leaf procedure may allocate a frame on the Register Stack.

7.1.2.8 LOCAL VARIABLES AND MEMORY-STACK FRAMES

A called procedure can store its local variables and temporaries in space allocated in the Register Stack frame by the procedure prologue. The values are referenced as an offset from the rsp base address, using the Stack-Pointer addressing of the Am29050
microprocessor local registers. No object in a register is aligned on anything smaller than a register boundary, and all objects take at least one register.

Because there are 128 local registers, the total Register Stack activation-record size may not be greater than 128 words. If the callee needs more space for local variables and temporaries, it must allocate a frame on the Memory Stack to hold these objects. To allocate a Memory-Stack frame, the procedure prologue decrements the memory stack pointer (msp, in gr125). The procedure epilogue de-allocates the Memory-Stack frame by incrementing the msp.

A procedure that extends the Memory Stack dynamically (e.g., using alloca()) must make a copy of the msp at procedure entry, before allocating the Memory-Stack frame. The msp is stored in the memory frame pointer (mfp) entry of the activation record in the Register Stack. The procedure then can change the msp during execution, according to the needs of dynamic allocation. On procedure return, the Memory-Stack frame is de-allocated using the mfp to restore the msp. A procedure that does not extend the Memory Stack dynamically need not have an mfp entry in its activation record.

The following prologue and epilogue routines are used if there is no dynamic allocation of the Memory Stack during procedure execution, but a Memory Stack frame is otherwise required:

prologue:
sub     rsp, rsp, <size>*4 ; allocate register frame
asgeu   V_SPILL, rsp, rab ; call spill handler if needed
add     fp, rsp, <size>*4 ; compute register frame pointer
sub     msp, msp, <msize> ; allocate memory frame
        msize = size of memory frame in words

epilogue:
add     rsp, rsp, <size>*4 ; de-allocate register frame
add     msp, msp, <msize> ; de-allocate memory frame
jmpi    lr0 ; return
asleu   V_FILL, fp, rfb ; call fill handler if needed

The following prologue and epilogue routines are used if there is dynamic allocation of the Memory Stack during procedure execution:

prologue:
sub     rsp, rsp, <size>*4 ; allocate register frame
asgeu   V_SPILL, rsp, rab ; call spill handler if needed
add     fp, rsp, <size>*4 ; compute register frame pointer
add     lr(<size>-1), msp, 0 ; save memory frame pointer
        lr(size-1) is last reg in new frame
sub     msp, msp, <msize> ; allocate memory frame,
        msize = size of memory frame in words

epilogue:
add     msp, lr(<size>-1),0 ; restore memory stack pointer
add     rsp, rsp, <size>*4 ; de-allocate memory frame
        de-allocate register frame
nop
jmpi    lr0 ; cannot reference a local register here
        return
asleu   V_FILL, fp, rfb ; call fill handler if needed

7.1.2.9  STATIC LINK POINTER

Some programming languages (notably Pascal) permit nested procedure declarations, introducing the possibility that a procedure may reference variables and arguments which are defined and managed by another procedure. This other procedure is a static parent of the callee. A static parent is determined by the
declarations of procedures in the program source, and is not necessarily the calling procedure; the calling procedure is the dynamic parent. Since procedures can be nested at a number of levels, a given procedure may have a number of hierarchically organized static parents.

A called procedure can locate its dynamic parent and the variables of the dynamic parent because of the return address and frame pointer in the Register Stack. However, these are not adequate to locate variables of the static parent which may be referenced in the procedure. If such references appear in a procedure, the procedure must be provided with a static link pointer (slp). In the Am29050 microprocessor run-time organization, the slp is stored in Global Register 124. Since there can be a hierarchy of static parents, the slp points to the slp of the immediate parent, which in turn points to the slp of its immediate parent, and so on. Note that the contents of Global Register 124 may be destroyed by a procedure call, so a procedure needing to reference the variables of a static parent may need to preserve the slp until these references are no longer necessary.

7.1.2.10 FLOATING-POINT ACCUMULATORS

A called procedure, if it needs to save and restore the floating-point accumulators, may save and restore the accumulators by treating them as double-precision even though they may contain single-precision values. Treating the floating-point accumulators as double-precision values is accomplished by saving the Floating-Point Environment Register, then forcing the Accumulator Format Field to 10 (double-precision). The accumulators and the Floating-Point Environment Register must be restored before returning to the calling procedure. Floating-point accumulators are not preserved across procedure calls.

7.1.2.11 TRANSPARENT PROCEDURES

A transparent procedure is one that requires very little overhead for managing run-time storage. Transparent procedures are used in the Am29050 microprocessor run-time organization primarily to implement compiler-specific support functions, such as integer divide.

A transparent routine does not allocate any activation-record frames. Parameters are passed to a transparent procedure using tav and the Indirect Pointer A, B, and C registers. The return address is stored in tpc. This convention allows a leaf procedure to call a transparent procedure without changing its status as a leaf procedure. There is a tight relationship between a compiler and the transparent procedures it calls. Some transparent procedures may need more temporary registers and the compiler must account for this.

7.1.3 Register Usage Convention

The Am29050 microprocessor run-time organization standardizes the uses of the local and global registers. This section summarizes register use and the nomenclature for register values:

- GR1: Register stack pointer (rsp).
- GR2–GR3: Condition Code Accumulator.
- GR4–GR63: Unimplemented.
- GR64–GR95: Reserved for operating-system use.
- GR96–GR111: Procedure return values. Lower-numbered registers are used before higher-numbered registers. If more than 16 words are needed, the additional
7.1.4

Example of a Complex Procedure Call

The following code sequence demonstrates a complex procedure call, illustrating how registers are used in the run-time organization:

caller:

\[
\begin{align*}
\text{add} & \quad \text{lrp}, \text{msp}, 32 \quad ; \text{pass lrp} \\
\text{add} & \quad \text{slp}, \text{msp}, 120 \quad ; \text{pass a static link} \\
\text{call} & \quad \text{lr0}, \text{callee} \\
\text{const} & \quad \text{lr2}, 1 \quad ; 1 \text{ as first argument}
\end{align*}
\]

(words are stored in memory (see GR123, large return pointer). These registers are also used for temporary values that are destroyed upon a procedure call.

- GR112–GR115: Reserved for programmer. These registers are not used by the compiler, except as directed by the programmer.
- GR116–GR120: Compiler temporaries.
- GR121: Trap handler argument/temporary (tav)—This register is used to communicate arguments to a software-invoked trap routine. It can be destroyed by the trap, but not by other traps and interrupts not explicitly generated by the program (for example, a Timer trap).
- GR122 Trap handler return address/temporary (tpc). This register is also used by software-invoked traps. It can be destroyed by the trap, but not by other traps and interrupts not explicitly generated by the program (for example, a Timer trap).
- GR123: Large return pointer/temporary (lrp).
- GR124: Static link pointer/temporary (slp).
- GR125: Memory stack pointer (msp).
- GR126: Register allocate bound (rab).
- GR127: Register free bound (rfb).
- LR0: Return address.
- LR1: Frame pointer.

In this convention, registers must be handled by software according to system requirements. The following practices are recommended:

- GR64–GR95 should be protected from User-mode access by the Register Bank Protect Register.
- The contents of GR96-GR124 should be assumed destroyed by a procedure call, unless the procedure is a transparent procedure.
- The contents of GR121 and GR122 should be assumed destroyed by any procedure call or any program-generated trap.
- The contents of GR125 are always preserved by a procedure call.
- The contents of GR126 and GR127 are managed by the spill and fill handlers and should not be modified except by these handlers.)
7.1.5

Trace-Back Tags

A trace-back tag is either one or two words of information included at the beginning of every procedure. This information permits a debug routine to determine the sequence of procedure calls and the values of program variables at a given point in execution. The trace-back tag describes the memory frame size and the number of local registers used by the associated procedure. A one-word tag is used if the memory frame size is less than 2K words; otherwise, the two-word tag is used. Regardless of tag length, the tag directly precedes the first instruction of the procedure. Figure 7-7 shows the format of the trace-back tags.

The first word of a trace-back tag starts with the invalid operation code 00 (hexadecimal). This unique, invalid instruction operation code allows the debugger to locate the beginning of the procedure in the absence of other information related to the beginning of the procedure, such as from a symbol table. This is particularly useful after a program crash, in which case the debug routine may have only an arbitrary instruction.

Figure 7-7 Trace-Back Tags

One-word tag:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 0 0 0 0 0 0 0</td>
<td>M</td>
<td>T</td>
<td>argcount</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Two-word tag:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>msize</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 0 0 0 1</td>
<td>M</td>
<td>T</td>
<td>argcount</td>
<td>Reserved</td>
</tr>
</tbody>
</table>
address within a procedure. The call sequence up to the current point in execution can be determined from the rsize and msize values in the trace-back tag. However, for procedures that perform dynamic stack allocation (e.g., using alloca()), the memory frame pointer must be used.

The tag word immediately preceding a procedure contains the following fields. Reserved fields must be zero.

<table>
<thead>
<tr>
<th>Bits</th>
<th>Item</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>31–24</td>
<td>opcode</td>
<td>Hexadecimal 00 (an invalid opcode)</td>
</tr>
<tr>
<td>23</td>
<td>tag type</td>
<td>0/one-word tag; 1/two-word tag</td>
</tr>
<tr>
<td>22</td>
<td>m</td>
<td>0/no mfp; 1/mfp used</td>
</tr>
<tr>
<td>21</td>
<td>t</td>
<td>0/normal; 1/transparent procedure</td>
</tr>
<tr>
<td>20–16</td>
<td>argcount</td>
<td>Number of arguments in registers (includes Ir0 and Ir1)</td>
</tr>
<tr>
<td>15–11</td>
<td>Reserved</td>
<td>Reserved, must be zero</td>
</tr>
<tr>
<td>10–3</td>
<td>msize</td>
<td>Memory frame size in doublewords (if bit 23 is 0)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>or reserved (if bit 23 is 1)</td>
</tr>
<tr>
<td>2–0</td>
<td>Reserved</td>
<td>Reserved, must be zero</td>
</tr>
</tbody>
</table>

If the procedure uses a Memory-Stack frame size 2K words or more, the msize field is contained in the second tag word immediately preceding the first tag word.

### 7.2 APPLICATIONS-PROGRAMMING CONSIDERATIONS

This section discusses topics of general concern in the implementation of applications programs.

#### 7.2.1 Addressing General-Purpose Registers Indirectly

Registers in the processor usually are addressed directly by fields within instructions. However, indirect addressing of registers may be required in some situations, such as when a program pointer is known to point to a variable that is resident in the register file.

Three special registers—Indirect Pointers A, B, and C—are provided so that separate indirect register numbers can be set for each of the source and destination operands within an instruction. Indirect Pointer C corresponds to the destination register RC, Indirect Pointer A corresponds to the RA operand register, and Indirect Pointer B corresponds to the RB operand register.

A given indirect pointer (the value in the corresponding register) is used to address the register file whenever Global Register 0 is specified as a source or destination register. For example, a value of 0 in the RA field of an instruction causes the content of the Indirect Pointer A Register to be used to access the RA operand.

The indirect pointers can be set by the four multiply instructions, the floating-point instructions, Move To Special Register instructions, and by the instructions EMULATE, DIVIDE, DIVIDU, and Set Indirect Pointers (SETIP). The Move To Special Register instructions set the indirect pointers individually as special-purpose registers. Of the remaining instructions, all but the EMULATE instruction set all three indirect pointers simultaneously, deriving the values that are written into the pointers from the instruction fields RC, RA, and RB. The EMULATE instruction sets all three indirect pointers, but only the Indirect Pointer A and Indirect Pointer B registers are written with meaningful values. They may be destroyed by DIVIDE, DIVIDU, MULTIPLY, MULTIPLU, MULTM, MULTMU, and the floating-point instructions.
When an indirect pointer is set by a Move To Special Register, bits 9–2 of the source operand are copied to corresponding bits in the indirect pointer. This allows the addressing of general-purpose registers, via the indirect pointers, to be consistent with the addressing of words in external memories and devices.

When the indirect pointers are set from instruction fields, the resulting values reflect the Stack-Pointer addition that is performed on local registers. In addition, register bank-protection checking is performed on the values that are loaded. A Protection Violation trap occurs if the values represent registers that cannot be accessed. The indirect pointers may thus be used to access exactly those operands that would be accessed by the instruction fields setting the indirect pointers. Consequently, a routine that emulates an instruction operation can access, with no overhead, the source and destination registers for the instruction being emulated. No copying of arguments and results needs to be done.

The indirect pointers are also set by the floating-point, MULTIPLY, MULTM, MULTIPLU, and MULTMU instruction when these cause exceptions, to allow the exception handler to access the instruction operands.

When using indirect register addressing, at least one cycle of delay must separate any instruction that sets an indirect pointer and any instruction which de-references that pointer. This restriction is the result of processor pipelining (see Section 7.4.3).

### 7.2.2 Run-Time Checking

The assert instructions provide programs with an efficient means of comparing two values and causing a trap when a specified relation between the two values is not satisfied. The instructions assert that some specified relation is true, and trap if the relation is not true. This allows run-time checking—such as checking that a computed array index is within the boundaries of the storage for an array—to be performed with a minimum performance penalty.

Assert instructions are available for comparing two signed or unsigned operands. The following relations are supported: equal-to, not-equal-to, less-than, less-than or equal-to, greater-than, and greater-than-or-equal-to.

The assert instructions specify a vector number for the trap. However, only vector numbers 64 through 255 (inclusive) may be specified by User-mode programs. If a User-mode assert instruction causes a trap, and the vector number is between 0 and 63 inclusive, a Protection Violation trap occurs, instead of the specified trap.

Since the assert instructions allow the specification of the vector number, several traps may be defined in the system, for different situations detected by the assert instructions.

### 7.2.3 Operating System Calls

An applications program can request a service from the operating system by using the following instruction:

```
asneq System_Routine, gr1, gr1
```

This instruction always creates a trap, since it attempts to assert that the content of a register is not equal to itself (the register number used here is irrelevant, as long as the register is otherwise accessible).

The System_Routine vector number specified by the instruction invokes the execution of the operating system routine that provides the requested service. This vector number may have any value between 64 and 255, inclusive (vector numbers 0 through 63...
are pre-defined or reserved). Thus, as many as 192 different operating-system routines may be invoked from the applications program.

In cases where the indirect pointers may be used, the EMULATE instruction allows two operand/result registers to be specified to the operating-system routine. The instruction is:

```
emulate System_Routine, lr3, lr6
```

In this case, the System_Routine vector number performs the same function as in the previous example. Here, however, LR3 and LR6 are specified as operand registers and/or result registers (these particular registers are used only for illustration). The operating-system routine has access to these registers via the indirect pointers, which allows flexible communication.

### 7.2.4 Multi-Precision Integer Addition And Subtraction

The processor allows the Carry (C) bit of the ALU Status Register to be used as an operand for add and subtract instructions. This provides for the addition and subtraction of operands which are greater than 32 bits in length. For example, the following code implements a 96-bit addition with signed overflow detection.

```
add     lr7,  gr96, lr2
addc    lr8,  gr97, lr3
addcs   lr9,  gr98, lr4
```

Global registers GR96–GR98 contain the first operand, local registers LR2–LR4 contain the second operand, and local registers LR7–LR9 contain the result. The first two add instructions set the C bit, which is used by the second two instructions. If the addition causes a signed overflow, then an Out of Range trap occurs; overflow is detected by the final instruction.

### 7.2.5 Integer Multiplication

The Am29050 microprocessor directly executes the integer-multiplication instructions MULTIPLY, MULTIPLU, MULTM, and MULTMU (these are implemented using traps in the Am29000 microprocessor). The Am29050 microprocessor implements the multiply-step instructions MUL, MULL, and MULL for compatibility, but new code generated for the Am29050 microprocessor should take advantage of the faster integer multiply instructions.

The MULTIPLY and MULTIPLU instructions multiply two 32-bit integers, giving a 32-bit result. MULTIPLY is used for signed integers, and MULTIPLU is used for unsigned integers. Overflow of the 32-bit result is detected when Integer Multiplication Overflow Exception Mask bit (MO) of the integer Environment Register is 0. When the MO bit is 0, the MULTIPLY and MULTIPLU operations cause an Out of Range trap upon overflow of a 32-bit signed or unsigned result, respectively.

In general, multiplying 32-bit integers produces a 64-bit result. The most-significant 32 bits of a signed or unsigned result are generated by the MULTM and MULTMU instructions, respectively. To obtain a full 64-bit result, a MULTIPLY or MULTIPLU instruction is followed by a MULTM or MULTMU instruction:

```
; 32 bit * 32 bit → 64 bit signed multiply
; Input: multiplicand in lr2, multiplier in lr3
; Output: result most-significant word in gr96, result
; least-significant word in gr97
```
7.2.6

**Integer Division**

The processor performs integer division by a series of divide step instructions, rather than by a single instruction. Floating-point division is performed by hardware. When the divisor is a power of 2, and the dividend is unsigned, the divide should be accomplished by a right shift.

If a program requires the division of two integers, the required sequence of divide steps may be executed in-line, or executed in a divide routine called as a procedure. It may be beneficial to precede a full divide procedure with a routine to discover whether or not the number of divide steps may be reduced. This reduction is possible when the operands do not use all of the available 32 bits of precision.

The following routine divides a 64-bit, unsigned dividend by a 32-bit unsigned divisor:

```
; 64 bit / 32 bit → 32 bit unsigned divide
; Input:  most-significant dividend word in lr2, least-significant dividend word in lr3,
; divisor in lr4
; Output: quotient in gr96, remainder in gr97
UDiv64:
  mtsr Q, lr3 ; put least-significant word of the dividend in register
  div0 gr97, lr2 ; perform initial divide step
   .rep 31 ; expand out 31 copies of the next
   .endr ; instruction in-line
   div gr97, gr97, lr4 ; total of 30 more divide steps
  divl gr97, gr97, lr4 ; perform last step
  divrem gr97, gr97, lr4 ; compute remainder
  mfsr gr96, Q ; get the quotient
```

The following routine divides a 32-bit unsigned dividend by a 32-bit unsigned divisor:

```
; 32 bit / 32 bit → 32 bit unsigned divide
; Input:  dividend word in lr2, divisor in lr3
; Output: quotient in gr96, remainder in gr97
```

UDiv32:

```
mtsr Q, lr2 ; put the dividend in the Q register
div0 gr97, 0 ; perform initial divide step, zeroing out
            ; the upper bits of the dividend
.rep 31 ; expand out 31 copies of the next
            ; instruction in-line
div gr97, gr97, lr4 ; total of 30 more divide steps
.endr
divl gr97, gr97, lr4 ; perform last step
divrem gr97, gr97, lr4 ; compute remainder
mfsr gr96, Q ; get the quotient
```

The following routine divides a 32-bit signed dividend by a 32-bit signed divisor. It also
traps division by zero. Because the divide-step instructions only operate on unsigned
operands, extra code is required to perform sign checking and conversion:

; 32 bit / 32 bit signed divide, called by:

```
call tpc, SDiv32 ; call the divide routine
setip dst_reg, src1_reg, src2_reg ; passing pointers to the operand
            ; registers in the delay slot
```

; Input: dividend and divisor in the registers pointed to by the indirect-pointer
; registers IRA and IPB
; Output: result quotient in the register pointed to by IPC, remainder left in Temp0
; Used: return address in tpc, special register Q
; Destroyed: previous contents of registers tav, Temp0–Temp2
; Symbolic register names:

```
.reg Temp0, gr116
.reg Temp1, gr119
.reg Temp2, gr120
.reg tpc, gr122
.word 0x00200000 ; Debugger tag word
```

SDiv32:

```
const Temp1, 0
asneq V_DIVBYZERO, Temp1, gr0 ; check for divide by zero with an assert
add Temp0, gr0, 0 ; get dividend from indirect pointer
jmpf Temp0, pdividend ; is it negative (jmpf is also "jmppos")
add Temp2, Temp1, gr0 ; get divisor from indirect pointer
const Temp1, 3 ; set negative result and remainder flags
subr Temp0, Temp0, 0 ; make dividend positive
pdividend:
    jmpf Temp2, pdivisor ; is divisor negative?
mstr Q, Temp0 ; copy dividend to Q register in delay slot
                ; of the jump
xor Temp1, Temp1, 1 ; turn off negative result flag
subr Temp2, Temp2, 0 ; make divisor positive
pdvdivisor:
    div0 Temp0, 0 ; initialize
    .rep 31 ; expand out 31 copies of the next
            ; instruction in-line
div Temp0, Temp0, Temp2 ; total of 30 more divide steps
.endr
divl Temp0, Temp0, Temp2 ; perform last divide step
divrem Temp0, Temp0, Temp2 ; get positive remainder
7.2.7 Roundling

Floating-point operations can be performed in one of four rounding modes defined in the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std. 754-1985). These modes are:

Round to Nearest: The result produced is the representable value nearest to the infinitely precise result. It can happen that the infinitely precise result falls exactly halfway between two representable values; in this case, the result produced will be whichever of those two representable values has a fractional part whose least-significant bit is 0.

Round Toward +∞: The result produced is the representable value closest to but no less than the infinitely precise result.

Round Toward −∞: The result produced is the representable value closest to but no greater than the infinitely precise result.

Round Toward 0: The result produced is the representable value closest to but no greater in magnitude than the infinitely precise result.

The floating-point rounding mode is determined by the FRM field of the Floating-Point Environment Register. The following operations are affected by the value in the FRM field:

- FADD, DADD, FSUB, DSUB, FMUL, DMUL, FDIV, DDIV, FMAC, DMAC, FMSM, DMSM, and SQRT
- MFACC and MTACC
- CONVERT, when the instruction field RND is 100.

The value in the FRM field has no effect on the floating-point comparison operations, the CLASS operation, or the FDMUL operation.

7.2.8 Fast-Floating Mode

The 29K Family fully supports the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std. 754-1985). For some floating-point implementations, however, a significant speed advantage can be realized by disabling certain supported features.

For the Am29050 microprocessor, a fast-float mode has been provided to disable the processing of denormalized numbers. Although the handling of denormalized numbers in the Am29050 microprocessor is always transparent to the user, the processor will sometimes require extra cycles to process denormalized operands. This adds both to processing time and to the statistical variability of the processing time required for a given number of computations.
The Fast-Float mode is enabled by setting the Fast-Float Select (FF) bit of the Floating-Point Environment Register. In the fast-float mode, denormalized numbers are handled as follows:

1. A denormalized source operand is converted to a zero of the same sign before the operation is performed; this conversion does not affect the value of the operand in the source register. This conversion does not signal an inexact result exception, because the Fast-Float mode considers a denormalized number to be nothing more than a representation of zero. This occurs without adding extra cycles.

2. An operation producing an infinitely precise result smaller than the smallest normal number in the destination format will produce a zero result of the same sign as the infinitely precise result; the underflow and inexact exceptions will be reported.

The instructions MTACC, FMAC, DMAC, FMSM, and DMSM use the Fast-Float mode, regardless of the FF bit.

### 7.2.9 Complementing a Boolean

To complement a Boolean in the processor's format, only the most-significant bit of the Boolean word should be considered, since the least-significant 31 bits may or may not be zeros. This is accomplished by the following instruction:

```
cpge gr96, gr96, 0
```

The Boolean is in GR96 in this example. This instruction is based on the observation that a Boolean TRUE is a negative integer, since the Boolean bit coincides with the integer sign bit. If the operand of this instruction is a negative integer (i.e., TRUE), the result is the Boolean FALSE. If the operand is non-negative (i.e., the Boolean FALSE), the result is TRUE.

### 7.2.10 Using the Floating-Point Accumulators

The Floating-Point Accumulators (ACC0 to ACC3) provide an extra source or destination register for the multiply-accumulate (FMAC, DMAC) and multiply-sum (FMSM, DMSM) instructions. The FMAC and DMAC instructions can be used to evaluate sum-of-products calculations, such as those found in vector or matrix multiplication. The FMSM and DMSM instructions are used when the multiplier is a fixed value, such as in polynomial evaluation using Horner's Rule, or the SAXPY or DAXPY (Single/Double precision A times X Plus Y) vector routines used in Gaussian Elimination.

### 7.2.10.1 MATRIX MULTIPLICATION USING THE FMAC INSTRUCTION

One of the operations performed frequently in 3-dimensional (3-D) graphics systems is the rotation and translation of a 3-D vector. This is accomplished by multiplying a 4-by-1 vector and a 4-by-4 matrix. In this case, the four accumulators are used to interleave four independent sum-of-products calculations. This eliminates pipeline stalls caused by dependencies on the accumulator values.

For the FMAC and DMAC instructions, accumulated values can overflow, especially when accumulating many terms. The FMAC and DMAC instructions can specify the accumulator format independent of the other operands, allowing the accumulated values to be maintained in the double-precision format even though the operations are performed in the single-precision format. This is accomplished with no performance penalty.
Multiply a 4 x 1 vector times a 4 x 4 matrix. Four accumulators are used to interleave four independent sum-of-product evaluations. This code takes 22 cycles to complete 28 floating-point operations.

Input: 4 x 4 matrix \((a)\) in registers \(lr2 - lr17\); 4 x 1 vector \((b)\) in registers \(lr18 - lr21\)

Output: 4 x 1 result vector \((c)\) in \(lr22 - lr25\).

The first four instructions initialize the accumulators with the first four independent products. The FMAC function field is set to 4, specifying that the operation to be performed is \(a \cdot b + 0.0\)

\[
\begin{align*}
\text{fmac} & \quad 4, 0, lr2, lr18 \quad \text{; acc}_0 \leftarrow a_{11} \cdot b_1 \\
\text{fmac} & \quad 4, 1, lr6, lr18 \quad \text{; acc}_1 \leftarrow a_{21} \cdot b_1 \\
\text{fmac} & \quad 4, 2, lr10, lr18 \quad \text{; acc}_2 \leftarrow a_{31} \cdot b_1 \\
\text{fmac} & \quad 4, 3, lr14, lr18 \quad \text{; acc}_3 \leftarrow a_{41} \cdot b_1
\end{align*}
\]

; the remaining FMAC operations continue the four independent evaluations:

\[
\begin{align*}
\text{fmac} & \quad 0, 0, lr4, lr20 \quad \text{; acc}_0 \leftarrow a_{12} \cdot b_2 + \text{acc}_0 \\
\text{fmac} & \quad 0, 1, lr7, lr19 \quad \text{; acc}_1 \leftarrow a_{22} \cdot b_2 + \text{acc}_1 \\
\text{fmac} & \quad 0, 2, lr11, lr19 \quad \text{; acc}_2 \leftarrow a_{32} \cdot b_2 + \text{acc}_2 \\
\text{fmac} & \quad 0, 3, lr15, lr19 \quad \text{; acc}_3 \leftarrow a_{42} \cdot b_2 + \text{acc}_3
\end{align*}
\]

\[
\begin{align*}
\text{fmac} & \quad 0, 0, lr4, lr20 \quad \text{; acc}_0 \leftarrow a_{13} \cdot b_3 + \text{acc}_0 \\
\text{fmac} & \quad 0, 1, lr8, lr20 \quad \text{; acc}_1 \leftarrow a_{23} \cdot b_3 + \text{acc}_1 \\
\text{fmac} & \quad 0, 2, lr12, lr20 \quad \text{; acc}_2 \leftarrow a_{33} \cdot b_3 + \text{acc}_2 \\
\text{fmac} & \quad 0, 3, lr16, lr20 \quad \text{; acc}_3 \leftarrow a_{43} \cdot b_3 + \text{acc}_3
\end{align*}
\]

; the final four instructions move the accumulated sums into the destination registers:

\[
\begin{align*}
\text{mfacc} & \quad lr22, 1, 0 \quad \text{; c}_0 \leftarrow \text{acc}_0 \\
\text{mfacc} & \quad lr23, 1, 1 \quad \text{; c}_1 \leftarrow \text{acc}_1 \\
\text{mfacc} & \quad lr24, 1, 2 \quad \text{; c}_2 \leftarrow \text{acc}_2 \\
\text{mfacc} & \quad lr25, 1, 3 \quad \text{; c}_3 \leftarrow \text{acc}_3
\end{align*}
\]

### 7.2.10.2 SAXPY USING THE MSM INSTRUCTION

The SAXPY (Single Precision A Times X Plus Y) routine is used heavily to solve systems of linear equations via Gaussian Elimination. The following example SAXPY routine operates on vectors of 16 elements:

SAXPY of size 16, using the FMSM instruction.

inputs: constant multiplier \(A\) in \(lr2\)
address of X vector in \(lr3\)
address of Y vector in \(lr4\)
address of result vector in \(lr5\)
assumes ACF is 01

; first, load in the X vector using the LOADM instruction. This operation works with burst-access memory at 1 word per cycle:

\[
\begin{align*}
\text{mtsrim} & \quad cr, 15 \quad \text{; load 16 words} \\
\text{loadm} & \quad 0, 0, gr96, lr2 \quad \text{; read in the X vector}
\end{align*}
\]

; load in the Y vector the same way...

\[
\begin{align*}
\text{mtsrim} & \quad cr, 15 \\
\text{loadm} & \quad 0, 0, lr6, lr3 \quad \text{; read in the Y vector} \\
\text{mtacc} & \quad lr4, 0, 0 \quad \text{; initialize with multiplier}\ A
\end{align*}
\]
7.2.11 Using the Condition Code Accumulator

The Condition Code Accumulator can be used to concatenate the Boolean results of several operations into a single condition code. The condition code can then be used as an operand in further operations, for example, as a control parameter for conditional branches.

The Condition Code Accumulator Register is accessed via Global Registers 2 and 3. If Global Register 2 (CCA) is specified as the destination of an operation, then the 32-bit operation result is written to the Condition Code Accumulator Register. If Global Register 3 (CCA-shift) is the destination, then the Condition Code Accumulator Register is shifted left one bit and the most-significant bit of the operation result is placed in the least-significant bit of the register. The contents of the Condition Code Accumulator Register are read by specifying Global Register 2 as a source operand of an instruction.

The following restrictions apply to the use of the Condition Code Accumulator:

CCA as Source: The CCA register can be specified as a source for any instruction except those performed in the Floating-Point Unit. (The instructions performed in the FPU are: all floating-point instructions, CLASS, CONVERT, MULTIPLY, MULTIPLU, MUL TM, and MUL MU.)

CCA as Destination: The CCA register can be specified as the destination of the following instructions only: ADD, SUB, and the constant instructions (CONST, CONSTH, CONSTHZ, and CONSTN).

CCA-shift as Source: The CCA-shift register can not be specified as a source. Specifying CCA-shift as a source will produce an unpredictable result.

CCA-shift as Destination: The CCA-shift register can be specified as the destination of any instruction except LOAD, LOADL, LOADM, and LOADSET.

There are two additional restrictions on the use of the Condition Code Accumulator:

1. The Condition Code Accumulator cannot be used as both source and destination in the same instruction. For example, the instructions:
   
   add gr3,gr2,lr0
   
or
   add gr2,gr3,lr0

   are not permitted.
2. Write-write dependency checking is disabled for any instructions having CCA or CCA-shift as the destination. For example, if the instructions:

\[
\begin{align*}
\text{fdiv} & \quad \text{gr3,lr0,lr2} \\
\text{and} & \quad \text{fadd} \quad \text{gr3,lr4,lr6}
\end{align*}
\]

are issued in sequence, hardware interlocks do not guarantee that the instructions will complete in sequence. Therefore only code sequences which guarantee a fixed order of completion will give predictable results. Problematic sequences are those which contain:

- Instructions with unequal latencies (as in the example above).
- Instructions whose latency may change in the presence of denormalized input operands or results. These instructions—which include FMUL, DMUL, DDIV, and SQRT—can be used if the Fast-Float mode is enabled.

### 7.2.12 Generating Large Constants

Eight-bit constants are directly available to most instructions. Larger constants must be generated explicitly by instructions and placed into registers before they can be used as operands. The processor has four instructions for the generation of large data constants: Constant (CONST); Constant, High (CONSTH); Constant, Negative (CONSTN); and Constant High, Zero (CONSTHZ).

The CONST instruction sets the least-significant 16 bits of a register with a field in the instruction; the most-significant 16 bits are set to zero. This instruction allows a 32-bit positive constant to be generated with one instruction, when the constant lies in the range of 0 to 65535.

Any 32-bit constant may be generated with a combination of the CONST and CONSTH instructions. The CONSTH instruction sets the most-significant 16 bits of a register with a field in the instruction; the least-significant bits are not modified. Thus, to create a 32-bit constant in a register, the CONST instruction sets the least-significant 16 bits, and the CONSTH instruction sets the most-significant 16 bits.

The CONSTN instruction sets the least-significant 16 bits of a register with a field in the instruction; the most-significant 16 bits are set to one. This instruction allows a 32-bit, negative constant to be generated with one instruction, when the constant lies in the range of -65536 to -1.

The CONSTHZ instruction sets the most-significant 16 bits of a register with a field in the instruction; the least-significant 16 bits are set to zero. This facilitates the generation of floating-point constants.

### 7.2.13 Large Jump and Call Ranges

The 16-bit relative branch displacement provided by processor instructions is sufficient in the majority of cases. However, addresses with a greater range occasionally are needed. In these cases, the CONST and CONSTH instructions generate the large branch-target address in a register. An indirect jump or call then uses this address to branch to the appropriate location.

When program modules are compiled separately, the compiler cannot determine whether or not the 16-bit displacement of a CALL instruction is sufficient to reach an external procedure, even though it is sufficient in most cases. Instead of generating instructions for the worst case (i.e., the CONST, CONSTH, and CALLI described above), it is more efficient to generate a CALL as if it were appropriate, with the
worst-case sequence (in this case, CONST, CONSTH, and JMP) also appearing in the generated code somewhere (e.g., at the end of a compiled procedure).

When the above scheme is used, the linker is able to determine whether or not the CALL is sufficient. If it is not, the CALL can be re-targeted to the worst-case sequence in the code. In other words, when the CALL is not sufficient, the linker causes the execution sequence to be:

```
call
const
consth
jmp
```

In this manner, the longer execution time for the call occurs only when necessary.

### 7.2.14 NO-OPs

When a NO-OP is required for proper operation (e.g., as described in Section 7.4.3), it is important that the selected instruction not perform any operation, regardless of program operating conditions. For example, the NO-OP cannot access general-purpose registers, because a register may be protected from access in some situations. The suggested NO-OP is:

```
aset 0x40, gr1, gr1
```

This instruction asserts that the Stack Pointer (GR1) is equal to itself. Since the assertion is always true, there is no trap. Note also that the Stack Pointer cannot be protected, and that the assert instruction cannot affect any processor state.

### 7.2.15 Character-String Operations

The need to perform operations on character strings arises frequently in many systems. The processor provides operations for manipulating character data, but these are frequently inefficient for dealing with character strings, since the processor is optimized for 32-bit data quantities.

It is much more efficient, in general, to perform character-string operations by operating on units of four bytes each. These four-byte units are more suited to the processor's data-flow organization. However, there are several things to be considered when dealing with four-byte units, as outlined in this section.

#### 7.2.15.1 ALIGNMENT OF BYTES WITHIN WORDS

Character strings normally are not aligned with respect to 32-bit words. Thus, when word operations are used to perform character-string operations, alignment of the character strings must be taken into account.

For example, consider a character string aligned on the third byte of a word that is moved to a destination string aligned on the first byte of a word. If the movement is performed word-at-a-time, rather than byte-at-a-time, the move must involve shift and merge operations, since words in the destination character-string are split across word boundaries in the source character string.

The processor's Funnel Shifter can be used to perform the alignment operations required when character operations are performed in four-byte units. Though the
Funnel Shifter supports general bit-aligned shift and merge operations, it easily is adapted to byte-aligned operations.

For byte-aligned shift and merge operations, it is only necessary to insure that the two most-significant bits of the Funnel Shift Count (FC) field of the ALU Status Register point to a byte within a word, and that the three least-significant bits of the FC field are 000.

7.2.15.2

DETECTION OF CHARACTERS WITHIN WORDS

Most character-string operations require the detection of a particular character within the string. For example, the end of a character string is identified by a special character in some character-string representations. In addition, character strings often are searched for a specific pattern. During such searches, the most-frequently executed operation is the search within the character string for the first character of the pattern.

The processor provides a Compare Bytes (CPBYTE) instruction, which directly supports the search for a character within a word. This instruction can provide a factor-of-four performance increase in character-search operations, since it allows a character string to be searched in four-byte units.

During the search, the words containing the character string are compared, a word at a time, to a search key. The search key has the character of interest in every byte position. The CPBYTE instruction then gives a result of TRUE if any character within the character-string word matches the corresponding byte in the search key.

7.2.16

Movement of Large Data Blocks

The movement of large blocks of data—for example, to perform a memory-to-memory move—can be performed by an alternating series of loads and stores. However, it is normally much more efficient to move large blocks of data by using an alternating series of Load Multiple and Store Multiple instructions. These instructions take better advantage of the data-movement capabilities of the processor, though they require the use of a large number of registers.

During data movement, it is possible to perform alignment operations by a series of EXTRACT instructions between the Load Multiple and Store Multiple. Also, since the Load Multiple and Store Multiple are interruptible, these instructions may be used to move large amounts of data without affecting interrupt latency.

7.3

SYSTEMS-PROGRAMMING CONSIDERATIONS

This section discusses topics of general concern in the implementation of control programs and operating systems.

7.3.1

System Protection

The Am29050 microprocessor provides protection of several different system resources. In general, this protection is based on the value of the Supervisor Mode (SM) bit in the Current Processor Status Register.

7.3.1.1

MEMORY PROTECTION

Memory and input/output access protection is provided by the Memory Management Unit. Each Translation Look-Aside Buffer entry in the MMU contains protection bits which determine whether or not an access to the page associated with the entry will be permitted. Each Region Mapping Control Register also contains protection bits to control access to the virtual region it maps.
There is a set of protection bits for Supervisor-mode programs, and a separate set for User-mode programs. Thus, for the same virtual page or region, the access authority of programs executing in the Supervisor mode can be different than the authority of programs executing in User mode.

A Data MMU Protection Violation or Instruction MMU Protection Violation trap occurs if a data or instruction access, respectively, is attempted, but is not allowed because of the value of the protection bits.

### 7.3.1.2 REGISTER PROTECTION

General-purpose registers are protected by the Register Bank Protection Register. The Register Bank Protection Register allows parameters for the operating system to be kept in general-purpose registers, protected from corruption by User-mode programs.

If a User-mode program attempts to access a protected general-purpose register, a Protection Violation trap occurs. Supervisor-mode programs may access any general-purpose register, regardless of protection.

The special-purpose registers 0 to 127 and all Translation Look-Aside Buffer registers are protected from User-mode access. Any attempted access of these registers by a User-mode program causes a Protection Violation trap. The special-purpose registers 163 and 165 to 255 (though not implemented) are protected from any access. Any attempted access of special-purpose registers 163 and 165 to 255, even in the Supervisor mode, causes a Protection Violation trap. This permits virtualization of these special registers.

### 7.3.1.3 EXTERNAL ACCESS PROTECTION

Other than the protection offered by the Memory Management Unit, the processor provides no specific protection for external devices and memories. However, the SUP/US output reflects the value of the SM bit during the address cycle of an external access. This can signal external devices and memories to provide protection. Any protection violations can be reported via the DERR input.

### 7.3.2 Interrupts and Traps

The Am29050 microprocessor automatically saves only the Current Processor Status Register in the Old Processor Status Register when an interrupt or trap is taken. The processor does not automatically save any other state when an interrupt or trap is taken, but rather freezes the contents of the following registers:

1. Program Counters 0, 1, and 2.
2. Channel Address, Channel Data, and Channel Control.
3. ALU Status.

When these registers are frozen, they are allowed to be updated only by Move To Special Register instructions. The frozen condition is controlled directly by the Freeze (FZ) bit in the Current Processor Status Register.

Since the Channel Address, Channel Data, and Channel Control registers are frozen when an interrupt or trap is taken, the interrupt handler may perform single-access loads and stores without interfering with the restart state of a channel operation in the interrupted routine. However, load-multiple and store-multiple operations have unpredictable results if performed while the FZ bit is 1, since these operations are sequenced by the Channel Control Register.
VEVECTOR AREA

As discussed in Section 3.5.4, interrupts and traps are dispatched through a 256-entry Vector Area, which directs the processor to a routine to handle a given interrupt or trap. Only 64 entries of this area are required for basic processor operation (or 22, if instruction emulation is not used).

The required number of Vector Area entries is system-dependent, as determined by the vector numbers that are specified in the assert and EMULATE instructions. The number of entries can be restricted to reduce the memory requirements for the Vector Area, which is especially important when the Vector Area is organized as a sequence of 64-instruction blocks. However, there is nothing to prevent an instruction from specifying a vector number in the range 64 to 255. For this reason, it may not be possible to reduce the size of the Vector Area, since erroneous instruction vector numbers might cause unpredictable results.

The Vector Area may be relocated by the Vector Area Base Address Register, and there may be multiple Vector Areas in the system, with the Vector Area Base Address Register pointing to the one that is currently active.

INTERRUPT HANDLING

For temporary program interruptions, such as for Translation Look-Aside Buffer reload, the basic processor interrupt mechanism is sufficient to eliminate the need for the interrupt or trap handler to save any state for the interrupted routine. This state may be left in the appropriate registers while the handler executes. An interrupt return returns immediately to the interrupted program.

Besides the direct performance advantage that results from not saving state for temporary program interruptions, there is an additional advantage provided by the processor. When the state of the interrupted routine remains in the appropriate registers, the processor can detect that the Program Counter 0 and Program Counter 1 registers contain sequential addresses. Instead of performing two non-sequential instruction fetches for the interrupt return in this case, the processor initiates only a single non-sequential fetch (the second fetch is performed as a sequential fetch). This reduces the overhead of the interrupt return for these routines.

Note that when the state of an interrupted program remains in the processor, the processor cannot be enabled to take any further interrupts until an interrupt return is executed. Therefore, this capability should be restricted to time-critical routines, where the execution time of the routine does not interfere with interrupt-latency considerations. (Note that the Interrupt Pending bit of the Current Processor Status Register may be used to detect the presence of external interrupts while these interrupts are disabled).

To support dynamically nested interrupts and traps, the interrupt or trap handler must save state as necessary for the application, using an appropriate data structure (such as an interrupt stack or program status area). Once the state has been saved (or, alternately, while it is being saved), the handler can load the state for a new program to be executed. An interrupt return then initiates the execution of the new program.

When the interrupt or trap handler saves the floating-point accumulators, the Accumulator Format (ACF) field of the Floating-Point Environment Register may not indicate the actual format of the accumulators, because of modifications to the ACF field before the interrupt or trap was taken. The interrupt or trap handler should treat the accumulators as containing double-precision values. This requires forcing the ACF field to 10 (double-precision) after saving the Floating-Point Environment Register and before executing an MFACC instruction to save the accumulators.
7.3.2.3 **INTERRUPT RETURN**

An interrupt return resumes the execution of a program whose processor state is contained in the following registers:

1. Old Processor Status.
2. Program Counters 0 and 1.
3. Channel Address, Channel Data, and Channel Control.

This state is most likely different from the state of the program executing the interrupt return. These registers must be set appropriately before an interrupt return is executed.

Note that the instruction sequence that sets these registers must have a Current Processor Status that is equivalent to that of an interrupt or trap handler; the FZ bit must be 1, and interrupts and traps must be disabled.

7.3.2.4 **SIMULATION OF INTERRUPTS AND TRAPS**

Assert instructions may be used by a Supervisor-mode program to simulate the occurrence of various interrupts and traps defined for the processor. Only an assert instruction executed in Supervisor mode can specify a vector number between 0 and 63. If this instruction causes a trap, the effect is to create an interrupt or trap which is similar to that associated with the specified vector number.

Thus, the interrupt and trap routines defined for basic processor operation can be invoked without creating any particular hardware condition. For example, an INTR1 interrupt may be simulated by an assert instruction that specifies a vector number of 17, without the activation of the INTR1 signal.

7.3.2.5 **TRAPS IN SYSTEM-LEVEL ROUTINES**

The Monitor trap and Monitor mode provide a mechanism for handling traps in system-level routines in a manner that allows these routines to be restarted. This permits error recovery and debugging of system-level routines.

7.3.3 **Memory Management**

This section discusses various issues involved in memory management as they relate to an operating system. The focus is on virtual-addressing issues.

7.3.3.1 **VIRTUAL PAGE SIZE**

The MMU Configuration Register determines the size of a virtual page mapped by the Memory Management Unit. The choices for page size are 1, 2, 4, and 8 kb. The selection of page size is based on several considerations:

1. For a given page size, any allocation of pages to a process will, on average, waste half of one page. With smaller page sizes, the waste is smaller. In systems with a large number of processes, each with a small amount of memory, small page sizes can reduce waste significantly.
2. Smaller page sizes allow finer memory-protection granularity.
3. The maximum amount of memory that can be referenced by Translation Look-Aside Buffer (TLB) entries is set by the number of TLB entries and the page size. Larger page sizes allow the fixed number of TLB entries to address more memory, and generally reduce the number of TLB misses. For example, with 1-kb pages, a process requiring 8 kb of contiguous memory would create eight TLB misses; with 8-kb pages, the process would create only one TLB miss.
4. The page is usually the unit of memory moved between memory and backing storage. The design of the backing storage sub-system also may influence the choice of page size, because of transfer-efficiency considerations. For example, if the backing storage is a disk, the disk seek time is large compared to transfer time. Thus, it is more efficient to transfer large amounts of data with a single seek. Efficiency may also depend on disk organization (i.e., the number of seeks possibly required to transfer a page).

7.3.3.2 PAGE REFERENCE AND CHANGE INFORMATION

In a demand-paged environment, it is important to be able to collect information on the use and modification of pages. The processor does not collect this information directly, but the information may be collected by the operating system, without requiring hardware support.

Each TLB entry contains six bits which specify the type of accesses that are permitted for the corresponding page. When a TLB entry is loaded, the TLB reload routine can set the protection bits so that an access to the corresponding page is not allowed. If an access is attempted, a TLB protection violation trap occurs. This trap may be used to signal that the page is being referenced. After noting this fact, the trap handler may set the protection bits to allow the access, and return to the trapping routine.

A technique similar to the one just described can be used to collect information on the modification of a page. However, in this case, the TLB protection bits initially are set so that a store is not allowed.

It is also possible to create reference information by noting references during TLB reload. For example, reference bits normally are reset periodically, so that they reflect current references. When reference bits are reset, the entire TLB may be invalidated. Reference bits then are set as TLB entries are loaded. Note that this scheme relies on the fact that a TLB miss implies a reference to the corresponding page. Also, this scheme does not account for page change information.

The disadvantage of both of the above schemes is one of possible performance loss. This is the result of the additional traps required to monitor page references and changes. If the performance impact is unacceptable, references and changes can be monitored easily by hardware that detects reads and writes to page frames in instruction or data memory.

7.3.3.3 MONITORING CRITICAL AREAS OF MEMORY

In certain fault-tolerant systems, it is necessary to detect changes to critical areas of memory, so that these changes may be reflected immediately on a non-volatile storage device. To monitor critical memory areas, the TLB protection bits can be set so that any change to the area causes a Data TLB Protection Violation trap. This trap signals that the area is being modified.

In this use of the protection bits, the trap handler does not set the bits to allow the access. Rather, the trap handler must emulate the access, using the Channel Address, Channel Data, and Channel Control registers. The Contents Valid (CV) bit of the Channel Control Register is reset before the trapping routine is restarted, so that the trap does not recur.

7.3.3.4 TLB MISS HANDLING

The address translation performed by the MMU is ultimately determined by routines that place entries into the Translation Look-Aside Buffer (TLB). TLB entries normally are based on system page tables, which give the translation for a large number of pages. The TLB simply caches the currently-needed translations, so that system page tables do not have to be accessed for every translation.
If a required address translation cannot be performed by any entry in the TLB, a TLB miss trap occurs. The trap handling routine—called the TLB reload routine—accesses the system page tables to determine the required translation, and sets the appropriate TLB entry. Note that the access requiring this translation can be restarted by the interrupt return at the end of the TLB reload routine (see Section 7.3.4).

A large number of different page-table organizations are possible. Since the TLB reload routine is a sequence of processor instructions, the page tables may have a structure and access method that satisfies trade-offs of page table size, translation lookup time, and memory-allocation strategies.

Another possibility supported by the TLB reload mechanism is that of a second-level TLB. The TLB reload routine is not required to access the system page tables immediately upon a TLB miss, but may access an external TLB, which can be much larger than the processor's TLB. The amount of time required to access the external TLB normally is much smaller than the amount of time required to access the page tables, leading to an overall improvement in performance. Of course, if a translation is not in the external TLB, a page table lookup still must be performed.

Because the TLB reload routine may depend on the type of access causing the TLB miss, the processor differentiates between misses on instruction and data accesses by Supervisor-mode and User-mode programs. This eliminates any time which might be spent by the TLB reload routine in making the same determination. Performance is also enhanced by the LRU Recommendation Register, which gives the TLB register-number for Word 0 of the TLB entry to be replaced by the TLB reload routine (the least-recently-used entry).

### 7.3.3.5 WARM START

When a process switch occurs, there is a high probability that most of the TLB entries of the old process will not be used by the new process. Thus, the new process most likely creates many TLB miss traps early in its execution. This is unavoidable on the first initiation of a process, but may be prevented on subsequent initiations.

When a given process is suspended, the operating system can save a copy of its TLB contents. When the task is restarted, the copy can be loaded back into the TLB. This warm start prevents many of the process' initial TLB misses, at the expense of the time required to save and restore the copy of the TLB entries. However, this time may be much shorter than the time required to perform all TLB re-loads individually.

Note that if this warm-start strategy is adopted, any change in address translation must be reflected in all copies of TLB entries for all affected processes. If address translation is changed often so that it affects more than one process, warm start may not be advantageous.

### 7.3.3.6 MINIMUM NUMBER OF RESIDENT PAGES

In any processor that supports demand-paging, there is a minimum number of pages that must be resident for any active process. This minimum is determined by the maximum number of pages that might be referenced by an atomic operation in the processor's architecture (e.g., an instruction, normally). If this maximum number is not guaranteed to be resident in memory, some operations might never complete, since they may never have all of the required pages resident in memory at one time.

For the Am29050 microprocessor, two pages are required for a process to make progress through the system. The reason for this requirement is that the Am29050 microprocessor, on interrupt return, restarts an interrupted Load Multiple or Store Multiple only after fetching two instructions (see Section 3.5.5). The first of these instructions must be resident in memory—and mapped by the TLB—and the page
required to complete the Load Multiple or Store Multiple must also be resident—and mapped by the TLB—for the interrupt return to complete successfully.

7.3.3.7 REGION MAPPING UNIT OPERATION

The Region Mapping Units (RMUs) also perform translation from a virtual address to a physical address. Each of the two RMUs can map a region of contiguous virtual addresses to an equivalent-sized region of contiguous physical addresses. The region size can range from 64 kb to 2 Gb in power-of-two increments. The RMUs allow large blocks of contiguous physical memory to be mapped in the virtual address space without the overhead of TLB miss handling or the possibility of replacing required TLB entries. For example, operating-system kernels exhibit much less locality-of-reference than applications programs; an operating-system reference causing a TLB miss does not later use the same TLB entry as often as an application program. Using the TLB to map operating-system references can degrade performance and replace valid TLB entries of the calling application. By mapping the operating-system references with the RMUs, this overhead is eliminated.

Like the TLB entries, each RMU entry has six bits which can be used to implement protection as well as collect reference and change information. When both RMUs map a given virtual address, RMU0 has priority over RMU1, and both have priority over the TLB entries. Upon an MMU Protection Violation trap, the trap handler (either data or instruction) should first check RMU0 to see if that unit caused the exception. Following this, it should check RMU1 and finally the TLBs.

If a valid translation does not exist in either RMU0 or RMU1, then the processor uses the TLB for translation. If no valid TLB translation exists, then a TLB miss trap occurs. The TLB miss handler may decide whether or not to use RMU instead of a TLB entry to handle the miss.

7.3.3.8 BRANCH TARGET CACHE MEMORY CONSIDERATIONS

The Branch Target Cache memory is accessed with virtual as well as physical addresses, depending on whether address translation is enabled for instruction accesses. Because of this, the Branch Target Cache memory may contain entries that might be considered valid, even though they are not.

For example, address translation may be changed by modifying the Process Identifier of the MMU Configuration Register. This change is not reflected in the Branch Target Cache memory tags, so the tags do not necessarily perform valid comparisons.

If a TLB miss occurs during the address translation for a branch target instruction, the processor considers the contents of the Branch Target Cache memory to be invalid. This is required to properly sequence the LRU Recommendation Register, and does not solve the problem just described. If the TLB is changed at some point, so that the TLB miss does not occur, the Branch Target Cache memory still may perform an invalid comparison.

To avoid the above problem, the contents of the Branch Target Cache memory must be invalidated explicitly whenever address translation is changed. This can be accomplished by executing an Invalidate (INV) instruction whenever an address translation is changed. The INV instruction causes all entries of the Branch Target Cache memory to become invalid (after the next successful branch). However, since the change in address translation rarely affects the program performing the change, the INV may unnecessarily affect the performance of this program.
The IRETINV instruction has the same effect on the Branch Target Cache memory as the INV instruction, but can reduce the performance impact. The IRETINV delays invalidation until an interrupt return is executed, eliminating the need to disrupt an operating-system routine when it changes address translation. At the point of interrupt return, the contents of the Branch Target Cache memory are most likely not of much use anyway.

Note that the Branch Target Cache memory is not invalidated when the Cache Disable (CD) bit of the Configuration Register is set. When the CD bit is 1, the Branch Target Cache memory continues to operate, but the processor considers its contents to be invalid. Thus, the CD bit cannot be used to invalidate the cache, and, furthermore, the Branch Target Cache memory may have to be invalidated whenever the CD bit is to be reset (i.e., when the cache is to be enabled).

The Branch Target Cache memory distinguishes between virtual and physical addresses, between the instruction RAM and instruction read-only memory (ROM) address spaces, and between User-mode and Supervisor-mode addresses. Thus, the Branch Target Cache memory does not have to be invalidated on transitions between these address spaces. This improves the performance of applications that make heavy use of ROM-based and/or operating-system routines in either physical or virtual address space.

### 7.3.4 Restarting Faulting External Accesses

In a demand-paged system environment, virtual pages and their associated virtual-to-physical mappings are made available to programs on demand. In other words, the memory-management routines generally execute only when a given page or mapping is needed by a program. This need is signaled by a page fault trap caused by a program access (normally, the page fault occurs during a TLB reload).

Since the page fault trap is part of normal system operation, and does not represent an error, the access that causes the trap must be restarted—once the trapping condition is remedied—in a manner that is not detectable to the program causing the trap.

Additionally, in the Am29050 microprocessor, the TLB reload mechanism relies on the ability to restart an access that causes a TLB miss trap. This restart also must be accomplished in a manner that cannot be detected by the trapping program.

The Am29050 microprocessor overlaps external accesses with the execution of instructions. Thus, traps caused by accesses are imprecise: the address of the instruction that initiated the access cannot be determined by the trap handler. Since the address of the initiating instruction is unknown, the access cannot be restarted by re-executing this instruction. Even if the address could be determined, the instruction might not be restartable, since an instruction executed before the trap occurred, but after the access began, may have altered the conditions of the access, such as by altering the address source register.

In order to provide for the restarting of loads and stores that cause exceptions, the processor saves all information required to restart these accesses in the Channel Address, Channel Data, and Channel Control registers. The Contents Valid (CV) and Not Needed (NN) bits in the Channel Control Register indicate that the information contained in these registers represents an access that must be restarted. The CV bit indicates that the access did not complete, and the NN bit indicates whether or not the data from the access is required by the processor.

Note that since instruction execution is overlapped with external accesses, an instruction that executes after a load may alter the destination register for the load. If a trap occurs in this situation, the access information in the Channel Address, Data, and
Control registers is correct, but the load cannot be restarted. The NN bit provides correct operation in this case.

When an interrupt or trap is taken, the handling routine has access to the Channel Address, Data, and Control registers; the contents of these registers may contain information relevant to an incomplete access, and can be preserved for restarting this access. Since these registers are frozen (due to the FZ bit of the Current Processor Status), they are not available to monitor any external accesses in the interrupt or trap handler until their contents are saved, and the FZ bit is reset.

Please note that the exception handler for the Data Access Exception trap must clear the Transaction Faulted (TF) bit in the Channel Control Register. Failure to clear the TF bit will result in the Am29050 microprocessor taking the trap again, once the exception handler returns, causing an infinite series of traps.

The processor restarts an access, using the Channel Address, Channel Data, and Channel Control registers, upon an interrupt return (IRET or IRETINV). The access is initiated if the CV bit of the Channel Control Register is 1 and the NN bit is 0. The restart cannot be detected in the logical operation of the restarted routine, although the timing of its execution is altered.

The mechanism used to restart faulting accesses has the additional benefit of allowing a fast interrupt-response time when the processor is performing a load-multiple or store-multiple operation. Interrupted load-multiple and store-multiple operations are restarted as if they had faulted. In this case, the operation resumes from the point of interruption, not the beginning of the sequence.

7.3.5

Multiple-Processor Systems

The Am29050 microprocessor provides several facilities for the implementation of multi-programming and multi-processing systems. These facilities help provide mutual exclusion, synchronization, and communication between multiple processes, whether these processes execute on a single processor or multiple processors.

Binary semaphores are supported by the Load and Set (LOADSET) instruction. This instruction loads the contents of an external location into a register and automatically sets the contents of the location to the integer -1. This instruction requires no special hardware support in the system, since all sequencing is performed by the processor. Also, the LOADSET is available to User-mode programs. This eliminates the overhead of an operating-system call in the use of binary semaphores.

The instructions Load and Lock (LOADL) and Store and Lock (STOREL) support the locking of external devices and memories, or the locking of particular locations within an external device or memory. This prevents access by any process or processor other than the one that performed the lock, and provides the flexibility of locking in a manner appropriate to the system and application. The LOADL and STOREL instructions are available to User-mode programs.

To indicate that a LOADL or STOREL is being executed, the processor asserts the LOCK output during the external access. Since the processor cannot control the behavior of external devices and memories directly, system hardware must support locking, if required.

Note that the protocol for the locking and unlocking of devices and memories must be defined by the system. For example, the protocol may be defined such that a LOADL locks the device or memory, and a STOREL unlocks the device or memory. Between the execution of the LOADL and the STOREL, the device can be accessed by the locking process with any combination of normal loads and stores.
For the implementation of a general-purpose exclusion, synchronization, and/or communication scheme, the processor allows Supervisor-mode programs to set the Lock (LK) bit in the Current Processor Status. This bit activates the LOCK pin, and prevents the processor from relinquishing the channel to another channel master. (If another master already has control of the channel when the LK bit is set, the LK bit does not take effect until control of the channel is returned to the processor.)

The LK bit allows a Supervisor-mode program to execute with mutual exclusion for any sequence of instructions. However, because interrupts also must be disabled for true exclusion, this may have a negative impact on system performance if used improperly.

7.3.6 Timer Facility

The processor has a built-in Timer Facility that can be configured to cause periodic interrupts. The Timer Facility consists of two special-purpose registers—the Timer Counter and the Timer Reload registers—that are accessible only to Supervisor-mode programs. These registers implement timing functions independent of program execution.

7.3.6.1 Timer Facility Operation

The Timer Counter Register has a 24-bit Timer Count Value (TCV) field that decrements by one on every processor cycle. If the TCV field decrements to zero, it is written with the Timer Reload Value (TRV) field of the Timer Reload Register on the next cycle; the Interrupt (IN) bit of the Timer Reload register is set at the same time. The re-loading of the TCV field by the TRV field maintains the accuracy of the Timer Facility.

The Timer Reload Register contains the 24-bit TRV field and the control bits Overflow (OV), Interrupt (IN), and Interrupt Enable (IE). The TCV field and IN bit were described above. If the IN bit is 1 and the IE bit also 1, a Timer interrupt occurs. If the IN bit is 1 when the TCV field decrements to zero, the OV bit also is set. The OV bit indicates that a Timer interrupt may have occurred before a previous interrupt was serviced.

7.3.6.2 Timer Facility Initialization

To initialize the Timer Facility, the following steps should be taken in the specified order (it is assumed that Timer interrupts are disabled by the DA bit of the Current Processor Status Register during the following steps):

1. Set the TCV field with the desired interval count for the first timing interval. Note that this interval must be sufficiently large to allow the execution of the next step before the TCV field decrements to zero (this is normally the case).

2. Set the TRV field with the desired interval count for the second timing interval. The OV and IN bits are reset, and the IE bit is set as desired. Note that the second timing interval may be equivalent to the first timing interval.

7.3.6.3 Handling Timer Interrupts

The following is a suggested list of actions to be taken to handle a Timer interrupt:

1. Read the Timer Reload register into a general-purpose register.

2. Reset the IN bit in the general-purpose register.
3. Set the TRV field in the general-purpose register to the desired value for the next timing interval. Note that, at this time, the Timer Counter is timing the current interval. Also, this step may be omitted, if all intervals are equivalent.

4. Write the contents of the general-purpose register back into the Timer Reload register.

5. Test the general-purpose-register copy of the OV bit, and if it is set, report the error as appropriate.

6. Perform any system operations required for the Timer interrupt.

7. Execute an interrupt return.

7.3.6.4 TIMER FACILITY USES

Since the Timer Facility has a resolution of a single processor cycle, it may be used to perform precise timing of system events. For example, it may be used to determine an exact measurement of the number of cycles between two events in the system, or to perform precise time-critical control functions. Note that the Timer interrupt is enabled and disabled separately from other processor interrupts, so that its priority can be separately specified.

The Timer Facility can be used to generate time intervals for collecting virtual page usage information (see Section 7.3.3). For example, if memory management relies on a working-set page-replacement algorithm, the Timer Facility can establish the working-set window.

The Timer Facility can be shared among multiple processes. This sharing is accomplished by the implementation of a queue for timer events, which are sorted in order of increasing event time. On each occurrence of a Timer interrupt, the TRV field is set for the interval between the next two events in the queue, while the Timer Counter Register is counting the current interval (because of a previous setting of the TRV field). The event at the beginning of the queue identifies other system actions to be taken for the Timer interrupt. This event is removed from the queue after the appropriate actions are taken.

7.4 PIPELINE FEATURES EXPOSED TO SOFTWARE

In certain cases, the Am29050 microprocessor pipeline is exposed during instruction execution, in that the execution of certain instructions are dependent on the execution of previous instructions. This section discusses the cases where the pipeline is exposed to software, and the resulting effect on instruction execution.

7.4.1 Delayed Branch

The effect of jump and call instructions is delayed by one cycle to allow the processor pipeline to achieve maximum throughput. When one of these branches is successful, the instruction immediately following the jump or call is executed before the target instruction of the jump or call is executed. Jump and call instructions collectively are referred to as delayed branches, and the immediately following instruction is called the delay instruction.
For example, in the following code fragment:

```
cpeq gr96, Ir6, Ir7 (1)
jmpf gr96, label (2)
sub Ir6, Ir6, 1 (3)
const Ir6, 0 (4)
```

```
label: call IrO, sort (5)
add Ir2, Ir5, 0 (6)
cpneq Ir3, gr96, 0 (7)
```

The SUB instruction (3) is executed regardless of the outcome of the JMPF instruction (2). Of course, if the JMPF is not successful, the CONST instruction (4) is also executed. If the JMPF is successful, then the instruction sequence is: (3), (5), (6), and then the first instruction of the SORT procedure. Note that the CALL instruction (5) is also a delayed branch, so the instruction immediately following it, (6), is always executed. After the SORT procedure executes the return sequence, the CPNEQ instruction (7) is the next instruction executed.

The benefit of delayed branches is improved performance and a simplified processor implementation. Performance is improved because the processor pipeline executes useful instructions in a larger number of cycles, compared to an implementation without delayed branches.

For example, ignoring all other effects on performance, and assuming that 15% of all instructions are branches, then a processor without delayed branches would take at least two cycles for 15% of its instructions, leading to 0.85(1) + 0.15(2) = 1.15 cycles per instruction, on average. This represents a 15% performance degradation compared to a processor with delayed branches (assuming, for this simple example, that the delay instruction is always useful).

The cost of having delayed branches is either the extra effort required when the compiler takes advantage of delayed branches (by re-organizing code), or the extra NO-OP instruction which the compiler inserts after every branch to guarantee correct program operation. Since the compiler expends only a small amount of effort to avoid wasting time and space with NO-OPs, and since the performance improvement resulting from this effort is significant, delayed branches are beneficial overall.

When two immediately adjacent branches are taken, the target of the first branch preempts execution of the delay cycle of the second branch, and the target of the second branch then follows the target of the first branch. For example, in the following code fragment:

```
jmp I1 (1)
jmp I2 (2)
add Ir4, Ir4, Ir5 (3)
```
An unconditional JMP instruction (1) is followed immediately by another unconditional JMP instruction (2). (In this example, unconditional JMPs are used; however, any two immediately adjacent taken branches exhibit the same behavior.) The sequence of executed instructions in this case is: JMP instruction (1), JMP instruction (2), SUB instruction (4), CONST instruction (6), SUBR instruction (7), OR instruction (8), and so on. Note that the ADD instruction (3) is not executed. Also, the target of the first JMP instruction (1) was merely visited; control did not continue sequentially from L1 but rather continued from L2.

## 7.4.2 Overlapped Operations

The Am29050 microprocessor overlaps external data references with other operations, and typically performs floating-point operations in parallel with integer operations and with other floating-point operations. Certain programming practices are necessary to exploit this parallelism to improve program performance.

### 7.4.2.1 EXTERNAL ACCESS

In order to make full use of overlapped storage accesses, some instruction reorganization may be necessary. For example, in the following sequence:

```plaintext
L1:    sub    gr96, gr96, 1     (4)
        subc   gr97, gr97, 0     (5)

        ...

L2:    const  gr100, 0xff0f    (6)
        subr   gr101, gr101, 1    (7)
        or     gr100, gr100, gr101 (8)

An unconditional JMP instruction (1) is followed immediately by another unconditional JMP instruction (2). (In this example, unconditional JMPs are used; however, any two immediately adjacent taken branches exhibit the same behavior.) The sequence of executed instructions in this case is: JMP instruction (1), JMP instruction (2), SUB instruction (4), CONST instruction (6), SUBR instruction (7), OR instruction (8), and so on. Note that the ADD instruction (3) is not executed. Also, the target of the first JMP instruction (1) was merely visited; control did not continue sequentially from L1 but rather continued from L2.

The ADD instruction (4) uses the result of the LOAD instruction (3). However, the following four instructions do not depend on the result of the LOAD. Therefore, the ADD instruction (4) can be moved past the JMPT (8)—since it always will be executed even if the JMPT is taken—and replace the NO-OP instruction (9). The resulting sequence is:

```
loop:

```
    sll        gr121, gr119, 2  (1)
    add        gr121, gr120, gr121 (2)
    load       0, 0, gr121, gr121 (3)
    sub        gr96, gr96, 3    (4)
    add        gr119, gr119, 1  (5)
    cplt       gr122, gr119, lr2 (6)
    jmpt       gr122, loop      (7)
    add        gr96, gr96, gr121 (8)
```

The instructions (4) through (7) are likely to be executed while external memory satisfies the load request, resulting in improved throughput. The processor thus allows parallelism to be exploited by instruction reordering.

The overlapped load feature may be used to improve processor performance, but imposes no constraints on instruction sequences, as delayed branches do. The processor implements the proper pipeline interlocks to make this parallelism transparent to a running program.

**FLOATING-POINT UNIT OPERATION**

Programs that use floating-point instructions can also benefit from instruction scheduling. Each of the individual floating-point pipelines (Adder, Multiplier, Divider/Square-Root Unit) can operate in parallel with integer instructions and external accesses, and with each other. Parallel execution is possible as long as subsequent instructions do not need the results of parallel floating-point operations. For example, consider the following code sequence:

```
; a = b + c * d - e / f
; g = *p + i << 2;
```

<table>
<thead>
<tr>
<th>INST</th>
<th>OPERANDS</th>
<th>START ON CYCLE</th>
</tr>
</thead>
<tbody>
<tr>
<td>fmul</td>
<td>t1, c, d</td>
<td>1</td>
</tr>
<tr>
<td>fadd</td>
<td>t1, b, t1</td>
<td>4</td>
</tr>
<tr>
<td>fdiv</td>
<td>t2, e, f</td>
<td>5</td>
</tr>
<tr>
<td>fsub</td>
<td>a, t1, t2</td>
<td>16</td>
</tr>
<tr>
<td>load</td>
<td>0, 0, t1, p</td>
<td>17</td>
</tr>
<tr>
<td>sll</td>
<td>t2, i, 2</td>
<td>18</td>
</tr>
<tr>
<td>add</td>
<td>g, t1, t2</td>
<td>19</td>
</tr>
</tbody>
</table>

The two program statements are independent, so they can be rearranged to take better advantage of the parallelism in the Floating-Point Unit:

<table>
<thead>
<tr>
<th>INST</th>
<th>OPERANDS</th>
<th>START ON CYCLE</th>
</tr>
</thead>
<tbody>
<tr>
<td>fdiv</td>
<td>t1, e, f</td>
<td>1</td>
</tr>
<tr>
<td>fmul</td>
<td>t2, c, d</td>
<td>2</td>
</tr>
<tr>
<td>load</td>
<td>0, 0, t3, p</td>
<td>3</td>
</tr>
<tr>
<td>sll</td>
<td>t4, i, 2</td>
<td>4</td>
</tr>
<tr>
<td>fadd</td>
<td>t2, b, t2</td>
<td>5</td>
</tr>
<tr>
<td>add</td>
<td>g, t3, t4</td>
<td>6</td>
</tr>
<tr>
<td>fsub</td>
<td>a, t2, t1</td>
<td>11</td>
</tr>
</tbody>
</table>
Note that the scheduled version of the code fragment uses more temporary registers (tn) to hold the results of parallel computations. The large register file of the Am29050 microprocessor facilitates this kind of code scheduling.

7.4.3 Delayed Effects of Registers

The modification of some registers has a delayed effect on processor behavior, because of the processor pipeline. The affected registers are the Stack Pointer (Global Register 1), Indirect Pointers A, B, and C, the MMU Configuration Register, and the Current Processor Status Register.

An instruction that writes to the Stack Pointer can be followed immediately by an instruction that reads the Stack Pointer. However, any instruction that references a local register also uses the value of the Stack Pointer to calculate an absolute-register number. At least one cycle of delay must separate an instruction that updates the Stack Pointer and an instruction that references a local register. In most systems, this affects procedure call and return only (see Section 7.1.2). In general, though, an instruction that immediately follows a change to the Stack Pointer should not reference a local register (however, note that this restriction does not apply to a reference of a local register via an indirect pointer).

The indirect pointers have an implementation similar to the Stack Pointer, and exhibit similar behavior. At least one cycle of delay must separate an instruction that modifies an indirect pointer and an instruction that uses that indirect pointer to access a register.

Note that it normally is not possible to guarantee that the delayed effect of the Stack Pointer and indirect pointers is visible to a program. If an interrupt or trap is taken immediately after one of these registers is set, then the interrupted routine sees the effect of the setting in the following instruction, because many cycles elapse between the two instructions. For this reason, a program should not be written in a manner that relies on the delayed effect; the results of this practice may be unpredictable.

At least one cycle of delay must separate a Move To Special Register that modifies the Page Size (PS) field of the MMU Configuration Register and an instruction that performs address translation. The latter instruction includes successful branches, loads, and stores.

If the Freeze (FZ) bit of the Current Processor Status Register is reset from 1 to 0, two cycles are required before all program state is reflected properly in the registers affected by the FZ bit. This implies that interrupts and traps cannot be enabled until two cycles after the FZ bit is reset, for proper sequencing of program state.
This chapter provides a specification of the Am29050 microprocessor instruction set. Sections 8.1 through 8.3 describe the terminology used, the setting of the ALU Status Register by instructions, and the instruction formats. Section 8.4 describes each instruction in detail; instructions are presented alphabetically by assembler mnemonic. Finally, Section 8.5 gives an index of instructions by operation code.

8.1 INSTRUCTION-DESCRIPTION NOMENCLATURE

To simplify the specification of the instruction set, special terminology is used throughout this chapter. This section defines the terminology and symbols used to describe instruction operands, operations, and the assembly-language syntax.

This section does not describe all terminology used. It excludes certain descriptive terms that have an obvious meaning.

8.1.1 Operand Notation and Symbols

Throughout this chapter, instruction operands are signed, two’s-complement, word integers, unless otherwise noted. The term register is used consistently to denote a general-purpose register; other types of registers are described explicitly.

The following notation is used in the description of instruction operands:

- **0I16**: 16-bit immediate data, zero-extended to 32 bits.
- **1I16**: 16-bit immediate data, one-extended to 32 bits.
- **BP**: The Byte Pointer (BP) field of the ALU Status Register. The BP field selects a byte or half-word within a word, and is interpreted according to the Byte Order bit of the configuration Register.
- **C**: The Carry (C) bit of the ALU Status Register. The C bit is logically zero-extended to 32 bits when it is involved in a word operation.
- **COUNT**: The value of the Count Remaining field of the Channel Control Register. Note that COUNT does not refer to this field directly, but rather to the value of the field at the beginning of a LOADM or STOREM instruction.
- **DEST**: The general-purpose register that is the destination of an instruction (i.e., the register used to store the result).
- **EXTERNAL WORD[n]**: The word in an external device or memory with address n. This terminology also is used for coprocessor words, except that the address n either has no pre-defined interpretation or is a data item transferred to the coprocessor.
- **FALSE**: The Boolean constant FALSE.
- **FC**: The Funnel Shift Count (FC) field of the ALU Status Register.
- **h’n’**: The hexadecimal constant n.
I16 16-bit immediate data.
IPA Indirect Pointer A Register.
IPB Indirect Pointer B Register.
IPC Indirect Pointer C Register.
PC The Program Counter Register. This register is not explicitly accessible by instruction, but does appear as an operand for certain instructions. The Program Counter always contains the word address of the instruction being executed, and is 30 bits in length.
Q The Q Register.
Register RA These designate the general-purpose registers specified by the instruction fields RA, RB, and RC (see Section 8.3).
Register RB
Register RC
SPDEST The special-purpose register that is the destination of an instruction.
SPECIAL The content of a special-purpose register, used as an instruction operand.
Special-purpose Register SA Designates the special-purpose register specified by the instruction field SA (see Section 8.3).
SRCA SRCCB The contents of general-purpose registers, used as instruction operands.
SRCA.BYTEn SRCB.BYTEn Designate the byte numbered n within the SRCA or SRCB operand.
TARGET The target-instruction address specified by a jump or call instruction. This address is either absolute, or Program-Counter relative.
TLB[n] The Translation Look-Aside Buffer Register with register number n.
TRUE The Boolean constant TRUE.
TWIN General-purpose registers are paired by absolute-register number, such that even-numbered registers are paired with odd-numbered registers having the next-highest register number. The twin of a given register is the other register in the pair to which the given register belongs. For example, Local Register 5 is the twin of Local Register 4, and vice versa.

8.1.2 Operator Symbols
The following symbols are used to describe instruction operations:
A « B Left shift of the A operand by the shift amount given by the B operand.
A >> B Right shift of the A operand by the shift amount given by the B operand.
A // B Concatenation. The B operand is appended to the A operand. In the resulting quantity, the A operand makes up the high-order part, and the B operand makes up the low-order part.
<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>A &amp; B</td>
<td>Bitwise AND.</td>
</tr>
<tr>
<td>A</td>
<td>Bitwise OR.</td>
</tr>
<tr>
<td>A ^ B</td>
<td>Bitwise exclusive-OR.</td>
</tr>
<tr>
<td>~ A</td>
<td>One's-complement.</td>
</tr>
<tr>
<td>A ← exp</td>
<td>Assignment of the A location by the result of the expression on the right side.</td>
</tr>
<tr>
<td>A = B</td>
<td>Equal to.</td>
</tr>
<tr>
<td>A &lt; B</td>
<td>Not equal to.</td>
</tr>
<tr>
<td>A &gt; B</td>
<td>Greater than.</td>
</tr>
<tr>
<td>A ≥ B</td>
<td>Greater than or equal to.</td>
</tr>
<tr>
<td>A &lt; B</td>
<td>Less than.</td>
</tr>
<tr>
<td>A ≤ B</td>
<td>Less than or equal to.</td>
</tr>
<tr>
<td>A + B</td>
<td>Addition.</td>
</tr>
<tr>
<td>A - B</td>
<td>Subtraction.</td>
</tr>
<tr>
<td>A * B</td>
<td>Multiplication.</td>
</tr>
<tr>
<td>A / B</td>
<td>Division.</td>
</tr>
<tr>
<td>A .. B</td>
<td>A subrange which includes the A operand and the B operand. This symbol is used for subranges of bits as well as subranges of words.</td>
</tr>
<tr>
<td>A OR B</td>
<td>Logical OR of two Boolean conditions.</td>
</tr>
</tbody>
</table>

### 8.1.3 Control-Flow Terminology

The following terminology is used to describe the control functions performed during the execution of various instructions:

- **Continue**: Continue execution of the current instruction sequence.
- **IF condition THEN operations ELSE operations**: The condition following the IF is tested. If the condition holds, the operations following the THEN are performed. If the condition does not hold, the operations following the ELSE are performed. If the ELSE is not present and the condition does not hold, no operation is performed.
- **Signed overflow**: This condition is present when the result of an add or subtract of two's-complement operands cannot be represented by a signed word integer.
- **Trap(n)**: Specifies a trap with vector number n. The vector number n may be specified indirectly (e.g., Trap (VN)) or explicitly by symbolic name (e.g., Trap (Out of Range)).
- **Unsigned overflow**: This condition is present when the result of an add of unsigned operands cannot be represented by an unsigned word integer.
- **Unsigned underflow**: This condition is present when the result of a subtract of unsigned operands cannot be represented by an unsigned integer (i.e., when the result is less than zero).
- **VN**: Designates the trap vector number specified by the instruction field VN (see Section 8.3).
8.1.4 **Assembler Syntax**

This chapter does not contain a full description of the instruction assembler, but provides a rudimentary description of the assembler syntax. The following notation is used to describe assembler tokens:

- **ce** Determines the Coprocessor Enable (CE) bit of a load or store instruction.
- **cntl** Determines the 7-bit control field in a load or store instruction.
- **const8** Specifies a constant that can be expressed by 8 bits.
- **const16** Specifies a constant that can be expressed by 16 bits.
- **ra** These tokens name general-purpose registers. In a formal sense, these represent the same token, since the name of a register does not depend on its instruction use. However, three distinct tokens are used to clarify the relationship between the assembler syntax, instruction operands, and instruction fields.
- **rb**
- **rc**
- **spid** A symbolic identifier for a special-purpose register.
- **target** A symbolic label for the target of a jump or call instruction.
- **vn** Specifies a trap vector number.

### 8.2 **ARITHMETIC/LOGIC STATUS RESULTS OF INSTRUCTIONS**

#### 8.2.1 Arithmetic/Logic Status Bits

The arithmetic/logic status bits of the ALU Status Register are:

- **V** Overflow
- **N** Negative
- **Z** Zero
- **C** Carry

The C bit is used in extended arithmetic operations (i.e., on operands greater than 32 bits in length), and the N bit is used in divide step operations. Other than these uses, the status bits are not involved in instruction operations. In particular, they are not used to determine the outcome of conditional jump instructions; Boolean values in registers are used instead for this purpose. The status bits are primarily informational.

Except for instructions that explicitly modify the ALU Status Register, the status bits are modified only by the execution of instructions in the Arithmetic and Logical classes. The Arithmetic and Logical instructions affect the status bits differently. The following two sections describe the setting of the status bits by Arithmetic and Logical instructions.

When the Freeze (FZ) bit of the Current Processor Status Register is 1, the ALU Status Register is not modified except by the Move To Special Register instruction.

#### 8.2.2 Arithmetic Operation Status Results

The Arithmetic instructions modify the V, N, Z, and C bits. These bits are set according to the result of the operation performed by the instruction.

All instructions in the Arithmetic class—except for MULTIPLY, MULTM, DIVIDE, MULTIPLU, MULTMU, and DIVIDU—perform an add. In the case of subtraction, the subtract is performed by adding the two's-complement or one's-complement of an operand to the other operand. The multiply step and divide step operations also
perform adds, again possibly complementing one of the operands before the operation is performed. In general, the status bits are based on the results of the add.

If two's-complement overflow occurs during the add, the V bit of the ALU Status Register is set; otherwise it is reset. Two's-complement overflow occurs when the carry-in to the most-significant bit of the intermediate result differs from the carry-out. When this occurs, the result cannot be represented by a signed word integer. Note that the V bit always is set in this manner, even when the result is unsigned.

The N bit of the ALU Status Register is set to the value of the most-significant bit of the result of the add. Note that the divide step and multiply step operations may shift the result after the operation is performed. In the cases where shifting occurs, the N bit may not agree with the result that is written into a general-purpose register, since the N bit is based only on the result of the add, not on the shift.

If the result of the add causes a zero word to be written to a general-purpose register, the Z bit of the ALU Status Register is set; otherwise, it is reset. The Z bit always reflects the result written into a general-purpose register; if shifting is performed by a multiply or divide step, the Z bit reflects the shifted value.

If there is a carry out of the add operation, the C bit is set; otherwise it is reset.

**8.2.2.1 CORRECTING OUT-OF-RANGE RESULTS**

Some Arithmetic instructions cause an Out of Range trap if the arithmetic operation causes an overflow or underflow. When an Out of Range trap occurs, the result of the operation—though incorrect—is written into the destination register. Furthermore, the Program Counter 2 Register contains the address of the trapping instruction, and the ALU Status Register contains an indication of the cause of the trap. It is possible, if required, for the trap handler to use this information to form the correct result.

The ALU Status indicates the cause of the Out of Range trap, based on the operation performed, as follows:

1. Signed overflow. If the Out of Range trap is caused by signed, two's-complement overflow (this can occur for both signed adds and subtracts), the V bit is 1.

2. Unsigned overflow. If the Out of Range trap is caused by unsigned overflow (this can occur only for unsigned adds), the C bit is 1.

3. Unsigned underflow. If the Out of Range trap is caused by unsigned underflow (this can occur only for unsigned subtracts), the C bit is 0.

The multiply instructions MULTIPLY and MULTIPLU can cause an Out of Range trap if the MO bit of the Integer Environment Register is 0 and the operation overflows. However, these instructions do not set the ALU Status Register. This exception is detected using the Exception Opcode Register.

**8.2.3 Logical Operation Status Results**

The Logical instructions modify the N and Z bits. These bits are set according the result of the instruction. The V and C bits are meaningless in regard to the logical instructions, so they are not modified.

The N bit of the ALU Status Register is set to the value of the most-significant bit of the result of the logical operation.

If the result of the logical operation is a zero word, the Z bit of the ALU Status Register is set; otherwise, it is reset.
8.2.4 Floating-Point Status

The floating-point instructions check for a number of exceptional conditions, and report these exceptions by setting bits of the Floating-Point Status Register (see Section 3.2.3). The exceptional conditions also may cause traps, depending on the state of mask bits in the Floating-Point Environment Register. There are two groups of status bits in the Floating-Point Status Register: trap status bits and sticky status bits. When an exception is detected, the Am29050 microprocessor sets the trap status bit and/or the sticky status bit associated with the exception, depending on the corresponding exception mask bit and on whether or not a trap occurs. The sticky status bit is set whenever the corresponding exception is masked, regardless of whether or not a trap occurs. A trap status bit is set whenever a trap occurs, regardless of the state of the corresponding mask bit.

A trap status bit is reset when a trap occurs and the indicated status does not apply to the trapping operation. A sticky status bit is reset only by software.

Since a floating-point exception may affect either a trap status bit, a sticky status bit, or both, the description of status results for floating-point instructions in this section indicates the exceptions that may be detected, rather than which status bits are set. The following terminology is used:

- **fpD** Divide By Zero. The processor determines whether a divide operation has a zero divisor and a non-zero, finite dividend. If so, the DT and/or DS bits of the Floating-Point Status Register are set.
- **fpX** Inexact Result. If the result of the associated floating-point operation is not equal to the infinitely-precise result, the XT and/or XS bits of the Floating-Point Status Register are set.
- **fpU** Underflow. If the result of the associated floating-point operation is too small to be expressed in the destination format, the UT and/or US bits of the Floating-Point Status Register are set.
- **fpV** Overflow. If the result of the associated floating-point operation is too large to be expressed in the destination format, the VT and/or VS bits of the Floating-Point Status Register are set.
- **fpR** Reserved Operand. If one or more input operands to the associated floating-point operation is a reserved value, or if the result of this floating-point operation is a reserved value, the RT and/or RS bits of the Floating-Point Status Register are set.
- **fpN** Invalid Operation. If the input operands to the associated floating-point operation produce an indeterminate result, the NT and/or NS bits of the Floating-Point Status Register are set.

8.3 INSTRUCTION FORMATS

All instructions for the Am29050 microprocessor are 32 bits in length, and are divided into four fields, as shown in Figure 8-1. These fields have several alternative definitions, as discussed below. In certain instructions, one or more fields are not used, and are reserved for future use. Even though they have no effect on processor operation, bits in reserved fields should be 0, to insure compatibility with future processor versions.
The instruction fields are defined as follows:

**Bits 31–24**

- **OP**: This field contains an operation code, defining the operation to be performed. In some instructions, the least-significant bit of the operation code selects between two possible operands. For this reason, the least-significant bit is sometimes labeled A or M with the following interpretations:
  - **A** (Absolute): The A bit is used to differentiate between Program-Counter relative (A = 0) and absolute (A = 1) instruction addresses, when these addresses appear within instructions.
  - **M** (Immediate): The M bit selects between a register operand (M = 0) and an immediate operand (M = 1), when the alternative is allowed by an instruction.

**Bits 23–16**

- **RC**: The RC field contains a global or local register number.
- **I17 ... I10**: This field contains the most-significant eight bits of a 16-bit instruction address. This is a word address, and may be program-counter relative or absolute, depending on the A bit of the operation code.
- **I15 ... I8**: This field contains the most-significant eight bits of a 16-bit instruction constant.
- **VN**: This field contains an 8-bit trap vector number.
- **CE//CNTL**: This field controls a load or store access, as described in Sections 3.4.4 and 6.1.2.

**Bits 15–8**

- **RA**: The RA field contains a global or local register number.
- **SA**: The SA field contains a special-purpose register number.

**Bits 7–0**

- **RB**: The RB field contains a global or local register number.
- **RB or I**: This field contains either a global or local register number, or an 8-bit instruction constant, depending on the value of the M bit of the operation code.
This field contains the least-significant eight bits of a 16-bit instruction address. This is a word address, and may be program-counter relative or absolute, depending on the A bit of the operation code.

This field contains the least-significant eight bits of a 16-bit instruction constant.

This field controls the operation of the CONVERT instruction.

This field is the FS portion of the above field and specifies the operand format for the CLASS and SQRT instructions.

The fields described above may appear in many combinations. However, certain combinations that appear frequently are shown in Figure 8-2.

### Figure 8-2  
Frequently Occurring Instruction Field Uses

#### Three operands, with possible 8-bit constant:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XX</td>
<td>XX</td>
<td>XX</td>
<td>X</td>
<td>M</td>
</tr>
<tr>
<td>RC</td>
<td>RA</td>
<td>RB or l</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Three operands, without constant:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XX</td>
<td>XX</td>
<td>XX</td>
<td>X</td>
<td>0</td>
</tr>
<tr>
<td>RC</td>
<td>RA</td>
<td>RB</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### One register operand, with 16-bit constant:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XX</td>
<td>XX</td>
<td>XX</td>
<td>X</td>
<td>1</td>
</tr>
<tr>
<td>H5 .. H8</td>
<td>RA</td>
<td>I7 .. I0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Jumps and calls with 16-bit instruction address:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XX</td>
<td>XX</td>
<td>XX</td>
<td>X</td>
<td>A</td>
</tr>
<tr>
<td>I17 .. 110</td>
<td>RA</td>
<td>I9 .. I2</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Two operands with trap vector number:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XX</td>
<td>XX</td>
<td>XX</td>
<td>X</td>
<td>M</td>
</tr>
<tr>
<td>VN</td>
<td>RA</td>
<td>RB or l</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Loads and stores:

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XX</td>
<td>XX</td>
<td>XX</td>
<td>X</td>
<td>M</td>
</tr>
<tr>
<td>CNTL</td>
<td>RA</td>
<td>RB or l</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

CE
8.4 INSTRUCTION DESCRIPTION

This section describes each Am29050 microprocessor instruction in detail. Figure 8-3 illustrates the layout of the information given for each description.

Figure 8-3 Instruction-Description Format

<table>
<thead>
<tr>
<th>Instruction Mnemonic</th>
<th>ADD</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction Name</td>
<td>Add</td>
</tr>
<tr>
<td>Brief Operation Description</td>
<td>DEST ← SRCA = SRCB</td>
</tr>
<tr>
<td>Assembler Syntax</td>
<td>ADD rc, ra, rb or ADD rc, ra, const8</td>
</tr>
<tr>
<td>Arithmetic/Logic Status Result</td>
<td>V, N, Z, C</td>
</tr>
<tr>
<td>Operands:</td>
<td>SRCA</td>
</tr>
<tr>
<td></td>
<td>SRCB</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>DEST</td>
</tr>
<tr>
<td>Instruction Format</td>
<td>00001010 M</td>
</tr>
<tr>
<td>Operation Code - HEX format</td>
<td>00001010 15 7 0</td>
</tr>
<tr>
<td>Detailed Description of instruction operation</td>
<td>OP = 14,15 ADD</td>
</tr>
<tr>
<td>Description:</td>
<td>The SRCA operand is added to the SRCB operand, and the result is placed into the DEST location.</td>
</tr>
</tbody>
</table>
ADD

Add

Operation: DEST → SRCA + SRCB

Assembler Syntax: ADD rc, ra, rb
or
ADD rc, ra, const8

Status: V, N, Z, C

Operands: SRCA Content of register RA
SRCB M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)
DEST Register RC

Description: The SRCA operand is added to the SRCB operand, and the result is placed into the DEST location.
**ADDC**

**Add with Carry**

**Operation:** \( \text{DEST} \leftarrow \text{SRCA} + \text{SRCB} + C \)

**Assembler Syntax:**
- ADDC rc, ra, rb
- ADDC rc, ra, const8

**Status:** V, N, Z, C

**Operands:**
- **SRCA** Content of register RA
- **SRCB**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST** Register RC

**Description:** The SRCA operand is added to the SRCB operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location.
ADDCS

Add with Carry, Signed

Operation: \( \text{DEST} \leftarrow \text{SRCA} + \text{SRCB} + C, \)
\[\text{IF signed overflow THEN Trap (Out of Range)}\]

Assembler Syntax:
- ADDCS rc, ra, rb
- ADDCS rc, ra, const8

Status: V, N, Z, C

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

\[\begin{array}{cccccc}
31 & 23 & 15 & 7 & 0 \\
0 & 0 & 0 & 1 & 1 & 0 & 0 & M & \text{RC} & \text{RA} & \text{RB or I} \\
\end{array}\]

\[\text{OP} = 18, 19\]

Description: The SRCA operand is added to the SRCB operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location. If the add operation causes a two's-complement signed overflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an overflow occurs.
ADDCU

Add with Carry, Unsigned

Operation:  DEST ← SRCA + SRCB + C,
            IF unsigned overflow THEN Trap (Out of Range)

Assembler Syntax: ADDCU rc, ra, rb
                 or
                 ADDCU rc, ra, const8

Status:  V, N, Z, C

Operands:  SRCA  Content of register RA
            SRCB  M = 0: Content of register RB
                  M = 1: I (Zero-extended to 32 bits)
            DEST  Register RC

Description:  The SRCA operand is added to the SRCB operand and the value of
              the ALU Status Carry bit, and the result is placed into the DEST
              location. If the add operation causes an unsigned overflow, an Out of
              Range trap occurs.

              Note that the DEST location is altered whether or not an overflow
              occurs.
**ADDS**

**Add, Signed**

**Operation:**
DEST ← SRCA + SRCB  
IF signed overflow THEN Trap (Out of Range)

**Assembler Syntax:**
ADDS rc, ra, rb  
or  
ADDS rc, ra, const8

**Status:**  
V, N, Z, C

**Operands:**
SRCA  
Content of register RA
SRCB  
M = 0: Content of register RB  
M = 1: I (Zero-extended to 32 bits)
DEST  
Register RC

**Description:**
The SRCA operand is added to the SRCB operand, and the result is placed into the DEST location. If the add operation causes a two's-complement signed overflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an overflow occurs.
ADDU

Add, Unsigned

Operation: \( \text{DEST} \leftarrow \text{SRCA} + \text{SRCB} \)
IF unsigned overflow THEN Trap (Out of Range)

Assembler Syntax: ADDU rc, ra, rb
or
ADDU rc, ra, const8

Status: V, N, Z, C

Operands: SRCA Content of register RA
 SRCB \( M = 0 \): Content of register RB
 \( M = 1 \): I (Zero-extended to 32 bits)
 DEST Register RC

Description: The SRCA operand is added to the SRCB operand, and the result is placed into the DEST location. If the add operation causes an unsigned overflow, an Out of Range trap occurs.
Note that the DEST location is altered whether or not an overflow occurs.
AND

AND Logical

Operation: DEST ← SRCA & SRCB

Assembler Syntax: AND rc, ra, rb
or
AND rc, ra, const8

Status: N, Z

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: The SRCA operand is logically ANDed, bit-by-bit, with the SRCB operand, and the result is placed into the DEST location.
ANDN

Operation: DEST ← SRCA & ~SRCB
Assembler Syntax: ANDN rc, ra, rb
or
ANDN rc, ra, const8
Status: N, Z
Operands:

<table>
<thead>
<tr>
<th>Operands</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SRCA</td>
<td>Content of register RA</td>
</tr>
</tbody>
</table>
| SRCB     | M = 0: Content of register RB  
           M = 1: I (Zero-extended to 32 bits) |
| DEST     | Register RC |

31 23 15 7 0

<table>
<thead>
<tr>
<th></th>
<th>RC</th>
<th>RA</th>
<th>RB or I</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 0 0</td>
<td>1 1 0 M</td>
<td>RA</td>
<td></td>
</tr>
</tbody>
</table>

OP = 9C, 9D

Description: The SRCA operand is logically ANDed, bit-by-bit, with the one's-complement of the SRCB operand, and the result is placed into the DEST location.
ASEQ

Assert Equal To

Operation: IF SRCA = SRCB THEN Continue
            ELSE Trap (VN)

Assembler Syntax: ASEQ vn, ra, rb
                 or
                 ASEQ vn, ra, const8

Status: Not affected

Operands: SRCA    Content of register RA
          SRCB    M = 0: Content of register RB
          M = 1: I (Zero-extended to 32 bits)
          VN     Trap vector number

Description: If the SRCA operand is equal to the SRCB operand, instruction
             execution continues; otherwise, a trap with the specified vector
             number occurs.

For programs in the User mode, a Protection Violation trap
occurs—instead of the assert trap—if a vector number between 0 and
63 is specified.
**ASGE**

**Assert Greater Than or Equal To**

**Operation:**
IF SRCA ≥ SRCB THEN Continue
ELSE Trap (VN)

**Assembler Syntax:**
ASGE vn, ra, rb
or
ASGE vn, ra, const8

**Status:**
Not affected

**Operands:**
- **SRCA**: Content of register RA
- **SRCB**: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- **VN**: Trap vector number

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>VN</td>
<td>RA</td>
<td>RB or I</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Description:**
If the value of the SRCA operand is greater than or equal to the value of the SRCB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
ASGEU

Assert Greater Than or Equal To, Unsigned

**Operation:** IF SRCA ≥ SRCS (unsigned) THEN Continue ELSE Trap (VN)

**Assembler Syntax:**
- ASGEU vn, ra, rb
- ASGEU vn, ra, const8

**Status:** Not affected

**Operands:**
- **SRCA:** Content of register RA
- **SRCS:**
  - M = 0: Content of register RB
  - M = 1: i (Zero-extended to 32 bits)
- **VN:** Trap vector number

<table>
<thead>
<tr>
<th>OP</th>
<th>5E, 5F</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>0 1 0 1 1 1 M VN RA RB or l</td>
</tr>
</tbody>
</table>

**Description:** If the value of the SRCA operand is greater than or equal to the value of the SRCS operand, instruction execution continues; otherwise, a trap with the specified vector number occurs. For the comparison, both operands are treated as unsigned integers.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
ASGT

Assert Greater Than

Operation: IF SRCA > SRCB THEN Continue
ELSE Trap (VN)

Assembler Syntax: ASGT vn, ra, rb
or
ASGT vn, ra, const8

Status: Not affected

Operands: SRCA Content of register RA
SRCB M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)
VN Trap vector number

Description: If the value of the SRCA operand is greater than the value of the SRCB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
### ASGTU

**Assert Greater Than, Unsigned**

**Operation:** IF SRCA > SRCB (unsigned) THEN Continue ELSE Trap (VN)

**Assembler Syntax:**
- ASGTU vn, ra, rb
- or
- ASGTU vn, ra, const8

**Status:** Not affected

**Operands:**
- **SRCA**
  - Content of register RA
- **SRCB**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **VN**
  - Trap vector number

![Binary Representation](image)

**Description:**
If the value of the SRCA operand is greater than the value of the SRCB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs. For the comparison, both operands are treated as unsigned integers.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
ASLE

Assert Less Than or Equal To

Operation:  IF SACA \leq \text{SACS} \text{ THEN Continue} \\
            ELSE Trap (VN)

Assembler Syntax: 
    ASLE vn, ra, rb \\
    or \\
    ASLE vn, ra, constB \\

Status: Not affected

Operands: 
    SACA Content of register RA \\
    SACS M=0: Content of register RB \\
            M=1: I (Zero-extended to 32 bits) \\
    VN Trap vector number

\begin{array}{cccccc}
    & & & & & \\
    31 & 23 & 15 & 7 & 0 \\
    \hline \\
    0 & 1 & 0 & 1 & 0 \quad M & VN & RA & RB \text{ or I} \\
\end{array}

OP = 54, 55

Description: 
If the value of the SACA operand is less than or equal to the value of the SRCB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
ASLEU

**Assert Less Than or Equal To, Unsigned**

**Operation:** IF SRCA ≤ SRCB (unsigned) THEN Continue
ELSE Trap (VN)

**Assembler Syntax:**
- ASLEU vn, ra, rb
- ASLEU vn, ra, const8

**Status:** Not affected

**Operands:**
- **SRCA** Content of register RA
- **SRCB** M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- **VN** Trap vector number

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>VN</td>
<td>RA</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>RB or I</td>
<td></td>
</tr>
</tbody>
</table>

**OP = 56, 57**

**Description:** If the value of the SRCA operand is less than or equal to the value of the SRCB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs. For the comparison, both operands are treated as unsigned integers.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
## ASLT

### Assert Less Than

**Operation:**
IF SACA < SACB THEN Continue
ELSE Trap(VN)

**Assembler Syntax:**
- ASLT vn, ra, rb
- ASLT vn, ra, const8

**Status:**
Not affected

**Operands:**
- **SACA:** Content of register RA
- **SACB:**
  - M = 0: Content of register RB
  - M = 1: I (Zero-extended to 32 bits)
- **VN:** Trap vector number

```plaintext
<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0101000</td>
<td>VN</td>
<td>RA</td>
<td>RB or I</td>
<td></td>
</tr>
</tbody>
</table>
```

**OP = 50, 51**

**Description:**
If the value of the SACA operand is less than the value of the SACB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
ASLTU

Assert Less Than, Unsigned

Operation: IF SRCA < SRCB (unsigned) THEN Continue ELSE Trap (VN)

Assembler Syntax: ASLTU vn, ra, rb
or ASLTU vn, ra, const8

Status: Not affected

Operands:

SRCA  Content of register RA
SRCB  M = 0: Content of register RB
       M = 1: I (Zero-extended to 32 bits)
VN    Trap vector number

Description: If the value of the SRCA operand is less than the value of the SRCB operand, instruction execution continues; otherwise, a trap with the specified vector number occurs. For the comparison, both operands are treated as unsigned integers.

For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
ASNEQ

Assert Not Equal To

Operation: IF SRCA <> SRCS THEN Continue
ELSE Trap (VN)

Assembler Syntax:
ASNEQ vn, ra, rb
or
ASNEQ vn, ra, const8

Status: Not affected

Operands:
SRCA          Content of register RA
SRCS          M = 0: Content of register RB
              M = 1: I (Zero-extended to 32 bits)
VN             Trap vector number

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>M</td>
<td>VN</td>
<td>RA</td>
<td>RB or I</td>
<td></td>
</tr>
</tbody>
</table>

OP = 72, 73

Description: If the SRCA operand is not equal to the SRCS operand, instruction execution continues; otherwise, a trap with the specified vector number occurs.
For programs in the User mode, a Protection Violation trap occurs—instead of the assert trap—if a vector number between 0 and 63 is specified.
CALL

Call Subroutine

Operation:
DEST ← PC // 00 = 8
PC ← TARGET
Execute delay instruction

Assembler Syntax:
CALL ra, target

Status:
Not affected

Operands:
TARGET
A = 0: I17 ... I10 // I9 ... I2 (sign-extended to 30 bits) + PC
A = 1: I17 ... I10 // I9 ... I2 (zero-extended to 30 bits)

DEST
Register RA

Description:
The address of the second following instruction is placed into the
DEST location, and a non-sequential instruction fetch occurs to the
instruction address given by the TARGET operand. The instruction
following the CALL is executed before the non-sequential fetch
occurs.
**CALLI**

**Call Subroutine, Indirect**

**Operation:**
- DEST ← PC / 00 + 8
- PC ← SRCB
- Execute delay instruction

**Assembler Syntax:** CALLI ra, rb

**Status:** Not affected

**Operands:**
- SRCB: Content of register RB
- DEST: Register RA

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Description:** The address of the second following instruction is placed into the DEST location, and a non-sequential instruction fetch occurs to the instruction address given by the SRCB operand. The instruction following the CALLI is executed before the non-sequential fetch occurs.
**CLASS**

Classify Floating-Point Operand

**Operation:** DEST ← CLASS(SRCA)

**Assembler Syntax:** CLASS rc, ra, FS

**Status:** None

**Operands:**

- **SRCA:** Content of register RA (single-precision f.p.)
- Content of register RA and the twin of register RA (Double-precision f.p.)

- **DEST:** Register RC

**Control:**

- **FS:** Format of source operand SRCA
  - 00: Reserved for future use
  - 01: Single-precision floating-point
  - 10: Double-precision floating-point
  - 11: Reserved for future use

**Description:** A 32-bit classification code for operand SRCA is placed into the DEST location. Operand SRCA is a single- or double-precision operand, as specified by FS. The classification code has the following format:

```
<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>RC</td>
<td>RA</td>
<td>Reserved</td>
<td>FS</td>
<td></td>
</tr>
</tbody>
</table>
```

**OP = E6**

**CLASS**

**Description:**

- **Bits 31–6:** Reserved (forced to 0).
- **Bit 5:** Operand Sign (OS). The OS bit is 1 for a negative operand (including negative zero) and 0 for a non-negative operand.
- **Bits 4–0:** Exponent-Fraction Class (EFC). This field classifies the biased exponent and fraction fields of the source operand as follows (Max is the largest biased exponent that can be used to represent a finite number. This exponent is 254 for the single-precision format and 2,046 for the double-precision format).
<table>
<thead>
<tr>
<th>EFC</th>
<th>Biased Exp (bexp)</th>
<th>Fraction (frac)</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>00000</td>
<td>0</td>
<td>0</td>
<td>zero</td>
</tr>
<tr>
<td>00001</td>
<td></td>
<td></td>
<td>unused</td>
</tr>
<tr>
<td>00010</td>
<td>0</td>
<td>0 &lt; frac &lt; .111 ... 1</td>
<td>denormalized</td>
</tr>
<tr>
<td>00011</td>
<td>0</td>
<td>.111 ... 1</td>
<td>denormalized</td>
</tr>
<tr>
<td>00100</td>
<td>1</td>
<td>0</td>
<td>unused</td>
</tr>
<tr>
<td>00101</td>
<td></td>
<td></td>
<td>odrae .111 ... 1 denormalized</td>
</tr>
<tr>
<td>00110</td>
<td>1</td>
<td>0 &lt; frac &lt; .111 ... 1</td>
<td>odrae &lt; .111 ... 1</td>
</tr>
<tr>
<td>00111</td>
<td>1</td>
<td>.111 ... 1</td>
<td></td>
</tr>
<tr>
<td>01000</td>
<td>1 &lt; bexp &lt; Max</td>
<td>0</td>
<td>unused</td>
</tr>
<tr>
<td>01001</td>
<td></td>
<td></td>
<td>unused</td>
</tr>
<tr>
<td>01010</td>
<td>1 &lt; bexp &lt; Max</td>
<td>0 &lt; frac &lt; .111 ... 1</td>
<td>1 &lt; bexp &lt; Max .111 ... 1</td>
</tr>
<tr>
<td>01011</td>
<td>1 &lt; bexp &lt; Max</td>
<td>.111 ... 1</td>
<td>Max</td>
</tr>
<tr>
<td>01100</td>
<td>Max</td>
<td>0</td>
<td>unused</td>
</tr>
<tr>
<td>01101</td>
<td></td>
<td></td>
<td>used</td>
</tr>
<tr>
<td>01110</td>
<td>Max</td>
<td>0 &lt; frac &lt; .111 ... 1</td>
<td>Max + 1 , MSB = 0 &lt;&gt; 0 SNaN</td>
</tr>
<tr>
<td>01111</td>
<td>Max</td>
<td>.111 ... 1</td>
<td></td>
</tr>
<tr>
<td>10000</td>
<td>Max + 1</td>
<td>0</td>
<td>infinity</td>
</tr>
<tr>
<td>10001</td>
<td></td>
<td></td>
<td>unused</td>
</tr>
<tr>
<td>10010</td>
<td>Max + 1, frac MSB = 0</td>
<td>&lt;&gt; 0</td>
<td>SNaN</td>
</tr>
<tr>
<td>10011</td>
<td>Max + 1, frac MSB = 1</td>
<td>&lt;&gt; 0</td>
<td>QNaN</td>
</tr>
</tbody>
</table>

Note: Max is the largest biased exponent that can be used to represent a finite number in a given format. Max is 254 for single-precision and 2,046 for double-precision.

Executing the CLASS instruction causes a pipeline hold of one cycle, until the intermediate result enters the denormalizer of the Floating-Point Unit.
Count Leading Zeros

Operation: Determine number of leading zeros in a word

Assembler Syntax:
- CLZ rc, rb
- CLZ rc, const8

Status: Not affected

Operands:
- SRCB
  - M = 0: Content of register RB
  - M = 1: I (Zero-extended to 32 bits)
- DEST
  - Register RC

Description: A count of the number of zero-bits to the first one-bit in the SRCB operand is placed into the DEST location. If the most-significant bit of the SRCB operand is 1, the resulting count is zero. If the SRCB operand is zero, the resulting count is 32.
CONST

Constant

Operation: DEST ← 0116
Assembler Syntax: CONST ra, const16
Status: Not affected
Operands: 0116 I15...I7...I0 (Zero-extended to 32 bits)
DEST Register RA

Description: The 0116 operand is placed into the DEST location.
CONSTH

Constant, High

Operation: Replace high-order half-word of SRCA by \( l16 \)

Assembler
Syntax: CONSTH ra, const16

Status: Not affected

Operands:
- SRCA: Content of register RA
- I16: \( l15 \ldots l8 / l7 \ldots l0 \)
- DEST: Register RA

Description: The low-order half-word of the SRCA operand is appended to the I16 operand, and the result is placed into the DEST operand. Note that the destination register for this instruction is the same as the source register.
**CONSTHZ**

**Constant High, Zero Lower**

**Operation:** DEST ← I16 << 16

**Assembler Syntax:** CONSTHZ ra, const16

**Status:** Not affected

**Operands:**
- I16: 115...18 // 17...10
- DEST: Register RA

**Description:** The I16 operand is placed into the upper 16 bits of the DEST location; the lower 16 bits of the DEST location are replaced with zeros.
CONSTN

Constant, Negative

Operation: \( \text{DEST} \leftarrow 1116 \)

Assembler
Syntax: CONSTN ra, const16
Status: Not affected
Operands: 1116 \( I_{15} \ldots I_8 // I_{7} \ldots I_0 \) (ones-extended to 32 bits)
DEST Register RA

\[
\begin{array}{cccccccc}
31 & 23 & 15 & 7 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 15 \ldots I_8 & \text{RA} & I_7 \ldots I_0 \\
\end{array}
\]

OP = 01

Description: The 1116 operand is placed into the DEST location.
CONVERT

Convert Data Format

Operation: DEST ← SRCA, with format modified per UI, RND, FD, FS

Assembler Syntax: CONVERT rc, ra, UI, RND, FD, FS

Status: fpX, fpU, fpV, fpR, fpN

Operands: SRCA Content of register RA (single-precision f.p.)
or Content of register RA and the twin of register RA (Double-precision f.p.)

DEST Content of register RC (single-precision f.p.)
or Content of register RC and the twin of register RA (Double-precision f.p.)

Control: UI 0 = signed integer
1 = unsigned integer

RND Round mode
000 Round to nearest
001 Round to minus infinity
010 Round to plus infinity
011 Round to zero
100 Round using f.p. round mode (FRM)
101–111 Reserved

FS,FD Format of source operand, format of destination operand
00 Integer
01 Single-precision floating-point
10 Double-precision floating-point
11 Reserved

Description: The SRCA operand with format FS is converted to format FD and rounded according to RND, then placed into the DEST location. If the source or destination operand is an integer, it is a signed or unsigned value according to the value of UI.

Note: Converting from format to like format is not supported, and will produce unpredictable results.
CPBYTE

**Compare Bytes**

**Operation:**

IF (SRCA.BYTE0 = SRCB.BYTE0) OR
    (SRCA.BYTE1 = SRCB.BYTE1) OR
    (SRCA.BYTE2 = SRCB.BYTE2) OR
    (SRCA.BYTE3 = SRCB.BYTE3) THEN
    DEST ← TRUE ELSE DEST ← FALSE

**Assembler Syntax:**

CPBYTE rc, ra, rb

or

CPBYTE rc, ra, const8

**Status:** Not affected

**Operands:**

SRCA    Content of register RA

SRCB    M = 0: Content of register RB

M = 1: I (Zero-extended to 32 bits)

DEST    Register RC

**Description:** Each byte of the SRCA operand is compared to the corresponding byte of the SRCB operand. If any corresponding bytes are equal, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location.
CPEQ

Compare Equal To

Operation: IF SRCA = SRCB THEN DEST ← TRUE
ELSE DEST ← FALSE

Assembler Syntax:
- CPEQ rc, ra, rb
- CPEQ rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: If the SRCA operand is equal to the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location.
CPGE

Compare Greater Than or Equal To

Operation: IF SRCA ≥ SRCB THEN DEST ← TRUE
ELSE DEST ← FALSE

Assembler Syntax: CPGE rc, ra, rb
or
CPGE rc, ra, const8

Status: Not affected

Operands:

SRCA Content of register RA
SRCB M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)
DEST Register RC

Description: If the value of the SRCA operand is greater than or equal to the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location.
**CPGEU**

**Compare Greater Than or Equal To, Unsigned**

**Operation:** IF SRCA ≥ SRCB (unsigned) THEN DEST ← TRUE ELSE DEST ← FALSE

**Assembler Syntax:**
- CPGEU rc, ra, rb
- CPGEU rc, ra, const8

**Status:** Not affected

**Operands:**
- **SRCA** Content of register RA
- **SRCB** M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- **DEST** Register RC

**Description:** If the value of the SRCA operand is greater than or equal to the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. For the comparison, both operands are treated as unsigned integers.
CPGT

Compare Greater Than

Operation: IF SRCA > SRCB THEN DEST ← TRUE ELSE DEST ← FALSE

Assembler Syntax: CPGT rc, ra, rb
or CPGT rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: If the value of the SRCA operand is greater than the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location.
**CPGTU**

**Compare Greater Than, Unsigned**

**Operation:** IF SRCA > SRCB (unsigned) THEN DEST ← TRUE
ELSE DEST ← FALSE

**Assembler Syntax:**
- CPGTU rc, ra, rb
- CPGTU rc, ra, const8

**Status:** Not affected

**Operands:**
- **SRCA:** Content of register RA
- **SRCB:**
  - M = 0: Content of register RB
  - M = 1: I (Zero-extended to 32 bits)
- **DEST:** Register RC

**Description:** If the value of the SRCA operand is greater than the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. For the comparison, both operands are treated as unsigned integers.
**Operation:**

IF SRCA ≤ SRCB THEN DEST ← TRUE
ELSE DEST ← FALSE

**Assembler Syntax:**

CPLE rc, ra, rb
or
CPLE rc, ra, const8

**Status:**

Not affected

**Operands:**

<table>
<thead>
<tr>
<th>Register</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SRCA</td>
<td>Content of register RA</td>
</tr>
<tr>
<td>SRCB</td>
<td>M = 0: Content of register RB</td>
</tr>
<tr>
<td></td>
<td>M = 1: I (Zero-extended to 32 bits)</td>
</tr>
<tr>
<td>DEST</td>
<td>Register RC</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>bit</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>M</td>
<td></td>
<td>RC</td>
<td></td>
<td>RA</td>
<td>RB</td>
</tr>
</tbody>
</table>
CPLEU

Compare Less Than or Equal To, Unsigned

Operation: IF SRCA ≤ SRCB (unsigned) THEN DEST ← TRUE ELSE DEST ← FALSE

Assembler Syntax:
- CPLEU rc, ra, rb
- CPLEU rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: If the value of the SRCA operand is less than or equal to the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. For the comparison, both operands are treated as unsigned integers.
CPLT

Compare Less Than

Operation: IF SRCA < SRCB THEN DEST ← TRUE
Else DEST ← FALSE

Assembler Syntax:
CPLT rc, ra, rb
or
CPLT rc, ra, const8

Status: Not affected

Operands:
SRCA \( M = 0 \): Content of register RA
SRCB \( M = 1 \): I (Zero-extended to 32 bits)
DEST Register RC

Description: If the value of the SRCA operand is less than the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location.
CPLTU

Compare Less Than, Unsigned

Operation: IF SRCA < SRCB (unsigned) THEN DEST ← TRUE
ELSE DEST ← FALSE

Assembler Syntax:
CPLTU rc, ra, rb
or
CPLTU rc, ra, const8

Status: Not affected

Operands:
SRCA Content of register RA
SRCB M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)
DEST Register RC

Description: If the value of the SRCA operand is less than the value of the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. For the comparison, both operands are treated as unsigned integers.
CPNEQ

Compare Not Equal To

Operation: IF SRCA <> SRCB THEN DEST ← TRUE
ELSE DEST ← FALSE

Assembler Syntax:
- CPNEQ rc, ra, rb
- or
- CPNEQ rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
- M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

31 23 15 7 0
0 1 1 0 0 0 1 M RC RA RB or I

Description: If the SRCA operand is not equal to the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location.
**DADD**

**Floating-Point Add, Double-Precision**

**Operation:**
DEST (double-precision) ← SRCA (double-precision) + SRCB (double-precision)

**Assembler Syntax:**
DADD rc, ra, rb

**Status:**
fpX, fpU, fpV, fpR, fpN

**Operands:**
- **SRCA:** Content of register RA and the twin of register RA
- **SRCB:** Content of register RB and the twin of register RB
- **DEST:** Register RC and the twin of register RC

![Binary Representation]

**Description:** The SRCA operand is added to the SRCB operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the addition are double-precision floating-point numbers.
DDIV

Floating-Point Divide, Double-Precision

Operation: DEST (double-precision) ← SRCA (double-precision) / SRCB (double-precision)

Assembler Syntax: DDIV rc, ra, rb

Status: fpD, fpX, fpU, fpV, fpR, fpN

Operands: SRCA Content of register RA and the twin of register RA
SRCB Content of register RB and the twin of register RB
DEST Register RC and the twin of register RC

Description: The SRCA operand is divided by the SRCB operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the division are double-precision floating-point numbers.
DEQ

Floating-Point Equal To, Double-Precision

Operation: IF SRCA (double-precision) = SRCB (double-precision) THEN DEST ← TRUE ELSE DEST ← FALSE

Assembler Syntax: DEQ rc, ra, rb
Status: fpl

Operands: SRCA Content of register RA and the twin of register RA
SRCB Content of register RB and the twin of register RB
DEST Register RC

Description: If the SRCA operand is equal to the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. SRCA and SRCB are double-precision floating-point numbers.

Note: The rounding mode specified by the FRM field of the Floating-Point Environment Register has no effect on this operation.
DGE

Floating-Point Greater Than Or Equal To, Double-Precision

Operation: IF SRCA (double-precision) ≥ SRCB (double-precision) THEN DEST ← TRUE ELSE DEST ← FALSE

Assembler Syntax: DGE rc, ra, rb

Status: fpl

Operands: SRCA Content of register RA and the twin of register RA
SRCB Content of register RB and the twin of register RB
DEST Register RC

Description: If the SRCA operand is greater than or equal to the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. SRCA and SRCB are double-precision floating-point numbers.

Note: The rounding mode specified by the FRM field of the Floating-Point Environment Register has no effect on this operation.
DGT

Floating-Point Greater Than, Double-Precision

Operation: IF SRCA (double-precision) > SRCB (double-precision)
THEN DEST ← TRUE
ELSE DEST ← FALSE

Assembler Syntax: DGT rc, ra, rb
Status: fpl
Operands: SRCA Content of register RA and the twin of register RA
SRCB Content of register RB and the twin of register RB
DEST Register RC

Description: If the SRCA operand is greater than the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. SRCA and SRCB are double-precision floating-point numbers.

Note: The rounding mode specified by the FRM field of the Floating-Point Environment Register has no effect on this operation.
DIV

Divide Step

Operation: Perform one-bit step of a divide operation (unsigned)

Assembler Syntax: 
DIV rc, ra, rb
or
DIV rc, ra, const 8

Status: V, N, Z, C

Operands:
SRCA Content of register RA
SRCB M = 0: Content of register RB
       M = 1: I (Zero-extended to 32 bits)
DEST Register RC

31 23 15 7 0
0 1 1 0 1 0 1 M RC RA RB or I

OP = 6A, 6B

Description: If the Divide Flag (DF) bit of the ALU Status Register is 1, the SRCB operand is subtracted from the SRCA operand. If the DF bit is 0, the SRCB operand is added to the SRCA operand.

The carry-out of the add or subtract operation is exclusive-ORed with the value of the DF bit and the value of the Negative (N) bit of the ALU Status Register; the resulting value is complemented and placed into the DF bit. The sign of the result of the add or subtract is placed into the N bit.

The content of the Q Register is appended to the result of the add or subtract, and the resulting 64-bit value is shifted left by one bit position; the value computed for the DF bit above fills the vacated bit position. The high-order 32 bits of the 64-bit shifted value are placed into the DEST location. The low-order 32 bits of the shifted value are placed into the Q Register.

Examples of integer divide operations appear in Section 7.2.6.
Divide Initialize

**Operation:** Initialize for a sequence of divide steps (unsigned)

**Assembler Syntax:**
- DIV0 rc, rb
- or
- DIV0 rc, const8

**Status:** V, N, Z, C

**Operands:**
- **SRCB**
  - M = 0: Content of register RB
  - M = 1: I (Zero-extended to 32 bits)
- **DEST**
  - Register RC

**Description:**
The Divide Flag (DF) bit of the ALU Status Register is set. The sign of the SRCB operand is placed into the Negative bit of the ALU Status Register.

The content of the Q register is appended to the SRCB operand, and the resulting 64-bit value is shifted left by one bit position; a 0 fills the vacated bit position. The high-order 32 bits of the 64-bit shifted value are placed into the DEST location. The low-order 32 bits of the shifted value are placed into the Q Register.

Examples of integer divide operations appear in Section 7.2.6.
DIVIDE

Integer Divide, Signed

Operation:  DEST ← (Q // SRCA) / SRCB (signed)
            Q ← Remainder

Assembler
Syntax:   DIVIDE rc, ra, rb

Status:   Not affected

Operands:  Q  Content of the Q Register
           SRCA  Content of register RA
          SRCB  Content of register RB
          DEST  Register RC

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>11100001</td>
<td>RC</td>
<td>RA</td>
<td>RB</td>
<td></td>
</tr>
</tbody>
</table>

Description:  The SRCA operand is appended to the content of the Q register. The resulting 64-bit value is divided by the SRCB operand, and the result is placed into the DEST location. This operation treats the operands as signed two's-complement integers and produces a signed two's-complement result.

The remainder is placed into the Q register. A non-zero remainder always has the same sign as the dividend.

This instruction does not check for a divide overflow condition. Checking for divide overflow must occur before the instruction is executed.

Note: This instruction is not supported directly in processor hardware. In the Am29050 microprocessor, this instruction causes a DIVIDE trap. When the trap occurs, the IPA, IPB, and IPC registers are set to reference SRCA, SRCB, and DEST.
DIVIDU

Integer Divide, Unsigned

Operation: DEST ← (Q // SRCA) / SRCB (unsigned)
Q ← Remainder

Assembler Syntax: DIVIDU rc, ra, rb

Status: Not affected

Operands:
- Q: Content of the Q Register
- SRCA: Content of register RA
- SRCB: Content of register RB
- DEST: Register RC

Description: The SRCA operand is appended to the content of the Q Register. The resulting 64-bit value is divided by the SRCB operand, and the result is placed into the DEST location. This operation treats the operands as unsigned integers, and produces an unsigned result.

The remainder is placed into the Q Register. The remainder is also unsigned.

Note: This instruction is not supported directly in processor hardware. In the Am29050 microprocessor, this instruction causes a DIVIDU trap. When the trap occurs, the IPA, IPB, and IPC registers are set to reference SRCA, SRCB, and DEST.
DIVL

**Divide Last Step**

**Operation:** Complete a sequence of divide steps (unsigned)

**Assembler**

**Syntax:** DIVL rc, ra, rb

**Status:** V, N, Z, C

**Operands:**
- **SRCA** Content of register RA
- **SRCB** M = 0: Content of register RB  
  M = 1: I (Zero-extended to 32 bits)
- **DEST** Register RC

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>M</td>
</tr>
</tbody>
</table>

**Description:** If the Divide Flag (DF) bit of the ALU Status Register is 1, the SRCB operand is subtracted from the SRCA operand. If the DF bit is 0, the SRCB operand is added to the SRCA operand. The result is placed into the DEST location.

The carry-out of the add or subtract operation is exclusive-ORed with the value of the DF bit and the value of the Negative (N) bit of the ALU Status Register; the resulting value is complemented and placed into the DF bit. The sign of the result of the add or subtract is placed into the N bit.

The content of the Q register is shifted left by one bit position; the value computed for the DF bit above fills the vacated bit position. The shifted value is placed into the Q Register.

Examples of integer divide operations appear in Section 7.2.6.
DIVREM

Divide Remainder

Operation: Generate remainder for divide operation (unsigned)

Assembler Syntax:
DIVREM rc, ra, rb
or
DIVREM rc, ra, const8

Status: V, N, Z, C

Operands:
SRCA Content of register RA
SRCB M = 0: Content of register RB
      M = 1: I (Zero-extended to 32 bits)
DEST Register RC

Description: If the Divide Flag (DF) bit of the ALU Status Register is 1, the SRCA operand is placed into the DEST location.
If the DF bit is 0, the SRCB operand is added to the SRCA operand, and the result is placed into the DEST location.
Examples of integer divide operations appear in Section 7.2.6.
**DMAC**

Floating-Point Multiply-Accumulate, Double-Precision

**Operation:**

\[
\text{ACC(ACN)} \text{ (double-precision)} \leftarrow \text{SRCA} \text{ (double-precision)} \times \text{SRCB} \text{ (double-precision)} = \text{ACC(ACN)} \text{ (double-precision)}
\]

**Assembler Syntax:**

DMACFUNC,ACN,ra,rb

**Status:**

fpU, fpV, fpR, fpN

**Operands:**

SRCA: Content of register RA and the twin of register RA
SRCB: Content of register RB and the twin of register RB
ACC(ACN): (Content of) Accumulator register ACN

**Control:**

FUNC: Modifies operation as shown in the table below
ACN: Accumulator register number (0, 1, 2, 3)

---

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>Res</td>
<td>FUNC</td>
<td>A</td>
<td>C</td>
<td>N</td>
</tr>
</tbody>
</table>

**OP = D9**

**Description:**

A compound operation of the form \((OP1 \times OP2) = OP3\) is performed, where \(OP1\), \(OP2\), and \(OP3\) are double-precision operands. Operand sources and optional sign changes are specified by FUNC, as described in the table below. The result is rounded and stored in ACC(ACN), in double-precision format. The Accumulator Format (ACF) field of the Floating-Point Environment Register must specify double-precision.

Note that the DMAC instruction uses the fast float mode of operation, regardless of the state of the Fast Float Select bit in the Floating-Point Environment Register. The DMAC instruction never causes a Floating-Point Exception trap—it updates the sticky status bits instead. Furthermore, the DMAC instruction never sets the Inexact Sticky bit, regardless of the result.
<table>
<thead>
<tr>
<th>FUNC</th>
<th>Operation Performed</th>
<th>ACC (ACN)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>(SRCA * SRCB) +</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>0001</td>
<td>(SRCA * SRCB) +</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>0010</td>
<td>(SRCA * SRCB) -</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>0011</td>
<td>(SRCA * SRCB) -</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>0100</td>
<td>(SRCA * SRCB) +</td>
<td>0.0</td>
</tr>
<tr>
<td>0101</td>
<td>(SRCA * SRCB) +</td>
<td>0.0</td>
</tr>
<tr>
<td>0110</td>
<td>(SRCA * SRCB) -</td>
<td>0.0</td>
</tr>
<tr>
<td>0111</td>
<td>(SRCA * SRCB) -</td>
<td>0.0</td>
</tr>
<tr>
<td>1000</td>
<td>(SRCA * 1.0) +</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>1001</td>
<td>(SRCA * 1.0) +</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>1010</td>
<td>(SRCA * 1.0) -</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>1011</td>
<td>(SRCA * 1.0) -</td>
<td>ACC (ACN)</td>
</tr>
<tr>
<td>1100</td>
<td>(SRCA * 1.0) +</td>
<td>0.0</td>
</tr>
<tr>
<td>1101</td>
<td>(SRCA * 1.0) +</td>
<td>0.0</td>
</tr>
<tr>
<td>1110</td>
<td>(SRCA * 1.0) -</td>
<td>0.0</td>
</tr>
<tr>
<td>1111</td>
<td>(SRCA * 1.0) -</td>
<td>0.0</td>
</tr>
</tbody>
</table>
DMSM

Floating-Point Multiply-Sum, Double-Precision

Operation: DEST (double-precision) ← SRCA (double-precision) * ACC(0)
(double-precision) = SCRB (double-precision)

Assembler Syntax: DMSM rc, ra, rb

Status: fpU, fpV, fpR, fpN

Operands: SRCA Content of register RA and the twin of register RA
SRCB Content of register RB and the twin of register RB
ACC(0) (Content of) Accumulator register 0
DEST Register RC and the twin of register RC

Description: The SRCA operand is multiplied by the ACC(0) operand, and the product added to the SRCB operand; the result is rounded to double-precision format according to Floating-Point Environment Register field FRM, and placed into the DEST location. Operands SRCA, SRCB, and ACC(0) are double-precision floating-point numbers. The Accumulator Format field of the Floating-Point Environment Register must specify double-precision.

Note that the DMSM instruction uses the fast float mode of operation, regardless of the state of the Fast Float Select bit in the Floating-Point Environment Register. The DMSM instruction never causes a Floating-Point Exception trap—it updates the sticky status bits instead. Furthermore, the DMSM instruction never sets the Inexact Sticky bit, regardless of the result.
**DMUL**

Floating-Point Multiply, Double-Precision

**Operation:**
DEST (double-precision) ← SRCA (double-precision) * SRCB (double-precision)

**Assembler Syntax:**
DMUL rc, ra, rb

**Status:**
fpX, fpU, fpV, fpR, fpN

**Operands:**
- **SRCA:** Content of register RA and the twin of register RA
- **SRCB:** Content of register RB and the twin of register RB
- **DEST:** Register RC

```plaintext
31  23  15  7  0
1 1 1 1 0 1 0 1  RC  RA  RB
```

**Description:** The SRCB operand is multiplied by the SRCA operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the multiplication are double-precision floating-point numbers.
**DSUB**

**Floating-Point Subtract, Double-Precision**

**Operation:**
DEST (double-precision) ← SRCA (double-precision)
SRCB (double-precision)

**Assembler Syntax:**
DSUB rc, ra, rb

**Status:**
fpX, fpU, fpV, fpR, fpN

**Operands:**
- SRCA: Content of register RA and the twin of register RA
- SRCB: Content of register RB and the twin of register RB
- DEST: Register RC

```
<table>
<thead>
<tr>
<th>31</th>
<th>29</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>
```

**Description:**
The SRCB operand is subtracted from the SRCA operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the subtraction are double-precision floating-point numbers.
EMULATE

Trap to Software Emulation Routine

Operation: Load IPA and IPB registers with operand register-numbers and Trap (VN)

Assembler Syntax: EMULATE vn, ra, rb

Status: Not affected

Operands: Absolute-register numbers for registers RA and RB

VN Trap vector number

Description: The IPA and IPB registers are set to the register numbers of registers RA and RB, respectively. A trap with the specified vector number occurs.

Note that the IPC register also is affected by this instruction, but that its value has no interpretation.

For programs in the User mode, a Protection Violation trap occurs—instead of the EMULATE trap—if a vector number between 0 and 63 is specified.
EXBYTE

Extract Byte

Operation: DEST ← SRCB, with low-order byte replaced by byte in SRCA selected by BP

Assembler Syntax:
- EXBYTE rc, ra, rb
- EXBYTE rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
- M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: A byte in the SRCA operand is selected by the Byte Position field of the ALU Status Register and the Byte Order bit of the Configuration Register. The selected byte replaces the low-order byte of the SRCB operand and the resulting word is placed into the DEST location.

Note: The selection of bytes within words is specified in Section 3.4.5.
EXHW

Extract Half-Word

Operation: DEST ← SRCB, with low-order half-word replaced by half-word in SRCA selected by BP

Assembler Syntax:
EXHW rc, ra, rb
or
EXHW rc, ra, const8

Status: Not affected

Operands:
SRCA Content of register RA
SRCB M = 0: Content of register RB
 M = 1: l (Zero-extended to 32 bits)
DEST Register RC

Description: A half-word in the SRCA operand is selected by the Byte Position field of the ALU Status Register and the Byte Order bit of the Configuration Register. The selected half-word replaces the low-order half-word of the SRCB operand, and the resulting word is placed into the DEST location.

Note: The selection of half-words within words is specified in Section 3.4.5.
**EXHWS**

**Extract Half-Word, Sign-Extended**

**Operation:** DEST ← half-word in SRCA selected by BP, sign-extended to 32 bits

**Assembler Syntax:** EXHWS rc, ra

**Status:** Not affected

**Operands:**
- SRCA: Content of register RA
- DEST: Register RC

**Description:** A half-word in the SRCA operand is selected by the Byte Position field of the ALU Status Register and the Byte Order bit of the Configuration Register. The selected half-word is sign-extended to 32 bits, and the resulting word is placed into the DEST location.

Note: The selection of half-words within words is specified in Section 3.4.5.
EXTRACT

Extract Word, Bit-Aligned

Operation: DEST ← high-order word of (SRCA // SRCB << FC)

Assembler Syntax:
- EXTRACT rc, ra ,rb
- or
- EXTRACT rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: The SRCB operand is appended to the SRCA operand, and the resulting 64-bit value is shifted left by the number of bit-positions specified by the Funnel Shift Count (FC) field of the ALU Status register. The high-order 32 bits of the 64-bit shifted value are placed in the DEST location.

If the SRCB operand is the same as the SRCA operand, the EXTRACT instruction performs a rotate operation.
FADD

Floating-Point Add, Single-Precision

Operation: DEST (single-precision) ← SRCA (single-precision) + SRCB (single-precision)

Assembler Syntax: 
FADD rc, ra, rb

Status: fpX, fpU, fpV, fpR, fpN

Operands:
- SRCA: Content of register RA
- SRCB: Content of register RB
- DEST: Register RC

Description: The SRCA operand is added to the SRCB operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the addition are single-precision floating-point numbers.
FDIV

Floating-Point Divide, Single-Precision

Operation: DEST (single-precision) ← SRCA (single-precision) / SRCB (single-precision)

Assembler
Syntax: FDIV rc, ra, rb
Status: fpD, fpX, fpU, fpV, fpR, fpN
Operands: SRCA Content of register RA
          SRCB Content of register RB
          DEST Register RC

Description: The SRCA operand is divided by the SRCB operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the division are single-precision floating-point numbers.
FDMUL

Floating-Point Multiply, Single-to-Double Precision

Operation:  
DEST (double-precision) ← SRCA (single-precision) × 
            SRCB (single-precision)

Assembler
Syntax:  
FDMUL rc, ra, rb

Status:  
fpR, fpN

Operands:  
SRCA  Content of register RA
SRCB  Content of register RB
DEST  Register RC

Description:  
The SRCB operand is multiplied by the SRCA operand; the result is 
placed into the DEST location. SRCA and SRCB are single-precision 
floating-point numbers; the result is produced in double-precision 
format. Because the product of two single-precision operands can 
always be represented exactly as a double-precision number, the 
FDMUL result does not depend on the FRM field of the Floating-Point 
Environment Register.
Floating-Point Equal To, Single-Precision

**Operation:**
IF SRCA (single-precision) = SRCB (single-precision)
THEN DEST ← TRUE
ELSE DEST ← FALSE

**Assembler Syntax:**
FEQ rc, ra, rb

**Status:**
fpN

**Operands:**
SRCA  Content of register RA
SRCB  Content of register RB
DEST  Register RC

**Description:**
If the SRCA operand is equal to the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. SRCA and SRCB are single-precision floating-point numbers.

Note: The rounding mode specified by the FRM field of the Floating-Point Environment Register has no effect on this operation.
FGE

Floating-Point Greater Than Or Equal To, Single-Precision

Operation: IF SRCA (single-precision) \geq\text{SRCB (single-precision)}
THEN DEST ← TRUE
ELSE DEST ← FALSE

Assembler Syntax: FGE rc, ra, rb

Status: fpN

Operands:
SRCA Content of register RA
SRCB Content of register RB
DEST Register RC

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 1 1 0 1 0 1 0</td>
<td>RC</td>
<td>RA</td>
<td>RB</td>
<td></td>
</tr>
</tbody>
</table>

OP = EE

Description: If the SRCA operand is greater than or equal to the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. SRCA and SRCB are single-precision floating-point numbers.

Note: The rounding mode specified by the FRM field of the Floating-Point Environment Register has no effect on this operation.
Floating-Point Greater Than, Single-Precision

Operation: IF SRCA (single-precision) > SRCB (single-precision) THEN DEST ← TRUE ELSE DEST ← FALSE

Assembler Syntax: FGT rc, ra, rb
Status: fpN
Operands:
- SRCA: Content of register RA
- SRCB: Content of register RB
- DEST: Register RC

Description: If the SRCA operand is greater than the SRCB operand, a Boolean TRUE is placed into the DEST location; otherwise, a Boolean FALSE is placed into the DEST location. SRCA and SRCB are single-precision floating-point numbers.

Note: The rounding mode specified by the FRM field of the Floating-Point Environment Register has no effect on this operation.
Floating-Point Multiply-Accumulate, Single-Precision

Operation:  ACC(ACN) (variable-precision) ← SRCA (single-precision) * SRCB (single-precision) + ACC(ACN) (variable-precision)

Assembler Syntax:  FMAC FUNC,ACN,ra,rb

Status:  fpU,fpV,fpR,fpN

Operands:  SRCA  Content of register RA
           SRCB  Content of register RB
           ACC(ACN)  (Content of) Accumulator register ACN

Control:  FUNC  Modifies operation as shown in the table below
           ACN  Accumulator register number (0, 1, 2, 3)

Description:  A compound operation of the form (OP1 * OP2) + OP3 is performed, where OP1 and OP2 are single-precision operands, and OP3 is an operand having the format specified by the Accumulator Format field of the Floating-Point Environment Register. Operand sources and optional sign changes are specified by FUNC, as described in the table below. The result is rounded and stored in ACC(ACN), in the format specified by ACF.

Note that the FMAC instruction uses the fast float mode of operation, regardless of the state of the Fast Float Select bit in the Floating-Point Environment Register. The FMAC instruction never causes a Floating-Point Exception trap—it updates the sticky status bits instead. Furthermore, the FMAC instruction never sets the Inexact Sticky bit, regardless of the result.

<table>
<thead>
<tr>
<th>FUNC</th>
<th>Operation performed</th>
<th>Operation performed</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>(SRCA * SRCB)</td>
<td>+</td>
</tr>
<tr>
<td>0001</td>
<td>(SRCA * -SRCB)</td>
<td>+</td>
</tr>
<tr>
<td>0010</td>
<td>(SRCA * SRCB)</td>
<td>-</td>
</tr>
<tr>
<td>0011</td>
<td>(SRCA * -SRCB)</td>
<td>-</td>
</tr>
<tr>
<td>0100</td>
<td>(SRCA * SRCB)</td>
<td>+</td>
</tr>
<tr>
<td>0101</td>
<td>(SRCA * -SRCB)</td>
<td>+</td>
</tr>
<tr>
<td>0110</td>
<td>(SRCA * SRCB)</td>
<td>-</td>
</tr>
<tr>
<td>0111</td>
<td>(SRCA * -SRCB)</td>
<td>-</td>
</tr>
<tr>
<td>1000</td>
<td>(SRCA * 1.0)</td>
<td>+</td>
</tr>
<tr>
<td>1001</td>
<td>(SRCA * -1.0)</td>
<td>+</td>
</tr>
<tr>
<td>1010</td>
<td>(SRCA * 1.0)</td>
<td>-</td>
</tr>
<tr>
<td>1011</td>
<td>(SRCA * -1.0)</td>
<td>-</td>
</tr>
<tr>
<td>1100</td>
<td>(SRCA * 1.0)</td>
<td>+</td>
</tr>
<tr>
<td>1101</td>
<td>(SRCA * -1.0)</td>
<td>+</td>
</tr>
<tr>
<td>1110</td>
<td>(SRCA * 1.0)</td>
<td>-</td>
</tr>
</tbody>
</table>
FMSM

Floating-Point Multiply-Sum, Single-Precision

Operation:  DEST (single-precision) ← SRCA (single-precision) * ACC(0) (single-precision) + SRCB (single-precision)

Assembler Syntax:  FMSM rc, ra, rb

Status:  fpU, fpV, fpR, fpN

Operands:  SRCA  Content of register RA
           SRCB  Content of register RB
           ACC(0)  Content of accumulator register 0
           DEST  Register RC

Description:  The SRCA operand is multiplied by the ACC(0) operand, and the product added to the SRCB operand; the result is rounded to single-precision format according to Floating-Point Environment Register field FRM, and placed into the DEST location. Operands SRCA, SRCB, and ACC(0) are single-precision floating-point numbers. The Accumulator Format field of the Floating-Point Environment Register must specify single-precision.

Note that the FMSM instruction uses the fast-float mode of operation, regardless of the state of the Fast-Float Select bit in the Floating-Point Environment Register. The FMSM instruction never causes a Floating-Point Exception trap—it updates the sticky status bits instead. Furthermore, the FMSM instruction never sets the Inexact Sticky bit, regardless of the result.
FMUL

Floating-Point Multiply, Single-Precision

Operation: DEST (single-precision) ← SRCA (single-precision) * SRCB (single-precision)

Assembler Syntax: FMUL rc, ra, rb
Status: fpX, fpU, fpV, fpR, fpN
Operands: SRCA Content of register RA
SRCB Content of register RB
DEST Register RC

31 23 15 7 0

OP = F4

Description: The SRCA operand is multiplied by the SRCB operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the multiplication are single-precision floating-point numbers.
FSUB

Floating-Point Subtract, Single-Precision

Operation: DEST (single-precision) ← SRCA (single-precision)*
            SRCB (single-precision)

Assembler
Syntax:    FSUB rc, ra, rb
Status:    fpX, fpU, fpV, fpR, fpN
Operands:  SRCA  Content of register RA
            SRCB  Content of register RB
            DEST  Register RC

Description: The SRCB operand is subtracted from the SRCA operand; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operands and result of the subtraction are single-precision floating-point numbers.
HALT

Enter Halt Mode

Operation: Enter Halt mode on next cycle
Assembler
Syntax: HALT
Status: Not affected
Operands: Not applicable

Description: The processor is placed into the Halt mode on the next cycle, except that any external data accesses are completed.

This instruction may be executed only by Supervisor-mode programs. An attempted execution by a User-mode program causes a Protection Violation trap to occur.

If the instruction following a Halt instruction has an exception (e.g., TLB Miss), the trap associated with this exception is taken before the processor enters the Halt mode.
INBYTE

Insert Byte

Operation: DEST ← SRCA, with byte selected by BP replaced by low-order byte of SRCB

Assembler Syntax: INBYTE rc, ra, rb
or
INBYTE rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: A byte in the SRCA operand is selected by the Byte Position field of the ALU Status Register and the Byte Order bit of the Configuration Register. The selected byte is replaced by the low-order byte of the SRCB operand, and the resulting word is placed into the DEST location.

Note: The selection of bytes within words is specified in Section 3.4.5.
INHW

Insert Half-Word

Operation: DEST ← SRCA, with half-word selected by BP replaced by low-order half-word of SRCB

Assembler Syntax: INHW rc, ra, rb
or INHW rc, ra, const8

Status: Not affected

Operands:
- SRCA Content of register RA
- SRCB M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST Register RC

Description: A half-word in the SRCA operand is selected by the Byte Position field of the ALU Status Register and the Byte Order bit of the Configuration Register. The selected half-word is replaced by the low-order half-word of the SRCB operand, and the resulting word is placed into the DEST location.

Note: The selection of half-words within words is specified in Section 3.4.5.
INvalidate

Operation:  Reset all valid bits in Branch Target Cache memory
Assembler Syntax:  INV
Status:  Not affected
Operands:  Not applicable

Description:  This instruction causes all Branch Target Cache memory valid bits to be reset, on the execution of the next successful branch. This causes all Branch Target Cache memory locations to become invalid.

This instruction may be executed only by Supervisor-mode programs. An attempted execution by a User-mode program causes a Protection Violation trap to occur.
Interrupt Return

Operation: Perform an interrupt return sequence
Assembler Syntax: IRET
Status: Not affected
Operands: Not applicable

```
   31  23  15  7  0
  +-----------+-----------+-----------+
  | OP = 88   | Reserved  | Reserved  |
  +-----------+-----------+-----------+
```

Description: This instruction performs the interrupt return sequence described in Section 3.5.5.
This instruction may be executed only by Supervisor-mode programs. An attempted execution by a User-mode program causes a Protection Violation trap to occur.
IRETINV

Interrupt Return and Invalidate

Operation: Perform an interrupt return sequence, and reset all valid bits in Branch Target Cache memory

Assembler Syntax: IRETINV

Status: Not affected

Operands: Not applicable

Description: This instruction performs the interrupt return sequence described in Section 3.5.5. When the sequence begins, all Branch Target Cache memory valid bits are reset to zeros. This causes all Branch Target Cache memory locations to become invalid.

This instruction may be executed only by Supervisor-mode programs. An attempted execution by a User-mode program causes a Protection Violation trap to occur.
JMP

Jump

Operation: \( \text{PC} \leftarrow \text{TARGET} \)
Execute delay instruction

Assembler
Syntax: \( \text{JMP target} \)

Status: Not affected

Operands: TARGET
\( A = 0: 117 \ldots l10 // l19 \ldots l2 \) (sign-extended to 30 bits) + PC
\( A = 1: 117 \ldots l10 // l19 \ldots l2 \) (zero-extended to 30 bits)

Description: A non-sequential instruction fetch occurs to the instruction address given by the TARGET operand. The instruction following the JMP is executed before the non-sequential fetch occurs.
JMPF

Jump False

Operation: IF SRCA = FALSE THEN PC ← TARGET
Execute delay instruction

Assembler
Syntax: JMPF ra, target

Status: Not affected

Operands: SRCA Content of register RA
TARGET
A = 0: \(117 \ldots 110 \parallel 19 \ldots 12\) (sign-extended to 30 bits) + PC
A = 1: \(117 \ldots 110 \parallel 19 \ldots 12\) (zero-extended to 30 bits)

Description: If SRCA is a Boolean FALSE, a non-sequential instruction fetch occurs to the instruction address given by the TARGET operand.
If SRCA is a Boolean TRUE, this instruction has no effect.
The instruction following the JMPF is executed regardless of the value of SRCA.
JMPFDEC: Jump False and Decrement

**Operation:**
IF SRCA = FALSE THEN
   SRCA ← SRCA - 1
   PC ← TARGET
ELSE
   SRCA ← SRCA - 1
Execute delay instruction

**Assembler Syntax:**
JMPFDEC ra, target

**Status:** Not affected

**Operands:**
- **SRCA:** Content of register RA
- **TARGET:**
  - A = 0: l17 ... l10 // l9 ... l2 (sign-extended to 30 bits) + PC
  - A = 1: l17 ... l10 // l9 ... l2 (zero-extended to 30 bits)

**Description:**
If SRCA is a Boolean FALSE, a non-sequential instruction fetch occurs to the instruction address given by the TARGET operand.

If SRCA is a Boolean TRUE, this instruction has no effect on the instruction-execution sequence.

The SRCA operand is decremented by one, regardless of whether or not the non-sequential instruction fetch occurs. Note that a negative number for the SRCA operand is a Boolean TRUE.

The instruction following the JMPFDEC is executed regardless of the value of SRCA.
JMPFI

Jump False Indirect

Operation: \( \text{IF SRCA = FALSE THEN PC} \leftarrow \text{SRCB} \)
Execute delay instruction

Assembler
Syntax: JMPFI ra, rb

Status: Not affected

Operands: SRCA  Content of register RA
SRCB  Content of register RB

31 23 15 7 0
1 1 0 0 1 0 0  Reserved  RA  RB

OP = C4

Description: The SRCA is a Boolean FALSE, a non-sequential instruction fetch occurs to the instruction address given by the SRCB operand.
If SRCA is a Boolean TRUE, this instruction has no effect.
The instruction following the JMPFI is executed regardless of the value of SRCA.
Jump Indirect

Operation: PC ← SRCB
Execute delay instruction

Assembler Syntax: JMPI rb

Status: Not affected

Operands: SRCB Content of register RB

Description: A non-sequential instruction fetch occurs to the instruction address given by the SRCB operand. The instruction following the JMPI is executed before the non-sequential fetch occurs.
**JMPT**

**Jump True**

**Operation:** IF SRCA = TRUE THEN PC ← TARGET  
Execute delay instruction

**Assembler**  
**Syntax:** JMPT ra, target  
**Status:** Not affected  
**Operands:**  
- SRCA: Content of register RA  
- TARGET:  
  - A = 0: I17 ... I10 // I9 ... I2 (sign-extended to 30 bits) + PC  
  - A = 1: I17 ... I10 // I9 ... I2 (zero-extended to 30 bits)

```
       31  23  15  7  0
  1 0 1 0 1 1 0 A | I17 ... I10 | RA | I9 ... I2
```

**OP = AC, AD**  

**Description:** If SRCA is a Boolean TRUE, a non-sequential instruction fetch occurs to the instruction address given by the TARGET operand.  
If SRCA is a Boolean FALSE, this instruction has no effect.  
The instruction following the JMPT is executed regardless of the value of SRCA.
JMPTI

Jump True Indirect

Operation: IF SRCA = TRUE THEN PC ← SRCB
Execute delay instruction

Assembler Syntax: JMPTI ra, rb
Status: Not affected
Operands: SRCA Content of register RA
SRCB Content of register RB

Description: If the SRCA is a Boolean TRUE, a non-sequential instruction fetch occurs to the instruction address given by the SRCB operand.
If SRCA is a Boolean FALSE, this instruction has no effect.
The instruction following the JMPTI is executed regardless of the value of SRCA.
LOAD

Load

Operation: DEST ← EXTERNAL WORD [SRCB]

Assembler Syntax:
LOAD ce, cntl, ra, rb
or
LOAD ce, cntl, ra, const8

Status: Not affected

Operands:
SRCB
M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)

DEST Register RA

31 23 15 7 0
0 0 0 1 0 1 1 M

CNTL
RA
RB or I

Description:
If the CE bit is 0, the external word addressed by the SRCB operand is placed into the DEST location.

If the CE bit is 1, a word is transferred from the coprocessor into the DEST location. The SRCB operand has no pre-defined interpretation in this case, though it appears on the address bus.

The CNTL field of the LOAD instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.
LOADL  LOADL

Load and Lock

Operation:  DEST ← EXTERNAL WORD [SRCB],
assert LOCK output during access

Assembler
Syntax:  LOADL ce, cntl, ra, rb
or
LOADL ce, cntl, ra, const8

Status:  Not affected

Operands:  SRCB   M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)

DEST   Register RA

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>M</td>
<td>CNTL</td>
<td>RA</td>
<td>RB or I</td>
<td></td>
</tr>
</tbody>
</table>

Description:  If the CE bit is 0, the external word addressed by the SRCB operand
is placed into the DEST location.

If the CE bit is 1, a word is transferred from the coprocessor into the DEST location. The SRCB operand has no pre-defined interpretation
in this case, though it appears on the address bus.

The CNTL field of the LOADL instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.

The LOCK output is asserted during the access or transfer.
LOADM

Load Multiple

Operation: DEST ... DEST + COUNT ← EXTERNAL WORD [SRCB] ...
EXTERNAL WORD [SRCB + (COUNT * 4)]

Assembler Syntax: LOADM ce, cntl, ra, rb
or
LOADM ce, cntl, ra, const8

Status: Not affected

Operands: SRCB
M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)

DEST Register RA

Description: If the CE bit is 0, external words at consecutive word addresses, beginning with the word addressed by the SRCB operand, are placed into consecutive registers, beginning with the DEST location.

If the CE bit is 1, multiple words are transferred from the coprocessor into consecutive registers, beginning with the DEST location. The SRCB operand has no pre-defined interpretation in this case.

The total number of words accessed or transferred in the sequence is specified by the Count Remaining (CR) field of the Channel Control Register (which also appears in the Load/Store Count Remaining Register) at the beginning of the access. The total number of words is the value of the CR field plus one. The CNTL field of the LOADM instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.

Note: The address and register-number sequences for the LOADM instruction are specified in Section 3.4.4.
LOADSET

Load and Set

Operation: DEST ← EXTERNAL WORD [SRCB]
            EXTERNAL WORD [SRCB] ← \texttt{h'FFFFFFFF'},
            assert LOCK output during access

Assembler Syntax: LOADSET ce, cntl, ra, rb
                 or
                 LOADSET ce, cntl, ra, const8

Status: Not affected

Operands: SRCB  M = 0: Content of register RB
           M = 1: I (Zero-extended to 32 bits)

DEST  Register RA

\begin{center}
\begin{tabular}{c|c|c|c|c|c}
 31 & 23 & 15 & 7 & 0 \\
0 & 0 & 1 & 0 & 1 \\
0 & 1 & 1 & 1 & M \\
\hline
       &     & CNTL & RA & RB or I \\
\end{tabular}
\end{center}

CE: \textbf{LOADSET}

OP = 26, 27

Description: If the CE bit is 0, the external word addressed by the SRCB operand is placed into the DEST location. After the DEST location is altered, the external word addressed by the SRCB operand is written, atomically, with a word consisting of a 1 in every bit position.

If the CE bit is 1, a word is transferred from the coprocessor into the DEST location. The SRCB operand has no pre-defined interpretation in this case, though it appears on the Address Bus. After the DEST location is altered, a word consisting of a 1 in every bit position is transferred, atomically, to the coprocessor.

The CNTL field of the LOADSET instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.

The LOCK output is asserted throughout the LOADSET operation.
Move From Accumulator

Operation: DEST ← ACC(ACN)

Assembler Syntax: MFACC rc, FMT, ACN

Status: fpX, fpU, fpV, fpR

Operands: DEST Register RC (single-precision f.p.)
or Register RC and twin of Register RC (Double-precision f.p.)

ACC(ACN) Content of ACC(ACN)

Control: FMT Format of destination operand
00 Format specified by ACF
01 Single-precision floating-point
10 Double-precision floating-point
11 Reserved

ACN Accumulator number (0, 1, 2, or 3)

Description: The operand in accumulator register ACN is converted to format FMT and rounded according to Floating-Point Environment Register field FRM, then placed into the DEST location. The format of the operand read from accumulator register ACN is specified by Floating-Point Environment Register field ACF.
MTACC

Move To Accumulator

Operation: ACC(ACN) ← SRCA

Assembler Syntax: MTACC ra, FMT, ACN

Status: fpX, fpU, fpV, fpR, fpN

Operands: SRCA Content of register RA (single-precision f.p.)
or
Content of register RA and the twin of Register RA (double-precision f.p.)

ACC(ACN) Content of ACC(ACN)

Control: FMT Format of source operand
00 Format specified by ACF
01 Single-precision floating-point
10 Double-precision floating-point
11 Reserved

ACN Accumulator number (0, 1, 2, or 3)

31 23 15 7 0

| 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | RA | Res | F | M | T | A | C | N |

OP = E8

Description: The SRCA operand is converted from format FMT and rounded according to Floating-Point Environment Register field FRM, then transferred to accumulator register ACC(ACN); the format of the destination operand is specified by Floating-Point Environment Register field ACF.

Note that the MTACC instruction uses the fast float mode of operation, regardless of the Fast Float Select bit in the Floating-Point Environment Register. A denormalized number is flushed to zero before being written into the accumulator.
MFSR

Move from Special Register

Operation: DEST ← SPECIAL
Assemble Syntax: MFSR rc, spid
Status: Not affected
Operands: SPECIAL Content of special-purpose register SA
         DEST Register RC

Description: The SPECIAL operand is placed into the DEST location.
For programs in the User mode, a Protection Violation trap occurs if
SA specifies a protected special-purpose register. If a trap occurs, the
DEST location is not altered.
**MFTLB**

**Move from Translation Look-Aside Buffer Register**

**Operation:** DEST ← TLB [SRCA]

**Assembler Syntax:** MFTLB rc, ra

**Status:** Not affected

**Operands:**
- **SRCA**: Content of register RA, bits 6 .. 0
- **DEST**: Register RC

---

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>RC</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>RA</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>7</td>
<td>8</td>
<td>7</td>
</tr>
<tr>
<td>Reserved</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Description:** The Translation Look-Aside Buffer (TLB) register whose register number is specified by the SCRA operand is placed into the DEST location.

This instruction may be executed only by Supervisor-mode programs. An attempted execution by a User-mode program causes a Protection Violation trap to occur. If a trap occurs, the DEST location is not altered.
Move to Special Register

Operation: \( \text{SPDEST} \leftarrow \text{SRCB} \)

Assembler Syntax: \( \text{MTSR spid, rb} \)

Status: Not affected, unless the destination is the ALU Status Register

Operands:
- \( \text{SRCB} \) Content of register RB
- \( \text{SPDEST} \) Special-purpose register SA

Description: The SRCB operand is placed into the SPECIAL location.

For programs in the User mode, a Protection Violation trap occurs if SA specifies a protected special-purpose register. If a trap occurs, the SPDEST location is not altered.
MTSRIM

Move to Special Register Immediate

Operation: SPDEST ← 0116
Assembler Syntax: MTSRIM spid, const16
Status: Not affected, unless the destination is the ALU Status Register
Operands: 0116 I15 ... I8 //I7 ... I0 (zero-extended to 32 bits)
           SPDEST Special-purpose register SA

<table>
<thead>
<tr>
<th>31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
<td>I15</td>
<td>I8</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

OP = 04

Description: The 0116 operand is placed into the SPECIAL location.
For programs in the User mode, a Protection Violation trap occurs if SA specifies a protected special-purpose register. If a trap occurs, the SPDEST location is not altered.
MTTLB
Move to Translation Look-Aside Buffer Register

Operation: TLB [SRCA] ← SRCB
Assembler Syntax: MTTLB ra, rb
Status: Not affected
Operands: SRCA Content of register RA, bits 6...0
SRCB Content of register RB

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>RA</td>
<td>RB</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Description: The SRCB operand is placed into the Translation Look-Aside Buffer (TLB) register whose register-number is specified by the SRCA operand.

This instruction may be executed only by Supervisor-mode programs. An attempted execution by a User-mode program causes a Protection Violation trap to occur. If a trap occurs, the TLB register is not altered.
Multiply Step

Operation: Perform one-bit step of a multiply operation

Assembler Syntax:
- MUL rc, ra, rb
- MUL rc, ra, const 8

Status: V, N, Z, C

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
        M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description:
If the least-significant bit of the Q Register is 1, the SRCA operand is added to the SRCB operand. If the least-significant bit of the Q register is 0, a zero word is added to the SRCB operand.

The content of the Q Register is appended to the result of the add, and the resulting 64-bit value is shifted right by one bit position; the true sign of the result of the add fills the vacated bit position (i.e., the sign of the result is complemented if an overflow occurred during the add operation). The high-order 32 bits of the 64-bit shifted value are placed into the DEST location. The low-order 32 bits of the shifted value are placed into the Q Register.

This instruction is provided for compatibility with the Am29000 microprocessor.
MULL

Multiply Last Step

Operation: Complete a sequence of multiply steps (for signed multiply)
Assembler Syntax: MULL rc, ra, rb
or
MULL rc, ra, const 8
Status: V, N, Z, C
Operands: SRCA Content of register RA
SRCB M = 0: Content of register RB
M = 1: I (Zero-extended to 32 bits)
DEST Register RC

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>M</td>
<td>RC</td>
<td>RA</td>
<td>RB or I</td>
<td></td>
</tr>
</tbody>
</table>

Description: If the least-significant bit of the Q Register is 1, the SRCA operand is subtracted from the SRCB operand. If the least-significant bit of the Q register is 0, a zero word is subtracted from the SRCB operand.

The content of the Q Register is appended to the result of the subtract, and the resulting 64-bit value is shifted right by one bit position; the true sign of the result of the subtract fills the vacated bit position (i.e., the sign of the result is complemented if an overflow occurred during the subtract operation). The high-order 32 bits of the 64-bit shifted value are placed into the DEST location. The low-order 32 bits of the shifted value are placed into the Q Register.

This instruction is provided for compatibility with the Am29000 microprocessor.
Integer Multiply, Unsigned

Operation: DEST ← SRCA * SRCB
Assembler Syntax: MULTIPLU rc, ra, rb
Status: None
Operands:
- SRCA: Content of register RA
- SRCS: Content of register RS
- DEST: Register RC

Description: The SRCA operand is multiplied by the SRCS operand. The low-order 32 bits of the 64-bit result are placed into the DEST location. This operation treats the SRCA and SRCS operands as unsigned integers and produces an unsigned result.

The contents of the Q register are undefined after a MULTIPLU operation.
MULTIPLY

Integer Multiply, Signed

Operation: DEST ← SRCA * SRCB

Assembler Syntax: MULTIPLY rc, ra, rb

Status: None

Operands:
SRCA: Content of register RA
SRCB: Content of register RB
DEST: Register RC

Description: The SRCA operand is multiplied by the SRCB operand. The low-order 32 bits of the 64-bit result are placed into the DEST location. This operation treats the SRCA and SRCB operands as two's-complement integers and produces a two's-complement result.

The contents of the Q register are undefined after a MULTIPLY operation.
MULTM

Integer Multiply Most-Significant Bits, Signed

Operation: \( \text{DEST} \leftarrow \text{SRCA} \times \text{SRCB} \)

Assembler Syntax: MULTM rc, ra, rb

Status: None

Operands:
- \text{SRCA} \quad \text{Content of register RA}
- \text{SRCB} \quad \text{Content of register RB}
- \text{DEST} \quad \text{Register RC}

Description: The SRCA operand is multiplied by the SRCB operand. The high-order 32 bits of the 64-bit result are placed into the DEST location. This operation treats the SRCA and SRCB operands as two's-complement integers and produces a two's-complement result.

The contents of the Q register are undefined after a MULTM operation.
MULTMU

Integer Multiply Most-Significant Bits, Unsigned

Operation: DEST ← SRCA * SRCB

Assembler
Syntax: MULTMU rc, ra, rb

Status: None

Operands:
SRCA       Content of register RA
SRCB       Content of register RB
DEST       Register RC

31 23 15  7  0
11011111  RC  RA  RB

Description: The SRCA operand is multiplied by the SRCB operand. The
high-order 32 bits of the 64-bit result are placed into the DEST
location. This operation treats the SRCA and SRCB operands as
unsigned integers and produces an unsigned result.

The contents of the Q register are undefined after a MULTMU
operation.
MULU

Multiply Step, Unsigned

Operation: Perform one-bit step of a multiply operation (unsigned)

Assembler Syntax:
- MULU rc, ra, rb
- MULU rc, ra, const 8

Status: V, N, Z, C

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
- M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description:
If the least-significant bit of the Q Register is 1, the SRCA operand is added to the SRCB operand. If the least-significant bit of the Q register is 0, a zero word is added to the SRCB operand.

The content of the Q register is appended to the result of the add, and the resulting 64-bit value is shifted right by one bit position; the carry-out of the add fills the vacated bit position. The high-order 32 bits of the 64-bit shifted value are placed into the DEST location. The low-order 32 bits of the shifted value are placed into the Q Register.

This instruction is provided for compatibility with the Am29000 microprocessor.
**NAND Logical**

**Operation:** \( \text{DEST} \leftarrow \neg (\text{SRCA} \& \text{SRCB}) \)

**Assembler Syntax:**
- NAND rc, ra, rb
- or
- NAND rc, ra, const8

**Status:** N, Z

**Operands:**
- **SRCA**
  - Content of register RA
- **SRCB**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST**
  - Register RC

**Description:** The SRCA operand is logically ANDed, bit-by-bit, with the SRCB operand. The one's-complement of the result is placed into the DEST location.
NOR Logical

Operation: \( \text{DEST} \leftarrow \neg (\text{SRCA} \mid \text{SRCB}) \)

Assembler Syntax:
- \( \text{NOR} \) \( \text{rc} \), \( \text{ra} \), \( \text{rb} \)
- \( \text{or} \)
- \( \text{NOR} \) \( \text{rc} \), \( \text{ra} \), const8

Status: \( \text{N, Z} \)

Operands:
- \( \text{SRCA} \) Content of register RA
- \( \text{SRCB} \) \( M = 0 \): Content of register RB
  \( M = 1 \): I (Zero-extended to 32 bits)
- \( \text{DEST} \) Register RC

Description: The \( \text{SRCA} \) operand is logically ORed, bit-by-bit, with the \( \text{SRCB} \) operand. The one's-complement of the result is placed into the \( \text{DEST} \) location.
**OR Logical**

**Operation:** \( \text{DEST} \leftarrow \text{SRCA} \mid \text{SRCB} \)

**Assembler Syntax:**
- OR \( \text{rc}, \text{ra}, \text{rb} \)
- OR \( \text{rc}, \text{ra}, \text{const8} \)

**Status:** \( \text{N, Z} \)

**Operands:**
- **SRCA**
  - Content of register RA
- **SRCB**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST**
  - Register RC

\[
\begin{array}{cccccc}
31 & 23 & 15 & 7 & 0 \\
\hline
1 & 0 & 0 & 1 & 0 \quad M \\
\hline
\end{array}
\]

**Description:** The SRCA operand is logically ORed, bit-by-bit, with the SRCB operand, and the result is placed into the DEST location.
OR-NOT Logical

**Operation:** DEST ← SRCA | ~ SRCB

**Assembler Syntax:**
- ORN rc, ra, rb
- ORN rc, ra, const8

**Status:** N, Z

**Operands:**
- **SRCA**: Content of register RA
- **SRCB**: M = 0: Content of register RB  
  M = 1: I (Zero-extended to 32 bits)
- **DEST**: Register RC

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 0 1 0 1 0 1</td>
<td>M</td>
<td>RC</td>
<td>RA</td>
<td>RB or I</td>
</tr>
</tbody>
</table>

**Description:** The SRCA operand is logically ORed, bit-by-bit, with the one's-complement of the SRCB operand, and the result is placed into the DEST location.
**Operation:** Load IPA, IPB, and IPC registers with operand-register numbers

**Assembler Syntax:** SETIP rc, ra, rb

**Status:** Not affected

**Operands:** Absolute-register numbers for registers RA, RB, and RC

---

**Description:** The IPA, IPB, and IPC registers are set to the register numbers of registers RA, RB, and RC, respectively.

For programs in the User mode, a Protection Violation trap occurs if RA, RB, or RC specifies a register that is protected by the Register Bank Protect Register.
SLL

Shift Left Logical

Operation:  DEST ← SRCA << SRCB (zero fill)

Assembler Syntax:  
   SLL rc, ra, rb
   or
   SLL rc, ra, const8

Status:  Not affected

Operands:  
   SRCA  Content of register RA
   SRCB  M = 0: Content of register RB, bits 4 ... 0
          M = 1: I, bits 4...0
   DEST  Register RC

Description:  The SRCA operand is shifted left by the number of bit positions specified by the SRCB operand; zeros fill vacated bit positions. The result is placed into the DEST location.
**SQRT**

*Floating-Point Square Root*

**Operation:**  DEST ← SQRT(SRCA)

**Assembler Syntax:**  SQRT rc, ra, FS

**Status:**  fpX, fpR, fpN

**Operands:**  
- SRCA  Content of register RA (single-precision f.p.)
- Content of register RA and the twin of register RA (double-precision f.p.)
- DEST  Register RC (single-precision f.p.)
- Register RC and twin of Register RC (double-precision f.p.)

**Control:**  
- FS  Format of source operand SRCA
- 00  Reserved for future use
- 01  Single-precision floating-point
- 10  Double-precision floating-point
- 11  Reserved for future use

```
<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>RA</td>
<td>Reserved</td>
<td>FS</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Description:**  This operation computes the square root of floating-point operand SRCA; the result is rounded according to FRM field of the Floating-Point Environment Register and placed into the DEST location. The operand and result are single- or double-precision floating-point numbers, as specified by FS.
Shift Right Arithmetic

Operation: DEST ← SRCA >> SRCB (sign fill)

Assembler Syntax:
- SRA rc, ra, rb
- SRA rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB, bits 4 ... 0
  M = 1: l, bits 4 ... 0
- DEST: Register RC

Description: The SRCA operand is shifted right by the number of bit positions specified by the SRCB operand; the sign of the SRCA operand fills vacated bit positions. The result is placed into the DEST location.
Shift Right Logical

Operation: DEST → SRCA >> SRCB (zero fill)

Assembler Syntax:
- SRL rc, ra, rb
- or
- SRL rc, ra, const8

Status: Not affected

Operands:
- SRCA: Content of register RA
- SRCB:
  - M = 0: Content of register RB, bits 4 ... 0
  - M = 1: I, bits 4 ... 0
- DEST: Register RC

Description: The SRCA operand is shifted right by the number of bit positions specified by the SRCB operand; zeros fill vacated bit positions. The result is placed into the DEST location.
**STORE**

*Store*

**Operation:** EXTERNAL WORD [SRCB] \(\rightarrow\) SRCA

**Assembler Syntax:**
- STORE ce, cntl, ra, rb
- STORE ce, cntl, ra, const8

**Status:** Not affected

**Operands:**
- SRCA: Content of register RA
- SRCB:
  - \(M = 0\): Content of register RB
  - \(M = 1\): I (Zero-extended to 32 bits)

**Description:**

If the CE bit is 0, the SRCA operand is placed into the external word addressed by the SRCB operand.

If the CE bit is 1, the SRCA and SRCB operands are transferred to the coprocessor.

The CNTL field of the STORE instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.
STOREL

Store and Lock

Operation: EXTERNAL WORD [SRCB] ← SRCA, assert LOCK output during access

Assembler Syntax: STOREL ce, cntl, ra, rb
or
STOREL ce, cntl, ra, const8

Status: Not affected

Operands:
- SRCA
  Content of register RA
- SRCB
  M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)

Description:
If the CE bit is 0, the SRCA operand is placed into the external word addressed by the SRCB operand.

If the CE bit is 1, the SRCA and SRCB operands are transferred to the coprocessor.

The CNTL field of the STOREL instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.

The LOCK output is asserted during the access or transfer.
STOREM

Store Multiple

Operation: EXTERNAL WORD [SRCB] ... EXTERNAL WORD [SRCB + (COUNT * 4)] ← SRCA ... SRCA+CUNT

Assembler Syntax: STOREM ce, cntl, ra, rb
or
STOREM ce, cntl, ra, const8

Status: Not affected

Operands: SRCA Content of register RA
SRCB M = 0: Content of register RB
          M = 1: I (Zero-extended to 32 bits)

Description: If the CE bit is 0, the contents of consecutive registers, beginning with the SRCA operand, are placed into external words at consecutive word addresses, beginning with the word addressed by the SRCB operand.

If the CE bit is 1, the contents of consecutive registers, beginning with the SRCA operand, are transferred to the coprocessor. The SRCB operand has no pre-defined interpretation in this case.

The total number of words accessed or transferred in the sequence is specified by the Count Remaining (CR) field of the Channel Control Register (which also appears in the Load/Store Count Remaining Register) at the beginning of the access. The total number of words is the value of the CR field plus one. The CNTL field of the STOREM instruction affects the access or transfer as described in Sections 3.4.4 and 6.1.2.

Note: The address and register-number sequences for the STOREM instruction are specified in Section 3.4.4.
**SUB**

Subtract

**Operation:** \( \text{DEST} \leftarrow \text{SRCA} - \text{SRCB} \)

**Assembler Syntax:**
- SUB rc, ra, rb
- SUB rc, ra, const8

**Status:** V, N, Z, C

**Operands:**
- **SRCA:** Content of register RA
- **SRCB:**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST:** Register RC

**Description:**
The SRCA operand is added to the two's-complement of the SRCB operand, and the result is placed into the DEST location.
**SUBC**

**Subtract with Carry**

**Operation:** \( \text{DEST} \leftarrow \text{SRCA} - \text{SRCB} - 1 + C \)

**Assembler Syntax:**
- `SUBC rc, ra, rb`
- `SUBC rc, ra, const8`

**Status:** \( V, N, Z, C \)

**Operands:**
- **SRCA** Content of register RA
- **SRCB**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST** Register RC

**Description:** The SRCA operand is added to the one's-complement of the SRCB operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location.
SUBCS

Subtract with Carry, Signed

Operation:

\[ \text{DEST} \leftarrow \text{SRCA} - \text{SRCB} - 1 + C \]

IF signed overflow THEN Trap (Out of Range)

Assembler Syntax:

SUBCS rc, ra, rb
or
SUBCS rc, ra, const8

Status: V, N, Z, C

Operands:

SRCA Content of register RA
SRCB \( M = 0 \): Content of register RB
\( M = 1 \): I (Zero-extended to 32 bits)

DEST Register RC

Description:
The SRCA operand is added to the one's-complement of the SRCB operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location. If the add operation causes a two’s-complement signed overflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an overflow occurs.
SUBCU

Subtract with Carry, Unsigned

Operation: \[ \text{DEST} \leftarrow \text{SRCA} - \text{SRCB} - 1 + C \]
IF unsigned underflow THEN Trap (Out of Range)

Assembler Syntax:
- SUBCU rc, ra, rb
- or
- SUBCU rc, ra, const8

Status: V, N, Z, C

Operands:
- \text{SRCA}: Content of register RA
- \text{SRCB}: \begin{align*}
M &= 0: \text{Content of register RB} \\
M &= 1: \text{I (Zero-extended to 32 bits)}
\end{align*}
- \text{DEST}: Register RC

Description: The SRCA operand is added to the one's-complement of the SRCB operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location. If the add operation causes an unsigned underflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an underflow occurs.
**SUBR**

**Subtract Reverse**

**Operation:**
\[ \text{DEST} \leftarrow \text{SRCB} - \text{SRCA} \]

**Assembler Syntax:**
- SUBR rc, ra, rb
- or
- SUBR rc, ra, const8

**Status:**
V, N, Z, C

**Operands:**
- **SRCA:** Content of register RA
- **SRCB:**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): \( I \) (Zero-extended to 32 bits)
- **DEST:** Register RC

**Description:**
The SRCB operand is added to the two's-complement of the SRCA operand and the result is placed into the DEST location.
**SUBRC**

**Subtract Reverse with Carry**

**Operation:** \( \text{DEST} \leftarrow \text{SRCB} - \text{SRCA} - 1 + \text{C} \)

**Assembler Syntax:**
- SUBRC \( \text{rc, ra, rb} \)
- or
- SUBRC \( \text{rc, ra, const8} \)

**Status:** \( V, N, Z, C \)

**Operands:**
- **SRCA:** Content of register RA
- **SRCB**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): \( I \) (Zero-extended to 32 bits)
- **DEST:** Register RC

**Description:** The SRCB operand is added to the one's-complement of the SRCA operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location.
**SUBRCS**

Subtract Reverse with Carry, Signed

**Operation:**

\[ \text{DEST} \leftarrow \text{SRCB} - \text{SRCA} - 1 + C \]

IF signed overflow THEN Trap (Out of Range)

**Assembler**

**Syntax:**

- \text{SUBRCS} rc, ra, rb
- \text{SUBRCS} rc, ra, const8

**Status:**

V, N, Z, C

**Operands:**

- **SRCA:** Content of register RA
- **SRCB:**
  - \( M = 0 \): Content of register RB
  - \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST:** Register RC

**Description:**

The SRCB operand is added to the one's-complement of the SRCA operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location. If the add operation causes a two's-complement signed overflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an overflow occurs.
**SUBRCU**

**Subtract Reverse with Carry, Unsigned**

**Operation:**

\[ \text{DEST} \leftarrow \text{SRCB} - \text{SRCA} - 1 + C \]

IF unsigned underflow THEN Trap (Out of Range)

**Assembler Syntax:**

SUBRCU rc, ra, rb  
or  
SUBRCU rc, ra, const8

**Status:**  
V, N, Z, C

**Operands:**

- **SRCA**  
  Content of register RA
- **SRCB**  
  \( M = 0 \): Content of register RB  
  \( M = 1 \): I (Zero-extended to 32 bits)
- **DEST**  
  Register RC

**Description:**

The SRCB operand is added to the one's-complement of the SRCA operand and the value of the ALU Status Carry bit, and the result is placed into the DEST location. If the add operation causes an unsigned underflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an underflow occurs.
SUBRS

Subtract Reverse, Signed

Operation: \( \text{DEST} \leftarrow \text{SRCB} - \text{SRCA} \)
IF signed overflow THEN Trap (Out of Range)

Assembler Syntax:
- SUBRS rc, ra, rb
- SUBRS rc, ra, const8

Status: V, N, Z, C

Operands:
- SRCA: Content of register RA
- SRCB: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- DEST: Register RC

Description: The SRCB operand is added to the two's-complement of the SRCA operand, and the result is placed into the DEST location. If the add operation causes a two's-complement signed overflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an overflow occurs.
SUBRU

Subtract Reverse, Unsigned

Operation:  
DEST ← SRCB − SRCA  
IF unsigned underflow THEN Trap (Out of Range)

Assembler  
Syntax:  SUBRU rc, ra, rb  
or  
SUBRU rc, ra, const8

Status:  V, N, Z, C

Operands:  
SRCA  Content of register RA
SRCB  M = 0: Content of register RB  
M = 1: I (Zero-extended to 32 bits)
DEST  Register RC

Description:  The SRCB operand is added to the two's-complement of the SRCA operand, and the result is placed into the DEST location. If the add operation causes an unsigned underflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an underflow occurs.
**SUBS**

**Subtract, Signed**

**Operation:**

\[ \text{DEST} \leftarrow \text{SRCA} - \text{SRCB} \]

IF signed overflow THEN Trap (Out of Range)

**Assembler Syntax:**

- SUBS rc, ra, rb
- SUBS rc, ra, const8

**Status:**

V, N, Z, C

**Operands:**

- **SRCA**: Content of register RA
- **SRCB**: M = 0: Content of register RB  
  M = 1: I (Zero-extended to 32 bits)
- **DEST**: Register RC

```
  31  23  15  7  0
0 0 1 0 0 0 0 M  RC  RA  RB or I
```

**Description:**

The SRCA operand is added to the two's-complement of the SRCB operand, and the result is placed into the DEST location. If the add operation causes a two's-complement signed overflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an overflow occurs.
SUBU

Subtract, Unsigned

Operation: \[ \text{DEST} \leftarrow \text{SRCA} - \text{SRCB} \]
IF unsigned underflow THEN Trap (Out of Range)

Assembler Syntax:
SUBU rc, ra, rb
or
SUBU rc, ra, const8

Status: V, N, Z, C

Operands:
SRCA  Content of register RA
SRCB  M = 0: Content of register RB
       M = 1: I (Zero-extended to 32 bits)
DEST  Register RC

Description: The SRCA operand is added to the two's-complement of the SRCB operand, and the result is placed into the DEST location. If the add operation causes an unsigned underflow, an Out of Range trap occurs.

Note that the DEST location is altered whether or not an underflow occurs.
**XNOR**

**Exclusive-NOR Logical**

**Operation:** \( \text{DEST} \leftarrow \sim (\text{SRCA} \oplus \text{SRCB}) \)

**Assembler Syntax:**
- XNOR rc, ra, rb
- or
- XNOR rc, ra, const8

**Status:** N, Z

**Operands:**
- **SRCA**: Content of register RA
- **SRCB**: M = 0: Content of register RB  
  M = 1: I (Zero-extended to 32 bits)
- **DEST**: Register RC

```plaintext
<p>| | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>23</td>
<td>15</td>
<td>7</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>M</td>
</tr>
</tbody>
</table>
RC | RA | RB or I
```

**OP = 96, 97**

**Description:** The SRCA operand is logically exclusive-ORed, bit-by-bit, with the SRCB operand. The one's-complement of the result is placed into the DEST location.
**XOR**

**Exclusive-OR Logical**

**Operation:** DEST ← SRCA ^ SRCB

**Assembler Syntax:**
- XOR rc, ra, rb
- or
- XOR rc, ra, const8

**Status:** N, Z

**Operands:**
- **SRCA**: Content of register RA
- **SRCB**: M = 0: Content of register RB
  M = 1: I (Zero-extended to 32 bits)
- **DEST**: Register RC

**Description:** The SRCA operand is logically exclusive-ORed, bit-by-bit, with the SRCB operand, and the result is placed into the DEST location.
8.5  INSTRUCTION INDEX BY OPERATION CODE

01  CONSTN  Constant, Negative
02  CONSTH  Constant, High
03  CONST    Constant
04  MTSRIM  Move to Special Register Immediate
05  CONSTHZ  Constant High, Zero Lower
06,07  LOADL  Load and Lock
08,09  CLZ    Count Leading Zeros
0A,0B  EXBYTE  Extract Byte
0C,0D  INBYTE  Insert Byte
0E,0F  STOREL  Store and Lock
10,11  ADDS   Add, Signed
12,13  ADDU   Add, Unsigned
14,15  ADD    Add
16,17  LOAD   Load
18,19  ADDCS  Add with Carry, Signed
1A,1B  ADDCU  Add with Carry, Unsigned
1C,1D  ADDC   Add with Carry
1E,1F  STORE  Store
20,21  SUBS   Subtract, Signed
22,23  SUBU   Subtract, Unsigned
24,25  SUB    Subtract
26,27  LOADSET  Load and Set
28,29  SUBCS  Subtract with Carry, Signed
2A,2B  SUBCU  Subtract with Carry, Unsigned
2C,2D  SUBC   Subtract with Carry
2E,2F  CPBYTE  Compare Bytes
30,31  SUBRS  Subtract Reverse, Signed
32,33  SUBRU  Subtract Reverse, Unsigned
34,35  SUBR   Subtract Reverse
36,37  LOADM  Load Multiple
38,39  SUBRCS  Subtract Reverse with Carry, Signed
3A,3B  SUBRCU  Subtract Reverse with Carry, Unsigned
3C,3D  SUBRC  Subtract Reverse with Carry
3E,3F  STOREM  Store Multiple
40,41  CPLT   Compare Less Than
42,43  CPLTU  Compare Less Than, Unsigned
44,45  CPLE   Compare Less Than or Equal To
46,47  CPLEU  Compare Less Than or Equal To, Unsigned
48,49  CPGT   Compare Greater Than
4A,4B  CPGTU  Compare Greater Than, Unsigned
4C,4D  CPGE  Compare Greater Than or Equal To
4E,4F  CPGEU  Compare Greater Than or Equal To, Unsigned
50,51  ASLT   Assert Less Than
52,53  ASLTTU  Assert Less Than, Unsigned
54,55  ASLE   Assert Less Than or Equal To
56,57  ASLEU  Assert Less Than or Equal To, Unsigned
<table>
<thead>
<tr>
<th>Code</th>
<th>Instruction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>58,59</td>
<td>ASGT</td>
<td>Assert Greater Than</td>
</tr>
<tr>
<td>5A,5B</td>
<td>ASGTU</td>
<td>Assert Greater Than, Unsigned</td>
</tr>
<tr>
<td>5C,5D</td>
<td>ASGE</td>
<td>Assert Greater Than or Equal To</td>
</tr>
<tr>
<td>5E,5F</td>
<td>ASGEU</td>
<td>Assert Greater Than or Equal To, Unsigned</td>
</tr>
<tr>
<td>60,61</td>
<td>CPEQ</td>
<td>Compare Equal To</td>
</tr>
<tr>
<td>62,63</td>
<td>CPNEQ</td>
<td>Compare Not Equal To</td>
</tr>
<tr>
<td>64,65</td>
<td>MUL</td>
<td>Multiply Step</td>
</tr>
<tr>
<td>66,67</td>
<td>MULL</td>
<td>Multiply Last Step</td>
</tr>
<tr>
<td>68,69</td>
<td>DIV0</td>
<td>Divide Initialize</td>
</tr>
<tr>
<td>6A,6B</td>
<td>DIV</td>
<td>Divide Step</td>
</tr>
<tr>
<td>6C,6D</td>
<td>DIVL</td>
<td>Divide Last Step</td>
</tr>
<tr>
<td>6E,6F</td>
<td>DIVREM</td>
<td>Divide Remainder</td>
</tr>
<tr>
<td>70,71</td>
<td>ASEQ</td>
<td>Assert Equal To</td>
</tr>
<tr>
<td>72,73</td>
<td>ASNEQ</td>
<td>Assert Not Equal To</td>
</tr>
<tr>
<td>74,75</td>
<td>MULU</td>
<td>Multiply Step, Unsigned</td>
</tr>
<tr>
<td>78,79</td>
<td>INHW</td>
<td>Insert Half-Word</td>
</tr>
<tr>
<td>7A,7B</td>
<td>EXTRACT</td>
<td>Extract Word, Bit-Aligned</td>
</tr>
<tr>
<td>7C,7D</td>
<td>EXH</td>
<td>Extract Half-Word</td>
</tr>
<tr>
<td>7E</td>
<td>EXHWS</td>
<td>Extract Half-Word, Sign-Extended</td>
</tr>
<tr>
<td>80,81</td>
<td>SLL</td>
<td>Shift Left Logical</td>
</tr>
<tr>
<td>82,83</td>
<td>SRL</td>
<td>Shift Right Logical</td>
</tr>
<tr>
<td>86,87</td>
<td>SRA</td>
<td>Shift Right Arithmetic</td>
</tr>
<tr>
<td>88</td>
<td>IRET</td>
<td>Interrupt Return</td>
</tr>
<tr>
<td>89</td>
<td>HALT</td>
<td>Enter HALT Mode</td>
</tr>
<tr>
<td>8C</td>
<td>IRETINV</td>
<td>Interrupt Return and Invalidate</td>
</tr>
<tr>
<td>90,91</td>
<td>AND</td>
<td>AND Logical</td>
</tr>
<tr>
<td>92,93</td>
<td>OR</td>
<td>OR Logical</td>
</tr>
<tr>
<td>94,95</td>
<td>XOR</td>
<td>Exclusive-OR Logical</td>
</tr>
<tr>
<td>96,97</td>
<td>XNOR</td>
<td>Exclusive-NOR Logical</td>
</tr>
<tr>
<td>98,99</td>
<td>NOR</td>
<td>NOR Logical</td>
</tr>
<tr>
<td>9A,9B</td>
<td>NAND</td>
<td>NAND Logical</td>
</tr>
<tr>
<td>9C,9D</td>
<td>ANDN</td>
<td>AND-NOT Logical</td>
</tr>
<tr>
<td>9E</td>
<td>SETIP</td>
<td>Set Indirect Pointers</td>
</tr>
<tr>
<td>9F</td>
<td>INV</td>
<td>Invalidate</td>
</tr>
<tr>
<td>A0,A1</td>
<td>JMP</td>
<td>Jump</td>
</tr>
<tr>
<td>A4,A5</td>
<td>JMPF</td>
<td>Jump False</td>
</tr>
<tr>
<td>A8,A9</td>
<td>CALL</td>
<td>Call Subroutine</td>
</tr>
<tr>
<td>AA,AB</td>
<td>ORN</td>
<td>OR-NOT Logical</td>
</tr>
<tr>
<td>AC,AD</td>
<td>JMPT</td>
<td>Jump True</td>
</tr>
<tr>
<td>B4,B5</td>
<td>JMPFDEC</td>
<td>Jump False and Decrement</td>
</tr>
<tr>
<td>B6</td>
<td>MFTLB</td>
<td>Move from Translation Look-Aside Buffer Register</td>
</tr>
<tr>
<td>BE</td>
<td>MTTLB</td>
<td>Move to Translation Look-Aside Buffer Register</td>
</tr>
<tr>
<td>BF</td>
<td>Reserved for emulation (trap vector number 28)</td>
<td></td>
</tr>
<tr>
<td>C0</td>
<td>JMPI</td>
<td>Jump Indirect</td>
</tr>
<tr>
<td>C4</td>
<td>JMPFI</td>
<td>Jump False Indirect</td>
</tr>
<tr>
<td>C6</td>
<td>MFSR</td>
<td>Move from Special Register</td>
</tr>
<tr>
<td>C8</td>
<td>CALLI</td>
<td>Call Subroutine, Indirect</td>
</tr>
<tr>
<td>CC</td>
<td>JMPTI</td>
<td>Jump True Indirect</td>
</tr>
<tr>
<td>CE</td>
<td>MTSR</td>
<td>Move to Special Register</td>
</tr>
<tr>
<td>CF–D6</td>
<td>Reserved for emulation (trap vector number 29)</td>
<td></td>
</tr>
<tr>
<td>D7</td>
<td>EMULATE</td>
<td>Trap to Software Emulation Routine</td>
</tr>
<tr>
<td>D8</td>
<td>FMAC</td>
<td>Floating-Point Multiply-Accumulate, Single-Precision</td>
</tr>
<tr>
<td>D9</td>
<td>DMAC</td>
<td>Floating-Point Multiply-Accumulate, Double-Precision</td>
</tr>
<tr>
<td>DA</td>
<td>FMSM</td>
<td>Floating-Point Multiply-Sum, Single-Precision</td>
</tr>
<tr>
<td>DB</td>
<td>DMSM</td>
<td>Floating-Point Multiply-Sum, Double-Precision</td>
</tr>
<tr>
<td>DC–DD</td>
<td>Reserved for emulation (trap vector numbers 28–29)</td>
<td></td>
</tr>
<tr>
<td>DE</td>
<td>MULTM</td>
<td>Integer Multiply Most-Significant Bits, Signed</td>
</tr>
<tr>
<td>DF</td>
<td>MULTMNU</td>
<td>Integer Multiply Most-Significant Bits, Unsigned</td>
</tr>
<tr>
<td>E0</td>
<td>MULTIPLY</td>
<td>Integer Multiply, Signed</td>
</tr>
<tr>
<td>E1</td>
<td>DIVIDE</td>
<td>Integer Divide, Signed</td>
</tr>
<tr>
<td>E2</td>
<td>MULTIPLU</td>
<td>Integer Multiply, Unsigned</td>
</tr>
<tr>
<td>E3</td>
<td>DIVIDU</td>
<td>Integer Divide, Unsigned</td>
</tr>
<tr>
<td>E4</td>
<td>CONVERT</td>
<td>Convert Data Format</td>
</tr>
<tr>
<td>E5</td>
<td>SQRT</td>
<td>Square Root</td>
</tr>
<tr>
<td>E6</td>
<td>CLASS</td>
<td>Classify Floating-Point Operand</td>
</tr>
<tr>
<td>E7</td>
<td>Reserved for emulation (trap vector number 39)</td>
<td></td>
</tr>
<tr>
<td>E8</td>
<td>MTACC</td>
<td>Move to Accumulator</td>
</tr>
<tr>
<td>E9</td>
<td>MFACC</td>
<td>Move from Accumulator</td>
</tr>
<tr>
<td>EA</td>
<td>FEQ</td>
<td>Floating-Point Equal To, Single-Precision</td>
</tr>
<tr>
<td>EB</td>
<td>DEQ</td>
<td>Floating-Point Equal To, Double-Precision</td>
</tr>
<tr>
<td>EC</td>
<td>FGT</td>
<td>Floating-Point Greater Than, Single-Precision</td>
</tr>
<tr>
<td>ED</td>
<td>DGT</td>
<td>Floating-Point Greater Than, Double-Precision</td>
</tr>
<tr>
<td>EE</td>
<td>FGE</td>
<td>Floating-Point Greater Than or Equal To, Single-Precision</td>
</tr>
<tr>
<td>EF</td>
<td>DGE</td>
<td>Floating-Point Greater Than or Equal To, Double-Precision</td>
</tr>
<tr>
<td>F0</td>
<td>FADD</td>
<td>Floating-Point Add, Single-Precision</td>
</tr>
<tr>
<td>F1</td>
<td>DADD</td>
<td>Floating-Point Add, Double-Precision</td>
</tr>
<tr>
<td>F2</td>
<td>FSUB</td>
<td>Floating-Point Subtract, Single-Precision</td>
</tr>
<tr>
<td>F3</td>
<td>DSUB</td>
<td>Floating-Point Subtract, Double-Precision</td>
</tr>
<tr>
<td>F4</td>
<td>FMUL</td>
<td>Floating-Point Multiply, Single-Precision</td>
</tr>
<tr>
<td>F5</td>
<td>DMUL</td>
<td>Floating-Point Multiply, Double-Precision</td>
</tr>
<tr>
<td>F6</td>
<td>FDIV</td>
<td>Floating-Point Divide, Single-Precision</td>
</tr>
<tr>
<td>F7</td>
<td>DDIV</td>
<td>Floating-Point Divide, Double-Precision</td>
</tr>
<tr>
<td>F8</td>
<td>Reserved for emulation (trap vector number 56)</td>
<td></td>
</tr>
<tr>
<td>F9</td>
<td>FDMUL</td>
<td>Floating-Point Multiply, Single-to-Double-Precision</td>
</tr>
<tr>
<td>FA–FF</td>
<td>Reserved for emulation (trap vector numbers 58–63)</td>
<td></td>
</tr>
</tbody>
</table>
## CHANNEL OPERATION TIMING

### Table A-1  Signal Summary

<table>
<thead>
<tr>
<th>Signal Name</th>
<th>Signal Function</th>
<th>Type (1)</th>
<th>Synch</th>
<th>Async</th>
</tr>
</thead>
<tbody>
<tr>
<td>A(31–0)</td>
<td>Address Bus</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>BGRT</td>
<td>Bus Grant</td>
<td>Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>BINV</td>
<td>Bus Invalid</td>
<td>Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>BREQ</td>
<td>Bus Request</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>CDA</td>
<td>Coprocessor Data Accept</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>CNTL(1–0)</td>
<td>CPU Control</td>
<td>Input</td>
<td>Async</td>
<td></td>
</tr>
<tr>
<td>D(31–0)</td>
<td>Data Bus</td>
<td>Bi-directional</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>DBACK</td>
<td>Data Burst Acknowledge</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>DBREQ</td>
<td>Data Burst Request</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>DERR</td>
<td>Data Error</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>DRDY</td>
<td>Data Ready</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>DREQ</td>
<td>Data Request</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>DREQT(1–0)</td>
<td>Data Request Type</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>I(31–0)</td>
<td>Instruction Bus</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>IBACK</td>
<td>Instruction Burst Acknowledge</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>IBREQ</td>
<td>Instruction Burst Request</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>IERR</td>
<td>Instruction Error</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>INCLK</td>
<td>Input Clock</td>
<td>Input</td>
<td>N/A</td>
<td></td>
</tr>
<tr>
<td>INTnR(3–0)</td>
<td>Interrupt Request</td>
<td>Input</td>
<td>Async</td>
<td></td>
</tr>
<tr>
<td>IRDY</td>
<td>Instruction Ready</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>IREQ</td>
<td>Instruction Request</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>IREQT</td>
<td>Instruction Request Type</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
</tbody>
</table>

(1) The signals labeled "3-state output" and "bi-directional" (except SYSCLK) are disabled when the channel is granted to an external master. All outputs (except MSERR) may be disabled by asserting the TEST input.
### Table A-1  Signal Summary (continued)

<table>
<thead>
<tr>
<th>Signal Name</th>
<th>Signal Function</th>
<th>Type (1)</th>
<th>Synch</th>
<th>Async</th>
</tr>
</thead>
<tbody>
<tr>
<td>LOCK</td>
<td>Lock</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>MPGM(1–0)</td>
<td>MMU Programmable</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>MSERR</td>
<td>Master/Slave Error</td>
<td>Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>OPT(2–0)</td>
<td>Option Control</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>PDA</td>
<td>Pipelined Data Access</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>PEN</td>
<td>Pipeline Enable</td>
<td>Input</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>PIN169</td>
<td>Hardware-Development System</td>
<td>Alignment</td>
<td>N/A</td>
<td></td>
</tr>
<tr>
<td>PIA</td>
<td>Pipelined Instruction Access</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>PWRCLK</td>
<td>N/A</td>
<td>SYSCLK Power</td>
<td>N/A</td>
<td></td>
</tr>
<tr>
<td>R/W</td>
<td>Read/Write</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>RESET</td>
<td>Reset</td>
<td>Input</td>
<td>Async</td>
<td></td>
</tr>
<tr>
<td>STAT(2–0)</td>
<td>CPU Status</td>
<td>Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>SUP/US</td>
<td>Supervisor/User Mode</td>
<td>3-State Output</td>
<td>Synch</td>
<td></td>
</tr>
<tr>
<td>SYSCLK</td>
<td>System Clock</td>
<td>Bi-directional</td>
<td>N/A</td>
<td></td>
</tr>
<tr>
<td>TEST</td>
<td>Test Mode</td>
<td>Input</td>
<td>Async</td>
<td></td>
</tr>
<tr>
<td>TRAP(1–0)</td>
<td>Trap Request</td>
<td>Input</td>
<td>Async</td>
<td></td>
</tr>
<tr>
<td>WARN</td>
<td>Warn</td>
<td>Edge-Sensitive Input</td>
<td>Async</td>
<td></td>
</tr>
</tbody>
</table>

(1) The signals labeled "3-state output" and "bi-directional" (except SYSCLK) are disabled when the channel is granted to an external master. All outputs (except MSERR) may be disabled by asserting the TEST input.
Figure A-1  Instruction Read—Simple Access

- SYSCLK
- A(31–0)
- SUP/US
- MPGM(1–0)
- IREQT
- IREQ
- PIA
- IREQ
- IBREQ
- BINV
- I(31–0)
- TRDY
- TERR
- PEN
- IBACK

CHANNEL OPERATION TIMING  A-3
Figure A-2 Instruction Read—Simple Access with IRDY Delayed

SYCLK

A(31-0)

Address N

SUP/US

MPGM(1-0)

IREQT

TREQ

PTA

TBRREQ

BINV

I(31-0)

Instr N

IRDY

TERR

PEN

TBACK
Figure A-3  Instruction Read—Pipelined Access

SYSLCK

A(31-0)

SUP/US

MPGM(1-0)

IREQ/T

IREQ

PIA

IBREQ

BINV

I(31-0)

IRDY

TERR

PEN

IBACK
Figure A-4  Instruction Read—Establishing Burst-Mode Access

SYSCLK

A(31–0) Address N

SUP/US MPGM(1–0) Address N

IREQT

IREQ

PTA

IBREQ

BINV

I(31–0) Instr N N+1 N+2 N+3

IRDY

IERA

PEN

IBACK
Figure A-5 Instruction Read—Burst-Mode Access Suspended by Slave

Channel Operation Timing

- SYSCLK
- A(31–0)
- SUP/US
- MPGM(1–0)
- IREQ
- IREQ
- IREQ
- IBREQ
- BINV
- I(31–0) Instr N
- IRDY
- IERR
- PEN
- IBACK

1 or More Cycles
Figure A-6 Instruction Read—Burst-Mode Access Preempted by Slave

SYSCLK

A(31-0)

SUP/US

MPGM(1-0)

IREQT

IREQ

PIA

TBREQ

BINV

I(31-0) Instr N N+1

TRDY

TERR

PEN

IBACK
Figure A-7  Instruction Read—Burst-Mode Access Suspended by Master

SYSCLK

A(31–0)

SUP/US

MPGM(1–0)

IREQT

IREQ

PIA

IBREQ

BINV

I(31–0) Instr N \( N + 1 \) \( N + 2 \)

TRDY

TERR

PEN

IBACK

1 or More Cycles
Figure A-8  Instruction Read—Burst-Mode Access Suspended by Master and Later Preempted by Slave
Figure A-9  Instruction Read—Burst-Mode Access Canceled by Slave*

A(31-0)
SUP/US
MPGM(1-0)
IREQT

Note: This may result in a trap.
Figure A-10  Instruction Read—Burst-Mode Access Ended by Master (Preempted, Terminated, or Canceled)

- SYSCLK
- A(31–0)
- SUP/US
- MPGM(1–0)
- IREQT
- IREQ
- FIAT
- IBREQ
- BINV
- I(31–0)
- INstr  N
- IRDY
- IERR
- PEN
- IBACK

Address M or N + 2

1 or More Cycles
Figure A-11  Instruction Read—TLB Miss or Protection Violation

SYSCLK

A(31-0)

SUP/US
MPGM(1-0)
IREQT

IREQ

PTA

IBREQ

BINV

I(31-0)

IRDY

IERR

PEN

IBACK

Address N

Address N

CHANNEL OPERATION TIMING A-13
Figure A-12  Instruction Read—Pipelined Access with TLB Miss or Protection Violation

SYSLCK

A(31–0) Address N  Address M

SUP/US  MPGM(1–0)

IREQT

IREQ

PA

IBREQ

BINV

I(31–0)  Instr N

IRDY

IERR

PEN

IBACK
Figure A-13  Instruction Read—Error Detected by Slave*

*Note: This may result in a trap.
Figure A-14 Data Read—Simple Access

SYSCLK

A(31–0)

SUP/US

LOCK

MPGM(1–0)

OPT(2–0)

DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)

DRDY

DERR

PEN

DBACK

Address N

Data N
Figure A-15  Data Write—Simple Access

![Diagram of data write simple access]
Figure A-16  Data Read—Simple Access with DRDY Delayed

SYSCLK

A(31-0)

SUP/US

LOCK

MPGM(1-0)

OPT(2-0)

DREQT(1-0)

R/W

DREQ

PDA

DBREQ

BINV

D(31-0)

DRDY

DERR

FEN

DBACK

0 or More Cycles

Address N

Address N

Data N
Figure A-17  Data Write—Simple Access with DRDY Delayed

- SYSCLK
- A(31-0)
- SUP/US
- LOCK
- MPGM(1-0)
- OPT(2-0)
- DREQT(1-0)
- R/W
- DREQ
- PDA
- DBREQ
- BINV
- D(31-0)
- DRDY
- DERR
- PEN
- DBACK

0 or More Cycles

Address N

Data N
Figure A-18  Data Read Followed by Data Write—Simple Access

- SYSCLK
- A(31–0) Address N Address M
- SUP/US
- LOCK
- MPGM(1–0) Address N Address M
- OPT(2–0)
- DREQT(1–0)
- R/W
- DREQ
- PDA
- DBREQ
- BINV
- D(31–0) Data N Data M
- DRDY
- DERR
- PEN
- DBACK
Figure A-19  Load and Set Instruction

SYCLK

A(31–0) Address N Address N

SUP/US OPT(2–0) DREQT(1–0)

Address N Address N

LOCK

R/W

DREQ

PDA

DBREQ

BINV

D(31–0) Data N All 1's

DRDY

DERR

PEN

DBACK
Figure A-20  Data Read—Pipelined Access

SYSCLK

A(31–0)  Address N  Address M

SUP/US
LOCK

MPGM(1–0)  Address N  Address M
OPT(2–0)
DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)  Data N  Data M

DRDY

DERR

PEN

DBACK
Figure A-21  Data Write—Pipelined Access
Figure A-22  Data Read Followed by Data Write—Pipelined Access (Not Used by Processor)
Figure A-23  Data Write Followed by Data Read—Pipelined Access
Figure A-24  Data Read—Establishing Burst-Mode Access
Figure A-25  Data Write—Establishing Burst-Mode Access
Figure A-26  Data Read—Burst-Mode Access Suspended by Slave

SYSLCK

A(31–0)

SUP/US

LOCK

MPGM(1–0)

OPT(2–0)

DREQ(1–0)

R/W

DREQ

PDA

DBREQ

BINv

D(31–0)

Data

N

N + 1

N + 2

DRDY

DERR

PEN

DBACK
Figure A-27  Data Write—Burst-Mode Access Suspended by Slave

- SYSCLK
- A(31–0)
- SUP/US
- LOCK
- MPGM(1–0)
- OPT(2–0)
- DREQT(1–0)
- R/W
- DREQ
- PDA
- DBREQ
- BINV
- D(31–0) Data N  N+1  N+2
- DRDY
- DERR
- PEN
- DBACK

1 or More Cycles
Figure A-28  Data Read—Burst-Mode Access Suspended by Master (Not Used by Processor)

SYSCLK

A(31–0)

SUP/US

LOCK

MPGM(1–0)

OPT(2–0)

DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)  \( \text{Data} \ N \ \text{N+1} \ \text{N+2} \)

DRDY

DERR

PEN

DBACK
Figure A-29  Data Write—Burst-Mode Access Suspended by Master (Not Used by Processor)

SYSCLK

A(31–0)

SUP/US LOCK

OPT(2–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0) Data N N+1 N+2

DRDY

DErr

PEN

DBACK
Figure A-30  Data Read—Burst-Mode Access Preempted by Slave

SYSCLK

A(31–0)

SUP/US

LOCK

MPGM(1–0)

OPT(2–0)

DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)  

Data N  

N + 1

DRDY

DERR

PEN

DBACK
Figure A-31  Data Write—Burst-Mode Access Preempted by Slave

SYSCLK

A(31–0)  Address N + 2

SUP/US LOCK
MPGM(1–0)  Address N + 2
OPT(2–0)
DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)  Data N  N + 1  N + 2

DHDY

DERR

PEN

DBACK

CHANNEL OPERATION TIMING  A-33
Figure A-32  Data Read—Burst-Mode Access Suspended by Master and Later Preempted by Slave (Not Used by Processor)
Figure A-33  Data Write—Burst-Mode Access Suspended by Master and Later Preempted by Slave (Not Used by Processor)
Figure A-34  Data Read—Burst-Mode Access Canceled by Slave*

Note: This results in a trap.
Figure A-35  Data Write—Burst-Mode Access Canceled by Slave*

*Note: This results in a trap.
Figure A-36  Data Read—Burst-Mode Access Ended by Master (Preempted, Terminated, or Canceled)

- SYSCLK
- A(31–0) with address M or N+2
- SUP/US
- LOCK
- MPGM(1–0)
- OPT(2–0)
- DREQ(1–0)
- R/W
- DREQ
- PDA
- DBREQ
- BINV
- D(31–0) with data N to N+1
- DRDY
- DERR
- PEN
- DBACK
Figure A-37  Data Write—Burst-Mode Access Ended by Master (Preempted, Terminated, or Canceled)

- SYSCLK
- A(31-0)
- SUP/US
- LOCK
- MPG(1-0)
- OPT(2-0)
- DREQ(1-0)
- R/W
- DREQ
- PDA
- DBREQ
- BINV
- D(31-0)
- DRDY
- DERR
- PEN
- DBACK

1 or More Cycles
Address M or N + 2
Address M or N + 2
Data N
Data M

CHANNEL OPERATION TIMING  A-39
Figure A-38  Data Read—TLB Miss or Protection Violation

SYSCLK

A(31–0)

SUP/US LOCK
MPGM(1–0)
OPT(2–0)
DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)

DRDY

DERR

PEN

DBACK
Figure A-39  Data Write—TLB Miss or Protection Violation

- SYSCLK
- A(31-0)
- SUP/US LOCK
- MPGM(1-0)
- OPT(2-0)
- DREQT(1-0)
- R/W
- DREG
- PDA
- DBREG
- BINV
- D(31-0)
- DBACK

Address N

Data N
Figure A-40  
Data Read—Pipelined Access with TLB Miss or Protection Violation

SYSCLK

A(31–0)

SUP/US

LOCK

MPGM(1–0)

OPT(2–0)

DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)

DRDY

DERR

PEN

DBACK

Address N  Address M

Address N  Address M
Figure A-41  Data Write—Pipelined Access with TLB Miss or Protection Violation

SYSCLK

A(31–0) Address N Address M

SUP/US  LOCK

MPGM(1–0) Address N Address M

OPT(2–0)

DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0) Data N

DRDY

DERR

PEN

DBACK
Figure A-42  Data Read—Error Detected by Slave

SYSCCLK

A(31–0)  Address N

SUP/US

LOCK

MPGM(1–0)

OPT(2–0)

DREQ(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0)

DRDY

DERF

PEN

DBACK
Figure A-43  Data Write—Error Detected by Slave

SYSCLK

A(31–0) Address N

SUP/US LOCK

MPGM(1–0)

OPT(2–0)

DREQT(1–0)

R/W

DREQ

PDA

DBREQ

BINV

D(31–0) Data N

DRDY

DEFR

PEN

DBACK

CHANNEL OPERATION TIMING A-45
Figure A-44  Channel Transfer from Processor to External Master

SYSCLK
BREQ
BGRT
BINV

A(31-0) Address N Address M
R/W
DREQ
PDA
DBREQ

D(31-0) Data N N+1 N+2
DRDY
ERR
PEN
DBACK

CHANNEL OPERATION TIMING
Figure A-45  Channel Transfer from External Master to Processor

SYSCLK

BREQ

BGHT

BINV

A(31-0)

R/W

DREQ

PDA

DBREQ

D(31-0)

DRDY

DERR

PEN

DBACK

CHANNEL OPERATION TIMING A-47
# APPENDIX B

## REGISTER SUMMARY

### Figure B-1  General-Purpose Register Organization

<table>
<thead>
<tr>
<th>Absolute REG #</th>
<th>General-Purpose Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Indirect Pointer Access</td>
</tr>
<tr>
<td>1</td>
<td>Stack Pointer</td>
</tr>
<tr>
<td>2</td>
<td>Condition Code Accumulator</td>
</tr>
<tr>
<td>3</td>
<td>Condition Code Accumulator, Shifted</td>
</tr>
<tr>
<td>4 THRU 63</td>
<td>Not Implemented</td>
</tr>
<tr>
<td>64</td>
<td>Global Register 64</td>
</tr>
<tr>
<td>65</td>
<td>Global Register 65</td>
</tr>
<tr>
<td>66</td>
<td>Global Register 66</td>
</tr>
<tr>
<td>126</td>
<td>Global Register 126</td>
</tr>
<tr>
<td>127</td>
<td>Global Register 127</td>
</tr>
<tr>
<td>128</td>
<td>Local Register 125</td>
</tr>
<tr>
<td>129</td>
<td>Local Register 126</td>
</tr>
<tr>
<td>130</td>
<td>Local Register 127</td>
</tr>
<tr>
<td>131</td>
<td>Local Register 0</td>
</tr>
<tr>
<td>132</td>
<td>Local Register 1</td>
</tr>
<tr>
<td>254</td>
<td>Local Register 123</td>
</tr>
<tr>
<td>255</td>
<td>Local Register 124</td>
</tr>
</tbody>
</table>

Global Registers

Local Registers

Stack Pointer = 131 (example)
### Figure B-2  Register Bank Organization

<table>
<thead>
<tr>
<th>Register Bank Protect Register Bit</th>
<th>Absolute-Register Numbers</th>
<th>General-Purpose Registers</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>4 through 15</td>
<td>Bank 0 (unimplemented)</td>
</tr>
<tr>
<td>1</td>
<td>16 through 31</td>
<td>Bank 1 (unimplemented)</td>
</tr>
<tr>
<td>2</td>
<td>32 through 47</td>
<td>Bank 2 (unimplemented)</td>
</tr>
<tr>
<td>3</td>
<td>48 through 63</td>
<td>Bank 3 (unimplemented)</td>
</tr>
<tr>
<td>4</td>
<td>64 through 79</td>
<td>Bank 4</td>
</tr>
<tr>
<td>5</td>
<td>80 through 95</td>
<td>Bank 5</td>
</tr>
<tr>
<td>6</td>
<td>96 through 111</td>
<td>Bank 6</td>
</tr>
<tr>
<td>7</td>
<td>112 through 127</td>
<td>Bank 7</td>
</tr>
<tr>
<td>8</td>
<td>128 through 143</td>
<td>Bank 8</td>
</tr>
<tr>
<td>9</td>
<td>144 through 159</td>
<td>Bank 9</td>
</tr>
<tr>
<td>10</td>
<td>160 through 175</td>
<td>Bank 10</td>
</tr>
<tr>
<td>11</td>
<td>176 through 191</td>
<td>Bank 11</td>
</tr>
<tr>
<td>12</td>
<td>192 through 207</td>
<td>Bank 12</td>
</tr>
<tr>
<td>13</td>
<td>208 through 223</td>
<td>Bank 13</td>
</tr>
<tr>
<td>14</td>
<td>224 through 239</td>
<td>Bank 14</td>
</tr>
<tr>
<td>15</td>
<td>240 through 255</td>
<td>Bank 15</td>
</tr>
</tbody>
</table>
Figure B-3  Special Purpose Registers

REG #

0

Vector Area Base Address (VAB)

1

Reserved

Old Processor Status (OPS)

CA  TE  TU  LK  WM  PI  IM  DA
IP  TP  FZ  RE  PD  SM  DI

2

Reserved

Current Processor Status (CPS)

CA  TE  TU  LK  WM  PI  IM  DA
MM  IP  TP  FZ  RE  PD  SM  DI

3

PRL

Configuration (CFG)

Reserved

CO  VF  BO  CD
EE  DW  RV  CP

4

Reserved

Channel Address (CHA)

5

Reserved

Channel Data (CHD)

6

Channel Control (CHC)

CE
LS  ST  Res
ML  LA  TF
NN  CV
Figure B-3  Special Purpose Registers (continued)

REG #

7

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Reserved

Register Bank Protect (RBP)

8

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Reserved

TCV

Timer Counter (TMC)

9

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Reserved

TRV

Timer Reload (TMR)

10

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

PC0

Program Counter 0 (PC0)

11

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

PC1

Program Counter 1 (PC1)

12

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

PC2

Program Counter 2 (PC2)

13

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Reserved

PS

PID

MMU Configuration (MMU)

14

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Reserved

LRU

LRU Recommendation (LRU)
### Figure B-3  Special Purpose Registers (continued)

<table>
<thead>
<tr>
<th>REG #</th>
</tr>
</thead>
<tbody>
<tr>
<td>15</td>
</tr>
<tr>
<td>16</td>
</tr>
<tr>
<td>17</td>
</tr>
<tr>
<td>18</td>
</tr>
<tr>
<td>19</td>
</tr>
<tr>
<td>20</td>
</tr>
<tr>
<td>21</td>
</tr>
<tr>
<td>22</td>
</tr>
<tr>
<td>23</td>
</tr>
</tbody>
</table>

#### Reason Vector (RSN)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>RSN</td>
</tr>
</tbody>
</table>

#### Region Mapping Address 0 (RMA0)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>VBA</td>
<td></td>
<td></td>
<td></td>
<td>PBA</td>
</tr>
</tbody>
</table>

#### Region Mapping Address 1 (RMA1)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>VBA</td>
<td></td>
<td></td>
<td></td>
<td>PBA</td>
</tr>
</tbody>
</table>

#### Region Mapping Control 0 (RMC0)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>PGM</td>
<td></td>
<td></td>
<td>TID</td>
</tr>
</tbody>
</table>

#### Region Mapping Control 1 (RMC1)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>PGM</td>
<td></td>
<td></td>
<td>TID</td>
</tr>
</tbody>
</table>

#### Shadow Program Counter 0 (SPC0)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPC0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Shadow Program Counter 1 (SPC1)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPC1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Instruction Breakpoint Address 0 (IBA0)

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>IBA</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Figure B-3  Special Purpose Registers (continued)

REG #

24

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>BPID</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Instruction Breakpoint Control 0 (IBC0)

25

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

Instruction Breakpoint Address 1 (IBA1)

26

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>BPID</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Instruction Breakpoint Control 1 (IBC1)

128

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>IPC</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Indirect Pointer C (IPC)

129

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>IPA</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Indirect Pointer A (IPA)

130

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>IPB</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Indirect Pointer B (IPB)

131

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Q</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Q(Q)

132

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Reserved</td>
<td>FC</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

ALU Status (ALU)

133

<table>
<thead>
<tr>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

Byte Pointer (BP)
### Figure B-3  Special Purpose Registers (continued)

#### Funnel Shift Count (FC)

<table>
<thead>
<tr>
<th>REG #</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>134</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>FC</td>
</tr>
</tbody>
</table>

#### Load/Store Count Remaining (CR)

<table>
<thead>
<tr>
<th>REG #</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>135</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>CR</td>
</tr>
</tbody>
</table>

#### Floating-Point Environment (FPE)

<table>
<thead>
<tr>
<th>REG #</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>160</td>
<td>Reserved</td>
<td>ACF</td>
<td>FRM</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Integer Environment (INTE)

<table>
<thead>
<tr>
<th>REG #</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>161</td>
<td>Reserved</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Floating-Point Status (FPS)

<table>
<thead>
<tr>
<th>REG #</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>162</td>
<td>Reserved</td>
<td>Res</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Exception Opcode (EXOP)

<table>
<thead>
<tr>
<th>REG #</th>
<th>31</th>
<th>23</th>
<th>15</th>
<th>7</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>164</td>
<td>Reserved</td>
<td>IOP</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Figure B-4  Special Purpose Registers

<table>
<thead>
<tr>
<th>REG #</th>
<th>TLB Set 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>TLB Entry Line 0 Word 0</td>
</tr>
<tr>
<td>1</td>
<td>TLB Entry Line 0 Word 1</td>
</tr>
<tr>
<td>2</td>
<td>TLB Entry Line 1 Word 0</td>
</tr>
<tr>
<td>3</td>
<td>TLB Entry Line 1 Word 1</td>
</tr>
<tr>
<td>60</td>
<td>TLB Entry Line 30 Word 0</td>
</tr>
<tr>
<td>61</td>
<td>TLB Entry Line 30 Word 1</td>
</tr>
<tr>
<td>62</td>
<td>TLB Entry Line 31 Word 0</td>
</tr>
<tr>
<td>63</td>
<td>TLB Entry Line 31 Word 1</td>
</tr>
<tr>
<td>64</td>
<td>TLB Entry Line 0 Word 0</td>
</tr>
<tr>
<td>65</td>
<td>TLB Entry Line 0 Word 1</td>
</tr>
<tr>
<td>66</td>
<td>TLB Entry Line 1 Word 0</td>
</tr>
<tr>
<td>67</td>
<td>TLB Entry Line 1 Word 1</td>
</tr>
<tr>
<td>124</td>
<td>TLB Entry Line 30 Word 0</td>
</tr>
<tr>
<td>125</td>
<td>TLB Entry Line 30 Word 1</td>
</tr>
<tr>
<td>126</td>
<td>TLB Entry Line 31 Word 0</td>
</tr>
<tr>
<td>127</td>
<td>TLB Entry Line 31 Word 1</td>
</tr>
</tbody>
</table>

Figure B-5  Translation Look-Aside Buffer Entries

Word 0

31 23 15 7 0

<table>
<thead>
<tr>
<th>VTAG</th>
<th>VE</th>
<th>SW</th>
<th>UR</th>
<th>UE</th>
</tr>
</thead>
<tbody>
<tr>
<td>SR</td>
<td>SE</td>
<td>UW</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Word 1

31 23 15 7 0

<table>
<thead>
<tr>
<th>RPN</th>
<th>Res</th>
<th>Res</th>
<th>PGM</th>
<th>U</th>
</tr>
</thead>
<tbody>
<tr>
<td>Res</td>
<td></td>
<td></td>
<td></td>
<td>IO</td>
</tr>
<tr>
<td>Label</td>
<td>Field Name</td>
<td>Register</td>
<td>Bit</td>
<td></td>
</tr>
<tr>
<td>-------</td>
<td>----------------------------------</td>
<td>-------------------------------</td>
<td>-------</td>
<td></td>
</tr>
<tr>
<td>ACF</td>
<td>Accumulator Format</td>
<td>Floating-Point Environment</td>
<td>10-9</td>
<td></td>
</tr>
<tr>
<td>B0</td>
<td>Bank 0 Protection Bit</td>
<td>Register Bank Protect</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>B1</td>
<td>Bank 1 Protection Bit</td>
<td>Register Bank Protect</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>B2</td>
<td>Bank 2 Protection Bit</td>
<td>Register Bank Protect</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>B3</td>
<td>Bank 3 Protection Bit</td>
<td>Register Bank Protect</td>
<td>3</td>
<td></td>
</tr>
<tr>
<td>B4</td>
<td>Bank 4 Protection Bit</td>
<td>Register Bank Protect</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>B5</td>
<td>Bank 5 Protection Bit</td>
<td>Register Bank Protect</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>B6</td>
<td>Bank 6 Protection Bit</td>
<td>Register Bank Protect</td>
<td>6</td>
<td></td>
</tr>
<tr>
<td>B7</td>
<td>Bank 7 Protection Bit</td>
<td>Register Bank Protect</td>
<td>7</td>
<td></td>
</tr>
<tr>
<td>B8</td>
<td>Bank 8 Protection Bit</td>
<td>Register Bank Protect</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>B9</td>
<td>Bank 9 Protection Bit</td>
<td>Register Bank Protect</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>B10</td>
<td>Bank 10 Protection Bit</td>
<td>Register Bank Protect</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>B11</td>
<td>Bank 11 Protection Bit</td>
<td>Register Bank Protect</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td>B12</td>
<td>Bank 12 Protection Bit</td>
<td>Register Bank Protect</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>B13</td>
<td>Bank 13 Protection Bit</td>
<td>Register Bank Protect</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td>B14</td>
<td>Bank 14 Protection Bit</td>
<td>Register Bank Protect</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td>B15</td>
<td>Bank 15 Protection Bit</td>
<td>Register Bank Protect</td>
<td>15</td>
<td></td>
</tr>
<tr>
<td>BEN</td>
<td>Breakpoint Enable</td>
<td>Instruction Breakpoint Control</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td>BHO</td>
<td>Breakpoint Has Occurred</td>
<td>Instruction Breakpoint Control</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>BO</td>
<td>Byte Order</td>
<td>Configuration</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>BP</td>
<td>Byte Pointer</td>
<td>ALU Status</td>
<td>6–5</td>
<td></td>
</tr>
<tr>
<td>BPID</td>
<td>Breakpoint Process Identifier</td>
<td>Instruction Breakpoint Control</td>
<td>7–0</td>
<td></td>
</tr>
<tr>
<td>BRM</td>
<td>Break ROM</td>
<td>Instruction Breakpoint Control</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>BSY</td>
<td>Break or Synchronize</td>
<td>Instruction Breakpoint Control</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>BTE</td>
<td>Break on Translate Enabled</td>
<td>Instruction Breakpoint Control</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>C</td>
<td>Carry</td>
<td>ALU Status</td>
<td>7</td>
<td></td>
</tr>
<tr>
<td>CA</td>
<td>Coprocessor Active</td>
<td>Current Processor Status</td>
<td>15</td>
<td></td>
</tr>
<tr>
<td>CD</td>
<td>Branch Target Cache Memory Disable</td>
<td>Configuration</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>CE</td>
<td>Coprocessor Enable</td>
<td>Channel Control</td>
<td>31</td>
<td></td>
</tr>
<tr>
<td>CHA</td>
<td>Channel Address</td>
<td>Channel Address</td>
<td>31–0</td>
<td></td>
</tr>
<tr>
<td>CHD</td>
<td>Channel Data</td>
<td>Channel Data</td>
<td>31–0</td>
<td></td>
</tr>
<tr>
<td>CNTL</td>
<td>Control</td>
<td>Channel Control</td>
<td>30–24</td>
<td></td>
</tr>
<tr>
<td>CO</td>
<td>Branch Target Cache Memory ...</td>
<td>Configuration</td>
<td>6</td>
<td></td>
</tr>
<tr>
<td>CP</td>
<td>Coprocessor Present</td>
<td>Configuration</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>CR</td>
<td>Load/Store Count Remaining</td>
<td>Channel Control</td>
<td>23–16</td>
<td></td>
</tr>
<tr>
<td>CV</td>
<td>Contents Valid</td>
<td>Channel Control</td>
<td>7–0</td>
<td></td>
</tr>
<tr>
<td>DA</td>
<td>Disable All Interrupts and Traps</td>
<td>Current Processor Status</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>DF</td>
<td>Divide Flag</td>
<td>Old Processor Status</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>ALU Status</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td>Label</td>
<td>Field Name</td>
<td>Register</td>
<td>Bit</td>
<td></td>
</tr>
<tr>
<td>-------</td>
<td>-------------------------------------------</td>
<td>-----------------------------------</td>
<td>-----</td>
<td></td>
</tr>
<tr>
<td>DI</td>
<td>Disable Interrupts</td>
<td>Current Processor Status</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>DM</td>
<td>Floating-Point Divide By Zero Mask</td>
<td>Floating-Point Environment</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>DO</td>
<td>Integer Division Overflow Mask</td>
<td>Integer Environment</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>DS</td>
<td>Floating-Point Divide By Zero Sticky</td>
<td>Floating-Point Status</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>DT</td>
<td>Floating-Point Divide By Zero Trap</td>
<td>ALU Status</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td>DW</td>
<td>Data Width Enable</td>
<td>Configuration</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>EE</td>
<td>Early Load Enable</td>
<td>Configuration</td>
<td>7</td>
<td></td>
</tr>
<tr>
<td>FF</td>
<td>Fast Floating-Point Select</td>
<td>Floating-Point Environment</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>FRM</td>
<td>Floating-Point Round Mode</td>
<td>Floating-Point Environment</td>
<td>7-6</td>
<td></td>
</tr>
<tr>
<td>FC</td>
<td>Funnel Shift Count</td>
<td>ALU Status</td>
<td>4-0</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Funnel Shift Count</td>
<td>4-0</td>
<td></td>
</tr>
<tr>
<td>FZ</td>
<td>Freeze</td>
<td>Current Processor Status</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>IBA</td>
<td>Instruction Breakpoint Address</td>
<td>Instruction Breakpoint Address 0, 1</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>IE</td>
<td>Interrupt Enable</td>
<td>Timer Reload</td>
<td>24</td>
<td></td>
</tr>
<tr>
<td>IM</td>
<td>Interrupt Mask</td>
<td>Old Processor Status</td>
<td>3-2</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Current Processor Status</td>
<td>3-2</td>
<td></td>
</tr>
<tr>
<td>IN</td>
<td>Interrupt</td>
<td>Timer Reload</td>
<td>25</td>
<td></td>
</tr>
<tr>
<td>IO</td>
<td>Input/Output</td>
<td>Region Mapping Control 0, 1</td>
<td>16</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 1</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>IOP</td>
<td>Instruction Opcode</td>
<td>Exception Opcode</td>
<td>7-0</td>
<td></td>
</tr>
<tr>
<td>IP</td>
<td>Interrupt Pending</td>
<td>Current Processor Status</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td>IPA</td>
<td>Indirect Pointer A</td>
<td>Indirect Pointer A</td>
<td>9-2</td>
<td></td>
</tr>
<tr>
<td>IPB</td>
<td>Indirect Pointer B</td>
<td>Indirect Pointer B</td>
<td>9-2</td>
<td></td>
</tr>
<tr>
<td>IPC</td>
<td>Indirect Pointer C</td>
<td>Indirect Pointer C</td>
<td>9-2</td>
<td></td>
</tr>
<tr>
<td>LA</td>
<td>Lock Active</td>
<td>Channel Control</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>LK</td>
<td>Lock</td>
<td>Current Processor Status</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>LRU</td>
<td>Least-Recently Used Entry</td>
<td>LRU Recommendation</td>
<td>6-1</td>
<td></td>
</tr>
<tr>
<td>LS</td>
<td>Load/Store</td>
<td>Channel Control</td>
<td>15</td>
<td></td>
</tr>
<tr>
<td>ML</td>
<td>Multiple Operation</td>
<td>Channel Control</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td>MM</td>
<td>Monitor Mode</td>
<td>Channel Control</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Current Processor Status</td>
<td>16</td>
<td></td>
</tr>
<tr>
<td>MO</td>
<td>Integer Multiplication Overflow Mask</td>
<td>Integer Environment</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>Negative</td>
<td>ALU Status</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>NM</td>
<td>Floating-Point Invalid Operation Mask</td>
<td>Floating-Point Environment</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>NN</td>
<td>Not Needed</td>
<td>Channel Control</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>NS</td>
<td>Floating-Point Invalid Sticky</td>
<td>Floating-Point Status</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>NT</td>
<td>Floating-Point Invalid Operation Trap</td>
<td>Floating-Point Status</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>OV</td>
<td>Overflow</td>
<td>Timer Reload</td>
<td>26</td>
<td></td>
</tr>
<tr>
<td>Label</td>
<td>Field Name</td>
<td>Register</td>
<td>Bit</td>
<td></td>
</tr>
<tr>
<td>-------</td>
<td>---------------------------------</td>
<td>-----------------------------------------------</td>
<td>-------</td>
<td></td>
</tr>
<tr>
<td>PBA</td>
<td>Physical Base Address</td>
<td>Region Mapping Address 0, 1</td>
<td>15-0</td>
<td></td>
</tr>
<tr>
<td>PC0</td>
<td>Program Counter 0</td>
<td>Program Counter 0</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>PC1</td>
<td>Program Counter 1</td>
<td>Program Counter 1</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>PC2</td>
<td>Program Counter 2</td>
<td>Program Counter 2</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>PD</td>
<td>Physical Addressing Data</td>
<td>Current Processor Status</td>
<td>6</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>6</td>
<td></td>
</tr>
<tr>
<td>PGM</td>
<td>User Programmable</td>
<td>Region Mapping Control 0, 1</td>
<td>23-22</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 1</td>
<td>7-6</td>
<td></td>
</tr>
<tr>
<td>PI</td>
<td>Physical Addressing Instructions</td>
<td>Current Processor Status</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>PID</td>
<td>Process Identifier</td>
<td>MMU Configuration</td>
<td>7-0</td>
<td></td>
</tr>
<tr>
<td>PRL</td>
<td>Processor Release Level</td>
<td>Configuration</td>
<td>31-24</td>
<td></td>
</tr>
<tr>
<td>PS</td>
<td>Page Size</td>
<td>MMU Configuration</td>
<td>9-8</td>
<td></td>
</tr>
<tr>
<td>Q</td>
<td>Quotient/Multiplier</td>
<td>Q Register</td>
<td>31-0</td>
<td></td>
</tr>
<tr>
<td>RE</td>
<td>ROM Enable</td>
<td>Current Processor Status</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>RGS</td>
<td>Region Size</td>
<td>Region Mapping Control 0, 1</td>
<td>20-17</td>
<td></td>
</tr>
<tr>
<td>RM</td>
<td>Floating-Point Reserved Operand Mask</td>
<td>Floating-Point Environment</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>RPN</td>
<td>Real Page Number</td>
<td>TLB Entry Word 1</td>
<td>31-10</td>
<td></td>
</tr>
<tr>
<td>RS</td>
<td>Floating-Point Reserved Operand Sticky</td>
<td>Floating-Point Status</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>RSN</td>
<td>Reason Vector</td>
<td>Reason Vector</td>
<td>7-0</td>
<td></td>
</tr>
<tr>
<td>RT</td>
<td>Floating-Point Reserved Operand Trap</td>
<td>Floating-Point Status</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>RV</td>
<td>ROM Vector Area</td>
<td>Configuration</td>
<td>3</td>
<td></td>
</tr>
<tr>
<td>SE</td>
<td>Supervisor Execute</td>
<td>Region Mapping Control 0, 1</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td>SM</td>
<td>Supervisor Mode</td>
<td>Current Processor Status</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>SPC0</td>
<td>Shadow Program Counter 0</td>
<td>Shadow Program Counter 0</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>SPC1</td>
<td>Shadow Program Counter 1</td>
<td>Shadow Program Counter 1</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>SPC2</td>
<td>Shadow Program Counter 2</td>
<td>Shadow Program Counter 2</td>
<td>31-2</td>
<td></td>
</tr>
<tr>
<td>SR</td>
<td>Supervisor Read</td>
<td>Region Mapping Control 0, 1</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td>ST</td>
<td>Set</td>
<td>Channel Control</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td>SW</td>
<td>Supervisor Write</td>
<td>Region Mapping Control 0, 1</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>TCV</td>
<td>Timer Count Value</td>
<td>Timer Counter</td>
<td>23-0</td>
<td></td>
</tr>
<tr>
<td>TE</td>
<td>Trace Enable</td>
<td>Current Processor Status</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td>TF</td>
<td>Transaction Faulted</td>
<td>Channel Control</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>TID</td>
<td>Task Identifier</td>
<td>Region Mapping Control 0, 1</td>
<td>7-0</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>7-0</td>
<td></td>
</tr>
<tr>
<td>TP</td>
<td>Trace Pending</td>
<td>Current Processor Status</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>TR</td>
<td>Target Register</td>
<td>Channel Control</td>
<td>9-2</td>
<td></td>
</tr>
<tr>
<td>TRV</td>
<td>Timer Reload Value</td>
<td>Timer Reload</td>
<td>23-0</td>
<td></td>
</tr>
<tr>
<td>Label</td>
<td>Field Name</td>
<td>Register</td>
<td>Bit</td>
<td></td>
</tr>
<tr>
<td>-------</td>
<td>--------------------------------------------</td>
<td>----------------------------------------------------</td>
<td>-----</td>
<td></td>
</tr>
<tr>
<td>TU</td>
<td>Trap Unaligned Access</td>
<td>Current Processor Status</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td>U</td>
<td>Usage</td>
<td>TLB Entry Word 1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>UE</td>
<td>User Execute</td>
<td>Region Mapping Control 0, 1</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>UM</td>
<td>Floating-Point Underflow Mask</td>
<td>Floating-Point Environment</td>
<td>3</td>
<td></td>
</tr>
<tr>
<td>UR</td>
<td>User Read</td>
<td>Region Mapping Control 0, 1</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>US</td>
<td>Floating-Point Underflow Sticky</td>
<td>Floating-Point Status</td>
<td>3</td>
<td></td>
</tr>
<tr>
<td>UT</td>
<td>Floating-Point Underflow Trap</td>
<td>Floating-Point Status</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td>UW</td>
<td>User Write</td>
<td>Region Mapping Control 0, 1</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>V</td>
<td>Overflow</td>
<td>ALU Status</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>VAB</td>
<td>Vector Area Base</td>
<td>Vector Area Base Address</td>
<td>31–10</td>
<td></td>
</tr>
<tr>
<td>VBA</td>
<td>Virtual Base Address</td>
<td>Region Mapping Address 0, 1</td>
<td>31–16</td>
<td></td>
</tr>
<tr>
<td>VE</td>
<td>Valid Entry</td>
<td>Region Mapping Control 0, 1</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>TLB Entry Word 0</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td>VF</td>
<td>Vector Fetch</td>
<td>Configuration</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>VM</td>
<td>Floating-Point Overflow Mask</td>
<td>Floating-Point Environment</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>VS</td>
<td>Floating-Point Overflow Sticky</td>
<td>Floating-Point Status</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>VT</td>
<td>Floating-Point Overflow Trap</td>
<td>Floating-Point Status</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>VTAG</td>
<td>Virtual Tag</td>
<td>TLB Entry Word 0</td>
<td>31–15</td>
<td></td>
</tr>
<tr>
<td>WM</td>
<td>Wait Mode</td>
<td>Current Processor Status</td>
<td>7</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Old Processor Status</td>
<td>7</td>
<td></td>
</tr>
<tr>
<td>XM</td>
<td>Floating-Point Inexact Result Mask</td>
<td>Floating-Point Environment</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>XS</td>
<td>Floating-Point Inexact Result Sticky</td>
<td>Floating-Point Status</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>XT</td>
<td>Floating-Point Inexact Result Trap</td>
<td>Floating-Point Status</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>Z</td>
<td>Zero</td>
<td>ALU Status</td>
<td>8</td>
<td></td>
</tr>
</tbody>
</table>
Table C-1 lists the latency of each single- and double-precision floating-point operation and each integer multiplication operation. Latency is the minimum time that must

<table>
<thead>
<tr>
<th>Operation (2)</th>
<th>Latency (Cycles)</th>
<th>Latency (ns @ 40 MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLASS (s.p., d.p.)</td>
<td>4</td>
<td>100 (1)</td>
</tr>
<tr>
<td>CONVERT (int → s.p.)</td>
<td>4</td>
<td>100</td>
</tr>
<tr>
<td>CONVERT (int → d.p.)</td>
<td>4</td>
<td>100</td>
</tr>
<tr>
<td>CONVERT (f.p. → int)</td>
<td>3</td>
<td>75</td>
</tr>
<tr>
<td>CONVERT (f.p. → f.p.)</td>
<td>3/4</td>
<td>75/100 (3)</td>
</tr>
<tr>
<td>DADD</td>
<td>3/4</td>
<td>75/100 (3)</td>
</tr>
<tr>
<td>DDIV</td>
<td>18</td>
<td>450 (1)</td>
</tr>
<tr>
<td>DEQ</td>
<td>3</td>
<td>50</td>
</tr>
<tr>
<td>DGE</td>
<td>3</td>
<td>50</td>
</tr>
<tr>
<td>DGT</td>
<td>3</td>
<td>50</td>
</tr>
<tr>
<td>DMAC</td>
<td>9</td>
<td>225</td>
</tr>
<tr>
<td>DSMRM</td>
<td>9</td>
<td>225</td>
</tr>
<tr>
<td>DMUL</td>
<td>6</td>
<td>150 (1)</td>
</tr>
<tr>
<td>DSUB</td>
<td>3/4</td>
<td>75/100 (3)</td>
</tr>
<tr>
<td>FADD</td>
<td>3/4</td>
<td>75/100 (3)</td>
</tr>
<tr>
<td>FDIV</td>
<td>11</td>
<td>275 (1)</td>
</tr>
<tr>
<td>FDMUL</td>
<td>3</td>
<td>75 (1)</td>
</tr>
<tr>
<td>FEQ</td>
<td>3</td>
<td>50</td>
</tr>
<tr>
<td>FGE</td>
<td>3</td>
<td>50</td>
</tr>
<tr>
<td>FGT</td>
<td>3</td>
<td>50</td>
</tr>
<tr>
<td>FMAC</td>
<td>6</td>
<td>150</td>
</tr>
<tr>
<td>FMSM</td>
<td>6</td>
<td>150</td>
</tr>
<tr>
<td>FMUL</td>
<td>3</td>
<td>75 (1)</td>
</tr>
<tr>
<td>FSUB</td>
<td>3/4</td>
<td>75/100 (3)</td>
</tr>
<tr>
<td>MFACC</td>
<td>3</td>
<td>75</td>
</tr>
<tr>
<td>MTACC</td>
<td>3/4</td>
<td>75/100 (3)</td>
</tr>
<tr>
<td>SQRT s.p.</td>
<td>28</td>
<td>700 (1)</td>
</tr>
<tr>
<td>d.p.</td>
<td>57</td>
<td>1,425</td>
</tr>
<tr>
<td>MULTIPLU</td>
<td>3</td>
<td>75</td>
</tr>
<tr>
<td>MULTIPLY</td>
<td>3</td>
<td>75</td>
</tr>
<tr>
<td>MULTM</td>
<td>3</td>
<td>75</td>
</tr>
<tr>
<td>MULTMU</td>
<td>3</td>
<td>75</td>
</tr>
</tbody>
</table>

Notes:
1. Requires additional cycles for wrapping/unwrapping of denormalized input/output operands (see Table C-3).
3. The extra cycle is required for renormalization when the input operands cause massive cancellation.
elapse after an instruction is issued before its result can be used as an input operand of a subsequent operation.

Table C-2 lists the repeat time of floating-point operations and integer multiplication operations. An instruction with a repeat time of \( N \) can be issued every \( N \) cycles.

Table C-3 shows the effect of denormalized source operands and results on instruction latency and issue rate.

If no dependencies or functional unit conflicts exist, then an instruction can be issued.

### C.2 EXCEPTIONS

In most cases, operations produce non-exceptional results, i.e., results that are equal to the infinitely precise result, rounded to the destination format. This section

**Table C-2**

<table>
<thead>
<tr>
<th>Operation (see note)</th>
<th>Repeat Time—Start New Operation Every ( N ) Cycles</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLASS (s.p., d.p.)</td>
<td>2</td>
</tr>
<tr>
<td>CONVERT (int → s.p.)</td>
<td>1</td>
</tr>
<tr>
<td>CONVERT (int → d.p.)</td>
<td>1</td>
</tr>
<tr>
<td>CONVERT (f.p. → int)</td>
<td>1</td>
</tr>
<tr>
<td>CONVERT (f.p. → f.p.)</td>
<td>1</td>
</tr>
<tr>
<td>DADD</td>
<td>1</td>
</tr>
<tr>
<td>DDIV</td>
<td>17</td>
</tr>
<tr>
<td>DEQ</td>
<td>1</td>
</tr>
<tr>
<td>DGE</td>
<td>1</td>
</tr>
<tr>
<td>DGT</td>
<td>1</td>
</tr>
<tr>
<td>DMAC</td>
<td>4</td>
</tr>
<tr>
<td>DMSM</td>
<td>4</td>
</tr>
<tr>
<td>DMUL</td>
<td>4</td>
</tr>
<tr>
<td>DSUB</td>
<td>1</td>
</tr>
<tr>
<td>FADD</td>
<td>1</td>
</tr>
<tr>
<td>FDIV</td>
<td>10</td>
</tr>
<tr>
<td>FDMUL</td>
<td>1</td>
</tr>
<tr>
<td>FEQ</td>
<td>1</td>
</tr>
<tr>
<td>FGE</td>
<td>1</td>
</tr>
<tr>
<td>FGT</td>
<td>1</td>
</tr>
<tr>
<td>FMAC</td>
<td>1</td>
</tr>
<tr>
<td>FSMN</td>
<td>1</td>
</tr>
<tr>
<td>FMUL</td>
<td>1</td>
</tr>
<tr>
<td>FSUB</td>
<td>1</td>
</tr>
<tr>
<td>MFACC</td>
<td>1</td>
</tr>
<tr>
<td>MTACC</td>
<td>1</td>
</tr>
<tr>
<td>SQRT s.p.</td>
<td>27</td>
</tr>
<tr>
<td>SQRT d.p.</td>
<td>56</td>
</tr>
<tr>
<td>MULTIPLU</td>
<td>1</td>
</tr>
<tr>
<td>MULTIPLY</td>
<td>1</td>
</tr>
<tr>
<td>MULTM</td>
<td>1</td>
</tr>
<tr>
<td>MULTMU</td>
<td>1</td>
</tr>
</tbody>
</table>

Notes: int = integer
s.p. = single-precision floating-point
d.p. = double-precision floating-point
### Table C-3  Effect on Latency of Denormalized Source Operands or Results

<table>
<thead>
<tr>
<th>Denormalized Operand Status</th>
<th>Latency Increase (1)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Cycles (ns @ 40 MHz)</td>
</tr>
<tr>
<td>One denormalized source operand</td>
<td>+4</td>
</tr>
<tr>
<td>Two denormalized source operands</td>
<td>+5</td>
</tr>
<tr>
<td>Denormalized result</td>
<td>+4</td>
</tr>
<tr>
<td>Two denormalized source operands following an operation with a denormalized result</td>
<td>+6 +240</td>
</tr>
</tbody>
</table>

Notes: 1. Only the instruction CLASS, FMUL, DMUL, FDMUL, FDIV, DDIV, and SQRT require extra cycles to process denormalized numbers. Denormalized number processing uses the adder, increasing by one cycle per denorm the latency of any instruction being issued to the adder at the same time.

2. Unwrapping of denormalized results is pipelined with other operations.

describes results produced in exceptional cases, as well as other details pertaining to the floating-point implementation.

The following terms are used in the classification of exceptions:

- \( \infty \) Infinity, a floating-point number comprising a maximum biased exponent, a zero fraction, and a sign bit of 1 or 0. \( +\infty \) indicates a positive infinity, \( -\infty \) a negative infinity.

- 0 Zero, a floating-point number comprising a biased exponent, a zero fraction, and a sign bit of 1 or 0. \( +0 \) indicates a zero with a sign bit of 0; \( -0 \) indicates a zero with a sign bit of 1.

- AQNaN An AMD Quiet Not-a-Number comprising a maximum biased exponent, a fraction of 11000...0, and a sign bit of 0. AQNaN is the only NaN reported as an operation result.

- Denorm A denormalized floating-point number; a non-zero number that is too small to be represented as a normalized floating-point number.

- FNum A finite, non-zero floating-point number. \( +\text{FNum} \) indicates a positive FNum; \( -\text{FNum} \) indicates a negative FNum.

- I0 Integer zero, an integer word consisting entirely of zeros.

- IMaxNeg The largest negative number representable in 32-bit, 2's-complement integer format. IMaxNeg has a value of 80000000, hexadecimal.

- IMaxPos The largest positive number representable in 32-bit, 2's-complement integer format. IMaxPos has a value of 7fffffff, hexadecimal.

- Inexact Result An exception indicating one of the following:
  - A rounded result of an operation not equal to the infinitely-precise result;
  - An overflowed operation with the overflow exception trap disabled (VM = 1); or
  - In fast-float mode, a non-zero intermediate result converted to a final result of zero.
**Infinitely Precise Result** The result of an operation, computed as if the exponent range and precision were unbounded.

**Intermediate Result** The result of an operation before rounding. For the purpose of describing exception handling, the intermediate result can be thought of as being equal to the infinitely-precise result.

**Invalid Operation** An exception indicating that the source operand or operands are invalid for the operation to be performed, e.g., the operation $\times 0$.

**Max** The largest representable finite floating-point number. $+\text{Max}$ indicates the largest positive finite number, $-\text{Max}$ the largest negative finite number.

**NonZ** A non-zero floating-point number. A NonZ can be either an FNum or an infinity. $+\text{NonZ}$ indicates a positive NonZ, $-\text{NonZ}$ a negative NonZ.

**Overflow** An exception indicating that the rounded result of an operation is too large to be expressed in the destination format.

**Reserved Operand** An exception indicating that an operation producing a numeric result has a reserved operand (NaN) as either a source operand or result.

**RResult** A result produced by rounding the infinitely-precise result.

**Sign (x)** The sign of operand $x$.

**UIMax** The largest representable, 32-bit, unsigned integer quantity. UIMax has a value of $\text{ffffffff}$, hexadecimal.

**Underflow** An exception indicating that the rounded result of an operation is too small to be represented in the destination format. There are two different sets of underflow criteria, depending on whether or not the underflow trap or fast-float mode is enabled:

**Underflow trap masked and fast-float mode disabled** ($\text{UM} = 1$ and $\text{FF} = 0$): An operation result underflows if a non-zero intermediate result is too small to be represented as a normalized number and the rounded result is inexact.

**Underflow trap unmasked or fast-float mode enabled** ($\text{UM} = 0$ or $\text{FF} = 1$): An operation result underflows if a non-zero intermediate result is too small to be represented as a normalized number.

The tables in Sections C.2.1 through C.2.12 list the exception classes relevant to each floating-point operation, and the results and exception status reported for a variety of conditions. The following shorthand is used to describe the status bits set in the floating-point status register:

<table>
<thead>
<tr>
<th>Notation</th>
<th>Status bits</th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>NS, NT</td>
</tr>
<tr>
<td>R</td>
<td>RS, RT</td>
</tr>
<tr>
<td>V</td>
<td>VS, VT</td>
</tr>
<tr>
<td>U</td>
<td>US, UT</td>
</tr>
<tr>
<td>X</td>
<td>XS, XT</td>
</tr>
<tr>
<td>D</td>
<td>DS, DT</td>
</tr>
</tbody>
</table>
Note that a sticky status bit (NS, RS, VS, US, XS, or OS) is set only if the correspond-
ing exception mask bit in the Floating-Point Environment Register is set, except when
a sticky status bit is set by a DMAC, DMSM, FMAC, FMSM, or MTACC instruction,
and that the state of the trap status bits (NT, RT, VT, UT, XT, or DT) is valid only if a
Floating-Point Exception trap is taken by the operation in question.

In most cases, exceptional conditions have been divided into two groups: input excep-
tions, for which the exception is due to inappropriate operands, and output excep-
tions, for which the exception can be detected only at the conclusion of an operation.

In the tables that follow, exceptions are prioritized in the following order, from the
highest to lowest priority:

1. Invalid operation, reserved operand ← highest priority
2. Divide by Zero
3. Overflow, Underflow
4. Inexact Result ← lowest priority

The result and status for a given exceptional operation are determined by the highest-
priority exception. If, for example, an operation produces both overflow and inexact
result exceptions, the overflow exception, having higher priority, determines the be-
vavior of the operation. The behavior of this operation is therefore described by the
Overflow entry of the Output Exception table for the operation in question.

The tables that follow list some cases that do not result in a status bit being set.
These cases are not considered exceptional by the IEEE Binary Floating-Point Stan-
dard, and are listed here merely for the sake of completeness.

### C.2.1 Addition (FADD, DADD)

<table>
<thead>
<tr>
<th>Input Exceptions: FADD, DADD</th>
<th>SRCB</th>
<th>FNum, 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SRCA</td>
<td>SNan</td>
<td>QNaN</td>
</tr>
<tr>
<td>SNan</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
</tr>
<tr>
<td>QNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td>N,R</td>
<td>R</td>
<td>R</td>
</tr>
<tr>
<td>+-</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td>N,R</td>
<td>none</td>
<td>R</td>
</tr>
<tr>
<td>--</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td>N,R</td>
<td>none</td>
<td>N,R</td>
</tr>
<tr>
<td>FNum, 0</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td>N,R</td>
<td>none</td>
<td>N,R</td>
</tr>
</tbody>
</table>
### Output Exceptions: FADD, DADD

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Overflow</strong></td>
<td>VM = 1</td>
<td>sign +</td>
<td>FRM =, +∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>FRM0, −∞</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
<td>V,X</td>
</tr>
<tr>
<td><strong>Underflow</strong></td>
<td>UM = 1</td>
<td>FF = 1</td>
<td>±0(1)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FF = 0</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>UM = 0</td>
<td>FF = 1</td>
<td>Exact Result</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Inexact Result</td>
</tr>
<tr>
<td>Inexact Result</td>
<td>XM = 1</td>
<td></td>
<td>RRResult</td>
</tr>
<tr>
<td>XM = 0</td>
<td>(NW)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Notes: N/A = Not applicable; addition cannot underflow for these conditions. (NW) = Result not written; contents of destination register unchanged. (1) = Zero has sign of intermediate result.

### C.2.2 Subtraction (FSUB, DSUB)

<table>
<thead>
<tr>
<th>Input Exceptions: FSUB, DSUB</th>
</tr>
</thead>
</table>

<table>
<thead>
<tr>
<th>SRCA</th>
<th>SNan</th>
<th>QNaN</th>
<th>SRCB</th>
<th>FNum, 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN, N,R</td>
<td>AQNaN, N,R</td>
</tr>
<tr>
<td>QNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN, N,R</td>
<td>AQNaN, N,R</td>
</tr>
<tr>
<td>±∞</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN, R</td>
<td>±∞, ±∞</td>
</tr>
<tr>
<td>−∞</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>−∞</td>
<td>AQNaN, −∞</td>
</tr>
<tr>
<td>FNum, 0</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>−∞</td>
<td>AQNaN, −∞</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SRCB</th>
<th>SRCB</th>
<th>FNum, 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>±∞</td>
<td>±∞</td>
<td>±∞, ±∞</td>
</tr>
<tr>
<td>−∞</td>
<td>−∞</td>
<td>−∞, −∞</td>
</tr>
<tr>
<td>none</td>
<td>none</td>
<td>none</td>
</tr>
</tbody>
</table>

C-6 FLOATING-POINT BEHAVIOR
### Output Exceptions: FSUB, DSUB

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Overflow</strong></td>
<td>VM = 1</td>
<td>sign +</td>
<td>FRM = , +$\infty$</td>
</tr>
<tr>
<td></td>
<td></td>
<td>sign -</td>
<td>FRM = , $-\infty$</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
<td>V, X</td>
</tr>
<tr>
<td>UM = 1</td>
<td>FF = 1</td>
<td>±0(1)</td>
<td>U, X</td>
</tr>
<tr>
<td></td>
<td>FF = 0</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>UM = 0</td>
<td>FF = 1</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>FF = 0</td>
<td>Inexact Result</td>
<td>N/A</td>
</tr>
<tr>
<td><strong>Inexact Result</strong></td>
<td>XM = 1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM = 0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Notes: N/A = Not applicable; addition cannot underflow for these conditions. (NW) = Result not written; contents of destination register unchanged. (1) = Zero has sign of intermediate result.

### C.2.3 Multiplication (FMUL, DMUL, FDMUL)

<table>
<thead>
<tr>
<th>SRCA</th>
<th>SNaN</th>
<th>QNaN</th>
<th>SRCB</th>
<th>=&gt;</th>
<th>0</th>
<th>FNum</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td></td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
</tr>
<tr>
<td>QNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td></td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
<td>N,R</td>
</tr>
<tr>
<td>$\pm\infty$</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>$\pm\infty$</td>
<td>$-\infty$</td>
<td>AQNaN</td>
<td>$\pm\infty$</td>
</tr>
<tr>
<td></td>
<td>N,R</td>
<td>R</td>
<td>none</td>
<td>none</td>
<td>N,R</td>
<td>none</td>
</tr>
<tr>
<td>$-\infty$</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>$-\infty$</td>
<td>$+\infty$</td>
<td>AQNaN</td>
<td>$-\infty$</td>
</tr>
<tr>
<td></td>
<td>N,R</td>
<td>R</td>
<td>none</td>
<td>none</td>
<td>I,R</td>
<td>none</td>
</tr>
<tr>
<td>0</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>$\pm0 (1)$</td>
<td>$\pm0 (1)$</td>
</tr>
<tr>
<td></td>
<td>N,R</td>
<td>R</td>
<td>N,R</td>
<td>N,R</td>
<td>none</td>
<td>none</td>
</tr>
<tr>
<td>FNum,0</td>
<td>AQNaN</td>
<td>AQNaN</td>
<td>$\pm\infty (1)$</td>
<td>$\pm\infty (1)$</td>
<td>$\pm0 (1)$</td>
<td>none</td>
</tr>
<tr>
<td></td>
<td>N,R</td>
<td>R</td>
<td>none</td>
<td>none</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

FLOATING-POINT BEHAVIOR C-7
### Output Exceptions: FMUL, DMUL

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>VM = 1</td>
<td>sign +</td>
<td>FRM =, +∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>FRM , -∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>FRM =, -∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>FRM , -∞</td>
</tr>
<tr>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
<td>V</td>
</tr>
<tr>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
<td>X</td>
</tr>
<tr>
<td>Underflow</td>
<td>UM = 1</td>
<td>FF = 1</td>
<td>±0 (1)</td>
</tr>
<tr>
<td></td>
<td>FF = 0</td>
<td>RResult U,X</td>
<td></td>
</tr>
<tr>
<td>UM = 0</td>
<td>FF = 1</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>FF = 0</td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td>Inexact Result</td>
<td>XM = 1</td>
<td>RResult X</td>
<td></td>
</tr>
<tr>
<td></td>
<td>XM = 0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Notes: (NW) = Result not written; contents of destination register unchanged.
(1) = Zero has sign of intermediate result.

The operation FDMUL produces no output exceptions.

### C.2.4 Division (FDIV, DDIV)

<table>
<thead>
<tr>
<th>SRCA (dividend)</th>
<th>SNan</th>
<th>QNaN</th>
<th>SRCB (divisor)</th>
<th>0</th>
<th>FNum</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
</tr>
<tr>
<td>QNaN</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
<td>AQNaN R</td>
<td>AQNaN R</td>
<td>AQNaN R</td>
</tr>
<tr>
<td></td>
<td>AQNaN N,R</td>
<td>AQNaN R</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
<td>AQNaN N,R</td>
</tr>
<tr>
<td></td>
<td>AQNaN N,R</td>
<td>AQNaN R</td>
<td>±0 (1) none</td>
<td>±0 (1) none</td>
<td></td>
</tr>
<tr>
<td></td>
<td>AQNaN N,R</td>
<td>AQNaN R</td>
<td>±0 (1) none</td>
<td>±0 (1) none</td>
<td></td>
</tr>
<tr>
<td></td>
<td>AQNaN N,R</td>
<td>AQNaN R</td>
<td>±0 (1) none</td>
<td>±0 (1) none</td>
<td></td>
</tr>
</tbody>
</table>

Note: (1) Result sign is XOR of sign(SRCA) and sign(SRCB)
<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Overflow</strong></td>
<td>VM = 1</td>
<td>sign +</td>
<td>FRM = +∞</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>sign -</td>
<td>FRM = -∞</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td><strong>UnderFlow</strong></td>
<td>UM = 1</td>
<td>FF = 1</td>
<td>±0 (1)</td>
</tr>
<tr>
<td></td>
<td>UM = 1</td>
<td>FF = 0</td>
<td>RResult</td>
</tr>
<tr>
<td></td>
<td>UM = 0</td>
<td>FF = 1</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>UM = 0</td>
<td>FF = 0</td>
<td>Exact Result</td>
</tr>
<tr>
<td></td>
<td>UM = 0</td>
<td>FF = 0</td>
<td>Inexact Result</td>
</tr>
<tr>
<td><strong>Inexact Result</strong></td>
<td>XM = 1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM = 0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Notes: (NW) = Result not written; contents of destination register unchanged.
(1) = Zero has sign of intermediate result.
### C.2.5 Comparison (FEQ, DEQ, FGE, DGE, FGT, DGT)

Input Exceptions: FEQ, DEQ

<table>
<thead>
<tr>
<th>SRCA</th>
<th>SRCB</th>
<th>SNaN</th>
<th>QNaN</th>
<th>∞, FNum, 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>FALSE</td>
<td>FALSE</td>
<td>FALSE</td>
<td></td>
</tr>
<tr>
<td></td>
<td>N</td>
<td>N</td>
<td>N</td>
<td></td>
</tr>
<tr>
<td>QNaN</td>
<td>FALSE</td>
<td>FALSE</td>
<td>FALSE</td>
<td></td>
</tr>
<tr>
<td></td>
<td>N</td>
<td>none</td>
<td>none</td>
<td></td>
</tr>
<tr>
<td>∞, FNum, 0</td>
<td>FALSE</td>
<td>FALSE</td>
<td>none</td>
<td></td>
</tr>
<tr>
<td></td>
<td>N</td>
<td>N</td>
<td>N</td>
<td></td>
</tr>
</tbody>
</table>

Input Exceptions: FGE, DGE, FGT, DGT

<table>
<thead>
<tr>
<th>SRCA</th>
<th>SRCB</th>
<th>SNaN</th>
<th>QNaN</th>
<th>∞, FNum, 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>FALSE</td>
<td>FALSE</td>
<td>FALSE</td>
<td></td>
</tr>
<tr>
<td></td>
<td>N</td>
<td>N</td>
<td>N</td>
<td></td>
</tr>
<tr>
<td>QNaN</td>
<td>FALSE</td>
<td>FALSE</td>
<td>FALSE</td>
<td></td>
</tr>
<tr>
<td></td>
<td>N</td>
<td>none</td>
<td>none</td>
<td></td>
</tr>
<tr>
<td>∞, FNum, 0</td>
<td>FALSE</td>
<td>FALSE</td>
<td>none</td>
<td></td>
</tr>
<tr>
<td></td>
<td>N</td>
<td>N</td>
<td>N</td>
<td></td>
</tr>
</tbody>
</table>

Floating-point comparison operations produce no output exceptions.

### C.2.6 Multiply-Accumulate (FMAC, DMAC), Multiply-Sum (FMSM, DMSM)

Input Exceptions: FMAC, DMAC, FMSM, DMSM

<table>
<thead>
<tr>
<th>OP1</th>
<th>OP2</th>
<th>OP3</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>one or more operands is an SNaN</td>
<td>AQNaN N,R</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>no SNaNs, one or more operands is a QNaN</td>
<td>AQNaN N,R</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(∞ * 0)</td>
<td>AQNaN N,R</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(+NonZ * +∞) or (−NonZ * −∞)</td>
<td>−∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(−NonZ * +∞) or (+NonZ * −∞)</td>
<td>+∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(+NonZ * +∞) or (−NonZ * −∞)</td>
<td>+∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(−NonZ * +∞) or (+NonZ * −∞)</td>
<td>−∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(+NonZ * +∞) or (−NonZ * −∞)</td>
<td>FNum</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>(−NonZ * +∞) or (+NonZ * −∞)</td>
<td>FNum</td>
</tr>
<tr>
<td></td>
<td>FNum, 0</td>
<td>FNum, 0</td>
<td>+∞</td>
<td>none</td>
</tr>
<tr>
<td>FNum, 0</td>
<td>FNum, 0</td>
<td>+∞</td>
<td>none</td>
<td></td>
</tr>
</tbody>
</table>

Notes: X = don’t care.

OP1 and OP2 are commutative, i.e., (A * B) will produce results identical to (B * A).
### Output Exceptions: FMAC, DMAC, FMSM, DMSM

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>sign +</td>
<td>FRM = +\infty, \infty</td>
<td>V,X</td>
</tr>
<tr>
<td></td>
<td>sign -</td>
<td>FRM = -\infty, -\infty</td>
<td>V,X</td>
</tr>
<tr>
<td>Underflow (2)</td>
<td></td>
<td>FRM 0, -\infty</td>
<td>±0 (1)</td>
</tr>
</tbody>
</table>

Notes: 1. Zero has sign of intermediate result.
2. The underflow criterion for these operations is the same as that for fast float mode; an operation result underflows if a non-zero intermediate result is too small to be represented as a normalized number.

Multiply-accumulate based operations—FMAC/DMAC and FMSM/DMSM—do not support gradual underflow. Denormalized input operands are converted to a zero of the same sign, and underflowed results are converted to a zero having the sign of the intermediate result.

For multiply-accumulate-based operations, the contents of special registers IPA, IPB, IPC, and the Exception Opcode Register may not reflect the operands and opcode of the faulting instruction after a Floating-Point Exception trap is taken.

### C.2.7 Square Root (SQRT)

<table>
<thead>
<tr>
<th>Input Exceptions: SQRTF</th>
</tr>
</thead>
<tbody>
<tr>
<td>SRCA:</td>
</tr>
<tr>
<td>SNan</td>
</tr>
<tr>
<td>QNaN</td>
</tr>
<tr>
<td>+\infty</td>
</tr>
<tr>
<td>-FNum, -\infty</td>
</tr>
<tr>
<td>+0</td>
</tr>
<tr>
<td>-0</td>
</tr>
</tbody>
</table>

Floating-Point Behavior C-11
### Output Exceptions: SQRT

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inexact Result</td>
<td>XM=1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM=0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Note: (NW) = Result not written; contents of designation register unchanged.

### C.2.8 Floating-Point-to-Floating-Point Conversions (CONVERT)

#### Input Exceptions:

<table>
<thead>
<tr>
<th>SRCA:</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>AQNaN</td>
</tr>
<tr>
<td>N,R</td>
<td></td>
</tr>
<tr>
<td>QNaN</td>
<td>AQNaN</td>
</tr>
<tr>
<td>R</td>
<td></td>
</tr>
<tr>
<td>∞</td>
<td>±∞ (1)</td>
</tr>
<tr>
<td>none</td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>±0 (1)</td>
</tr>
<tr>
<td>none</td>
<td></td>
</tr>
</tbody>
</table>

Note: (1) = Result has sign of operand.

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>VM = 1</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>sign +</td>
<td>FRM = +∞</td>
<td>+∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FRM 0, -∞</td>
<td>+Max</td>
</tr>
<tr>
<td></td>
<td>sign -</td>
<td>FRM = -∞</td>
<td>-∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FRM 0, +∞</td>
<td>-Max</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td>UnderFlow</td>
<td>UM = 1</td>
<td>FF = 1</td>
<td>±0 (1)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FF = 0</td>
<td>RResult</td>
</tr>
<tr>
<td></td>
<td>UM = 0</td>
<td>FF = 1</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FF = 0</td>
<td>Exact Result</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td>Inexact Result</td>
<td>XM = 1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM = 0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Notes: (NW) = Result not written; contents of destination register unchanged.
(1) = Zero has sign of intermediate result.

### C.2.9 Integer-to-Floating-Point Conversions (CONVERT)

Input Exceptions: Integer-to-floating-point conversions produce no input exceptions.

### Output Exceptions: CONVERT, Integer → f.p.

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inexact Result</td>
<td>XM=1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM=0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Note: (NW) = Result not written; contents of designation register unchanged.
## C.2.10 Floating-Point-to-Integer Conversions (CONVERT)

### Input Exceptions: CONVERT, f.p. → signed integer

<table>
<thead>
<tr>
<th>SRCA:</th>
<th>10</th>
<th>N,R</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>10</td>
<td>N,R</td>
</tr>
<tr>
<td>QNaN</td>
<td>10</td>
<td>N,R</td>
</tr>
<tr>
<td>+∞</td>
<td>IMaxPos</td>
<td>N</td>
</tr>
<tr>
<td>−∞</td>
<td>IMaxNeg</td>
<td>N</td>
</tr>
<tr>
<td>+0</td>
<td>10</td>
<td>none</td>
</tr>
<tr>
<td>−0</td>
<td>10</td>
<td>none</td>
</tr>
</tbody>
</table>

### Input Exceptions: CONVERT, f.p. → unsigned integer

<table>
<thead>
<tr>
<th>SRCA:</th>
<th>10</th>
<th>N,R</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNan</td>
<td>10</td>
<td>N,R</td>
</tr>
<tr>
<td>QNaN</td>
<td>10</td>
<td>N,R</td>
</tr>
<tr>
<td>+∞</td>
<td>UIMax</td>
<td>N</td>
</tr>
<tr>
<td>−∞</td>
<td>10</td>
<td>N</td>
</tr>
<tr>
<td>+0</td>
<td>10</td>
<td>none</td>
</tr>
<tr>
<td>−0</td>
<td>10</td>
<td>none</td>
</tr>
<tr>
<td>−FNum</td>
<td>10</td>
<td>N</td>
</tr>
</tbody>
</table>

### Output Exceptions: CONVERT, f.p. → signed integer

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>VM = 1</td>
<td>sign +</td>
<td>IMaxPos</td>
</tr>
<tr>
<td></td>
<td></td>
<td>sign -</td>
<td>IMaxNeg</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td>Inexact Result</td>
<td>XM = 1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM = 0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Note: (NW) = Result not written; contents of designation register unchanged.

### Output Exceptions: CONVERT, f.p. → unsigned integer

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>VM = 1</td>
<td>UIMax</td>
<td>N</td>
</tr>
<tr>
<td></td>
<td>VM = 0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td>Inexact Result</td>
<td>XM = 1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM = 0</td>
<td>(NW)</td>
<td>X</td>
</tr>
</tbody>
</table>

Note: (NW) = Result not written; contents of designation register unchanged.
## C.2.11 Move From Accumulator (MFACC)

### Input Exceptions: MFACC

<table>
<thead>
<tr>
<th>Source Register (SRCA)</th>
<th>NaN</th>
<th>A NaN</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inf</td>
<td>Inf</td>
<td>±Inf</td>
</tr>
<tr>
<td>0</td>
<td>±0</td>
<td>±0</td>
</tr>
</tbody>
</table>

| Note: (1) = Result has sign of operand. |

### Output Exceptions: MFACC

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>VM=1</td>
<td>FRM =+∞, +∞</td>
<td>V,X</td>
</tr>
<tr>
<td></td>
<td>VM=0</td>
<td>FRM 0, +∞</td>
<td>V,X</td>
</tr>
<tr>
<td></td>
<td>VM=0</td>
<td>FRM 0, -∞</td>
<td>±0 (1)</td>
</tr>
<tr>
<td>Underflow</td>
<td>UM=1</td>
<td>FF=1</td>
<td>U,X</td>
</tr>
<tr>
<td></td>
<td>UM=0</td>
<td>Exact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td></td>
<td>UM=0</td>
<td>Inexact Result</td>
<td>(NW)</td>
</tr>
<tr>
<td>Inexact Result</td>
<td>XM=1</td>
<td>RResult</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td>XM=0</td>
<td>RResult</td>
<td>(NW)</td>
</tr>
</tbody>
</table>

Notes: (NW) = Result not written; contents of destination register unchanged.
(1) = Zero has sign of intermediate result.
### C.2.12 Move To Accumulator (MTACC)

**Input Exceptions: MTACC**

<table>
<thead>
<tr>
<th>SRCA:</th>
</tr>
</thead>
<tbody>
<tr>
<td>SNaN</td>
</tr>
<tr>
<td>QNaN</td>
</tr>
<tr>
<td>∞</td>
</tr>
<tr>
<td>0</td>
</tr>
</tbody>
</table>

**Output Exceptions: MTACC**

<table>
<thead>
<tr>
<th>Exception</th>
<th>Conditions</th>
<th>Result</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overflow</td>
<td>sign +</td>
<td>FRM =, +∞</td>
<td>+∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FRM 0, −∞</td>
<td>+Max</td>
</tr>
<tr>
<td></td>
<td>sign −</td>
<td>FRM =, −∞</td>
<td>−∞</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FRM 0, −∞</td>
<td>−Max</td>
</tr>
<tr>
<td>Underflow (1)</td>
<td></td>
<td>±0 (2)</td>
<td>U,X</td>
</tr>
<tr>
<td>Inexact Result</td>
<td></td>
<td>RResult</td>
<td>X</td>
</tr>
</tbody>
</table>

**Notes:**
1. Result has sign of operand.
2. Underflow is detected only at the output of the operation; denormalized inputs are not flushed to zero. The output underflow detection criterion is the same as for fast float mode; an operation result underflows if a non-zero intermediate result is too small to be represented as a normalized number.

### C.2.13 Classify (CLASS)

The CLASS operation does not produce exceptions.

### C.2.14 Integer Multiply (MULTIPLY, MULTIPLU, MULTM, MULTMU)

Integer multiplication operations MULTIPLY, MULTIPLU, MULTM, and MULTMU do not affect the ALU Status Register or the Floating-Point Status Register.

For the MULTIPLY and MULTIPLU instructions, overflow of the 32-bit result can be detected by trapping on overflow. When the Integer Multiply Overflow Mask bit is 0, the MULTIPLY instruction causes an Out of Range trap when it produces a signed result that exceeds 32 bits (a positive number larger than 7ffffffff, hexadecimal, or a negative number smaller than 80000000, hexadecimal). Similarly, the MULTIPLU instruction causes an Out of Range trap when it produces an unsigned result that
exceeds 32 bits (a positive number greater than ffffff, hexadecimal). The MULTM and MULTMU instructions cannot overflow, and are unaffected by the MO bit.

C.2.15 Integer Divide (DIVIDE, DIVIDU)

Integer division operations DIVIDE and DIVIDU do not affect the ALU Status Register or the Floating-Point Status Register. Each produces a quotient QUOT and remainder REM such that Euclid's Equation is always satisfied for non-exceptional cases, that is:

\[ \text{Dividend} = (\text{Divisor} \cdot \text{QUOT}) + \text{REM} \]

If QUOT is non-zero, its sign is the exclusive-OR of the signs of the dividend and divisor. If the infinitely-precise quotient cannot be expressed as an integer, it is truncated toward zero. That is, QUOT is the integer closest to and no greater in magnitude than the infinitely-precise result.

If REM is non-zero, it has the sign of the dividend.

DIVIDE and DIVIDU always take the Out of Range trap when the divisor is 0; QUOT and REM are undefined.

Overflow of the 32-bit quotient can be detected by trapping on overflow. When the Integer Divide Overflow Mask bit is 0, the DIVIDE instruction causes an Out of Range trap when it produces a signed quotient that exceeds 32 bits (a positive number larger than 7ffffff, hexadecimal; or a negative number smaller than 80000000, hexadecimal). Similarly, the DIVIDU instruction causes an Out of Range trap when it produces an unsigned result that exceeds 32 bits (a positive number greater than ffffff, hexadecimal). QUOT and REM are undefined for an overflowing integer divide, regardless of whether overflow trapping is enabled.

Note that this behavior is generated by the DIVIDE and DIVIDU instruction emulation software.

C.3 TRAPS

The following floating-point instructions take the Floating-Point Exception trap (vector number 0x16) upon producing an unmasked exception:

- CONVERT
- DMUL
- FGE
- DADD
- DSUB
- FGT
- DDIV
- FADD
- FMUL
- DEQ
- FDIV
- FSUB
- DGE
- FDMUL
- MFACC
- DGT
- FEQ
- SQR

The instructions FMAC, DMAC, FMSM, DMSM, and MTACC do not take the Floating-Point Exception trap upon producing an unmasked exception.

The time at which a floating-point exception trap is taken depends on the type of the exception causing the trap. The Invalid Operation, Reserved Operand, and Divide by Zero exceptions cause a trap to be taken after the first cycle of the execute stage, since they can be determined at the beginning of an operation. The Overflow, Underflow, and Inexact Result exceptions cause a trap to be taken after the last cycle of the execute stage. This timing is characteristic of the Am29050 microprocessor hardware implementation; other 29K Family processors may exhibit different trap timing.
A Floating-Point Exception trap cannot be caused by writing to the Floating-Point Environment Register or the Floating-Point Status Register. For example, it is not possible to cause a floating-point exception trap by unmasking a currently set exception.

When the DA bit of the Current Processor Status Register is 1, any arithmetic exception that would otherwise produce a Floating-Point Exception trap or Out of Range trap will instead cause a Monitor trap. In all other respects, however, the processor behaves as described in Section 3.5.10.
INDEX

| A (Absolute), 8-7                        | Arithmetic/Logic Unit (ALU), 2-15, 4-18, 8-4 |
| A (31–0) (Address Bus), 1-4, 5-1       | ASEQ, 7-26                                       |
| Access privilege, 5-20                  | ASNE, 7-17                                       |
| Access protocol, 2-17, 5-8              | Assembler syntax, 8-4                            |
| Access, burst-mode, 1-4                 | Assert compare, 7-17                             |
| Access, simple, 2-18                    |                                                  |
| Access, simultaneous, 5-19              |                                                  |
| Activation record, 7-1                  |                                                  |
| Activation record mapping, 7-3, 7-7     |                                                  |
| ADD, 7-39                               |                                                  |
| Addition, integer, 7-18                 |                                                  |
| Address Bus (A(31–0)), 1-4, 5-1         |                                                  |
| Address Bus, coprocessor operations, 6-8|                                                  |
| Address Bus, shared, 2-18               |                                                  |
| Address Space, Coprocessor, 2-10        |                                                  |
| Address Space, Input/Output, 2-10       |                                                  |
| Address Space, Instruction ROM, 2-10    |                                                  |
| Address Space, Instruction/Data, 2-10   |                                                  |
| Address Tag, 4-8, 4-9                   |                                                  |
| Address transfer, 2-18                  |                                                  |
| Address translation, 2-12, 4-22–4-23, 7-32|                                                  |
| Address translation exceptions, 1-6     |                                                  |
| Address Unit, 2-15, 4-11, 4-14          |                                                  |
| Address, physical, 2-10                 |                                                  |
| Address, virtual, 2-10                  |                                                  |
| Addresses, pipelined, 1-4               |                                                  |
| Addressing, 2-10, 4-12                  |                                                  |
| Addressing, indirect, 7-16              |                                                  |
| Addressing, register, 4-12              |                                                  |
| ADRF Latch, 4-14, 4-15                  |                                                  |
| Alignment, 2-10                         |                                                  |
| Alignment, Branch Target Cache memory, 4-9|                                                  |
| Alignment, bytes, 7-27                  |                                                  |
| ALU (Arithmetic/Logic Unit), 2-15, 4-12, 4-18, 8-4|                                                  |
| ALU Status Register, 2-4, 8-4           |                                                  |
| Am29050 microprocessor, 1-2             |                                                  |
| Am29050 microprocessor features, 1-1    |                                                  |
| Am29050 microprocessor special features, 1-11|                                                  |
| Applications, 7-16                      |                                                  |
| Arbitration, 2-18, 5-6, 5-18            |                                                  |
| Arguments, incoming, 7-3, 7-9           |                                                  |
| Arguments, outgoing, 7-3, 7-9           |                                                  |
| Arithmetic operation, 8-4               |                                                  |
| Byte alignment, 7-27                    |                                                  |
| Cache Block, 4-9                        |                                                  |
| Cache Disable (CD), 4-7, 7-34           |                                                  |
| Cache replacement, random, 4-8          |                                                  |
| Cache tag, 4-6                          |                                                  |
| C (Carry) ALU Status Reg., 8-1, 8-4     |                                                  |
| CA (Coprocessor Active), 6-4            |                                                  |
| Bus Grant (BGRT), 5-1, 5-18             |                                                  |
| Boolean, 7-22                           |                                                  |
| Boolean FALSE, 7-22                     |                                                  |
| Boolean TRUE, 7-22                      |                                                  |
| Boundary crossings, 4-9                 |                                                  |
| Branch displacement, relative, 7-26     |                                                  |
| Branch Target, 4-14                     |                                                  |
| Branch Target Cache memory, 1-5, 2-14, 4-3, 4-5, 4-16, 7-33, 7-34|                                                  |
| Branch Target Cache memory disable (CD), Configuration Reg., 4-7, 7-34|                                                  |
| Branch Target Cache memory lookup process, 4-7–4-8|                                                  |
| Branch, relative, 1-5, 2-5, 4-9, 7-38, 7-39|                                                  |
| Branches, immediately adjacent, 7-39    |                                                  |
| BREQ (Bus Request), 5-1, 5-18           |                                                  |
| Burst, 5-11                             |                                                  |
| Burst mode, 4-14, 5-11, 5-13, 5-14, 5-24|                                                  |
| Burst mode access, 4-14, 5-8, 5-11, 5-14|                                                  |
| Burst mode access protocol, 2-17        |                                                  |
| Burst mode cancellation, 5-16            |                                                  |
| Burst mode preemption, 5-16              |                                                  |
| Burst mode termination, 5-16             |                                                  |
| Bus Grant (BGRT), 5-1, 5-18             |                                                  |
| Bus Invalid (BINV), 5-1, 5-18           |                                                  |
| Bus Request (BREQ), 5-1, 5-18           |                                                  |
| Bus sharing, 5-19                       |                                                  |
| Byte alignment, 7-27                    |                                                  |
CALL, 7-38
Call, large range, 7-25
Calls, operating system, 7-17
Carry (C), ALU Status Reg., 7-18, 8-4
CD (Cache Disable), 4-7, 7-34
CDA (Coprocessor Data Accept), 5-4, 6-6
CDA sequencing, 6-7
CE (Coprocessor Enable) Channel Control Reg., 6-2
CE/CNTL, 8-7
Channel, 2-17, 5-6
Channel Address (CHA), Channel Addr. Reg., 3-13, 7-34
Channel Control, 3-14, 7-34
Channel Data (CHD), Channel Data Reg., 3-13, 7-34
Character detection, 7-27
Character-string, 7-26, 7-27
Clock synchronization, 5-31
Clock, processor-generated, 5-31
Clock, system-generated, 5-32
Clocks, 2-19
CNTL(1-0) (CPU Control), 5-5, 5-22, 5-23, 5-24-5-25, 5-26, 5-30,
Compare Bytes (CPBYTE), 7-27
Compiler, optimizing, 1-8
Compiler's run-time stack, 1-4-1-5
Compilers, 1-8
Complementing a Boolean, 7-22
Configuration Register, 2-3, 4-7, 7-34
CONST, 7-25, 7-38, 7-39
Constant, 32-bit, 7-25
Constant, 8-bit, 2-5
CONSTH, 7-25
CONSTN, 7-25
Contents Valid (CV), Channel Control Reg., 7-35
Context switching, 2-11
Context switching, temporary, 2-11
Contexts, saving and restoring, 2-11
Coprocessor, 6-1
Coprocessor Active (CA), 6-4
Coprocessor attachment, 2-19
Coprocessor communication, 6-7
Coprocessor Data Accept (CDA), 5-4, 6-6
Coprocessor Enable (CE), Channel Control Reg., 6-2
Coprocessor exception, 6-3, 6-8
Coprocessor exception trap, 6-8
Coprocessor interrupts, 6-4
Coprocessor Load/Store, 6-2
Coprocessor operations, 6-1
Coprocessor Present (CP), Configuration Reg., 6-4
Coprocessor transfer, 5-3, 6-1, 6-2, 6-3, 6-5, 6-7
COUNT, 8-1
CP (Coprocessor Present) Configuration Reg., 6-4
CPBYTE (Compare Bytes), 7-27
CPNEQ, 7-38
CPU Control (CNTL(1-0)), 5-5, 5-22, 5-24, 5-26, 5-30,
CPU Status (STAT(2-0)), 5-4, 5-21, 5-22, 5-24, 5-25, 5-30
Current Processor Status, 2-2
Current Processor Status Register, 3-78
CV (Contents Valid) Channel Control Reg., 7-35
Cycle time, 1-2
D(31-0) (Data Bus), 1-4, 5-3
DA (Disable All Interrupts), 5-29
Daisy chain, 2-18
Data access, 5-7
Data access exception trap, 5-7-5-8
Data Access request, 5-6
Data accesses, external, 2-9
Data Address Transfer, 5-6
Data blocks, movement of large, 7-27
Data Burst Acknowledge (DBACK), 5-3, 5-8, 5-14, 5-15, 5-16
Data Burst Request (DBREQ), 5-3, 5-8, 5-14, 5-15
Data Bus (D(31-0)), 1-4, 5-2
Data dependencies, pipeline, 4-12
Data Error (DERR), 5-3, 5-7, 5-14
Data formats, 2-8
Data forwarding, 4-13
Data Ready (DRDY), 5-3, 5-10-5-11, 5-14
Data Request (DREQ), 5-3, 5-10, 5-16-5-17
Data Request Type (DREQT(1-0)), 5-3, 5-7, 6-5
data transfer, 5-6
data types, 2-9
data-flow organization, 1-6-1-7
data-unit numbering conventions, 2-9
DBACK (Data Burst Acknowledge), 5-3, 5-8, 5-15, 5-16
DBREQ (Data Burst Request), 5-3, 5-8, 5-14, 5-15
Decode PC Register, 4-14, 4-15
Decode stage, 4-2
delay cycle, indirect addressing, 7-41
delayed branch, 7-38-7-39
delayed effects, registers, 7-41
Demand paging, 7-31–7-32
\texttt{DERR} (Data Error), 5-3, 5-7–5-8, 5-14
\texttt{DEST}, 8-1
\texttt{DI} (Disable Interrupts), 5-29, 6-5
\texttt{DIVIDE, 7-16}
Divide instructions, 7-19
\texttt{DIVIDU, 7-16}
Double-precision floating-point, 3-45
\texttt{DRDY} (Data Ready), 5-3, 5-10–5-11, 5-14
\texttt{DREQ} (Data Request), 5-10, 5-16–5-17
\texttt{DREQT (1–0) (Data Request Type), 5-3, 5-7, 6-5}
\texttt{DTR, 4-13}

\texttt{EMULATE, 7-16}
\texttt{ETR, 4-13, 4-14}
Exceptions, address translation, 1-6
Execute stage, 4-2
Executing mode, 2-16
Execution Unit, 2-15, 4-1, 4-12
External access, 7-34
External access protection, 7-28
External interrupts, 5-29
External traps, 5-29
\texttt{EXTERNAL WORD[n], 8-1}

\texttt{FALSE, 8-1}
\texttt{FC (Funnel Shift Count) Funnel Shift Count Reg., 7-27, 8-1}
Fetch-Ahead Adder, 4-10
Fetch-Ahead Adder overflow, 4-10
Fetch special instruction, 4-16
Fetch stage, 4-2
Field Shift Unit, 2-15, 4-19
\texttt{FIFO, 4-3}
Freeze (FZ), 4-11, 7-29, 8-4
Funnel-Shift Unit, 4-19
\texttt{FZ (Freeze bit), 4-11, 7-30, 8-4}

General-purpose registers, 1-4, 2-1, 2-2
Generator, register address, 4-13
Global registers, 4-12

Halt, 5-4
Halt mode, 2-17, 5-21, 5-22, 5-25
Hardware development system, 2-18, 5-22
Hardware testing, 5-28

\texttt{0|16 (16-bit immediate data zero-extended to 32 bits), 8-1}
\texttt{1|16 (16-bit immediate data, ones-extended to 32 bits), 8-1}
\texttt{l(31–0) (Instruction Bus), 5-2, 8-7}

\texttt{I16 (16-bit immediate data), 8-2}
\texttt{IBACK (Instruction Burst Acknowledge), 5-2, 5-8, 5-12, 5-13, 5-15}
\texttt{IBREQ (Instruction Burst Request), 5-2, 5-8, 5-12, 5-13, 5-15}
I-Bus, 4-15
\texttt{IE (Interrupt Enable) Timer Reload Reg., 7-36}
\texttt{IERR (Instruction Error), 4-4, 5-2, 5-7, 5-12, 5-26}
\texttt{IFP (Instruction Fetch Pointer), 4-3}
\texttt{IFU, 4-16}
\texttt{IN (Interrupt) Timer Reload Reg., 7-36}
In args, 7-2
\texttt{INCLK (Input Clock), 5-5, 5-31–5-33}
Indirect addressing, 7-16
Indirect addressing delay cycle, 7-17
Indirect pointers, 7-16, 7-17, 7-41
Initialization, timer facility, 7-37
Input Clock (INCLK), 5-5
Input/Output access, 5-3
Instruction access, 5-7
Instruction Access Exception, 4-4
Instruction Address Transfer, 5-6
Instruction boundary, 2-13
Instruction Burst Acknowledge (\texttt{IBACK}), 5-2, 5-8, 5-12, 5-13, 5-15
Instruction Burst Request (\texttt{IBREQ}), 5-2, 5-8, 5-12, 5-13, 5-15
Instruction Bus (I\texttt{[31–0]}), 1-4, 5-2
Instruction/Data memory, 5-7
Instruction/Data memory access, 5-7
Instruction description format, 8-9
Instruction Error (\texttt{IERR}), 4-4, 5-2, 5-7–5-8, 5-12, 5-26
Instruction Fetch Pointer (\texttt{IFP}), 4-3
Instruction Fetch Unit, 2-13, 4-2–4-3
Instruction fetch, external, 4-10
Instruction fetch-ahead, 4-10
Instruction-field uses, 8-8
Instruction formats, 8-6
Instruction overview, 2-5
Instruction Prefetch Buffer (\texttt{IPB}), 4-3, 4-16
Instruction prefetch stream, 4-3
Instruction Ready (\texttt{IRDY}), 4-4, 5-2, 5-7, 5-10, 5-12, 5-13, 5-16, 5-26
Instruction Register (\texttt{IR}), 5-25
Instruction Request (\texttt{IREQ}), 5-2, 5-10, 5-13, 5-16–5-17
Instruction Request Type (\texttt{IREQT}), 5-2
Instruction ROM, 5-7
Instruction set, 2-6
<table>
<thead>
<tr>
<th>Instruction Transfer, 5-6</th>
<th>LOAD (Load), 7-40</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction, listing by operation code, 8-137</td>
<td>Load and Lock (LOADL), 5-20, 7-36</td>
</tr>
<tr>
<td>Instruction, special, 4-16</td>
<td>Load and Set (LOADSET), 7-35-7-36</td>
</tr>
<tr>
<td>Instructions, three address, 2-1</td>
<td>Load data, forwarding, 1-7</td>
</tr>
<tr>
<td>Integer addition, 7-18</td>
<td>Load Multiple (LOADM), 1-7, 4-14, 4-15, 5-21, 7-27</td>
</tr>
<tr>
<td>Integer division, 7-19</td>
<td>Load Test Instruction, 5-4, 5-21, 5-25</td>
</tr>
<tr>
<td>Integer multiplication, 7-18</td>
<td>Load Test Instruction mode, 2-17</td>
</tr>
<tr>
<td>Integer subtraction, 7-18</td>
<td>LOADL (Load and Lock), 5-20, 7-36</td>
</tr>
<tr>
<td>Interrupt (IN), Timer Reload Reg., 7-36</td>
<td>LOADM, 1-7, 4-14, 4-15, 5-21, 7-27</td>
</tr>
<tr>
<td>Interrupt handling, 7-29</td>
<td>Loads and Stores, 1-6</td>
</tr>
<tr>
<td>Interrupt or Trap, 3-62</td>
<td>Loads and Stores, overlapped, 7-39</td>
</tr>
<tr>
<td>Interrupt processing, user-defined, 2-11</td>
<td>LOADSET (Load and Set), 7-35-7-36</td>
</tr>
<tr>
<td>Interrupt Request (INTR(3–0)), 5-4, 5-29-5-30</td>
<td>Load/Store Instruction Format, 3-47</td>
</tr>
<tr>
<td>Interrupt return, 7-30, 8-84, 8-85</td>
<td>Local registers, 4-12, 7-4</td>
</tr>
<tr>
<td>Interrupt simulation, 7-30</td>
<td>Local registers, stack pointer, 2-1</td>
</tr>
<tr>
<td>Interrupts, 1-8, 2-11, 2-13, 5-20, 7-35</td>
<td>Lock (LK), 7-36</td>
</tr>
<tr>
<td>Interrupts, coprocessor, 6-4</td>
<td>Lock (LOCK), 5-1, 5-20</td>
</tr>
<tr>
<td>Interrupts, dynamically nested, 2-11, 7-30</td>
<td>Lock output, 5-20</td>
</tr>
<tr>
<td>Interrupts, external, 5-29</td>
<td>Logical operation, 8-5</td>
</tr>
<tr>
<td>INTR(3–0) (Interrupt Request), 5-4, 5-29</td>
<td>LRU (Least Recently Used Entry) LRU Rec. Reg., 7-32</td>
</tr>
<tr>
<td>INV, 4-5, 7-34</td>
<td>MULTIPLU, 7-17</td>
</tr>
<tr>
<td>IPA (Indirect Pointer A) Indirect Pointer A Reg., 4-13, 8-2</td>
<td>MULTYPL, 7-17</td>
</tr>
<tr>
<td>IPB, 4-13, 5-12</td>
<td>MULTMU, 7-17</td>
</tr>
<tr>
<td>IPB (Indirect Pointer B) Indirect Pointer B Reg., 8-2</td>
<td>M (Immediate), 8-7</td>
</tr>
<tr>
<td>IPB (Instruction Prefetch Buffer), 4-3</td>
<td>Mapping activation record, 7-3, 7-7</td>
</tr>
<tr>
<td>IPB allocated state, 4-4</td>
<td>Master and slave switching, 5-33</td>
</tr>
<tr>
<td>IPB available state, 4-4</td>
<td>Master/slave operation, 2-19, 5-32</td>
</tr>
<tr>
<td>IPB error state, 4-4</td>
<td>Master/slave checking, 5-32</td>
</tr>
<tr>
<td>IPB state transitions, 4-4</td>
<td>Master/Slave Error (MSERR), 5-5, 5-32</td>
</tr>
<tr>
<td>IPB valid state, 4-4</td>
<td>Memory management, 1-7, 2-12, 7-30</td>
</tr>
<tr>
<td>IPC (Indirect Pointer C) Indirect Pointer C Reg., 4-13, 8-2</td>
<td>Memory Management Unit, 2-12, 2-15</td>
</tr>
<tr>
<td>IR (Instruction Register), 5-25, 5-26</td>
<td>Memory protection, 7-28, 7-31</td>
</tr>
<tr>
<td>IRRY (Intr. Ready), 4-4, 5-2, 5-7, 5-11, 5-12, 5-13, 5-16, 5-26</td>
<td>Memory, critical areas, 7-31</td>
</tr>
<tr>
<td>IREQ (Instruction Request), 5-2, 5-10-5-11, 5-16-5-17</td>
<td>Merge, byte-aligned, 7-27</td>
</tr>
<tr>
<td>IREQT (Instruction Request Type), 5-2</td>
<td>MIPs, 1-2</td>
</tr>
<tr>
<td>IRET, 7-35</td>
<td>MMU, 4-8-4-9, 4-15, 4-16, 4-22-4-23, 7-28, 7-31</td>
</tr>
<tr>
<td>IRETINV, 7-34, 7-35</td>
<td>MMU Configuration Register, 7-41</td>
</tr>
<tr>
<td>JMP, 7-39</td>
<td>MMU Programmable (MPGM(1-0)), 5-2, 5-6</td>
</tr>
<tr>
<td>Jump, large range, 7-25</td>
<td>Mode, Executing, 2-16</td>
</tr>
<tr>
<td>Large call range, 7-25</td>
<td>Mode, Halt, 2-17, 5-22</td>
</tr>
<tr>
<td>Large constants, 7-25</td>
<td>Mode, Pipeline Hold, 2-16, 4-23</td>
</tr>
<tr>
<td>Large data blocks, movement, 7-27</td>
<td>Mode, Step, 2-17, 5-24</td>
</tr>
<tr>
<td>Large jump range, 7-25</td>
<td>Mode, Wait, 2-16, 3-59</td>
</tr>
<tr>
<td>Least Recently Used Entry (LRU), LRU Rec. Reg., 7-32</td>
<td>Monitoring critical areas, 7-32</td>
</tr>
<tr>
<td>LK (Lock), 7-35</td>
<td>Move To Special Register (MTSR), 5-26, 7-16, 8-4</td>
</tr>
<tr>
<td></td>
<td>MPGM(1-0) (MMU Programmable), 5-2, 5-6</td>
</tr>
<tr>
<td></td>
<td>MSERR (Master/Slave Error), 5-5, 5-32</td>
</tr>
</tbody>
</table>
MTSR (Move To Special Register), 5-26, 7-16, 8-4
Multi-precision, 7-18
Multi-processing, 7-35
Multiple masters, 2-18, 5-19
Multiple slaves, 5-19
Multiplication integer, 7-18
N (Negative) ALU Status Reg., 8-4
NN (Not Needed) Channel Control Reg., 4-14, 7-35
NO-OP, 7-26, 7-38–7-39
Nomenclature, 8-1
Non-Coprocessor Load/Store Format, 3-47
Non-sequential fetch, 4-10
Non-sequential instruction fetch, 4-10, 5-4
Normal, 5-5
Not Needed (NN), Channel Control Reg., 4-14, 7-35
Notation, 8-1
Numbering conventions, data-unit, 2-9
Old Processor Status Register, 2-2, 3-78
OP (operation code), 8-7
Operating system calls, 7-17
Operation code (OP), 8-7
Operator symbols, 8-2
OPT(2–0) (Option Control), 5-3, 5-6
OPT (Option), 6-3
Option (OPT), 6-3
Option Control (OPT(2–0)), 5-3, 5-6
OR, 7-39
Organization, Branch Target Cache memory, 4-5
Organization, data flow, 1-5
Out args, 7-2
Out of range, 8-5
Out of Range trap, 8-5
OV (Overflow), 7-36
Overflow (OV), 7-36
Overflow (V), ALU Status Reg., 8-5
Overflow, signed, 8-5
Overflow, unsigned, 8-5
Overlapped loads, 1-6
Overlapped store, 1-6
Page change information, 7-31
Page fault, 7-34
Page reference, 7-31
Page size, virtual, 7-31
Paging, 7-31, 7-33
PC (Program Counter), 8-10, 8-2
PC Buffer, 4-10
PC Bus, 4-11
PC MUX, 4-11
PC2–PC0, 4-11, 4-12, 7-29
PC1 (Program Counter 1) Program Counter 1 Reg., 5-20
PDA (Pipelined Data Access), 5-3, 5-9
PEN (Pipeline Enable), 5-2, 5-9, 5-10
PIA (Pipelined Instruction Acknowledge), 5-2, 5-9, 5-10
PID (Process Identifier) MMU Configuration Reg., 3-18, 3-72, 3-77
Pipeline, 1-7, 2-13, 4-2
Pipeline data dependencies, 4-13
Pipeline dependency, 4-13
Pipeline Enable (PEN), 5-2
Pipeline features exposed, 7-1, 7-37
Pipeline Hold, 4-2, 4-14
Pipeline Hold mode, 4-16, 4-23, 5-4
Pipeline interlocks, 1-11
Pipelined access, 5-8, 5-9, 5-11
Pipelined access protocol, 2-18
Pipelined addresses, 1-4
Pipelined Data Access (PDA), 5-3, 5-9, 5-10
Pipelined Instruction Access (PIA), 5-2, 5-9, 5-10
Port A, 4-13
Port B, 4-13
Port C, 4-13
Prefetching, 1-5
Primary access, 5-9, 5-10
Prioritizer, 2-16, 4-19
Priority, 5-29
Process Identifier (PID), MMU Configuration Reg., 3-18, 3-72, 3-77
Processor, 5-9
Processor cancellation, 5-16
Processor-generated clock, 5-31
Processor modes, 2-16
Processor preemption, 5-16
Processor reset, 5-30
Processor termination, 5-16
Program Counter (PC), 4-10
Program Counter Unit, 2-15, 4-10
Programming, Coprocessor, 2-13
Protected segment, 2-2
Protection bits, supervisor mode, 7-28
Protection bits, TLB, 7-31
Protection bits, user mode, 7-28
Protection checking, 4-15
Protection Violation Trap, 7-17
Protection violation, TLB, 7-28
Protection, external access, 7-28
Protection, memory, 7-28

INDEX 1-5
<table>
<thead>
<tr>
<th>Term</th>
<th>Page(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Syntax, assembler</td>
<td>8-4</td>
</tr>
<tr>
<td>SYSCLK (System Clock)</td>
<td>5-5, 5-28, 5-30-5-33</td>
</tr>
<tr>
<td>System diagram</td>
<td>1-3</td>
</tr>
<tr>
<td>System interface</td>
<td>2-17</td>
</tr>
<tr>
<td>System programming</td>
<td>7-28</td>
</tr>
<tr>
<td>System protection</td>
<td>7-28</td>
</tr>
<tr>
<td>System-generated clock</td>
<td>5-31</td>
</tr>
<tr>
<td>Taking Interrupt or Trap</td>
<td>5-4</td>
</tr>
<tr>
<td>TARGET</td>
<td>8-2</td>
</tr>
<tr>
<td>Target</td>
<td>4-8</td>
</tr>
<tr>
<td>Target instruction</td>
<td>4-6, 4-7</td>
</tr>
<tr>
<td>Task Identifier (TID), TLB Entry Word 0</td>
<td>2-12</td>
</tr>
<tr>
<td>Task identifiers</td>
<td>1-7</td>
</tr>
<tr>
<td>TC (Transfer Control)</td>
<td>6-2</td>
</tr>
<tr>
<td>TCV (Timer Count Value) Timer Counter Reg.</td>
<td>7-36</td>
</tr>
<tr>
<td>TE (Trace Enable)</td>
<td>3-78</td>
</tr>
<tr>
<td>Terminology</td>
<td>8-3</td>
</tr>
<tr>
<td>TEST (Test mode)</td>
<td>2-17, 5-5, 5-28</td>
</tr>
<tr>
<td>Test/Development interface</td>
<td>2-18, 5-21</td>
</tr>
<tr>
<td>Timer Count Register</td>
<td>7-36</td>
</tr>
<tr>
<td>Timer Count Value (TCV) Timer Counter Reg.</td>
<td>7-36</td>
</tr>
<tr>
<td>Timer Counter Register</td>
<td>5-24</td>
</tr>
<tr>
<td>Timer Facility</td>
<td>2-13, 5-24, 7-36</td>
</tr>
<tr>
<td>Timer interrupts</td>
<td>7-36</td>
</tr>
<tr>
<td>Timer Reload Register</td>
<td>7-36</td>
</tr>
<tr>
<td>Timer Reload Value (TRV)</td>
<td>7-36</td>
</tr>
<tr>
<td>TLB (Translation Look-Aside Buffer)</td>
<td>1-7</td>
</tr>
<tr>
<td>TLB misses</td>
<td>5-12</td>
</tr>
<tr>
<td>TLB Miss handling</td>
<td>7-31</td>
</tr>
<tr>
<td>TLB[\pi]</td>
<td>8-2</td>
</tr>
<tr>
<td>TLB registers</td>
<td>2-5, 3-32</td>
</tr>
<tr>
<td>TLB reload</td>
<td>7-29, 7-31</td>
</tr>
<tr>
<td>TLB, second-level</td>
<td>7-32</td>
</tr>
<tr>
<td>TP (Trace Pending)</td>
<td>3-78</td>
</tr>
<tr>
<td>Trace Enable (TE)</td>
<td>3-78</td>
</tr>
<tr>
<td>Trace Facility</td>
<td>2-13</td>
</tr>
<tr>
<td>Trace Pending (TP)</td>
<td>3-78</td>
</tr>
<tr>
<td>Trace Trap</td>
<td>3-78</td>
</tr>
<tr>
<td>Transfer Control (TC)</td>
<td>6-2</td>
</tr>
<tr>
<td>Transfer, coprocessor</td>
<td>6-2, 6-5</td>
</tr>
<tr>
<td>Translation Look-Aside Buffer (TLB)</td>
<td>1-7</td>
</tr>
<tr>
<td>Translation, early address</td>
<td>1-7</td>
</tr>
<tr>
<td>Translation, instruction address</td>
<td>4-22-4-23</td>
</tr>
<tr>
<td>Translation, Load Multiple address</td>
<td>4-23</td>
</tr>
<tr>
<td>Translation, Store Multiple address</td>
<td>4-23</td>
</tr>
<tr>
<td>Translation, virtual to physical</td>
<td>1-7</td>
</tr>
<tr>
<td>Trap Request (TRAP(1-0))</td>
<td>5-4, 5-29</td>
</tr>
<tr>
<td>TRAP(1-0) (Trap Request)</td>
<td>5-4, 5-29</td>
</tr>
<tr>
<td>Traps, 1-8, 2-11, 2-13, 5-20, 7-28, 7-35</td>
<td></td>
</tr>
<tr>
<td>Traps, external</td>
<td>5-29</td>
</tr>
<tr>
<td>TRUE</td>
<td>8-2</td>
</tr>
<tr>
<td>TRV (Timer Reload Value) Timer Reload Reg.</td>
<td>7-36</td>
</tr>
<tr>
<td>TWIN</td>
<td>8-2</td>
</tr>
<tr>
<td>UA (User Access)</td>
<td>6-3</td>
</tr>
<tr>
<td>Underflow, signed</td>
<td>8-3</td>
</tr>
<tr>
<td>Underflow, unsigned</td>
<td>8-3</td>
</tr>
<tr>
<td>User Access (UA)</td>
<td>6-3</td>
</tr>
<tr>
<td>User-defined</td>
<td>5-6</td>
</tr>
<tr>
<td>V (Overflow) ALU Status Reg.</td>
<td>8-4</td>
</tr>
<tr>
<td>Valid bits, Branch Target Cache memory</td>
<td>4-5, 4-8</td>
</tr>
<tr>
<td>Valid instructions in Cache</td>
<td>4-9</td>
</tr>
<tr>
<td>Valid transitions</td>
<td>5-23</td>
</tr>
<tr>
<td>VE (Valid Entry) TLB Entry Word 0</td>
<td>4-5</td>
</tr>
<tr>
<td>Vector Area, 1-8, 2-11, 7-29</td>
<td></td>
</tr>
<tr>
<td>Vector Area Base address</td>
<td>2-2</td>
</tr>
<tr>
<td>Vector number</td>
<td>7-17</td>
</tr>
<tr>
<td>Vectors, table of, 2-12, 3-59</td>
<td></td>
</tr>
<tr>
<td>Virtual-page boundary</td>
<td>5-17</td>
</tr>
<tr>
<td>Virtual-page size</td>
<td>7-30</td>
</tr>
<tr>
<td>Virtual to physical address translation</td>
<td>1-7, 3-72</td>
</tr>
<tr>
<td>VN, 8-3, 8-7</td>
<td></td>
</tr>
<tr>
<td>Wait mode</td>
<td>2-16</td>
</tr>
<tr>
<td>Warm start</td>
<td>7-32</td>
</tr>
<tr>
<td>Warn (WARN)</td>
<td>4-24, 5-4, 5-30, 5-31</td>
</tr>
<tr>
<td>Z (Zero) ALU Status Reg.</td>
<td>8-4</td>
</tr>
<tr>
<td>Zero (Z)</td>
<td>8-4</td>
</tr>
</tbody>
</table>
VITEL ELECTRONICS

MIKE RAICK ASSOCIATES

LORENZ SALES

DOLFUSS ROOT

COM-TEK SALES, INC

TEXAS,

ARIZONA

CANADA,

ILLINOIS,

PENNSYLVANIA

NEW

characteristics

Advanced Micro Devices reserves the right to make changes in its product without notice in order to improve design or performance characteristics. The performance characteristics listed in this document are guaranteed by specific tests, guard banding, design and other practices common to the industry. For specific testing details, contact your local AMD sales representative. The company assumes no responsibility for the use of any circuits described herein.

North American

ALABAMA ........................................... (205) 882-9122
ARIZONA ........................................... (602) 244-4400
CALIFORNIA ...................................... (213) 645-1524

Newport Beach ................................ (714) 752-6282
Sacramento(Roseville) ......................... (916) 786-6700
San Diego ........................................ (619) 560-7030
San Jose ......................................... (408) 452-4500
Woodland Hills ................................ (818) 992-4155

CANADA, Ontario, Kanata .......... (613) 592-0060
Willowdale ..................................... (416) 224-5193

COLORADO ..................................... (303) 741-2900
CONNETICUT ................................ (203) 264-7600

FLORIDA, Clearwater ....................... (813) 530-9971

Ft. Lauderdale ................................ (305) 776-2001

Orlando (Longwood) ......................... (407) 862-9203

GEORGIA ........................................ (404) 449-7920

ILLINOIS, Chicago (Itasca) ................ (708) 773-4422

Naperville ..................................... (708) 505-5117

KANSAS ......................................... (913) 451-3115

MARYLAND ..................................... (301) 381-3790

MASSACHUSETTS ............................... (617) 273-3970

MINNESOTA ..................................... (612) 938-0001

NEW JERSEY .....................................

Cherry Hill .................................... (609) 682-2900

 Parsippany .................................... (201) 299-0002

 NEW YORK, Liverpool ....................... (315) 457-5400

 Brewster ....................................... (914) 279-8323

 Rochester ..................................... (716) 272-9020

 NORTH CAROLINA ...............................

 Harrisburg .................................... (704) 455-1010

 Raleigh ........................................ (919) 878-8111

 OHIO, Columbus (Westerville) ............. (614) 891-6455

 Dayton ......................................... (513) 439-0268

 PENNSYLVANIA ................................

 (215) 396-8006

 SOUTH CAROLINA ............................. (803) 772-6760

 TEXAS .......................................... (512) 346-7830

 Austin .......................................... (512) 934-9099

 Dallas .......................................... (214) 934-9099

 Houston ........................................ (713) 765-9001

 UTAH ........................................... (801) 264-2900

 International

 BELGIUM, Bruxelles ......................... (02) 771-91-42

 FAX ............................................. (02) 762-37-12

 TLX ............................................. 846-61028

 FRANCE, Paris ................................. (1) 49-75-10-10

 FAX ............................................. (1) 49-75-10-13

 WEST GERMANY, Hannover area .......... (0511) 736085

 FAX ............................................. (0511) 721254

 München ........................................ (089) 1411-0

 México .......................................... (01) 606490

 TLX ............................................. 523883

 Stuttgart ..................................... (0711) 62 33 77

 FAX ............................................. (0711) 621517

 TLX ............................................. 721882

 HONG KONG, Wanchai ....................... 852-8654525

 FAX ............................................. 852-8654335

 ITALY, Milan .................................. (02) 3390541

 FAX ............................................. (02) 3533241

 TLX ............................................. 3498000

 JAPAN, Tokyo ................................ (03) 346-7550

 FAX ............................................. 462-29-8458

 KANAGAWA, Tokyo ......................... (03) 462-29-8458

 FAX ............................................. 462-29-8458

 KOREA, Seoul .................... (822) 784-0030

 LATIN AMERICA, Ft. Lauderdale ....... (305) 484-8600

 FAX ............................................. 485-9736

 NORWAY, Hovik ......................... (47) 509-05-51

 TWX: 623377

 SINGAPORE, TEL 65-3481188

 FAX ............................................. 65-3480161

 SWEDEN, Stockholm .................. (08) 733 03 50

 FAX ............................................. 733 02 85

 TAIWAN, TEL 866-2-723393

 FAX ............................................. 866-2-723342

 UNITED KINGDOM, Manchester area .. (0925) 826008

 (Warrington) .................................. (0925) 827693

 TWX: 5109554261 AMDFTL

 U.S.A: (305) 484-8600

 North American Representatives

 CANADA, Burnaby, B.C. - DAVETEK MARKETING (604) 430-3680

 CANADA, Ontowa - DAVETEK MARKETING (613) 249-1884

 Kanata, Ontario - VITEL ELECTRONICS (613) 592-0060

 Mississauga, Ontario - VITEL ELECTRONICS (416) 676-9720

 Lachine, Quebec - VITEL ELECTRONICS (514) 636-9591

 INDIA, INTERMOUNTAIN TECH MKTG, INC (208) 888-6071

 ILLINOIS, HEARTLAND TECH MKTG, INC (312) 577-9222

 INDIANA, Huntington - ELECTRONIC MARKETING CONSULTANTS, INC (317) 921-3450

 Indianapolis - ELECTRONIC MARKETING CONSULTANTS, INC (317) 921-3450

 IOWA, LORENZ SALES ........................ (319) 377-4666

 KANSAS, Merriam - LORENZ SALES (913) 469-1312

 KENTUCKY, ELECTRONIC MARKETING CONSULTANTS, INC (317) 921-3452

 MICHIGAN, Brighton - MIKE RACK ASSOCIATES (313) 644-5040

 Holland - COM-TEK SALES, INC (616) 392-7100

 Novi - COM-TEK SALES, INC (313) 344-1409

 MONTANA, Mel Foster Tech. Sales, Inc. (612) 914-9790

 MISSOURI, LORENZ SALES .................. (314) 966-6587

 NEBRASKA, LORENZ SALES .................. (402) 745-4660

 NEW MEXICO, THORSON DESERT STATES (505) 293-8555

 NEW YORK, East Syracuse - NYCOM, INC (315) 437-8343

 Woodbury - COMPONENT CONSULTANTS, INC (516) 364-8020

 OHIO, Centererville - DOLFUSS ROOT & CO (513) 433-6776

 Columbus - DOLFUSS ROOT & CO (614) 885-4844

 Strongsville - DOLFUSS ROOT & CO (216) 899-9370

 OREGON, ELECTRA TECHNICAL SALES, INC (503) 643-5074

 PENNSYLVANIA, RUSSELL F. CLARK CO., INC (412) 242-9500

 PUERTO RICO, COMP REP ASSOC, INC (809) 746-6550

 UTAH, RP MARKETING ......................... (801) 589-0631

 WASHINGTON, ELECTRA TECHNICAL SALES, INC (206) 821-7442

 WISCONSIN, HEARTLAND TECH MKTG, INC (414) 792-0920

 "sales offices as of 12/88"