Slide 94 of 97
Notes:
We can see that this is a completely optimized piece of code. The tools have eliminated all no NOP instructions, and we have a single-cycle loop executing six instructions.
We also notice a .M1x. What this is showing is that we can multiply data from the B register file and the A register file together. The X indicates a cross path. So I'm multiplying a B register by an A register and I put the result in the A register. In fact, we could have up to two cross paths: one from B to A and one from A to B.