TMS320C4x Features
The TMS320C4X devices are 32-bit floating point Digital Signal Processors optimized for parallel processing. The 'C4X family combines a high performance CPU and DMA controller with up to six communication ports to meet the needs of multiprocessor and I/O intensive applications. All 'C4X devices are compatible with TI's multi-chip development environment. Each device contains an on-chip analysis module which supports hardware breakpoints for parallel-processing development and debugging. The 'C4X family is source code compatible with the TMS320C3X family of floating point DSPs.
'C4x Key Specifications
- Performance up to 80 MFLOPS
- Increased computational power
- Increased communications power
- Multi-processing capability
- Scalability
- Fault tolerance
'C4x Key Applications
- High-speed communications
- Virtual reality, simulators
- Radar/sonar/image processing
- 3-D graphics
- Robotics/numeric control
- Speech recognition
- Telecom infrastructure
Features By Device
TMS320C4x CPU
The TMS320C4x has an independent multiplier and accumulator, and
achieves up to 80 MFLOPS. Results are stored in any one of 12
extended-precision registers. These are 40-bit registers that
store values with a 32-bit mantissa and an 8-bit exponent. These
registers can serve as both the source and the destination for
any arithmetic operation. The extended-precision registers are
an extremely valuable resource for programming in assembly or
C. These registers allow you to maintain intermediate results
without storing data in memory. This results in higher performance
assembly code and a more efficient C compiler.
To sustain 80 MFLOPS, the CPU has two independent auxiliary register
arithmetic units (ARAUs), which can generate two addresses in
a single cycle. The two ARAUs operate in parallel with the multiplier
and ALU. They support addressing with displacements, addressing
with index registers (IR0 and IR1), circular addressing, and bit-reversed
addressing.
Features of the TMS320C4x CPU are:
- High-speed internal parallelism: eight operations/cycle for
maximum sustained performance
- Floating-point/integer multiply
- Floating-point/integer addition
- Two data accesses
- Zero-overhead branch and loop counter update
- IEEE floating-point conversion
- Divide and square root support for improved performance
- Single-cycle byte and halfword manipulation capabilities
- Register-based CPU
TMS320C4x Memory and Bus Structure
To realize the full performance of the 'C4x CPU, it is important
to have a bus and memory architecture that can keep pace. The
'C4x fetches up to four 32-bit words each cycle: a program opcode,
two CPU data operands, and a DMA data transfer. The internal buses
can transfer all four words in parallel, relying on seven memory
sources for data.
The 'C4x uses seven internal buses to access on-chip resources:
Program Address/Data: The CPU uses these buses to maintain
instruction fetches every cycle.
Data Address/Data: In any cycle, the CPU can fetch two
data operands because it has two data address buses and one data
bus that can be accessed twice in a single cycle.
DMA Address/Data: The DMA uses these buses to perform
DMA transfers in parallel with CPU operation.
With the internal buses in place to feed the DMA and CPU, the
'C4x-generation devices can use both internal and external data
and program memory. Internally, the 'C4x has two 1K ´
32-bit word blocks of dual-access RAM, providing up to four words
of program or data in a single cycle. For external memory, the
'C40 has two identical 32-bit buses, which address up to 2G words
of memory each. The 'C44 has two 24-bit external address buses,
which address up to 16M words each. Each device has an on-chip
instruction cache to boost performance when using slower external
memory.
TMS320C4x Communication Ports
The communication ports on the 'C4x generation transfer up to
28 Mbytes/s each for asynchronous interprocessor communications
or for servicing intensive I/O needs. The 'C40 has six ports,
and the 'C44 has four. Each port has four control pins and eight
data pins. These 12 pins provide a glueless interface to another
'C4x. The control pins combined with the control logic arbitrate
with another device to determine data transfer timing and direction.
Because the communication ports have built-in arbitration and
control circuitry, you simply need to read data from and write
data to the memory-mapped input and output FIFOs.
In a typical transfer, the DMA coprocessor or the CPU first writes
to the output FIFO. Next, the communication port sends a request
signal to the destination processor, which responds with an acknowledge.
The communication port then transfers the word as four successive
bytes. The destination processor receives the word in its input
FIFO, where the destination DMA or CPU can read the contents.
Note: the input and output FIFOs provide a 16-word ´
32-bit buffer between communications ports.
Features of the 'C4x communication ports include:
- Up to 28 Mbytes/s bidirectional interface on each communication
port for high-speed and low-cost parallel-processor interface
- 8-word deep input FIFO and 8-word deep output FIFO buffer
- Automatic arbitration and handshaking for direct processor-to-processor
connection
TMS320C4x DMA Coprocessor
With as many as six communication ports and two external buses,
the TMS320C4x has an I/O capability of as much as 488 Mbytes/s.
To service this tremendous speed, the 'C4x has a 6- or 12-channel
DMA. The DMA operates independently of the CPU and has dedicated
address and data buses to avoid bus conflicts.
The DMA is programmed to transfer data from any memory location
to any other memory location (communication ports are memory-mapped).
The DMA can begin a task based on CPU or external interrupts and
can interrupt the CPU at the completion of a task. The DMA also
includes a link pointer register that allows the DMA to program
its next task without CPU intervention.
Since each communication port has transmit and receive capability,
12 DMA channels are needed if all six communication ports are
being used in a bidirectional mode. The DMA has a split-mode operation
dedicated to this function, allowing the DMA to service the 12
input and output FIFOs in the communication ports.
In the event that both the CPU and DMA are accessing the same
resource, priorities can be assigned to resolve the conflict.
Priority can be assigned to the CPU, the DMA, or mixed, where
the CPU gets the first access followed by the DMA.
The 'C4x DMA coprocessor features:
- Concurrent I/O to maximize sustained CPU performance
- Autoinitialization
- Up to 6 or 12 DMA channels for parallel data transfers
- Data transfers to and from anywhere in memory
- Three operations/cycle
- 32-bit data transfer
- Address register update
- Transfer counter update
- Performance up to 120 MOPS
|