Texas Instruments
SemiconductorsDSP SolutionsFeedBackTI Home
Product InformationIn the NewsToolsLiteratureSupport

Digital Signal Processing
Blue Band
DSP Products 'C4x

TMS320C4x Features

The TMS320C4X devices are 32-bit floating point Digital Signal Processors optimized for parallel processing. The 'C4X family combines a high performance CPU and DMA controller with up to six communication ports to meet the needs of multiprocessor and I/O intensive applications. All 'C4X devices are compatible with TI's multi-chip development environment. Each device contains an on-chip analysis module which supports hardware breakpoints for parallel-processing development and debugging. The 'C4X family is source code compatible with the TMS320C3X family of floating point DSPs.

'C4x Key Specifications

  • Performance up to 80 MFLOPS
  • Increased computational power
  • Increased communications power
  • Multi-processing capability
  • Scalability
  • Fault tolerance

'C4x Key Applications

  • High-speed communications
  • Virtual reality, simulators
  • Radar/sonar/image processing
  • 3-D graphics
  • Robotics/numeric control
  • Speech recognition
  • Telecom infrastructure

Features By Device


TMS320C4x CPU

The TMS320C4x has an independent multiplier and accumulator, and achieves up to 80 MFLOPS. Results are stored in any one of 12 extended-precision registers. These are 40-bit registers that store values with a 32-bit mantissa and an 8-bit exponent. These registers can serve as both the source and the destination for any arithmetic operation. The extended-precision registers are an extremely valuable resource for programming in assembly or C. These registers allow you to maintain intermediate results without storing data in memory. This results in higher performance assembly code and a more efficient C compiler.

To sustain 80 MFLOPS, the CPU has two independent auxiliary register arithmetic units (ARAUs), which can generate two addresses in a single cycle. The two ARAUs operate in parallel with the multiplier and ALU. They support addressing with displacements, addressing with index registers (IR0 and IR1), circular addressing, and bit-reversed addressing.

Features of the TMS320C4x CPU are:

  • High-speed internal parallelism: eight operations/cycle for maximum sustained performance
    • Floating-point/integer multiply
    • Floating-point/integer addition
    • Two data accesses
    • Zero-overhead branch and loop counter update
  • IEEE floating-point conversion
  • Divide and square root support for improved performance
  • Single-cycle byte and halfword manipulation capabilities
  • Register-based CPU


TMS320C4x Memory and Bus Structure

To realize the full performance of the 'C4x CPU, it is important to have a bus and memory architecture that can keep pace. The 'C4x fetches up to four 32-bit words each cycle: a program opcode, two CPU data operands, and a DMA data transfer. The internal buses can transfer all four words in parallel, relying on seven memory sources for data.

The 'C4x uses seven internal buses to access on-chip resources:

Program Address/Data: The CPU uses these buses to maintain instruction fetches every cycle.

Data Address/Data: In any cycle, the CPU can fetch two data operands because it has two data address buses and one data bus that can be accessed twice in a single cycle.

DMA Address/Data: The DMA uses these buses to perform DMA transfers in parallel with CPU operation.

With the internal buses in place to feed the DMA and CPU, the 'C4x-generation devices can use both internal and external data and program memory. Internally, the 'C4x has two 1K ´ 32-bit word blocks of dual-access RAM, providing up to four words of program or data in a single cycle. For external memory, the 'C40 has two identical 32-bit buses, which address up to 2G words of memory each. The 'C44 has two 24-bit external address buses, which address up to 16M words each. Each device has an on-chip instruction cache to boost performance when using slower external memory.


TMS320C4x Communication Ports

The communication ports on the 'C4x generation transfer up to 28 Mbytes/s each for asynchronous interprocessor communications or for servicing intensive I/O needs. The 'C40 has six ports, and the 'C44 has four. Each port has four control pins and eight data pins. These 12 pins provide a glueless interface to another 'C4x. The control pins combined with the control logic arbitrate with another device to determine data transfer timing and direction. Because the communication ports have built-in arbitration and control circuitry, you simply need to read data from and write data to the memory-mapped input and output FIFOs.

In a typical transfer, the DMA coprocessor or the CPU first writes to the output FIFO. Next, the communication port sends a request signal to the destination processor, which responds with an acknowledge. The communication port then transfers the word as four successive bytes. The destination processor receives the word in its input FIFO, where the destination DMA or CPU can read the contents. Note: the input and output FIFOs provide a 16-word ´ 32-bit buffer between communications ports.

Features of the 'C4x communication ports include:

  • Up to 28 Mbytes/s bidirectional interface on each communication port for high-speed and low-cost parallel-processor interface
  • 8-word deep input FIFO and 8-word deep output FIFO buffer
  • Automatic arbitration and handshaking for direct processor-to-processor connection


TMS320C4x DMA Coprocessor

With as many as six communication ports and two external buses, the TMS320C4x has an I/O capability of as much as 488 Mbytes/s. To service this tremendous speed, the 'C4x has a 6- or 12-channel DMA. The DMA operates independently of the CPU and has dedicated address and data buses to avoid bus conflicts.

The DMA is programmed to transfer data from any memory location to any other memory location (communication ports are memory-mapped). The DMA can begin a task based on CPU or external interrupts and can interrupt the CPU at the completion of a task. The DMA also includes a link pointer register that allows the DMA to program its next task without CPU intervention.

Since each communication port has transmit and receive capability, 12 DMA channels are needed if all six communication ports are being used in a bidirectional mode. The DMA has a split-mode operation dedicated to this function, allowing the DMA to service the 12 input and output FIFOs in the communication ports.

In the event that both the CPU and DMA are accessing the same resource, priorities can be assigned to resolve the conflict. Priority can be assigned to the CPU, the DMA, or mixed, where the CPU gets the first access followed by the DMA.

The 'C4x DMA coprocessor features:

  • Concurrent I/O to maximize sustained CPU performance
  • Autoinitialization
  • Up to 6 or 12 DMA channels for parallel data transfers
    • Data transfers to and from anywhere in memory
    • Three operations/cycle
      • 32-bit data transfer
      • Address register update
      • Transfer counter update
    • Performance up to 120 MOPS

SemiconductorsDSP SolutionsFeedBackTI Home
© Copyright 1997 Texas Instruments Incorporated. All rights reserved.
Trademarks, Important Notice!