TMS320C4x
The TMS320C4x is the second generation of 32-bit floating point digital signal processors. The first family member, the TMS320C40, was introduced in 1990. TMS320C4x family members target multiprocessor floating-point DSP systems for scientific, industrial, and military applications. The TMS320C4x is similar to (and object-code compatible with) its predecessor, TMS320C3x.
Key features of the TMS320C4x
The TMS320C4x has several key features:
- IEEE floating-point conversion for ease of use
- Register-based CPU
- Single-cycle byte and half-word manipulation capabilities
- Divide and square root support for improved performance
- On-chip memory includes 2K words of SRAM, 128 words of program cache, and boot loader
- Two external buses providing an address reach of up to 4 gigawords
- Two memory-mapped 32-bit timers
- 6 and 12 channel DMA
- Up to six communication ports for multiprocessor communication
- Idle mode for reduced power consumption
Architecture
Central Processing Unit (CPU)
The ’C4x’s CPU has a register-based architecture. The CPU consists of several components:
Floating-point/integer multiplier- The multiplier performs single-cycle multiplications on 32-bit integer and 40-bit floating-point values. The ’C4x implementation of floating-point arithmetic allows for floating-point operations at fixed point speeds via a 25-ns instruction cycle and a high degree of parallelism.
Arithmetic Logic Unit (ALU)- The ALU performs single-cycle operations on 32-bit integer, 32-bit logical, and 40-bit floating-point data, including single-cycle integer and floating-point conversions. Results of the ALU are always maintained in 32-bit integer or 40-bit floating-point formats.
32-bit barrel shifter- The barrel shifter is coupled to the ALU and can perform shifts of up to 32 bits left or right. The shifter supports arithmetic shifts, logical shifts, and rotate-through-carry operations.
Internal buses (CPU1/CPU2 and REG1/REG2)- Four internal buses, CPU1, CPU2, REG1, and REG2, carry two operands from memory and two operands from the register file, thus allowing parallel multiplies and adds/subtracts on four integer or floating-point operands in a single cycle.
Auxiliary register arithmetic units (ARAU)- The two auxiliary register arithmetic units (ARAU0 and ARAU1) can generate two addresses in a single cycle. The ARAUs operate in parallel with the multiplier and ALU. They support addressing with displacements, index registers (IR0 and IR1), and circular and bit-reversed addressing.
CPU Primary register file- The ’C4x primary register file provides 32 registers in a multiport register file that is tightly coupled to the CPU. All of the primary register file registers can be operated upon by the multiplier and ALU and can be used as general-purpose registers.
CPU Expansion Register File- Besides the CPU primary register file, the expansion register file contains two special registers that act as pointers:
- The IVTP register points to the interrupt-vector table (IVT), which defines vectors for all interrupts.
- The TVTP register points to the trap vector table (TVT), which defines vectors for 512 traps.
Memory organization
The total memory reach of the ’C4x is 4G 32-bit words. Program memory (on chip RAM or ROM and external memory) as well as registers affecting timers,communication ports, and DMA channels are contained within this space. This allows tables, coefficients, program code, and data to be stored in either RAM or ROM. Thus, memory usage is maximized, and memory space allocated as desired.
Memory Map- The memory map for each processor is shown in Figure. The level at the external pin ROMEN determines whether or not the first megaword of memory addresses the internal ROM or external memory. The maps illustrate the entire address space of the ’C40 and ’C44. The value of ROMEN affects only the first megaword of memory:
Memory Addressing Modes- The ’C4x supports a base set of general-purpose instructions as well as arithmetic- intensive instructions that are particularly suited for digital signal processing and other numeric-intensive applications.
The following list shows the addressing modes with their addressing types:
- General addressing modes:
- Register. The operand is a CPU register.
- Immediate. The operand is a 16-bit immediate value.
- Direct. The operand is the contents of a 32-bit address
- Indirect. A 32-bit auxiliary register indicates the address of the operand.
- Three-operand addressing modes:
- Register. (same as for general addressing mode).
- Indirect. (same as for general addressing mode).
- Immediate. The operand is an 8-bit immediate value.
- Parallel addressing modes:
- Register. The operand is an extended-precision register.
- Indirect. (same as for general addressing mode).
- Branch addressing modes:
- Register. (same as for general addressing mode).
- PC-relative. A signed 16-bit displacement or a 24-bit displacement is added to the PC.
Internal buses
A large portion of the ’C4x’s high performance is due to internal busing and parallelism. Separate buses allow for parallel program fetching, data accessing, and DMA accessing:
- Program buses- PADDR and PDATA
- Data buses- DADDR1, DADDR2, and DDATA
- DMA buses- DMAADDR and DMADATA
External bus operation
The ’C4x provides two identical external interfaces: the global memory interface and the local memory interface. Each consists of a 32-bit data bus, a 31-bit (’C40) or 24-bit (’C44) address bus, and two sets of control signals. Both buses can be used to address external program/data memory or I/O space.
Interrupts
The ’C4x supports four external interrupts (IIOF3–0), a number of internal interrupts, a non-maskable external NMI interrupt, and a non-maskable external RESET signal, which sets the processor to a known state. The DMA and communication ports have their own internal interrupts. When the CPU responds to the interrupt, the IACK pin can be used to signal an external interrupt acknowledge.
Peripherals
All ’C4x on-chip peripherals are controlled through memory-mapped registers on a dedicated peripheral bus. This peripheral bus is composed of a 32-bit data bus and a 32-bit address bus. The ’C4x peripherals include two timers and six (’C40) or four (’C44) communication ports.
Pipeline operation
Two characteristics of the ’C4x that contribute to its high performance are pipelining and concurrent I/O and CPU operation. Four functional units control ’C4x pipeline operation: fetch, decode, read, and execute. Pipelining is the overlapping or parallel operations of the fetch, decode, read, and execute levels of a basic instruction.
The four major units of the ’C4x pipeline structure and their functions are as follows:
- Fetch Unit (F)- Fetches the instruction words from memory and updates the program counter.
- Decode Unit (D)- Decodes the instruction word and performs address generation. Also, controls modification of the ARn registers in the indirect addressing mode, and of the stack pointer when PUSH to or POP from the stack occurs.
- Read Unit (R) If required, reads the operands from memory.
- Execute Unit (E) If required, reads the operands from the register file, performs the necessary operation, and writes results to the register file. If required, results of previous operations are written to memory.