R4000

A Toshiba R4000 microprocessor.

MIPS R4000 die shot.

The R4000 is a microprocessor developed by MIPS Computer Systems that implemented the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

Models

There were three configurations of the R4000: the R4000PC, an entry-level model with no support for a secondary cache; the R4000SC, a model with secondary cache but no multiprocessor capability; and the R4000MC, a model with secondary cache and support for the cache coherency protocols required by multiprocessor systems.

Description

The R4000 was a scalar superpipelined microprocessor with an eight-stage integer pipeline. During the first stage (IF), a virtual address for an instruction is generated and the instruction translation lookaside buffer (TLB) begins the translation of the address to a physical address. In the second stage (IS), translation is completed and the instruction is fetched from an internal 8 KB instruction cache. The instruction cache is direct-mapped and virtually indexed, physically tagged. It has a 16- or 32-byte line size. Architecturally, it could be expanded to 32 KB.

During the third stage (RF), the instruction is decoded and the register file is read. The MIPS III defines two register files, one for the integer unit and the other for floating-point. Each register file is 64 bits wide and contained 32 entries. The integer register file has two read ports and one write port, while the floating-point register file has two read ports and two write ports. Execution begins at stage four (EX) for both integer and floating-point instructions; and is written back to the register files when completed in stage eight (WB). Results may be bypassed if possible.

Integer execution

The R4000 has an arithmetic logic unit (ALU), a shifter, multiplier and divider and load aligner for executing integer instructions. The ALU consists of a 64-bit carry-select adder and a logic unit and is pipelined. The shifter is a 32-bit barrel shifter. It performs 64-bit shifts in two cycles, stalling the pipeline as a result. This design was chosen to save die area. The multiplier and divider are not pipelined and have significant latencies: multiplies have a 10- or 20-cycle latency for 32-bit or 64-bit integers, respectively; whereas divides have a 69- or 133-cycle latency for 32-bit or 64-bit integers, respectively. Most instructions have a single cycle latency. The ALU adder is also used for calculating virtual addresses for loads, stores and branches.

Load and store instructions are executed by the integer pipeline, and access the on-chip 8 KB data cache.

Floating-point execution

The R4000 has an on-die IEEE 754-1985-compliant floating-point unit (FPU), referred to as the R4010. The FPU is a coprocessor designated CP1^[1] (the MIPS ISA defined four coprocessors, designated CP0 to CP3). The FPU can operate in two modes, 32- or 64-bit which are selected by setting a bit, the FR bit, in the CPU status register. In 32-bit mode, the 32 floating-point registers become 32 bits wide when used to hold single-precision floating-point numbers. When used to hold double-precision numbers, there are 16 floating-point registers (the registers are paired).

The FPU can operate in parallel with the ALU unless there is a data or resource dependency, which causes it to stall. It contains three sub-units: an adder, a multiplier and a divider. The multiplier and divider can execute an instruction in parallel with the adder, but they use the adder in their final stages of execution, thus imposing limits to overlapping execution. Thus, under certain conditions, it can execute up to three instructions at any time, one in each unit. The FPU is capable of retiring one instruction per cycle.

The adder and multiplier are pipelined. The multiplier has a four-stage multiplier pipeline. It is clocked at twice the clock frequency of the microprocessor for adequate performance and uses dynamic logic to achieve the high clock frequency. Division has a 23- or 36-cycle latency for single- or double-precision operations and square-root has a 54- or 112-cycle latency. Division and square-root uses the SRT algorithm.

Memory management

The memory management unit (MMU) uses a 48-entry translation lookaside buffer to translate virtual addresses. The R4000 uses a 64-bit virtual address, but only implements 40 of the 64-bits for 1 TB of virtual memory. The remaining bits are checked to ensure that they contain zero. The R4000 uses a 36-bit physical address, thus is able to address 64 GB of physical memory.

Secondary cache

The R4000 (SC and MC configurations only) supported an external secondary cache with a capacity of 128 KB to 4 MB. The cache was accessed via a dedicated 128-bit data bus. The secondary cache could be configured either as a unified cache or as a split instruction and data cache. In the latter configuration, each cache can have a capacity of 128 KB to 2 MB.^[2] The secondary cache is physically indexed, physically tagged and has a programmable line size of 128, 256, 512 or 1,024 bytes. The cache controller is on-die. The cache is built from standard static random access memory (SRAM). The data and tag buses are ECC-protected.

System bus

The R4000 used a 64-bit system bus called the SysAD bus. The SysAD bus was an address and data multiplexed bus, that is, it used the same set of wires to transfer data and addresses. While this reduced bandwidth, it was also less expensive than providing a separate address bus, which would have required more pins and increased the complexity of the system. The SysAD bus can be configured to operate at half, a third or a quarter of the internal clock frequency. The SysAD bus generates its clock signal by dividing the operating frequency.

Transistor count, die dimensions and process details

The R4000 contained 1.2 million transistors.^[3] It was designed for a 1.0 µm two-layer metal complementary metal–oxide–semiconductor (CMOS) process. As MIPS was a fabless company, the R4000 was fabricated by partners in their own processes, which had a 0.8 µm minimum feature size.^[4]

Clocking

The R4000 generates the various clock signals from a master clock signal generated externally. For the operating frequency, the R4000 multiplies the master clock signal by two by use of an on-die phase-locked loop (PLL).

Packaging

The R4000PC was packaged in a 179-pin ceramic pin grid array (CPGA). The R4000SC and R4000MC were packaged in a 447-pin ceramic staggered pin grid array (SPGA). The pin out of the R4000MC is different from the R4000SC, with some pins which are unused on the R4000SC used for signals to implement cache coherency on the R4000MC. The pin-out of the R4000PC was similar to that of the PGA-packaged R4200 and R4600 microprocessors. This characteristic enabled a properly designed system to use any of the three microprocessors.

R4400

An example of a R4400MC microprocessor fabricated by Toshiba

NEC VR4400MC die shot

The R4400 was a further development of the R4000. It was announced in early November 1992. Samples of the microprocessor had been shipped to selected customers before then, with general availability in January 1993. The R4400 operated at higher clock frequencies of 100, 133, 150, 200, and 250 MHz. The only major improvement from the R4000 were larger primary caches, which were doubled in capacity to 16 KB each from 8 KB each. It contained 2.3 million transistors.

The R4400 was licensed by Integrated Device Technology (IDT), LSI Logic, NEC, Performance Semiconductor, Siemens AG and Toshiba. IDT, NEC, Siemens and Toshiba fabricated and marketed the microprocessor. LSI Logic used the R4400 in custom products. Performance Semiconductor sold their logic division to Cypress Semiconductor where the MIPS microprocessor products were discontinued.

NEC marketed their version as the VR4400. The first version, a 150 MHz part, was announced in November 1992. Early versions were fabricated in a 0.6 µm process.^[5] In mid-1995, a 250 MHz part began sampling. It was fabricated in a 0.35 µm four-layer-metal process.^[6] NEC also produced the MR4401, a ceramic multi-chip module (MCM) that contained a VR4400SC with ten 1 Mbit SRAM chips that implemented a 1 MB secondary cache. The MCM was pin-compatible with the R4x00PC. The first version, a 150 MHz part, was announced in 1994. In 1995, a 200 MHz part was announced.

Toshiba marketed their version as the TC86R4400. A 200 MHz part containing 2.3 million transistors and measuring 134 mm² fabricated in a 0.3 µm process was introduced in mid-1994. The R4400PC was priced at $1,600, the R4400SC at $1,950, and the R4400MC at $2,150 in quantities of 10,000.^[7]

Users

The R4400 was used by:

Carrera Computers in their Windows NT personal computers and workstations^[8]
Concurrent Computer Corporation in their real-time multiprocessor Maxion systems^[9]
DeskStation Technology in their Windows NT personal computers and DeskStation Tyne workstation^[10]
Digital Equipment Corporation in their DECstation 5000/260 workstation and server
NEC Corporation in their RISCstation workstations, RISCserver servers, and Cenju-3 supercomputer^[11]
NeTPower in their Windows NT workstations and servers
Pyramid Technology used the R4400MC in their Nile Series servers^[12]
Siemens Nixdorf Informationssysteme (SNI) in their RM-series UNIX servers and SR2000 mainframe
Silicon Graphics, Inc. in their Onyx, Indigo, Indigo2, and Indy workstations; and in their Challenge server
Tandem Computers in their NonStop Himalaya fault-tolerant servers

Core logic chipsets

The R4000 and R4400 required external core logic to interface to the system. Both commercially available and proprietary core logic were developed for these microprocessors. Proprietary designs were developed by system vendors such as SGI for use in its own systems. Commercial chipsets were developed by Acer, and MIPS microprocessors fabricators NEC, and Toshiba. Acer developed the PICA chipset. Toshiba developed the Tiger Shark chipset, which adapted the SysAD bus to an i486-compatible system bus.^[13]

Notes

↑ MIPS R4000 Microprocessor User's Manual, Second Edition, p. 152
↑ Heinrich, "MIPS R4000 Microprocessor User's Manual", p. 248
↑ Mirapuri, "The Mips R4000 Processor", p. 10
↑ Mirapuri, "The Mips R4000 Processor", p. 21
↑ "NEC VR4400 Clock-Doubling RISC Has 2m Transistors"
↑ "NEC Ready With 250MHz Version Of The 64-Bit MIPS R4400 RISC"
↑ "Toshiba Has 200MHz MIPS R4400"
↑ "...And From Carrera Computers"
↑ "Concurrent Multiprocessors Feature New Bus Architecture"
↑ "MIPS R-Based Windows NT Personal Computers From Deskstation..."
↑ "NEC Goes After The Business Market With Its Latest Line Of RISC UNIX Workstations"
↑ "Pyramid Technology Aims To Crash The Mainframe With Nile Series"
↑ "Toshiba Samples 80486-Bus Chip Set For R-Series"

References

Heinrich, Joe. MIPS R4000 Microprocessor User's Manual, Second Edition.
Sunil Mirapuri, Michael Woodacre, Nader Vasseghi, "The Mips R4000 Processor," IEEE Micro, vol. 12. no. 2, pp. 10–22, March/April 1992

MIPS microprocessors

Application
Processors

MIPS32 compatible	Ingenic XBurst JZ4720 Ben NanoNote JZ4730 (Skytone Alpha-400) JZ4740 (Dingoo A320) JZ4750 (Game Gadget) JZ4760 Velocity Micro T103 Cruz Velocity Micro T301 Cruz JZ4770 Ainol Novo7 Paladin NEOGEO-X GCW-Zero JZ4780

Embedded
Microcontrollers

M4K	Microchip Technology PIC32MX

4Kc/4KEc	ATI/AMD/Broadcom Xilleon

Networking

4Kc/4KEc	Qualcomm Atheros AR2313 AR2318 MediaTek RT2880 Texas Instruments/Infineon/Lantiq AR7 Lantiq AMAZON

24Kc/24KEc	Qualcomm Atheros AR7240 AR7161 AR9132 AR9331 MediaTek RT3050 RT3350 RT5350 RT6856 Lantiq DANUBE VINAX

34Kc	Lantiq AR188 VRX288 GRX388 Ikanos Fusiv Vx175/173 Fusiv Vx180 Fusiv Vx185/183

74Kc	Qualcomm Atheros AR9344 QCA9558 MediaTek RT3662 RT3883 Broadcom BCM4706

MIPS32 compatible	Broadcom various Cavium various Alchemy Semiconductor Alchemy RMI Corporation XLR

5Kc	Marvell 88E6318 "Link Street"

MIPS64 compatible	Broadcom various Cavium Octeon

Gaming

various	PlayStation 1 MIPS R3000A-compatible Nintendo 64 NEC VR4300 PlayStation Portable R4000-based PlayStation 2 Emotion Engine

Supercomputer

various	Silicon Graphics SGI Challenge SGI Origin 2000 SGI Origin 3000 and Onyx 3000

MIPS64 compatible	Loongson SiCortex

Classic
Processors

MIPS I	R2000 R2000A R3000 R3000A

MIPS II	R6000

MIPS III	R4000 R4400 R4200 R4300i R4600 R4700

MIPS IV	R5000 R8000 R10000 R12000 R12000A R14000 R14000A R16000 R16000A R18000

MIPS V

This article is issued from Wikipedia - version of the Wednesday, April 06, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.