# A Low-Cost, High-Performance PA-RISC Workstation with Built-In Graphics, Multimedia, and Networking Capabilities

Designing as a set the three VLSI components that provide the core functions of CPU, I/O, and graphics for the HP 9000 Model 712 workstation balanced performance and cost and simplified the interfaces between components, allowing designers to create a system with high performance at a low cost.

## by Roger A. Pearson

Designing a workstation entails defining various functional blocks to work together to provide a set of features at a desired level of performance at the lowest possible cost. Often, many parts of the design are leveraged from previous designs, and only new functionality is designed from scratch. This approach may save development costs, but could result in a product that is more costly to build.

When one component of the system design has performance that can't be taken advantage of, whether because of architecture limitations or other components' performance limitations, then the system design suffers by having to carry the cost of that unused performance. By designing with the total system in mind, so that all components of the design are optimized to work together with no wasted performance, cost can be minimized. The designers of the HP 9000 Series 700 Models 712/60 and 712/80 took this approach to offer a high-performance combination of graphics, multimedia, and networking capabilities at new low prices. The objectives of the new design included:

- Providing the high performance of a PA-RISC workstation at the lowest possible cost
- Improving the performance and capabilities of multimedia functions through simple extensions to the instruction set
- Enabling an extensive set of communication features through low-cost option cards
- Designing for high-volume manufacturing.

Instrumental in meeting these objectives was the decision to design three new custom VLSI chips together, as a set, to achieve new levels of price/performance for the core functions of CPU, I/O, and graphics.

#### Overview

Three new VLSI chips provide most of the functionality of the Model 712 workstation. The PA 7100LC CPU chip interfaces directly to the cache and main memory. The LASI (LAN/SCSI) chip does most of the core I/O needed for entry-level

workstations. The graphics subsystem consists of the graphics chip and the frame buffer VRAMs. All three chips communicate through the GSC (general system connect) bus. Fig. 1 shows a block diagram of the Model 712 system.

The Models 712/60 and 712/80 are very similar and differ only in their cache sizes and cache speeds and in the main system clock speeds.

## The Processor

The compute power of the Model 712 system is provided by the PA-RISC PA 7100LC processor, 1,2 which is packaged in a 432-pin ceramic PGA. The CPU design was optimized for the Model 712 and includes the following features:

- Superscalar CPU
- 1K-byte instruction buffer
- Multimedia support
- Cache control for up to 2M bytes of external cache
- ECC (error correction coding) memory controller

The clock frequencies of the Model 712/60 and the Model 712/80 are 60 MHz and 80 MHz respectively. The PA 7100LC is described in more detail in the article on page 12.

## Cache

The PA 7100LC CPU uses an external cache. An external cache allows system designers to change the size of the cache easily to meet their performance and cost goals. Furthermore, off-chip cache provides all the performance necessary, without limiting the CPU frequency.

The external cache is 64K bytes on the Model 712/60 and 256K bytes on the Model 712/80 and is logically split into equal halves for the instruction and data caches. Combining the caches saved pins on the CPU. To further reduce costs, industry-standard SRAMs (static RAMs) are used. Table I shows the SRAMs used in the Model 712 systems.

April 1995 Hewlett-Packard Journal © Hewlett-Packard Company 1995



**Fig. 1.** Block Diagram of the HP 9000 Model 712 hardware.

| Table I                                   |  |  |  |
|-------------------------------------------|--|--|--|
| Static RAMS Used in the Model 712 Systems |  |  |  |

| Static Rains Osca in the model 712 Systems |          |                     |       |          |
|--------------------------------------------|----------|---------------------|-------|----------|
| Model                                      | Function | Size                | Speed | Quantity |
| 712/60                                     | Tag      | 8K bytes            | 12 ns | 4        |
|                                            | Data     | 8K bytes            | 12 ns | 6        |
|                                            | Data     | $8K \times 9$ bits  | 12 ns | 2        |
| 712/80                                     | Tag      | 32K bytes           | 10 ns | 4        |
|                                            | Data     | 32K bytes           | 10 ns | 6        |
|                                            | Data     | $32K \times 9$ bits | 10 ns | 2        |
|                                            |          |                     |       |          |

### **Main Memory**

The main memory for the Model 712 systems has been engineered to provide high performance with industry-standard 70-ns SIMMs (single inline memory modules). Currently supported SIMMs are available in 4M-, 8M-, 16M-, and 32M-byte sizes. Four slots are available and must be filled in pairs for a maximum of 128M bytes.

The Model 712's main memory design minimizes the average cache miss penalty. The main memory controller returns double words (eight bytes, since a word is four bytes) back to the CPU. Each cache line is made up of four double words. When there is a cache miss, the one double word of the four in the cache line that was missed is referred to as the critical word. To minimize the miss penalty, the double word containing the critical word is sent back to the CPU first, followed by the remaining three double words.

Bandwidth is maximized by using fast page mode when consecutive accesses reside on the same page. This is often the case when large blocks of memory are accessed and is very common in windowed graphics systems.

## The General System Connect Bus

The general system connect, or GSC, is the local bus that connects the three VLSI devices and the optional I/O card. The GSC bus is designed to provide maximum bandwidth for memory-to-graphics transfers. The bus has 32-bit multiplexed address and data lines to minimize the number of signals. Other features of the bus include:

- Operation at half the CPU frequency (30 or 40 MHz)
- Support for 1-, 2-, 4-, 8-, 16-, or 32-byte transactions
- Central arbitration
- Parity generation and checking.

Normally, bus transactions are terminated by a turnaround state that allows drivers to be turned off before the drivers for the next transaction are turned on. To improve graphics performance, the bus supports back-to-back writes to the same device without the turnaround state. This improves throughput on transfers of large blocks of data from main memory to graphics.

During transfers from memory to I/O, it is sometimes necessary to lock the CPU out of memory (e.g., when semaphores are used). To facilitate this, the GSC bus provides a locking mechanism, which prevents the CPU from accessing memory (to service a cache miss, for example).

#### **Graphics**

The graphics subsystem consists of a graphics chip and four on-board VRAMs (video RAMs), which provide a 1024-by-768-

pixel frame buffer with a depth of eight planes at a refresh rate of 72 Hz. An optional high-resolution VRAM board increases resolution to 1280 by 1024 pixels.

The graphics chip was designed with the other system components to provide high performance at a minimal cost. For more information on the graphics chip, see reference 3 and the article on page 43.

#### **Built-in I/O**

The Model 712 features a number of built-in I/O devices that are intended to address the needs of the majority of users.

Support for these functions is provided largely by the LASI I/O VLSI chip. LASI is a highly integrated chip that provides a significant reduction in system cost and increased reliability. The chip is packaged in a 240-pin MQUAD package. The LASI chip is described in more detail in the article on page 36 and in reference 4.

The following sections briefly describe the LASI chip's built-in capabilities.

**IEEE 802.3 LAN.** LASI contains an Intel 82C596 megacell which was ported to work with HP's IC process. The LAN transceiver, which was not practical to include on LASI, is loaded on the printed circuit board. The transceiver interfaces to both the AUI (attachment unit interface) and Ethertwist media.

**SCSI.** The Model 712 uses an 8-bit single-ended SCSI interface for the optional internal hard drive and external peripherals. The SCSI-2 interface is implemented entirely within LASI through a megacell that was designed by HP and NCR. A netlist for the NCR 53C710 was imported into HP's design environment. The design was then tuned to work in HP's IC process.

By keeping the SCSI bus stub length to a minimum on the printed circuit board and on the connection to the optional internal drive, SCSI termination on the internal side is greatly simplified. Short stub lengths allow the bus to be terminated on the printed circuit board, whether the optional internal drive is present or not. This saves cost by obviating the need for special terminators which would otherwise have to be enabled or disabled (manually or electrically), depending on the presence or absence of the optional internal drive.

**Audio.** 16-bit CD-quality audio playback and record capability is provided by the audio circuitry, which consists of a Crystal Semiconductor CS4216 CODEC and supporting circuitry. The LASI chip also includes the serial interface to the CS4216. Headphone, microphone, and line-in connectors are located on the rear panel. Standard sampling rates include 8, 44.1, and 48 kHz.

**Real-Time Clock**. A real-time clock is designed into the LASI chip. Battery backup keeps time while the workstation is powered down.

**PS/2**. There are two PS/2 connectors on the rear panel that allow connection to a low-cost industry-standard keyboard and mouse. The PS/2 interface circuitry is integrated into the LASI chip.

**RS-232**. An RS-232 interface has also been designed into the LASI chip. The Model 712 buffers the signals with a MAXIM 211 to provide an RS-232 serial port. LASI buffers inbound

and outbound data with 16-byte FIFOs, at baud rates from 50 to 454 kbits/s.

**Parallel.** The LASI chip also provides a parallel port conforming to the Centronics industry standard.

**Flexible Disk Support.** A Western Digital WD37C65C flexible disk controller interfaces LASI to an optional internal personal-computer-style flexible disk drive.

**Flash EPROM.** An 8-bit bus on the LASI chip is demultiplexed by two 74CHT374 latches to provide the address and data lines necessary to address the two 128K-byte flash EPROMs that contain the boot firmware. The flash EPROMs are also used to store configuration parameters, eliminating the need for an EEPROM and its associated cost.

**I/O System Support.** LASI provides a number of miscellaneous I/O system support functions, including:

- Clock generation. LASI derives all the necessary clocks required by the I/O circuitry from the main system clock. It does so by using simple divide-by-n counters and two digital phase-locked loops.
- System arbitration support. LASI arbitrates GSC bus requests from the I/O devices within LASI, as well as from the CPU and optional expansion card.
- Interrupt support. LASI also provides and manages external interrupt capability for the various I/O devices.

#### Optional I/O

For those users who need functionality beyond that provided by the built-in I/O, the Model 712 includes two personality slots that can be configured with a variety of other I/O functions. The first of these slots is referred to as the expansion slot and includes a connection to the GSC bus. The second slot provides a connection to the serial audio stream, and is intended for telephone functions. This slot is called the telephony slot.

**Expansion Cards.** Expansion cards are optional cards that connect directly to the GSC bus to provide a variety of other I/O functions.

Since LASI has a configurable address space and can be configured as an arbitration slave, many of the expansion cards rely on a second LASI chip to implement much of their functionality.

The following optional expansion cards are provided for the Model 712:

- Second serial port. The second serial port card uses its own LASI chip and support circuitry identical to that on the system board to provide an additional RS-232 port.
- Second LAN AUI and second serial interface. This card also uses a LASI chip and circuitry similar to that on the system board to add an additional IEEE 802.3 LAN with an attachment unit interface (AUI) and a second RS-232 interface.
- X.25 and second serial interface. A Motorola 68302 multiprotocol processor interfaced to the 8-bit bus of a slave LASI provides X.25 networking to a 25-pin X.21bis port for speeds of 1.2 kbits/s to 19.2 kbits/s. The second RS-232 serial interface is implemented in the same fashion as the other cards.
- Second display. A second display can be added to the system with the second display card. This card duplicates the

April 1995 Hewlett-Packard Journal © Hewlett-Packard Company 1995



**Fig. 2.** Block diagram of the Model 712 audio and telephony circuits.

graphics functionality that is already built into the system board by replicating the graphics chip and its supporting circuitry.

- Token Ring/9000. The Token Ring/9000 card provides IEEE 802.5 LAN functionality through the use of a Texas Instruments token ring controller chip and a custom ASIC that provides the GSC interface. Unshielded and shielded twisted pair connections are provided at data rates from 4 Mbits/s to 16 Mbits/s.
- Second display and second LAN AUI/RS-232. This option combines the features of the second graphics display and the second LAN AUI/RS-232 options. Since the circuitry for this option would not fit on a single expansion slot card, some of the circuitry resides on a daughter card that is connected to the expansion slot card. The daughter card gets power and mechanical support through the telephony connector, so when this option is installed, the telephony option is not available.

**Telephony.** The telephony card installs in the telephony slot and provides two lines of telephone access. Each of the lines can be configured to support voice, data modem, or fax modem.

The system board's headset and microphone serve as the human interface for voice telephony, and an interface chip on the telephony card called XBAR links the system board's audio circuitry to the telephony functions (see Fig. 2).

This arrangement allows recording and playback during telephone conversations. It also supports digital mixing of microphone, line-in, telephone, and prerecorded audio. Caller-ID decoding is supported, as are DTMF (dual-tone multifrequency) encoding and decoding, and dual-line conferencing.

The XBAR chip serves to route information between the LASI I/O chip, the audio CODEC, and the DSP blocks in a variety of programmable ways. Data is transferred to and from the system board through two serial data paths. Two additional serial paths send and receive data to and from the DSPs. Two 8-bit parallel ports are used by the DSPs during the DSP boot process. XBAR has a few other functions, including receiving incoming phone rings and controlling phone line hook status.

Each DSP subsystem consists of an Analog Devices ADSP2101 processor and 32K by 24 bits of external 20-ns



Fig. 3. The Model 712 system board.



Fig. 4. The Model 712 system board construction.

SRAM for DSP programs and data. Each processor has two serial ports, one for XBAR and the other for the Analog Devices AD28mps01 analog front end (phone CODEC). Each phone CODEC connects to a standard two-wire telephone line through a Silicon Systems Incorporated 73M9002 data access arrangement, which provides the isolation circuitry required by communications regulatory agencies.

The telephony card is described in more detail in the article on page 69.

## **Printed Circuit Board Design**

The Model 712 system contains a single printed circuit board called the system board. Fig. 3 shows a photograph of the system board. The system board supports all the functionality of the Model 712 system except for the optional boards and peripherals.

The system board is 10 layers deep, and has 0.005-inch traces and spaces. It measures 11.4 inches by 5.6 inches and uses double-sided surface mount technology.

The board construction shown in Fig. 4 was designed with the printed circuit board vendor to ensure that the least costly materials were chosen to obtain the necessary electrical parameters. Although it is designed to exhibit specific trace impedances, the blank printed circuit board is not a controlled-impedance design, which saves cost. The finished board size is optimized to make the best use of standard subpanel sizes used by the printed circuit board vendor. Although the board does use 0.005-inch traces and spaces, these minimum geometries are used only when necessary. Whenever possible, less aggressive routing is used to help with board yield and to keep down the cost of the board.

The design of the blank printed circuit board presented a number of technical challenges and some cost-saving opportunities.

**Performance Challenges.** The clock and cache layouts presented some very special challenges in designing the printed circuit board.

Fig. 5 shows a simplified block diagram of the clock circuit used in the Model 712. All ECL circuitry is powered from the V<sub>cc</sub> supply, and all clock receivers in the VLSI are designed to operate at these shifted ECL voltage levels. This saves the cost of additional supply voltages and level translators. The master clock is first buffered, and multiple copies are routed to the receiving VLSI. This way, the delay to each device can be independently controlled to minimize clock skew and maximize system performance. Clocks are all routed on inner layers, where propagation delay is better controlled because of the trace's stripline nature. The clocks are driven as differential pairs and are routed to each other to minimize differential noise generation and susceptibility. The clock circuitry also features an interesting termination scheme. This pi-termination network is designed to approximate the same load as other more traditional termination schemes. However, it has the advantage of using zero supply current and fewer parts.

Fig. 6 shows a conceptual representation of how the cache is routed. The cache line is routed to minimize cache address drive delay. This arrangement also cuts down on the number of vias and maintains an unbroken ground plane. Address lines are routed from the CPU to the first via split on inner layers, where the impedance is close to half that of the outer layers. This is to better match the impedance of the traces on the two outer layers, which are essentially in parallel.

**EMC and EMI Control.** In addition to more traditional methods of EMC and EMI control, the Model 712 system board uses features built into the blank printed circuit board to mimic



**Fig. 5.** The clock circuit used in the Model 712 system.



Fig. 6. A conceptual representation of the cache layout in the Model 712.

the functionality of equivalent discrete designs. However, since they are built into the printed circuit board their benefits are essentially free.

Small spark gaps are placed near many of the connectors to help control ESD. These spark gaps are simply very small trace segments separated at minimum geometries to provide a shunt path for ESD energy from signal to ground.

To control RFI, the printed circuit board makes use of a number of buried capacitors. Buried capacitors are essentially small capacitors whose plates are all or part of the printed circuit board's signal or ground layers. The dielectric material of the printed circuit board serves to separate the plates of the capacitors. Each power plane is effectively bypassed to ground by placing a ground plane in close proximity to it. Furthermore, some signals are also bypassed to ground with small buried capacitors to shunt unwanted RFI energy to ground.

#### Conclusion

By taking the approach of designing from the ground up, the Model 712 hardware designers have optimized each part of the design to work together to provide outstanding performance at very low cost. Designing the VLSI components as a set balanced performance and cost and also simplified the interfaces between the devices. By building in the features wanted by most customers and making less common features available only on low-cost option boards, the system cost is minimized for most customers.

The Model 712 system performance is summarized in Table II.

| Table II  Model 712 Performance |        |        |  |  |  |
|---------------------------------|--------|--------|--|--|--|
| Specification                   | 712/60 | 712/80 |  |  |  |
| SPECint92                       | 58.1   | 84.3   |  |  |  |
| SPECfp92                        | 85.5   | 122.3  |  |  |  |
| MFLOPS(DP)                      | 12.8   | 30.6   |  |  |  |
| AIM APR II                      | 44.5   | 73.8   |  |  |  |

## Acknowledgments

This paper would not have been possible without the help of Rob Horning, Mike Diehl, Jeff Hargis, Paul Tucker, Steve Scheid, and Howell Felsenthal.

The design of the Model 712 hardware was a team effort, and many people are to be thanked for its success. Special thanks to the firmware team including Jeff Kehoe and Doug Feller, whose innovations helped keep the hardware simple. Thanks to the R&D team in Fort Collins, Colorado including Jim McLucas and James Murphy. Thanks to the R&D team in Cupertino, California including Alan Wiemann, Wayne Ashby, Sharon Ebner, Maria Lines, Danny Lu, Rob Snyder, Steve La Mar, Robert Lin, Daniel Li, Rayka Mohebbi, Pat McGuire, Jean Lundeen, Jeff Swanson, and Paul Rogers. Thanks also to the teams who designed the VLSI chips that were crucial to the success of the project including the PA 7100LC design team, the LASI design team, and the graphics chip design team, led by Paul Martin. Thanks to Spence Ure's manufacturing team for their insights. Thanks to the marketing team of Barry Crume, Steve Johnson, and Evan James for providing focus. Finally, thanks to the guidance of Cliff Loeb and Joe Fucetola, and to the vision of Denny Georg.

## References

- 1. Patrick Knebel, et al, "HP's PA7100LC: A Low-Cost Superscalar PA-RISC Processor," *Compcon Digest of Papers*, February 1993, pp. 441-447.
- 2. Steve Undy, et al, "A Low-Cost Graphics and Multimedia Workstation Chip Set," *IEEE Micro*, April 1994, pp. 10-22.
- 3. C. Dowdell and L. Thayer, "Scalable Graphics Enhancements for PA-RISC Workstations," *Compcon Digest of Papers*, February 1992, pp. 122-128.
- 4. Tom Spencer, et al, "A Workstation I/O System on a Chip," *Compcon Digest of Papers*, February 1994.