Starfire - A High Performance Enterprise Data Center Server

Horizontal Rule [Dec 97]

John Chan

Last issue, it was mentioned that an extremely powerful SUN server has been installed with the aim of replacing and enhancing the processing capacity of our Academic Unix environment. Furthermore, you might have learnt or even attended the press conference held jointly by Sun Microsystems, ASL, and CityU celebrating the launching of this machine, an UltraSparc Enterprise 10000 server. It was the first installation of this kind in Hong Kong and even in Asia. You may begin to ask how good the server is, what the basic architecture of the machine is, how it can help from the users’ and system administrators’ point of view and how it is incorporated into our existing Unix environment. Let us address each of these issues in turn.

Architecture

The UltraSparc Enterprise 10000 server, popularly known as the Starfire, is at the head of the Sun Microsystem’s Ultra Enterprise X000 family of servers. The Starfire is built around an uncompromised scalable symmetric multiprocessing (SMP) architecture. Its main components include a set of system boards, a centerplane, centerplane support boards, control boards, peripherals, and power and cooling subsystems.

The Starfire houses a group of system boards interconnected by a centerplane. A single system cabinet holds up to 16 of these system boards, each of which can be independently configured with processors, memory and I/O channels, depicted as follows:

All of these system boards is manipulated by a control board that contains the system-level logic central to all system boards. This includes the system clock generator, temperature and airflow monitoring, and an interface for the external system console which handles all diagnostics, boot, shutdown, and environmental monitoring.

At the heart of the Starfire is a newly designed centerplane system bus, called the Gigaplane-XB Interconnect. Physically, it is a circuit board with two symmetrical sides, each mounting up to eight system boards, a centerplane support board, and a control board. Its design is capitalised on inherent UltraSparc Port Architecture (UPA) advantages. UPA is Sun Microsystem’s standard I/O definition for the UltraSparc processors. A combination of improvements have been used to increase interconnect bandwidth by tenfold over high-end bus-based systems, which will increase system throughput and reduce memory latency. Amongst these include: separation of address and data bus paths, 16 bytes datapath width, 16 separate datapaths allowing a separate connection to each system board, and use of point-to-point wires instead of multi-drop buses.

Domain Concept

The Starfire is unique in bringing mainframe-style partitioning capabilities to the Unix world with its introduction of the Dynamic System Domains, allowing a single Starfire to be logically divided into multiple systems or domains. Each domain appears as a standalone, self-contained Sparc system with its own operating system and network connection. Domains are implemented through special capabilities of the Starfire hardware and Solaris software, and are configured and controlled by a single system console (SSP). Domains can be created and deleted without interrupting the operation of other domains. Each domain can be administered and managed as though no other domains are present. Any number of system boards can be placed in any desired domain, and system boards can be added or removed from domains while the operating system is running, as long as sufficient resources still remain in the domain. The domains share the Gigaplane-XB Interconnect but are isolated from each other. Any software error or CPU, memory, or I/O error incurred in one domain will not affect any other domains. An extension of the domain feature is the ability of setting up domain groups, in which separate domains share some or all of the memory within each domain.

The primary benefits and use of dynamic system domains include:

RAS Features

Even the most powerful system is useless if it is not highly reliable, available, and serviceable. The Starfire addresses this issue by offering industry-leading RAS (reliability, availability, serviceability) features. The reliability features prevent failures from happening. The availability features keep the system operating in spite of failures. The serviceability features help correct failures quickly after they have occurred. Some of the key features include: the system detects data integrity problems and corrects them when it can; all components in the system may be configured redundantly; most hardware maintenance can be performed via hot swap without powering down the system; termperature-controlled variable speed internal fans ensure that airflow adapts to the actual thermal conditions within the system. These are just a few. Furthermore, redundancy options are available for disk drives and disk controllers.

The console functions for the Starfire are carried out by the System Service Processor (SSP). This is a Solaris workstation specifically configured for the Starfire and is connected to it via a dedicated network. The detachment of the console from the Starfire itself and the use of the dedicated network allows multiple console sessions on any client that is connected to the same dedicated network. This provides several important capabilities, including remote administration and remote power control of the Starfire. Besides running Solaris, the SSP runs several other software that comes along with the Starfire. Amongst these are the Hostview, POST, and the Netcon. The Hostview is an enhanced graphical menu that eases the system administrator with the management of the Starfire’s hardware, domains, and dynamic reconfiguration functions. The Power On Self Test (POST) provides a comprehensive diagnostic check on every system components and make sure each is properly functioning before allowing them to be configured into the system. It can also be used to test a replacement component before reattaching it to the running system. The Netcon allows remote console session as stated above. All of these add on to the RAS features by allowing errors to be detected beforehand and minimising the system downtime during which failed components are detached and then reattached later on.

Scalability

Widespread use of more powerful workstations and networked personal computers is creating an increased burden on servers. An effective solution is symmetric multiprocessing (SMP) that commits additional system resources as needed. The Starfire Gigaplane-XB Interconnect, a true point-to-point crossbar interconnect, provides both the overall system performance and the throughput necessary to support a large number of processors, and to achieve near-linear SMP scalability. Using this high-performance interconnect (12.8 Gbytes/sec memory bandwidth, 6.4 Gbytes/sec peak I/O bandwidth), the Starfire is able to maintain a delivered bandwidth (10 Gbytes/sec) close to its peak, and a nearly constant, very low latency (500 nanoseconds) throughout its range of scalability.

Service Redistribution

With the installation of the Starfire, most of the Academic Unix environment have been shifted to this system. Furthermore, existing applications on other platforms and newly developed services will gradually be migrated to this server as well. General users will find this most beneficial as they can enjoy more reliable services, faster response time, and minimal downtime. Academic users will find that they can now run their programs, or conduct their research work on configurations which would otherwise not be affordable on the older machines. In particular, the large capacity of disk space that is configured to the Starfire will add much value to the Academic group who normally requires huge amount of disk space for their teaching and research projects.

The current configuration of the Starfire includes four system boards, sixteen 250MHz UltraSparc processors, 2Gbytes of memory, and around 130Gbytes of disk storage. It has been configured into 3 domains. Domain One has eight processors, 1Gbyte of memory, and around 90Gbytes of disk storage. This is mainly used to replace the Academic Unix teaching and research environment that was used to be offered by the Sparc Datacenter 2000 and Sparcserver 1000. A number of academic applications such as compute-intensive applications, statistical analysis, scientific analysis, numerical computation, database manipulation, and language compilers are provided. Another aspect is the provision of the Web server for the personal homepages of all staff and students. Each staff and student is given a login account on this domain with at least 5Mbytes of disk quota. Domain Two has four processors, 512Mbytes of memory, and about 25Gbytes of disk storage. This acts as the Intranet and network server supporting the Mail service, the DNS name service, the NFS file service and the remote access or CityLink Plus service. These services were used to be offered by an obsolete SUN Sparcserver 690MP and a much slower SUN Sparcstation 4. Besides, the infrastructure of the existing information services is in the process of being rebuilt using the Intranet technology. The Intranet environment will enable the incorporation of new applications such as multimedia teaching support and will provide a uniform and integrated environment for teaching, learning and administration. A server with reasonable capacity is essential for the support of a campus-wide Intranet. Domain Two together with Domain Three (having the same configuration as Domain Two) have been set up for this purpose. Some of the services that will be available very soon include the Help Desk System, Staff Leave Balance System, the Computer Shop Buyer’s Guide and the Student Record System (Banner). Due to the nature of Domains Two and Three, users will not be able to log into these systems.

Future expansion

Given the current Starfire platform configuration, two expansion paths can be taken in order to increase the resources: by adding more capacity to the existing domains, and by adding two more domains (maximum five at the time of writing). It is still a long way before the maximum allowable resources will reach to the upper limit of 64 processors and 64 Gbytes of memory. To cater for the growth of Intranet services, research work and the personal web, it will be necessary that both approaches be taken so that the Starfire machine can provide the best services to all users.


[Issue No. 13]


[u logo]
Computing Services Centre
City University of Hong Kong
ccnetcom@cityu.edu.hk

[Home Page][CSC Home][NetComp Home][Content Home][Previous Page][Next Page]