Upgrade Strategy for the CityU’s SAN Infrastructure

by John Chan

The need for the SAN Storage Upgrade

Following the established policy of storage consolidation, a Network Storage System (NSS) based on EMC storage subsystem was first installed in 2001 as the standard platform for providing central and shared storage for the entire University. Since then, this large SAN system has become an indispensable and essential part of the University’s IT infrastructure, especially for all of the major mission-critical services. As the data storage requirements grew over the years, the CSC decided in June 2004 to adopt a more cost effective approach by introducing a two-tier enterprise storage subsystem approach over the traditional single-tier that was installed initially. With that in place, a more flexible storage growth rate can be anticipated. The Tier-1 storage is mainly used for mission critical services that require very high throughput rate in terms of performance and extremely reliable. The Tier-2 storage with sufficient performance and high availability set up will then be used for the other important yet slightly less critical central IT services. Using this approach, more and more services can be adopted in using the central storage while maintaining the cost to an acceptable level.

Since then, storage usage in both Tiers has increased tremendously with the Tier-1 by 20% while the Tier-2 by as much as 400%. While at the same time, the model for each Tier has reached its End of Life stage, meaning that no more additional storage can be purchased.

The upgrade approach

Besides increasing the storage capacity, interoperability, technology, and performance must be well balanced, in order to provide a seamless upgrade to all services on using the new storage. Besides the two tiers storage subsystem, the SAN infrastructure consist of the Celerra NAS gateway which serves as the File Sharing Engine for the whole staff LAN, and also consist of the Disk Library and Legato Backup software which acts as the Central Backup Engine for all services, regardless of the storage. All these components must be well integrated. Thus in order not to make the upgrade very complicated and very time consuming, and not affecting any of these components, first of all the tier approach must be preserved, and secondly, similar models in each tier are to be used. Thus it was decided that the Symmetrix family and the CLARiiON family will still be used for the Tier-1 and Tier-2 storage subsystem in the upgrade.

The Tier-1 storage consists of the Symmetrix DMX3-950 subsystem. This system, with much better design and advanced features, and much faster processors, provides a nearly three-folded performance gain over obsolete DMX-1000, with doubling the fiber channel throughput, faster and bigger disk drives, faster and more cache memory, and tripling the maximum disk slots.

The Tier-2 storage consists of the CLARiiON CX3-80 subsystem. Again, this system with better design, and faster processors, provides 50% more performance gain over the obsolete CX700, with doubling the fiber channel throughput, faster and bigger disk drives, doubling the cache memory, and doubling the maximum disk slots.

The migration

The real challenge of the migration is to minimize interruption to each service as much as possible. Since that involves a lot of services or servers, it is not as easy as it seems to be. To achieve that, all components must be converted to the “compatible” level between each other, involving the firmware code level of the storage subsystems, the HBA driver version of each server, and the OS version of each server. Fortunately, bringing all these to the acceptable levels do not involve much interruption to each service. Adding to the fact that most of the servers have been kept up with newer hardware and software, this preliminary process is made much easier for achieving the required levels. When this is ready, it is just a matter of scheduling for the storage migration for each service one at a time. Choosing this host-based migration approach over the storage-based migration will minimize the impact to each service and will make the troubleshooting much simpler since the storage subsystems need not to be converted all at once. Furthermore, any interruption to each service, if needed, can be arranged according to the peak period and usage requirement of the service. At this point, several different storage migration methods can be chosen for each server, namely, the Volume Manager Mirroring, the EMC MirrorView Data Synchronization, and the File System Copy methods were used. Depending on the requirement of the server, most of these methods will have no or very little interruption to the service. The last method was used on a couple of services that require conversion between different storage subsystems. This method requires much longer interruption since the migration needs to be done through the host. The whole migration process was completed in a two-month time span.

It is anticipated that a more advanced class of storage subsystem will be introduced in the next three to four years, which would be a natural replacement for the subsystem in each tier. With all these advantages, this replacement strategy will continue in the future by making use of the new technology, with reduced total cost of ownership and minimum management effort, and greatly boosting the performance for each critical service provisioning.