Upgrade
Strategy for the CityU’s SAN Infrastructure
By
John Chan
|
|
|
The need for the SAN Storage Upgrade
Following the established policy of storage consolidation, a
Network Storage System (NSS) based on EMC storage subsystem was
first installed in 2001 as the standard platform for providing
central and shared storage for the entire University. Since then,
this large SAN system has become an indispensable and essential
part of the University’s IT infrastructure, especially for all
of the major mission-critical services. As the data storage requirements
grew over the years, the CSC decided in June 2004 to adopt a more
cost effective approach by introducing a two-tier enterprise storage
subsystem approach over the traditional single-tier that was installed
initially. With that in place, a more flexible storage growth
rate can be anticipated. The Tier-1 storage is mainly used for
mission critical services that require very high throughput rate
in terms of performance and extremely reliable. The Tier-2 storage
with sufficient performance and high availability set up will
then be used for the other important yet slightly less critical
central IT services. Using this approach, more and more services
can be adopted in using the central storage while maintaining
the cost to an acceptable level.
Since then, storage usage in both Tiers has increased tremendously
with the Tier-1 by 20% while the Tier-2 by as much as 400%. While
at the same time, the model for each Tier has reached its End
of Life stage, meaning that no more additional storage can be
purchased.
The upgrade approach
Besides increasing the storage capacity, interoperability, technology,
and performance must be well balanced, in order to provide a seamless
upgrade to all services on using the new storage. Besides the
two tiers storage subsystem, the SAN infrastructure consist of
the Celerra NAS gateway which serves as the File Sharing Engine
for the whole staff LAN, and also consist of the Disk Library
and Legato Backup software which acts as the Central Backup Engine
for all services, regardless of the storage. All these components
must be well integrated. Thus in order not to make the upgrade
very complicated and very time consuming, and not affecting any
of these components, first of all the tier approach must be preserved,
and secondly, similar models in each tier are to be used. Thus
it was decided that the Symmetrix family and the CLARiiON family
will still be used for the Tier-1 and Tier-2 storage subsystem
in the upgrade.
The Tier-1 storage consists of the Symmetrix DMX3-950 subsystem.
This system, with much better design and advanced features, and
much faster processors, provides a nearly three-folded performance
gain over obsolete DMX-1000, with doubling the fiber channel throughput,
faster and bigger disk drives, faster and more cache memory, and
tripling the maximum disk slots.
The Tier-2 storage consists of the CLARiiON CX3-80 subsystem.
Again, this system with better design, and faster processors,
provides 50% more performance gain over the obsolete CX700, with
doubling the fiber channel throughput, faster and bigger disk
drives, doubling the cache memory, and doubling the maximum disk
slots.
The migration
The real challenge of the migration is to minimize interruption
to each service as much as possible. Since that involves a lot
of services or servers, it is not as easy as it seems to be. To
achieve that, all components must be converted to the “compatible”
level between each other, involving the firmware code level of
the storage subsystems, the HBA driver version of each server,
and the OS version of each server. Fortunately, bringing all these
to the acceptable levels do not involve much interruption to each
service. Adding to the fact that most of the servers have been
kept up with newer hardware and software, this preliminary process
is made much easier for achieving the required levels. When this
is ready, it is just a matter of scheduling for the storage migration
for each service one at a time. Choosing this host-based migration
approach over the storage-based migration will minimize the impact
to each service and will make the troubleshooting much simpler
since the storage subsystems need not to be converted all at once.
Furthermore, any interruption to each service, if needed, can
be arranged according to the peak period and usage requirement
of the service. At this point, several different storage migration
methods can be chosen for each server, namely, the Volume Manager
Mirroring, the EMC MirrorView Data Synchronization, and the File
System Copy methods were used. Depending on the requirement of
the server, most of these methods will have no or very little
interruption to the service. The last method was used on a couple
of services that require conversion between different storage
subsystems. This method requires much longer interruption since
the migration needs to be done through the host. The whole migration
process was completed in a two-month time span.
It is anticipated that a more advanced class of storage subsystem
will be introduced in the next three to four years, which would
be a natural replacement for the subsystem in each tier. With
all these advantages, this replacement strategy will continue
in the future by making use of the new technology, with reduced
total cost of ownership and minimum management effort, and greatly
boosting the performance for each critical service provisioning.