Enhancing Central Storage Infrastructure

by John Chan
 
 
Why Upgrade?
 
Since the establishment of City University, the Computing Services Centre (CSC) was charged with the full responsibility of providing all Central IT services to the University to support all its administrative, academic and research works. Since then, the CSC has followed closely with the IT Industry standards and trends, so as to provide the most robust and cost-effective solutions for providing the services. Many of these methodologies were studied and adopted, among these included the use of the full function and mature server and network virtualization techniques, as well as the adoption of the Central SAN Storage Provisioning (which can also be viewed as a disk virtualization technique). These are two of the most important technologies nowadays in the Enterprise Class IT Implementation. Without these, it will be impossible to provide effective and efficient resource management to an Enterprise organization such as our University. And to date, almost all of the University critical services were built on top of these two technologies.
 
With these two technologies, full-blown Virtualization features can be utilized when they are integrated. Hence, it is well-justified to say that an enterprise class Central SAN Storage Provisioning Architecture (CSA) must be present since this is the most basic building block upon which various services will be built.
 
The first CSA was built in 2001 based on EMC storage subsystems. Since then, these subsystems have gone through two major generations. Storage Tiering technology was adopted in between, and a two-tier storage infrastructure was used. The Tier-1 storage, engineered by much faster and more reliable disks, was used to serve mission critical services. All other services were served by the Tier-2 storage, provided by slower and less expensive disks.
 
Owing to the restructuring of the University, especially with the 3-3-4 curriculum implementation, more and more IT services are demanding large amount of storage provisioning. To name a few, the e-Learning service, the Enterprise DMS service, the Central SharePoint service, the Departmental Shared Drives and Servers services, the Central Virtualization Infrastructure service, and the Private Cloud service.
 
However, the storage subsystems upgraded/incremented over time have been used for over 7 years, it would not be cost-effective to buy and add more storage to the aging subsystems. New generation subsystems must be used so as to take the CSA to the next level of high quality provisioning.
 
 
Major Components Bought
 
The enhanced CSA consists of several major components, namely, the SAN Fiber Fabrics products, the Storage subsystems products, the SAN Protection products, and the Backup & Recovery subsystems products.
 
The new SAN Fiber Fabrics is an extension of the existing fabrics using enhanced enterprise-class SAN switches and advanced networking protocols. OM4 cables are used for connecting the various major components as well as all the servers and devices that will be using the CSA.
 
The Storage subsystems consist of three major components serving different functions. The advanced SAN storage subsystem with initial 140TB capacity will be a general purpose storage subsystem for all services while the advanced NAS subsystem with initial 48TB capacity will particularly serve those services that require the NAS technology. Finally, the advanced high availability (HA) subsystem allows Storage Virtualization across various Data Center sites.
 
The SAN Protection products consist of the Replication Manager as well as the RecoverPoint series products allowing high performance data replication and data protection capabilities.
 
The Backup & Recovery subsystems consist mainly of the Networker Backup software, the Networker Backup servers, the Data Domain Deduplication Backup devices with an initial 22TB usable capacity as well as the advanced backup device with an initial 7TB usable capacity. These subsystems allow faster data backup and recovery as well as reduce storage space for holding multiple copies of the same data. 
 
Highlighted Features of Each Component
 
The SAN Fabrics
 
The new generation DS series SAN switches can support up to 16Gbps speed. However, they are backward compatible with lower speed devices ranging down from 16Gbps to 2Gbps. They can also deliver very high aggregate bandwidth, up to 768Gbps end-to-end full duplex data traffic, and have very low-energy consumption.
 
Advanced SAN Storage Subsystem
 
The subsystem is the new state-of-the-art storage sub-system targeted for all types of services, especially for mission critical services that require very high throughput rate in terms of performance and extremely reliable storage media for offering the services. By incorporating the FAST VP and Flash technology, it can provide multi-tiered storage, and boost up the performance of the response time in multi-folds. For instance, with 4.2% fewer disk drives, these new technologies can improve the response time by 39% with a 40% savings in operating costs.
 
Using this built-in multi-tiered storage technology, a 3-Tiering can be set up using SSD, FC and SATA-II types of disk. Tiering will be based on the speed of the disks. For instance, the 3-Tiering can comprise of an Extreme Performance Tier using SSD disks, a Performance Tier using FC disks, and a Capacity Tier using SATA-II disks. The subsystem has the intelligence of allocating the appropriate tiered storage based on the service needs at a particular point in time. This is a remarkable advance in the Storage Tiering technology.
 
The subsystem is designed for a 100% virtually provisioned environment. Virtual Provisioning presents a host, application, or file system with more storage than is actually physical allocated. It also provides “always-on” capability for uncompromising information availability. This is especially essential for mission critical applications.
 
Advanced HA Subsystem
 
Basically, the subsystem delivers storage federation and virtualization. It creates a single point of control for a consolidated pool of federated resources and turns a diverse heterogeneous storage infrastructure into a centrally managed storage pool. Besides simplifying the management, it allows non-disruptive and transparent data mobility and high availability across heterogeneous storage arrays.
 
The subsystem enables the sharing of resources at two different data centers by allowing the creation of an Active/Active environment between them. It allows data volumes to be configured for simultaneous access by applications in two different locations; thereby enables very efficient and balanced resources sharing and relocation. Interruption to the application is reduced to the minimal or even none by providing high availability between the two sites in case of planned and unplanned events.
 
Advanced NAS Subsystem
 
The subsystem is the new state-of-the-art unified storage subsystem targeted for all types of services, especially those that require File Services (NAS). Its modular architecture integrates hardware components for block, file, and object with concurrent support for native NAS, ISCSI, Fibre Channel, and FCoE protocols. Similar to the new SAN storage subsystem, by incorporating the FAST VP and Flash technology, the new NAS subsystem can provide multi-tiered storage, and boost up the performance of the response time in multi-folds. Using this technology, it can support Auto-Tiering of NAS volumes.
 
The subsystem is designed to deliver scalable performance using the MCx (multi-core optimization) technology. It can also balance the workload across all disks even if there is only one storage tier. This is especially beneficial to the NAS volume whose workload will be balanced across all disks with the same NAS Pool.
 
The subsystem provides advanced built-in capacity optimization features, such as block-level deduplication and compression, as well as file system deduplication. It also supports virtual provisioning that enables much storage space saving.
 
 
How Will Central IT Services Benefit?
 
The new products with the above enhanced major features are most beneficial to the University in the following areas:
 
Faster and more reliable services
 
General services using the new SAN storage subsystem will benefit most from the 3-Tiering technology since the 3-Tiering will be even more granular than the 2-Tiering provided by the two older storage subsystems before the upgrade, and can be dynamically allocated and switched. For instance, storage allocated to the AIMS server can all be fixed at a certain tiered level or be set as dynamic based on the current performance needs. When the latter is chosen, during student registration, all storage will be automatically switched to the Extreme Performance Tier to boost up the data throughput for the AIMS service.
 
Besides the above, the new SAN storage subsystem has many other features that are crucial to the CSC in providing the Central Storage. In terms of performance, the current configuration comes with one processing engine and 96GB Cache, while the maximum can go up to 4 engines and one TB Cache respectively. Again, this large scalability range is crucial, especially for those critical services. Furthermore, the new SAN storage subsystem has a performance boost of 2 to 2.3 times over the two old storage systems we are currently using.
 
Active/Active Multiple Data Center sites enabled
 
The new HA subsystem basically extends the Central Virtualization Infrastructure to more than one data center site. With this in place, it is now possible to transparently move and relocate active Virtual Machines (VM) across different sites as deemed necessary.
 
The University as well as the External Auditors require the CSC to implement a Business Continuity Plan so that the CSC can respond quickly to a crisis or disaster that may destroy or severely cripple the University’s IT Systems, especially those mission critical services defined by the University, whose disruption will have severe impact on the livelihood of the University stakeholders, service providers or suppliers. These services include the AIMS, Blackboard e-Learning, and the Central DMS services. With the installation of the new HA subsystem, together with the server clustering, it is now possible to provision a service across two different data center sites. Whenever there is a server failure or disaster occurring in one site, the service will still be provided by the server in the mirror site, due to the data mirroring or real-time data synchronization between the two sites that are fully provided by the new HA subsystem. No service interruption and data recovery is necessary.
 
Cater for more storage-demanding services
 
New service such as the Private Clouds and existing services such as Central Enterprise DMS, Central Virtualization Infrastructure, and the e-Learning, etc. rely heavily on the availability of the storage provisioning. In particular, the new SAN storage subsystem configuration comes with 204 disks for an initial capacity of 140TB, while the maximum capacity can be as many as 1560 disks, with different sizes and types, providing up to 1.5PB of usable storage capacity in one storage array. This large capacity for growth is very essential in providing the Private Clouds as the storage requirement for this service can be very huge and unpredictable, and needs to be dynamically allocated and de-allocated in a very random and spontaneous fashion.
 
Departmental servers as well as many other Central Web services rely heavily on the use of File Service. Using the new NAS subsystem, its initial configuration comes with 58 disks for an initial capacity of 48TB, while the maximum capacity can go up to 750 disks, with different sizes and types, providing up to 3.0PB of raw capacity. With this large capacity for growth, it will ease a lot on the storage contention between departments on the establishment of their departmental shared drives. On top of that, more generous quota provision can be provided to departments, which will benefit a lot to the department’s daily administration and/or academic activities.
 
Environmental boost
 
The new SAN storage subsystem was designed with as much Green Technology as possible. With its multi-core processors, it delivers more IOPS in a smaller footprint, for lower energy costs. Using FAST VP, performance and cost requirement for different workloads can be matched all in the same storage array, saving energy and resulting in capital and operational cost savings. Comparing the new SAN storage subsystem with the two older models combined, the Power Consumption, Heat Dissipation, GHG Emissions, and rack space occupation by the former is about one-third of the latter. The annual energy cost for the former is around HKD80,598, while for the latter is around HKD224,583, which is a significant cost saving annually.
 
Similarly, comparing the new NAS subsystem with older models, the Power Consumption and rack space occupation efficiency occupied by the former  is around one-seventh and one-fourth respectively of the latter.
 
Faster backup and recovery
 
Coupled with the Central SAN Storage for the entire University was the setup of the Central Backup facility that provides backup and restore solutions for the SAN. Several generations of backup devices have been implemented over the years, with the latest being the use of Data Deduplication (DD) technology.
 
Basically, DD works by examining the files or volumes to be backed up. Normally, if this technology is not deployed, traditional backup will create a significant amount of duplicate data within a file, even if it is an incremental backup of backing up only the modified files. However, with the new technology, the deduplication algorithm will analyze the data and store or backup only the unique changed elements of a particular file. This process will provide an average of 10 to 30 times reduction in the resulting storage capacity. This means that, for example, a normal backup of 10GB to 30GB of data will be reduced to 1GB using the new technology that will be kept in the DD backup device.
 
The DD device acquired in this upgrade boosted the backup speed by 42% with 3 times more backup capacity.
 
Difficulties Encountered
 
Since this is a major revamp of the whole CSA, there are several major obstacles that need to be overcome.
 
Even though all the major new components occupy less data center rack space, due to the congestion of the existing old data center, it will not be possible to buy and install these components without first adding more data center space. Fortunately, a new data center expansion was approved and configured just in time for installing these new devices.
 
In order to use the new storage, it is necessary to migrate the existing data from the old and existing storage subsystems to the new ones. And this has to be done for each server that has been using the CSA. This is not an easy and obvious task as there will be a lot of compatibility issues such as the server software, driver versions, old storage subsystems firmware update, etc.
 
To implement Active/Active Data Center site, it is required to link up those multiple sites using fiber optics cables. Several networking components have to be implemented first, including re-designing the backup network, the management network, and the service network.
 
 
Time Frame
 
We expect that the whole upgrade exercise will span for 2 years, starting from 2013. Fortunately, the above difficulties can be resolved at various stages during the process. Furthermore, due to the complexity of the process, the upgrade will be implemented in phases, with the provision of more and new storage being the highest priority. This has been released in the second quarter of this year. Other new components will be ready in due course.