The Definitive Guide To Building Highly Scalable Enterprise File Serving Solutions

tm
The Definitive Guide To
Building Highly Scalable

Enterprise File Serving
Solutions
Chris Wolf
Introduction
Introduction to Realtimepublishers
by Sean Daily, Series Editor
The book you are about to enjoy represents an entirely new modality of publishing and a major
first in the industry. The founding concept behind Realtimepublishers.com is the idea of
providing readers with high-quality books about today’s most critical technology topics—at no
cost to the reader. Although this feat may sound difficult to achieve, it is made possible through
the vision and generosity of a corporate sponsor who agrees to bear the book’s production
expenses and host the book on its Web site for the benefit of its Web site visitors.
It should be pointed out that the free nature of these publications does not in any way diminish
their quality. Without reservation, I can tell you that the book that you’re now reading is the
equivalent of any similar printed book you might find at your local bookstore—with the notable
exception that it won’t cost you $30 to $80. The Realtimepublishers publishing model also
provides other significant benefits. For example, the electronic nature of this book makes
activities such as chapter updates and additions or the release of a new edition possible in a far
shorter timeframe than is the case with conventional printed books. Because we publish our titles
in “real-time”—that is, as chapters are written or revised by the author—you benefit from
receiving the information immediately rather than having to wait months or years to receive a
complete product.
Finally, I’d like to note that our books are by no means paid advertisements for the sponsor.
Realtimepublishers is an independent publishing company and maintains, by written agreement
with the sponsor, 100 percent editorial control over the content of our titles. It is my opinion that
this system of content delivery not only is of immeasurable value to readers but also will hold a
significant place in the future of publishing.
As the founder of Realtimepublishers, my raison d’être is to create “dream team” projects—that
is, to locate and work only with the industry’s leading authors and sponsors, and publish books
that help readers do their everyday jobs. To that end, I encourage and welcome your feedback on
this or any other book in the Realtimepublishers.com series. If you would like to submit a
comment, question, or suggestion, please send an email to feedback@realtimepublishers.com,
leave feedback on our Web site at http://www.realtimepublishers.com, or call us at 800-509-
0532 ext. 110.
Thanks for reading, and enjoy!
Sean Daily
Founder & Series Editor
Realtimepublishers.com, Inc.
i
Table of Contents
Introduction to Realtimepublishers.................................................................................................. i
Chapter 1: Moving Beyond Current File Serving Philosophies ......................................................1
State of the World ............................................................................................................................1
Performance Challenges ......................................................................................................1
Management Challenges......................................................................................................1
Availability Challenges........................................................................................................2
Growth of Managed Data.....................................................................................................2
Today’s File Serving Landscape..........................................................................................2
Standalone Servers...................................................................................................3
DFS ..........................................................................................................................5
NAS Appliances.......................................................................................................7
Failover Clusters ......................................................................................................8
Cluster Architecture .................................................................................................9
Shared Data Clusters..............................................................................................10
Current Storage Architectures........................................................................................................14
SCSI ...................................................................................................................................14
SATA .................................................................................................................................15
FC and SANs .....................................................................................................................16
Switches and Hubs.................................................................................................17
Router.....................................................................................................................17
FCIP and iFCP ...................................................................................................................18
iSCSI ..................................................................................................................................18
Clustered File Serving Gaining Momentum ..................................................................................19
High Availability ...............................................................................................................19
Consolidation Advantages .................................................................................................20
Drive Toward Standardization...........................................................................................20
Summary ........................................................................................................................................21
Chapter 2: Taming Storage Growth—A Modern Perspective.......................................................22
Current Storage Problems ..............................................................................................................22
Availability ........................................................................................................................22
Growth ...............................................................................................................................23
Management.......................................................................................................................23
Expanding Backup Windows.............................................................................................23
ii
Table of Contents
Existing Storage Solutions.............................................................................................................24

SAN....................................................................................................................................24
NAS Filers .........................................................................................................................24
DFS ....................................................................................................................................25
Virtualization .....................................................................................................................25
Storage Virtualization ............................................................................................25
Server Virtualization..........................................................................................................32
Virtual Machines....................................................................................................32
Shared Data Clusters..............................................................................................33
Comparing Virtual Machines and Shared Data Clusters .......................................34
Examining Unappliance vs. Appliance Solutions..........................................................................35
Proprietary vs. Open Solutions ..........................................................................................36
Volume Economics............................................................................................................37
Integration with Existing Infrastructure and Investments..................................................37
The Scalability Dilemma ...................................................................................................37
Backup Challenges.............................................................................................................38
Taming Server and Storage Growth—the Non-Proprietary Approach..........................................40
Storage Consolidation via SAN .........................................................................................40
Server Consolidation via Clustering ..................................................................................41
Planning for Growth While Maintaining Freedom........................................................................41
Summary ........................................................................................................................................42
Chapter 3: Data Path Optimization for Enterprise File Serving ....................................................43
The Big Picture of File Access ......................................................................................................43
Availability and Accessibility........................................................................................................44
Redundant Storage .........................................................................................................................45
RAID Levels ......................................................................................................................45
RAID 0...................................................................................................................45
RAID 1...................................................................................................................46
RAID 5...................................................................................................................47
RAID 0+1 ..............................................................................................................47
RAID 1+0 ..............................................................................................................49
RAID 5+0 ..............................................................................................................49
Hardware vs. Software RAID ............................................................................................51
iii
Table of Contents
Hardware RAID .....................................................................................................51

Software RAID ......................................................................................................52
Redundant SAN Fabrics ................................................................................................................53
Elements of the Redundant SAN .......................................................................................53
Managing the Redundant SAN ..........................................................................................54
Redundant LANs ...........................................................................................................................55
Redundant Power ...........................................................................................................................56
Redundant Servers .........................................................................................................................57
Shared Data Clusters..........................................................................................................57
Failover Clusters ................................................................................................................57
Proprietary Redundant Servers ..........................................................................................58
Eliminating Bottlenecks.................................................................................................................58
Architectural Bottlenecks...................................................................................................60
Single NAS Head...................................................................................................60
Single File Server...................................................................................................60
Load Balancing ..............................................................................................................................61
Managing the Resilient Data Path..................................................................................................61
Summary ........................................................................................................................................63
Chapter 4: Building High-Performance, Scalable, and Resilient Windows File Serving Solutions64
Managing High-Performance and Availability Across a Windows Infrastructure........................64
VDS....................................................................................................................................64
VSS ....................................................................................................................................66
Shadow Copies for Shared Folders....................................................................................67
Shadow Copies for Shared Folders Basics ............................................................68
Enabling Shadow Copies for Shared Folders Support...........................................68
Recovering Previous Versions of a File.................................................................72
Enhanced Storage and File Serving Support .....................................................................74
Multipath I/O Support............................................................................................74
STORport Driver Support......................................................................................74
iSCSI Support ........................................................................................................74
Improved Offline Files Support .............................................................................75
The Microsoft Approach to High-Availability File Serving..........................................................79
MSCS.................................................................................................................................79
iv
Table of Contents
DFS ....................................................................................................................................80
AD Integration ...................................................................................................................81
Commercial File Serving Solutions ...............................................................................................81
PolyServe NAS Cluster......................................................................................................81
Symantec Cluster ...............................................................................................................82
Current Trends in Windows File Serving ......................................................................................82
Benefits of Consolidation ..................................................................................................82
Benefits of Shared Storage.................................................................................................83
Deploying Enterprise-Class Windows File-Serving Solutions......................................................83
Pre-Deployment Considerations ........................................................................................83
Validating Server and Storage Requirements ....................................................................84
Summary ........................................................................................................................................85
Chapter 5: Building High-Performance, Scalable, and Resilient Linux File-Serving Solutions...86
Challenges Facing the Linux File-Serving Landscape ..................................................................86
Performance .......................................................................................................................87
Scalability ..........................................................................................................................87
Availability ........................................................................................................................88
Integration ..........................................................................................................................88
Existing Linux File-Serving Solutions...........................................................................................89
Standalone..........................................................................................................................89
NAS....................................................................................................................................89
DFS ....................................................................................................................................90
Clustered ............................................................................................................................90
Failover Clustering.................................................................................................91
Load-Balanced Clustering .....................................................................................92
LVS Architecture ...............................................................................................................93
LVS via NAT.........................................................................................................94
LVS via IP Tunneling ............................................................................................94
LVS via Direct Routing .........................................................................................94
Commercial File-Serving Solutions...............................................................................................95
PolyServe NAS Cluster......................................................................................................95
VERITAS Cluster ..............................................................................................................97
Red Hat Cluster Suite and Global File System..................................................................97
v
Table of Contents
Deploying Performance-Based Scalable Linux File-Serving Solutions........................................98

Pre-Deployment Considerations ........................................................................................98
Server Sizing......................................................................................................................98
Storage Sizing ..................................................................................................................100
Managing Enterprise-Class Linux File Serving...........................................................................101
NFS ..................................................................................................................................101
What Is New in NFS v4? .....................................................................................102
NFS Setup Checklist ............................................................................................102
Samba...............................................................................................................................104
What Is Coming in Samba 4.0? ...........................................................................105
Samba Deployment..............................................................................................105
Current Trends in Linux File Serving..........................................................................................106
Migration from UNIX to Linux .......................................................................................106
Benefits of Consolidation ................................................................................................107
Storage Consolidation......................................................................................................107
Summary ......................................................................................................................................108
Chapter 6: Managing High-Performance, Scalable, and Resilient Data Across the Enterprise ..109
Challenges Facing Heterogeneous Networks ..............................................................................109
Inhibited Agility...............................................................................................................110
Complexity.......................................................................................................................110
Integration Concerns........................................................................................................110
IT Risk and Compliance Considerations .........................................................................110
Integrating Windows and Linux File-Serving Solutions .............................................................111
CIFS and NFS Integration ...............................................................................................111
Managing ACLs...............................................................................................................112
Integration with Existing Services ...................................................................................113
Backup and Recovery ..................................................................................................................113
Disaster Planning Essentials ............................................................................................114
Development ........................................................................................................115
Disaster Planning Roles .......................................................................................115
Traditional Backup Methodologies..................................................................................116
Snapshots .........................................................................................................................116
Server-Free Backups........................................................................................................117
vi
Table of Contents
Server-Less Backups........................................................................................................118
Archiving and Migration..................................................................................................119
Successful Backup Architectures.....................................................................................120
D2T ......................................................................................................................121
D2D......................................................................................................................121
D2D2T .................................................................................................................121
Benefits of Share-Data Approaches.................................................................................123
Comparison: Consolidated vs. Distributed Backup Architectures ..................................123
Distributed Approach...........................................................................................124
Consolidated Approach........................................................................................125
Data Recovery..................................................................................................................126
The Advantages of Freedom........................................................................................................127
Benefits of Avoiding Proprietary Solutions.....................................................................127
Uncapped Scalability and Performance ...........................................................................128
Architecture Flexibility....................................................................................................128
Freedom of Choice...........................................................................................................128
Summary ......................................................................................................................................129
vii
Copyright Statement
Copyright Statement
© 2006 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that
have been created, developed, or commissioned by, and published with the permission
of, Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are
protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web
site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be
held liable for technical or editorial errors or omissions contained in the Materials,
including without limitation, for any direct, indirect, incidental, special, exemplary or
consequential damages whatsoever resulting from the use of any information contained
in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent
& Trademark Office. All other product or service names are the property of their
respective owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtimepublishers.com, please contact us via e-mail at
info@realtimepublishers.com.
viii
Chapter 1
Chapter 1: Moving Beyond Current File Serving Philosophies

The challenges that face file serving have evolved over the past few years, and the methods used
to meet those challenges have advanced as well. Today, many organizations view data
availability as critical, allowing for very small windows of system downtime. Compounding the
problems of maintaining data availability is the sheer volume of data that many organizations
must manage. The industry has moved from needing gigabytes of storage a few years ago to
eclipsing the terabyte or even petabyte range of managed storage.
This chapter will begin an exploration of how to build highly scalable enterprise file serving
solutions by looking at the current state of the world of file serving. Along the way, you will see
the many disk, server, performance, and availability choices at your disposal. After exploring the
countless available options, the chapter will examine how Information Technology (IT) as a
whole is modernizing its approach to file serving and data management. This chapter will
provide the foundation on which to build the rest of the guide.
State of the World

Today, file serving can be deployed in many shapes and sizes. Architecturally, there are several
methods for designing and deploying file serving solutions. Many organizations don’t just
employ one idea or methodology but are often faced with managing a collection of disparate
technologies.
Performance Challenges
Performance problems often follow the pattern of a pendulum—they flow from one extreme to
another. On some levels, there are several servers not working up to capacity with physical
resources under-utilized. On most networks, there are almost always other servers that are over-
utilized, with users continually complaining about slow performance. On many networks,
resources are present to solve the problems of high-volume file serving, but the distribution of
the resources doesn’t allow all file servers to cohesively meet demand.
Management Challenges
In addition to performance challenges, managing a high volume of servers is a difficult task.
With each independent file server on your network, you are faced with the need to maintain
system hardware, software updates, and antivirus software in addition to a host of other
management tasks. To deal with the increased management requirements that are often the result
of network sprawl, many organizations are looking to achieve the following:
• Consolidate for the purpose of managing and maintaining fewer servers
• Consolidate and manage storage centrally
• Scale on-demand
• Centrally manage a collection of servers as a single computing resource
• Reduce software costs such as operating system (OS) and application licensing costs
Besides the management challenges faced, file serving continues to be challenged by availability
trials.
1
Chapter 1
Availability Challenges
In 2004, the Gartner Group determined that the average cost of downtime worldwide was
$42,000 per hour. They also found that the average network experiences 175 hours of downtime
each year. Thus, based on Gartner’s determinations, it should not take your organization long to
recognize the importance of data availability. Even if an organization is far below the average
downtime and is down for 100 hours in a year, that time would equate to potentially $4,200,000
in lost revenue.
Although the cost of downtime may be obvious and is certainly backed by some pretty
significant statistics from the Gartner Group, there are still countless organizations that simply
deal with downtime as if it’s an expected part of life in IT. In addition, many organizations
believe that the cost of downtime is eliminated once systems are backed up. When a company’s
data is unavailable, their reputation may be damaged and customer confidence weakened as a
result of the downtime. This is especially true with e-commerce. Potential customers will likely
not return to an unavailable Web site and will look to other option to buy their needed solution.
In many cases, if an organization’s data availability is unreliable, potential customers will
believe that the organization is also unreliable.
Although downtime for individual systems is inevitable, data does not have to be unavailable
during that period. System patches, hardware, and software upgrades are a required factor for all
networks, but the sole purpose of the network is to provide access to data. If one system must go
down for maintenance, why must the data be unavailable? With clustered file serving, server
maintenance or even failure will not significantly interrupt data access.
Growth of Managed Data

Over the past decade, storage growth has repeatedly exceeded the projections of most network
planners. Storage has continued to grow at an exponential rate, while the reliance each company
has on electronic data has increased as well. The result has been a need to manage an abundance
of storage while providing for fast access and high availability.
Today’s File Serving Landscape

Years ago, file serving was pretty simple. Today, file serving is much more complex, and there
are many approaches from which to choose. Today’s approaches to file serving include:
• Standalone servers
• Distributed file systems (DFSs)
• Network Attached Storage (NAS) appliances
• Failover Clusters
• Shared data clusters
This section will look at the current role of each of these architectures as well as the advantages
and disadvantages of each of these approaches.
2
Chapter 1
Standalone Servers
Standalone servers represent the root origin of file serving, and today maintain a very large
presence in the file serving landscape. Figure 1.1 shows a typical standalone file server
implementation.
Figure 1.1: Standalone file server implementation.
Notice in the figure that the storage scalability is addressed by attaching an external disk array to
the server. Although the initial deployment and management of this type of architecture is
usually simple at first, as the network scales, management generally becomes more difficult. The
file server implementation that this figure shows is generally referred to as a data island. The
reason is that access to the data is through a single path—the Local Area Network (LAN).
Whether access is required for clients or for backup and restore operations, the data must be
accessed over the LAN. For backup operations, this requirement might mean that backup and
restore data is throttled by the speed of the LAN. A 100Mbps LAN, for example, would provide
you with a maximum throughput of 12.5MBps (100Mbps divided by 8 bits).
3
Chapter 1
Many organizations have combated the storage management shortcomings of standalone file
servers by implementing either a dedicated LAN or a storage area network (SAN) for backup
and recovery operations. Although this approach might solve immediate storage needs, it does
little for scalability and availability. With the single file server acting as the lone access point for
data access, several individual problems that occur with the server can result in the complete loss
of data access. For example, any of the following failures would result in data unavailability:
• Hardware failure, such as CPU, RAM, or motherboard
• Network failure
• Power failure
• Disk failure
• Malware
Aside from any element of system hardware representing a possible single point of failure,
having just one or even two access points to data can result in performance bottlenecks.
Chapter 3 will look at ways to combat the availability and performance bottleneck issues associated
with standalone file servers.
How do most organizations overcome file serving performance issues as their networks grow?
Most simply add file servers. If one server is becoming overtaxed, an organization will order
another server and move some of the shares on the overburdened server to the new server. This
approach to growth is simple and has certainly been tested over time. However, adding servers
also means that administrators have more systems to manage. This load will ultimately include
additional work in hardware, software, and patch management. In addition, administrators will
be faced with the task of updating login scripts to direct clients to the new servers. Thus, in
addition to the cost of the new servers, there will ultimately be increased software and
administrative costs associated with the addition of the new server.
Although adding servers to the network is an inevitable part of growth, there are other
technologies that can assist in the scalability issues that surround file serving today. The next few
sections will look at alternative methods that can be either substituted for or complement the
addition of file servers to the LAN.
4
Chapter 1
DFS
The use of a DFS to manage file serving has been a growing trend in recent years. In short, a
DFS enables the logical organization of file shares and presents them to users and applications as
a single view. Thus, an organization’s 200 file shares scattered across 12 servers may logically
appear as if they’re attached to a single server. Figure 1.2 illustrates the core concept of a DFS.
Figure 1.2: A simple DFS implementation.
With DFS, users can access network shares via a DFS root server. On the DFS root server,
administrators can configure a logical folder hierarchy, then map each folder to a share located
on another server on the network. Each physical location that is mapped in the DFS hierarchy is
referred to as a DFS link. The link will contain the Universal Naming Convention (UNC) path to
the actual location of the shared folder. When a user accesses a shared folder on the DFS server,
the user will be transparently linked to another physical server on the network.
5
Chapter 1
To illustrate this concept, compare DFS with traditional file serving—DFS enables
administrators to present users a mapped network drive to access each share, and the
administrators can simply map a single drive letter to the DFS root. Having a logical access layer
in front of physical network resources offers several advantages:
• Administrators can change the physical location of shared data to support data
consolidation or relocation without interrupting user access
• Replicas can be created for folders at the DFS root, allowing files to be replicated
between multiple file servers
• With domain-based DFS, the DFS root can exist on multiple domain controllers, thus
adding fault tolerance to the DFS root itself
• Windows DFS is closely intertwined with Active Directory (AD), enabling users to
automatically be directed to shares that exist in their local site when multiple replicas of
the same shared folders exist
For more information about DFS, refer to Microsoft TechNet http://www.microsoft.com/technet and
search using the key word “DFS.”
In being able to create replicas of DFS links, administrators can add a level of fault tolerance to
the file serving infrastructure. Also, in being able to integrate with AD sites, users accessing a
link that contains multiple replicas will be directed to the replica location that exists in their
computers’ local site. The actual DFS root can also be replicated using domain-based DFS; thus,
the DFS root will also be fault tolerant.
DFS solves a few of the scalability issues with file serving. With DFS in place, file servers can
be added without having any impact on users and drive mappings. Availability can be increased
by creating replica links for critical shares. If the replica links traverse two or more sites, an
organization will also have simple disaster protection in place.
DFS should not be considered a replacement for normal backups. Although DFS can transparently
maintain multiple copies of files across two sites, it does not prevent file corruption, erroneous data
entry, or accidental or intentional deletion. Thus, you should still back up your file server data to
removable media and store it at an offsite facility.
Although DFS can solve some of the data access and availability concerns of standalone file
servers, it does not help combat the server sprawl that administrators will have to contend with as
additional servers are added to the network. Each server will still need to be maintained as a
separate entity. DFS will hide the complexity of the network infrastructure to end users and
applications, but administrators aren’t so fortunate. As the network grows, administrators will be
faced with managing and maintaining each server on the LAN.
6
Chapter 1
NAS Appliances
NAS appliances began gaining momentum as a method to consolidate and simplify file serving
in the late 1990s. NAS appliances quickly gained popularity as a result of the fact that they can
be deployed quickly (often within minutes) and with support for up to terabytes of storage;
several file servers are often able to be consolidated into a single NAS. Figure 1.3 shows a
typical NAS deployment.
Figure 1.3: A simple NAS deployment.
NAS devices are labeled appliances because of the fact that an administrator can literally buy a
NAS and plug it in. However, NAS devices have restricted software choices. By restricting the
software that could be installed, if any, NAS vendors are able to guarantee the reliability of their
systems. As most NAS appliances have a sole purpose of being file servers, there isn’t much of a
need to install applications.
Major vendors in the NAS space include Network Appliance, EMC, and Microsoft, which offers
a Windows Storage Server 2003 OS. Network Appliance and EMC provide both hardware and
their own proprietary NAS OS with each appliance. Microsoft does not ship NAS appliances.
Instead, it provides a NAS OS to vendors such as Dell and Hewlett-Packard, who ship NAS
appliances with the Windows Storage Server 2003 OS.
In being built for file serving, nearly all NAS appliances (including those from Network
Appliance, EMC, and Microsoft) support the two most common network file sharing protocols:
Common Internet File System (CIFS) and Network File System (NFS). Also, most NAS
appliances include built-in redundant hardware as well as data management utilities.
The popularity of NAS has been attributed primarily to its ability to be quickly deployed as well
as the relative simplicity of administration of the NAS appliance. Nearly all NAS appliances
come with a simple to use Web-based administration tool.
7
Chapter 1
As with other file serving approaches, NAS has a few drawbacks. Most NAS appliances come
with proprietary hardware and a proprietary OS. This shortcoming limits the flexibility of the
device in the long run. For example, an older and slower NAS appliance cannot later be used as a
database server. Also, the nature of proprietary solutions requires the purchaser to return to the
same NAS vendor to purchase hardware upgrades. Another challenge that has recently plagued
NAS is sprawl. For many network administrators that bought into the NAS philosophy of file
serving, adding capacity means adding another NAS. In time, many organizations have
accumulated several NAS appliances that are all independently managed.
Failover Clusters
Another approach to file serving involves the use of clusters. The simple definition of a cluster is
two or more physical computers collectively hosting one or more applications. A major
advantage to clusters is in the ability for an application to be able to move from one node to
another in the cluster. The process of an application moving to another node is known as
failover. A shared storage device between all nodes in the cluster is needed so that an application
will see a consistent view of its data regardless of the physical node that is hosting it. With these
capabilities, when many think of the term cluster, they quickly realize the benefits of availability
provided by clustering.
The two primary architectures available for file serving clusters are failover clusters and shared
data clusters. The difference between these architectures lies in how the cluster’s shared storage
is accessed. With failover clustering, one node in the cluster exclusively owns a portion of the
shared store resource. If an application in the cluster needs to fail over to another node, the
failover node will need to mount the storage before bringing the application online. Figure 1.4
illustrates a failover cluster.
Figure 1.4: Failover cluster with SCSI-attached shared storage.
Notice that a heartbeat connection is also shown in the illustration. The heartbeat represents a
dedicated network over which the cluster nodes can monitor each other. In this way, a node can
determine whether another node is offline. If no dedicated heartbeat network is present, the
cluster nodes will monitor each other over the LAN.
8
Chapter 1
In a simple failover cluster, one node hosts an application, such as a file server inside a virtual
server. The virtual server acts as an addressable host on the network and has a unique host name
and IP address. The second node, the passive node, monitors the first node for failure. If the first
node becomes non-responsive, the second node will assume control of the virtual server. Many
popular OS vendors offer failover clustering support with their OSs. For example, Microsoft
Windows Server 2003 (WS2K3) Enterprise Edition and Red Hat Enterprise Advanced Server 4.0
with the add-on Cluster Suite both support as many as 8-node failover clusters. The open source
High-Availability Linux Project offers support for failover clusters of 8 nodes or more.
There are plenty of available failover clustering solutions on the market today. However, vendors
are also starting to embrace shared data clusters, which offer the same level of fault tolerance as
failover clusters, but several additional benefits as well.
Cluster Architecture
Clusters are typically described as either N-to-1 or N-Plus-1. In an N-to-1 architecture, one node
in the cluster is designated as the passive node, leaving it available to handle failover if an active
node in the cluster fails. Figure 1.5 shows a 3-node N-to-1 cluster.
Figure 1.5: A 3-node N-to-1 cluster.
Notice that Node1 is active for the virtual server FS-Sales and Node2 is active for the virtual
server FS-Acct. If either active node fails, Node3 will assume its role. In this architecture, Node3
is always designated as the passive node, meaning that when the primary active node returns
online following a failure, the service will fail back to the primary node. Although this approach
offers simplicity, having automatic fail back means that the failed service will be offline twice—
once during the initial failover and again during the fail back.
9
Chapter 1
N-Plus-1 clustering offers a different approach. With N-Plus-1, a standby (passive) node can
assume control of a primary node’s service when the primary active node fails. However, when
the active node returns to service, it then assumes the role of the passive node. Thus, in time, the
active node for each service managed by the cluster may be completely different than at the time
the cluster was originally set up. However, automatic failback is not an issue with this approach,
thus providing for better overall availability.
Shared Data Clusters

Shared data clustering can also provide the benefit of high performance as well as load
balancing. Shared data clusters differ from failover clusters in how they work with shared
storage. In a shared data cluster, each node in the cluster simultaneously mounts the shared
storage resources. The approach provides far superior performance over failover clusters because
mount delays are not encountered when an application tries to failover to another physical node
in the cluster. With shared data clusters, multiple nodes in the cluster can access the shared data
concurrently; with failover clusters, only one node can access a shared storage resource at a time.
Figure 1.6 shows a shared data cluster. Notice that one of the key differences with the shared
data cluster is that a SAN is used to interconnect the shared storage resources.
Figure 1.6: Shared data cluster with SAN-attached storage.
The elements of the SAN cloud are discussed later in this chapter.
10
Chapter 1
Shared data clusters have steadily grown in popularity as a result of their ability to satisfy many
of the problems facing today’s file serving environments. In particular, shared data clusters can
offer the following benefits:
• Provide more effective utilization of hardware resources
• Provide for simple scalability to accommodate growth
• Provide for high availability
Depending on whom you ask, industry analysts have found that average server CPU
consumption runs from 8 percent to 30 percent. Most organizations have several servers that
exhibit similar performance statistics. For example, consider an organization that has two servers
that average 10 percent CPU utilization. Consolidating the servers to a single system will not
only allow hardware to be more effectively utilized but also reduce the total number of managed
systems on the network.
Several organizations have turned to virtual machines as a means to further consolidate server
resources. Companies such as VMware and Microsoft provide excellent virtualization tools in this
arena. Although virtualization might make sense in many circumstances, a virtual machine is still a
managed system and will need to endure patch and security updates as with any other system on the
network. Virtual machines provide an excellent benefit in consolidation, especially when consolidating
legacy OSs running needed proprietary database applications, but they are not always the best fit for
file serving. Consolidating to virtual servers running on top of clusters not only allows you to maximize
your hardware investment but also reduces the number of managed systems on your network.
Like traditional failover clustering, shared data or cluster file system architectures involve the
use of virtual servers that are not bound to a single physical server. Virtual servers that exist in
the cluster can move to another host if their original host becomes unavailable.
Where cluster file systems differ is in their fundamentally unique approach to clustering. In
traditional clustering, each virtual server has its own data that is not shared with any other virtual
server. In shared data cluster computing, multiple virtual servers can export the same data.
To summarize the key components of shared data cluster, consider the following common
characteristics:
• Modular—Several dense servers are grouped to support mission-critical file serving and
application needs.
• Adaptive—Physical resources in the cluster can be dynamically allocated to meet
performance requirements.
• High availability—Virtual servers are enabled to fail over to available physical resources
if a failure occurs.
• Shared data—Servers in the cluster concurrently access shared data via a SAN.
Concurrent access provides for near instantaneous failover.
• Platform independence—Hardware of each node in the cluster does not need to be
identical or even from the same vendor.
• Management layer—Intelligence exists that oversees and ensures cohesion of physical
and logical elements in the cluster.
11
Chapter 1
Modular
In being modular, the cluster should support the logical grouping of physical resources to support
the demand and quantity of virtual servers that are needed. In being able to group both physical
server resources as well as storage resources, management is relatively simple. On the outside,
shared data clusters can look intimidating. For this architecture to succeed, it’s important for the
management of resources to be simple. Modularization provides this simplicity.
Adaptive
Shared data clusters have the ability to take advantage of both high-performance clustering and
failover clustering. To meet the needs of applications, additional servers can be redeployed to
virtual server groups to accommodate demand. Additional virtual servers and applications can
usually be added with minimal to no investment.
High Availability
To support high availability, virtual servers in the cluster can failover to other nodes. If data
access via one physical server in the cluster is interrupted, another physical server can take
control of a virtual server in the cluster. Also, shared data clusters provides for a unique data
sharing architecture that allows failovers to typically complete within seconds.
With file servers running as virtual servers hosted by a shared data cluster, data access does not
need to be unavailable for several hours due to scheduled or unscheduled downtime. Instead, if a
node in the cluster needs to go offline (or is taken offline by system failure), the application
hosted by the node can simply be moved to another node in the cluster. With failover generally
taking seconds to complete, user access would be minimally disrupted.
Not all clustering products support application failover during upgrades. Some products will require all
servers be taken down simultaneously during an upgrade. Administrators should consult their cluster
product vendor prior to performing any cluster maintenance to verify that clustered applications will
remain available during any system upgrades.
Shared Data
With shared data, many traditional failover cluster architectures employ a shared-nothing
architecture. With shared-nothing clustering, one or more servers share storage, but in reality
only one server can use a shared physical disk at a time. The argument for this approach has long
been that concurrent I/O operations from multiple sources could corrupt the shared hard disk, so
it is best that the disk only be mounted on one physical server at a time. Ultimately, this means
that traditional architectures in which software running on the servers in the cluster simply will
not run properly if multiple physical servers are concurrently accessing the same disk space.
However, this architecture will result in slow failovers in the event of a failure due to one node
needing to release the storage resource and then the failover node needing to mount the storage
resource.
With shared data clusters, each node in the cluster mounts the shared storage on the SAN. Thus,
during a failover, no delay is incurred for mounting storage resources. To insure data integrity,
the cluster’s management layer uses a distributed lock manager (DLM). The DLM allows
multiple servers to read and write to the same files simultaneously. The DLM also provides for
cache coherence across the cluster. True cache coherence is what allows multiple servers to work
on the same application data at the same time. This feature is what allows shared data clustering
to offer both high performance and high availability.
12
Chapter 1
Platform Independence
In being platform independent, cluster computing allows the use of preferred hardware for the
assembly of the cluster’s inner infrastructure. Platform independence makes it much easier for
organizations to get started with cluster computing, and as servers in the cluster age, those
servers can potentially be used for other purposes within the organization.
Management Layer
The role of the management layer within cluster computing is to not only modularize physical
resources such as servers and storage but also provide failover and dynamic allocation of
additional resources to meet performance demands.
As shared data clusters are a new and different approach to clustering, there are currently few
choices available that can provide the complex management functionality of cluster computing-
driven server infrastructure. The lone vendor that can fully deliver shared data clusters today is
PolyServe; however, there are other storage vendors that offer consolidation and availability
solutions such as Network Appliance and EMC (but each of these solutions is hardware centric).
Built to Scale
Another aspect of shared data clustering that has lead to its popularity has been its simple growth
model. As load increases, nodes can simply be added to the cluster. Although many failover
cluster architectures experience trouble scaling, shared data clusters that can run on both
Windows and Linux OSs support scaling to 16 nodes or beyond. This type of flexibility
eliminates much of the guesswork of growth and capacity planning. With shared data clusters
supporting a high number of maximum nodes, administrators can add nodes as needed rather
than purchase based on capacity that may be planned 18 months out.
The Cost Factor

Shared data clusters offer several advantages, but those advantages come with a price. Shared
data clusters typically share a common storage source. The shared storage is usually
interconnected to the cluster nodes via a fibre channel SAN. Although shared storage contributes
to some of the benefits mentioned earlier (and several more discussed in Chapter 6), it comes at a
higher cost than traditional direct attached storage (DAS). However, although cost can lead to
initial sticker shock, the surprise often quickly passes when the cost of downtime is compared
with the cost of the shared storage infrastructure with your data hosted on industry standard
Intel-based architecture and the need to scale performance. To understand the savings, look past
the cost of DAS on a single server. With shared storage, after the initial infrastructure
investment, there is little difference in the cost of actual storage. When compared with the cost of
Intel servers over proprietary UNIX or NAS appliances, the cost savings of shared data
clustering is often estimated at 8 to 10 times the cost of the proprietary equipment. Thus, the
shared data approach provides not only better utilization of storage resources, better availability,
and better performance, but also substantial cost savings.
In terms of complexity, storage architectures are less intimidating once the technologies
available have been explored. The following sections highlight these technologies.
13
Chapter 1
Current Storage Architectures

Today, there are several ways to deploy storage on a LAN. Among the most popular choices are:
• SCSI
• Serial ATA (SATA)
• Fibre Channel (FC)
• Internet SCSI (iSCSI)
This section will take a brief look at each of these technologies as they relate to building a better
file serving infrastructure.
SCSI
SCSI has long been the core storage architecture for high-performance file serving. Although this
disk architecture has lost significant ground to FC, most organizations still employ several SCSI
storage devices on their networks. The first generation of SCSI offered throughput of as fast as
5MBps; today Ultra320 SCSI can push data at a rate of as fast as 320MBps. With SCSI device
support, the size of the SCSI bus will ultimately determine the number of devices that can be
connected to the bus. For example, narrow SCSI has an 8-bit bus, which allows it to support as
many as 8 devices, including the SCSI host bus adapter (HBA).
Wide SCSI has a 16-bit bus, which allows for support for as many as 16 devices. By using
logical unit numbers (LUNs), SCSI buses can support more than this limitation. SCSI IDs are
used to identify each device on the bus. By default, each SCSI HBA uses an ID of 7. For narrow
SCSI, IDs of 0 to 7 are valid; whereas 0 to 15 are valid IDs for wide SCSI. Table 1.1 shows the
different SCSI bus types available today.
Bus Type Bus Width (Bits) Bandwidth (MBps) Maximum Cable Length (m)
SE LVD HDV
SCSI-1 8 5 6 --- 25
SCSI-2 8 5 3 --- 25
Wide SCSI 16 10 3 --- 25
Fast SCSI 8 10 3 --- 25
Fast Wide SCSI 16 20 3 --- 25
Ultra SCSI 8 20 1.5 --- 25
Ultra SCSI-2 16 40 3 --- 25
Ultra2 SCSI 16 80 --- 12 25
Ultra160 SCSI 16 160 --- 12 ---
Ultra320 SCSI 16 320 --- 12 ---
Table 1.1: SCSI bus type comparison.
14
Chapter 1
Note that the table lists cable lengths only for a SCSI bus standard supported for a particular SCSI
bus type. LVD cable lengths are not listed until Ultra2 SCSI, which was the first SCSI standard to
support the LVD bus type.
For more information about SCSI, visit Gary Field’s SCSI Info Central at http://www.scsifaq.org.
SCSI runs into major scalability problems with shared storage architectures. In nearly all failover
cluster implementations, shared storage connected via SCSI supports a maximum of 2 nodes.
This scalability limitation has led many organizations to move away from failover clustering.
Although failover clusters can run on SANs, the products of many vendors still behave as if
they’re SCSI attached, thus diminishing their attractiveness.
For greater scalability, many organizations are moving toward shared data clusters that
interconnect shared storage to cluster nodes via an FC SAN. Although FC provides the data
transport in the SAN, FC disk arrays attached to the SAN may contain internal FC, SCSI, or
SATA disks. With the ability to offer scalability and support for all major disk storage
architectures, it’s easy to see why FC has become the leading storage interconnect in the
industry.
SATA
SATA drives have become increasingly popular due to their lower cost (compared with SCSI)
and comparable speeds. The first SATA standard provided for 150MBps data transfer rates. In
response to this standard, SCSI vendors quickly met the challenge, and, in turn, SATA began to
offer 300MBps with its SATA II standard. At 300MBps, SATA II is still slightly slower than
Ultra320 SCSI, but is now a viable cost-effective option in high-performance file serving.
Also, many storage vendors have jumped on the SATA bandwagon, with several vendors such as
Hitachi and Sun Microsystems offering SATA disk arrays. The rise of SATA has been pushed by
several storage vendors that have built SATA storage devices that can be interconnected to FC
SANs.
For more information about SATA, refer to the SATA International Organization homepage at
http://www.sata-io.org.
15
Chapter 1
FC and SANs
Today, FC is the predominant architecture for interconnecting shared storage devices. The high
adoption rate of FC has been fueled by its several advantages over SCSI:
• Speed—4Gbps FC mediums offer data transfer rates as fast as 512MBps
• FC SANs support as many as 16 million devices
• FC supports cable lengths as long as 10KM
One of FC’s greatest benefits is that this architecture allows for interconnecting storage devices
via a dedicated SAN. SANs provide the following benefits:
• Storage resources can be pooled and shared by all servers
• Backup performance will likely increase dramatically
• Scalability issues can be more easily managed
• Shared data clusters can scale as high as 16 nodes or beyond, depending on the clustering
application
Each server connected to the SAN can potentially access any storage resource on the SAN.
SANs enable the maximized use of storage resources by creating better opportunities to allocate
unused resources to other servers. This setup has significantly aided data backups. Now, a server
no longer has to send its data over the LAN to access a tape library for backup, for example.
Instead, the server can directly access the library via the SAN. Backup vendors such as Symantec
(formerly VERITAS) and CommVault have architectures that support sharing of backup targets
in a SAN. Now servers are no longer faced with network bottlenecks while backing up their data.
The term LAN-free is often used to describe this backup approach. Other backup methods such as
server-free and server-less are also available by using enterprise-class backup products and
interconnecting storage resources via a SAN.
Chapter 6 will provide examples of all SAN-based backup configurations, including LAN-free, server-
free, and server-less as well as several examples of how organizations are consolidating storage
resources by connecting their servers to SANs.
Disk arrays as well as backup devices can be shared on a SAN. In the past, many in IT addressed
storage by guessing how much storage a server would need when it was initially requisitioned,
and if the server needed more disk resources, more would be ordered at a later date. For servers
for which the estimate was too high, disk resources would go unused. The ability to collectively
pool physical disks in a SAN enables the allocation of disk space to servers as needed. The
bottom line with SANs is that their implementation is a natural part of the progression toward
consolidation. Figure 1.7 shows a basic SAN.
16
Chapter 1
Figure 1.7: A SAN that consists of a switch, router, disk array, and tape library.
Notice that three servers are sharing a disk array and tape library. The switch and router are used
to interconnect the storage devices on the SAN. FC SAN hardware devices share the same names
of devices that you have already come to know and love with LANs. The primary devices that
drive a SAN include:
• Switches and hubs
• Routers (also known as bridges)
Switches and Hubs

Switches and hubs are used to interconnect devices on the SAN. Their role on the SAN is similar
to a switch or hub on a LAN. Hubs are older FC devices that support a topology known as FC-
Arbitrated Loop (FC-AL), which is the SAN equivalent to a token ring network. Switches
dominate today’s SAN landscape and work similarly to Ethernet switches. SANs connected via a
switch are said to be a part of a Switched Fabric topology. This setup is similar to the Ethernet
switches. With a switch, dedicated point-to-point connections are made between devices on the
SAN, allowing the devices to use the full bandwidth of the SAN. With FC-AL hubs, bandwidth
is shared and only one device can send data at a time. Among the popular switch vendors today
are Brocade, McData, and Cisco Systems. Another very popular device on the SAN is the router.
Router
Routers are devices that are used to connect an FC SAN to a SCSI device. The job of the device
is to route between a SCSI bus and an FC bus. The router is a very important consideration when
planning to implement a SAN, as it allows an organization to connect existing SCSI storage
devices (disk arrays and libraries) to the SAN. This connection prevents the loss of the initial
SCSI storage investment. The two most popular router vendors today are ADIC and Crossroads.
For more information about FC and SANs, refer to the excellent online resources: Storage
Networking Industry Association at http://www.snia.org, Fibre Channel Industry Association at
http://www.fibrechannel.org, and Legato System’s SAN Academy at http://www.sanacademy.com.
17
Chapter 1
FCIP and iFCP

The cheapest transmission medium is the Internet, which requires IP. With this in mind,
wouldn’t it be useful to be able to bridge SANs in two sites together through the Internet? In
order for this to happen, you will need a device capable of doing the FC-to-FCIP translation.
Some FC switches have integrated FCIP ports that allow you to do so. However, FCIP doesn’t
provide any means to directly interface with an FC device; instead, it’s a method of bridging two
FC SANs over an IP network.
Internet FC Protocol (iFCP) is much more robust than FCIP. Like FCIP, iFCP can also be used
to bridge FC switches over an IP network. However, this protocol also provides the ability to
network native IP storage devices and FC devices together on the same IP-based storage
network. With the rise of gigabit Ethernet networks, consider iFCP as a way to provide full
integration between an FC and IP network. Another rising protocol that provides the same level
of hardware integration over gigabit Ethernet is iSCSI.
iSCSI
iSCSI works very similarly to iFCP, except that instead of encapsulating Fibre Channel Protocol
(FCP) data in IP packets, SCSI data is encapsulated. In being designed to run over Ethernet,
iSCSI enables the leveraging of existing Ethernet devices on a storage network. For example,
consider an organization that purchases new gigabit Ethernet switches for an iSCSI SAN. As
technology improves and the organization decides to upgrade to faster gigabit switches, the older
switches can be used to connect hosts on the LAN. FC switches don’t offer this level of
flexibility.
iSCSI architecture involves a host configured as an iSCSI target. The iSCSI target can be a
server with locally connected storage or a storage device that natively supports iSCSI. Clients
that access the storage over the network using the iSCSI protocol are known as initiators.
Initiators need to have iSCSI client software installed in order to access the iSCSI target. Figure
1.8 shows a typical iSCSI environment, showing two initiator hosts and one iSCSI target.
Figure 1.8: A small iSCSI SAN.
As iSCSI is a newer and maturing protocol, there are not as many storage devices that support
iSCSI as those that support FC. As more devices become available, expect competition to cause
the price of both iSCSI and FC SANs to drop even further.
18
Chapter 1
Clustered File Serving Gaining Momentum

To get past the reliance of data on an individual system, clustered file serving has emerged as the
primary means for maintaining data availability. In short, clustering allows a virtual server to run
on top of any physical server participating in the cluster. Virtual servers have the same
characteristics as physical file servers—a name, IP address, and the ability to provide access to
data. However, they differ in the fact that they are not dependent on a single piece of hardware to
remain online. Instead, if a virtual server host’s hardware fails, the virtual server can simply
move to another host. The result is that the virtual server is only offline for a few seconds while
moving to another physical host, compared with several minutes or hours of unavailability in the
event of a server failure.
High Availability
Keeping data available means keeping everything in the data path available. This goal is most
often secured through redundancy. Storage itself can achieve redundancy through Redundant
Array of Inexpensive Disks (RAID). Redundant switches can be added to the data path on the
network, preventing against a switch failure. Redundant switches can be added to a SAN.
Finally, physical servers themselves can be made redundant through clustering. Figure 1.9
illustrates an example of a highly available file serving architecture.
Figure 1.9: An example of a high-availability clustering architecture.
Considerable time is spent exploring adding redundancy to the complete data path in Chapter 3.
19
Chapter 1
Consolidation Advantages
Newer cars have a lot more parts inside. Although the additional parts may equate to more
features, such as power windows, these additions also mean that there are more parts that can
break. On a network that employs 200 servers, each part on each server represents a potential
failure. Reducing the number of servers on the network ultimately reduces the number of
potential failures.
PolyServe recently studied the benefits of consolidating file servers to a clustered file system
running on standard hardware and found the following:
• Procurement costs are reduced by as much as 70 percent
• Physical and logical file server use and storage consumption are reduced by as much as
80 percent
• Operational costs are reduced by at least 50 percent
• File server downtime is reduced by almost 100 percent
Thus, consolidating to clustered file system (CFS)-based file serving easily equates to
quantifiable savings. An administrator who wants to lower the number of system management
headaches needs a way to quantify proposals for new technologies in order to get them approved.
If data unavailability is reduced from 175 hours per year to 1 hour per year, for example, an
organization may see a production savings of more than 4 million dollars, according to the
Gartner survey cited earlier.
Drive Toward Standardization

Movement toward standardized hardware on Intel-based platforms has steadily gained ground
over the past decade. Moving away from proprietary hardware solutions gives organizations true
independence with their hardware investments. As hardware ages, it can be used in other roles,
such as in application serving of a less critical database application. When mission-critical
servers are upgraded, the original server systems can be used for other roles within the
organization.
Having standard non-proprietary hardware also offers complete flexibility with OS and
application choices. A Windows box could easily become a Linux box or vice-versa as the need
arises. As needs on the network change, systems can be moved to where they’re most needed.
With proprietary solutions, this level of flexibility is typically not possible.
The push toward standard platforms has gone past the major OS vendors and extended to
application and service vendors. Running servers on standardized hardware ultimately means far
more applications are available to select from.
The bottom line with the movement toward standardization is that administrators and end users
benefit the most. Organizations have better and less expensive products and much more to
choose from when making purchasing decisions. The competition that has been steadily
expanding in the non-proprietary market will only continue to benefit the industry with
innovation fueling further competition.
20
Chapter 1
Summary
With increased need for performance and availability of files, shared data clusters have steadily
emerged as the architecture of choice to meet many organizations’ file serving needs. Shared
data clusters offer superior scalability and a significantly lower cost than point-level proprietary
solutions such as the offerings of many NAS vendors. With this type of momentum, it appears
that shared data clusters will continue to experience rapid growth in the years to come.
Deploying a shared data cluster architecture as part of a consolidated and highly available server
infrastructure can provide a resilient and flexible architecture that can scale as an organization
grows.
The next chapter digs deeper into the problems plaguing modern architectures and looks further
into how these problems are being solved. The rest of the guide will explore specific examples of
how to optimize the data path for performance and availability and provide examples of
increasing performance, availability, and scalability of both Windows and Linux file serving
solutions.
21
Chapter 2
Chapter 2: Taming Storage Growth—A Modern Perspective

Chapter 1 introduced the vast array of storage and file serving solutions available today. It is
important to understand how each file serving and storage technology works, and equally crucial
to know which technology is right for your organization’s needs. This chapter provides a detailed
examination of the current storage and growth problems facing IT and explores what server,
storage, and application vendors are doing to address these problems.
Current Storage Problems

Most organizations face several problems with their storage infrastructures, most notably:
• Availability
• Growth
• Management
• Backup window expansion
This section will look at the root causes for each of these problems, setting the foundation for a
later discussion of how vendors are using technology to address these issues.
Availability
Availability refers to data being obtainable when a user or application needs it. Because need is a
relative term, the definition of availability can vary from one organization to the next. For a
small medical office, availability probably means access to resources from 8:00 AM to 6:00 PM
Monday through Friday. For an ecommerce Web site, availability means 24 × 7 access to data.
For most organizations, availability likely falls in the middle of the two previous examples.
Regardless of an organization’s definition of availability, the performance of IT staff is often
measured by the availability of data.
Several problems can derail data availability:
• System hardware failure
• System software failure
• Power failure
• Network failure
• Disk failure
22
Chapter 2
Fortunately, many of the single points of failure that prevent data availability can be overcome
with redundancy. An organization can overcome power failure through the use of UPSs and
backup generators. Reliance on additional switches, routers, and redundant links can help
prevent network failure; RAID can help overcome data availability problems that result from
disk failure.
Chapter 3 will focus on how to overcome these issues so as to ensure data availability.
System hardware or software failure is often more difficult to rebound from. To overcome this
challenge, several solutions are now available, including NAS and shared data clusters. These
technologies will be examined later in this chapter.
Growth
Another problem facing administrators today is growth. As storage requirements and client
demand increases, how do you accommodate the increase? To put growth into perspective,
according to IDC’s Worldwide Disk Storage Systems Tracker, disk factory revenue grew 6.7
percent year over year, as reported in the first quarter of 2005. The report also noted that the 6.7
percent growth value represented 8 quarters of consecutive growth. The increase in reviews has
occurred despite the fact that the cost per gigabyte of disk storage continues to fall. For example,
the 2005 Worldwide Disk Storage Systems Tracker also reported that capacity continues to grow
at an exponential rate, with 2005 year-on-year capacity growing 58.6 percent.
Market trends have shown that nearly all administrators face growth. One of the major problems
is how to effectively manage it. Is the ideal solution to continue to add capacity or is a better
solution to rebuild the network infrastructure so that it can effectively scale to meet the needs of
future growth? Many administrators are deciding that now is the time to look at new ways of
managing data, as earlier architectures are not well-suited to meet the continued year-on-year
demands of growth.
Management
With growth comes additional headaches—the first of which is management. As online storage
increases, what is the best way to effectively manage the increase? If each server on the LAN is
using local DAS storage, this situation creates several potential bottlenecks, data islands, and
independently managed systems. If client demand is also increasing, is the best option to add
servers to the LAN to deal with the heavier load? Ultimately, the problem that is hurting data
management today is that many administrators are trying to use traditional architectures to deal
with modern problems.
Expanding Backup Windows

As the amount of data grows, so do the backup windows for many organizations. With
traditional servers with DAS storage and LAN-based backups, it has become almost impossible
to back up servers over a LAN within the time of a backup window. This challenge has resulted
in many organizations altering their backup schedules and doing fewer full backups in order to
have backups complete before business starts each morning. To deal with the issue of expanding
backup windows, many IT shops have considered or have already deployed solutions such as
NAS, SAN, DFS, and virtualization. The following sections explore the part that each of these
technologies plays within the network.
23
Chapter 2
Existing Storage Solutions

There are many vendors in the market arena that offer products to solve the current storage
problems of today. Although these solutions can ease the management burden of an
administrator, the solutions’ one-size-fits-all approach doesn’t guarantee success. It’s the
responsibility of the organization and the IT staff to understand each of the available storage and
consolidations solutions, then select the solution that best fits the company’s mission.
SAN
For many organizations, a SAN is often the answer for consolidating, pooling, and centrally
managing storage resources. The SAN can provide a single and possibly redundant network for
access to all storage devices. Depending on the supporting products purchased by an
organization, a SAN might also provide the ability to run backups independent of the LAN. The
advantage to LAN-free backups is almost always increased throughput, thus making it easier to
complete backups on time.
As SANs have become an industry standard for consolidating storage resources, hundreds of
application vendors now offer products that help support storage management and availability in
a SAN. In addition, as SANs are assembled using industry standard parts and protocols such as
FCP and iSCSI, an administrator can design a SAN using off-the-shelf parts.
NAS Filers
NAS filers have been at the heart of the file server consolidation boom. Organizations that face
the challenge of scaling out file servers can simply purchase a single NAS filer with more than a
terabyte of available storage.
One of the greatest selling points to NAS filers has been that they’re plug-and-play in nature,
allowing them to be deployed within minutes. At the same time, however, most NAS solutions
are vendor-centric, meaning that they don’t always easily integrate with other network
management products. NAS vendors such as EMC and NetApp offer support for a common
protocol known as Network Data Management Protocol (NDMP), which allows third-party
backup products to back up data on EMC and NetApp appliances. The benefit of NDMP is that it
is intended to be vendor neutral, meaning that if a backup product supports NDMP, it can back
up any NDMP-enabled NAS appliance.
Microsoft NAS appliances that run the Windows Storage Server OS, however, do not support
NDMP. Backing up a Windows Storage Server NAS will require the installation of backup agent
software on the NAS itself.
Traditional NAS vendor offerings do not allow administrators to install backup software on their
filers. The reason for this restriction is to guarantee the availability of the NAS; however, it
significantly ties the hands of administrators when they’re looking for flexibility.
Later, this chapter will spend more time contrasting the role of NAS in server and storage
consolidation with that of other products.
24
Chapter 2
DFS
DFS has been seen as an easy way to combat server sprawl, at least from a user or application
perspective. As servers are added to a network to accommodate growth and demand, the new
servers can be referenced under a single domain-based DFS root. This feature allows the addition
of the new servers to be transparent to users.
DFS can equally support server consolidation. For organizations that are consolidating and
removing servers from the LAN, DFS can add a layer of transparency to the consolidation
process. If user workstations and applications are set up to access file shares via the DFS root,
administrators are free to move and relocate shares in the background and simply update the
links that exist at the DFS root once the migration is complete. Thus, the way users access file
systems will be the same both before and after the migration.
Virtualization
Virtualization technologies have recently jumped to the forefront of organizations’ efforts to
consolidate and simplify data access and management. This section will look at how storage
virtualization has aided in storage consolidation efforts of SANs and how server virtualization
has enabled companies to reduce the number of physical servers on their LANs by as much as 75
percent.
Storage Virtualization
As the number of managed storage resources on a LAN grows, so does the time and cost of
managing those resources. Implementing a SAN provides an excellent first step toward
consolidation and easing an administrator’s storage management burden, however, the SAN
alone may not be enough. This is where storage virtualization comes into the picture. There are
plenty of ways to define storage virtualization, but to keep it simple, consider storage
virtualization to be the logical abstraction of physical storage resources. In other words, a logical
access layer is placed in front of physical storage resources.
Storage virtualization is often a confusing topic due to the fact that several storage vendors have their
own definition of the term. Competing vendors—most of which claim to have invented storage
virtualization—may offer differing definitions. A common voice for storage virtualization can be found
at the Storage Network Industry Association (http://www.snia.org).
Figure 2.1 provides a simple illustration of storage virtualization. The primary point of storage
virtualization is to logically present physical storage resources to servers. This setup often results
in better storage utilization and simplified management. As Figure 2.1 illustrates, storage
virtualization starts by adding a data access layer between systems and their storage devices.
25
Chapter 2
Figure 2.1: Virtualization access layer for physical storage resources.
The actual virtualization layer can be comprised of several different technologies. Among the
virtualization technologies that may exist between servers and storage are:
• In-band virtualization
• Out-of-band virtualization
• Hierarchical Storage Management (HSM)
• Policy-based storage virtualization
The next four sections explore how each of these virtualization architectures aids in storage
consolidation.
In-Band Virtualization
With storage virtualization, the term in-band implies that the virtualization device lies in the data
path. The purpose of the device is to control the SAN storage resources seen by each server
attached to the SAN. This level of virtualization goes far beyond traditional SAN segmentation
practices such as zoning by allowing an administrator to allocate storage resources at the
partition level, instead of at the volume level. Figure 2.2 shows an example of in-band
virtualization.
26
Chapter 2
Figure 2.2: In-band storage virtualization.
Notice that there is a virtualization appliance in the data path. The role of the appliance is to
logically present storage resources to physical servers connected to the SAN. Also, as it resides
directly in the data path, the appliance will provide more control over physical separation of
SAN resources. For simplicity, the virtualization appliance is shown as an independent device,
but such doesn’t have to be the case. Although established NAS vendors such as Network
Appliance and EMC are now offering virtualization appliances that fall inline with their general
NAS philosophy, other fabric switch vendors such as Cisco Systems and Brocade are integrating
virtualization intelligence into their fabric switches. Thus, the virtualization appliance does not
have to be a standalone box and instead can seamlessly integrate into a SAN fabric.
Oftentimes multi-function SAN devices appear to initially have a higher cost than their single function
counterparts. However, every device introduced to the data path in a SAN can result in a single point
of failure. This shortcoming is often overcome by adding redundant components. Thus, for fault
tolerance, an organization will need two of each single function device. If the devices across the SAN
can’t be cohesively managed, separate management utilities will be required for each. Comparing this
option with solutions such as running VERITAS Storage Foundation on top of a Cisco MDS 9000
switched fabric will reveal a significantly lower cost of ownership.
To get the most out of in-band virtualization, many organizations deploy a software storage
virtualization controller such as IBM SAN Volume Controller or VERITAS Storage Foundation.
With the virtualization component residing in fabric switches as opposed to living on standalone
appliances, an organization will have fewer potential single points of failure in the SAN.
27
Chapter 2
Many organizations have been wary about deploying in-band virtualization because of the
overhead of the virtualization appliance. In being inside the data path and having to make logical
decisions as data passes through the appliance, the virtualization appliance will induce at least
marginal latency to the SAN. Vendors such as Cisco have worked to overcome this issue by
adding a data cache to the appliance or switch. Although adding a cache to the appliance can
improve latency, it will still likely be noticeable in performance-intensive deployments.
Out-of-Band Virtualization
Out-of-band storage virtualization differs from in-band virtualization in the location of the
virtualization device or software controlling the virtualization. With out-of-band virtualization,
the virtualization device resides outside of the data path (see Figure 2.3). Thus, two separate
paths exist for the data path and control path. (With in-band virtualization, both data and control
signals use the same data path.)
Figure 2.3: Out-of-band storage virtualization.
With control separated from the data path, out-of-band virtualization deployments don’t share
the same latency problems as in-band virtualization. Also, as out-of-band deployments don’t
reside directly in the data path, they can be deployed without major changes to the SAN
topology.
Out-of-band solutions can be hardware or software based. For example, a DFS root server can
provide out-of-band virtualization. DFS clients locate data on the DFS root server and are then
redirected to the location of a DFS link. Data transfer occurs directly between the server hosting
the DFS link and the client accessing the data. This setup causes out-of-band deployments to
have less data path overhead than in-band virtualization.
28
Chapter 2
Another advantage of out-of-band virtualization is that it’s not vendor or storage centric. For
example, IBM and Cisco sell a bundled in-band virtualization solution that requires specific
hardware from Cisco and software from IBM. Although both vendors’ solutions are effective,
some administrators don’t like feeling that an investment in technology will equal a marriage to a
particular vendor. However, purchasing a fabric switch such as the Cisco MDS 9000 series that
offers in-band virtualization as opposed to a dedicated in-band appliance will still offer some
degree of flexibility. If an organization decides to move to an out-of-band solution later, the
company will still be able to use the switch on the SAN.
There are several options available for pooling and sharing disk resources on a SAN. Although
both in-band and out-of-band virtualization differ in their approach to storage virtualization, both
options offer the ability to make the most out of a storage investment. Consolidating storage
resources on a SAN is often the first step in moving toward a more scalable storage growth
model. Adding virtualization to complement the shared SAN storage will provide greater control
of shared resources and likely allow even more savings in terms of storage utilization and
management with a SAN investment.
HSM
HSM is a management concept that has been around for several years and is finally starting to
gain traction as a method for controlling storage growth. Think of HSM as an automated archival
tool. As files exceed a predetermined last access age, they are moved to slower, less expensive
media such as tape. When the HSM tool archives the file, it leaves behind a stub file, which is
usually a few kilobytes in size and contains a pointer to the actual physical location of the file’s
data. The use of stub files is significant because it provides a layer of transparency to users and
applications accessing the file. If a file has been migrated off of a file server, leaving a stub file
allows users to access the file as they usually do without being aware of the file’s new location.
The only noticeable differs for the users will be in the time it takes for the file to be retrieved by
the HSM application.
Some HSM tools have moved away from the use of stub files and work at the directory level instead,
thus allowing the contents of an entire folder to be archived. NuView’s StorageX performs HSM with
this approach. StorageX is able to leverage the existing features of DFS and NFS to archive folders,
while adding a layer of transparency to the end user.
Figure 2.4 shows a simple HSM deployment. In this example, files are migrated from a disk
array on the SAN to a tape library. The migration job would be facilitated by a file server
attached to the SAN that is running an HSM application.
29
Chapter 2
Figure 2.4: Migrating files older than 6 months to tape.
HSM tools typically set file migration criteria based on:

• Last access time
• Minimum size
• Type
When a migration job is run, files that meet the migration criteria are moved to a storage device
such as a tape and a stub file is left in its place. The advantage of HSM for storage consolidation
is that it allows the control of the growth of online storage resources. By incorporating near-line
storage into a storage infrastructure, an organization can continue to meet the needs of online
data demands while minimizing the demand of online storage devices as well as the size of
backups.
To further understand HSM, consider the example of a law office. Many legal organizations
maintain electronic copies of contracts; however, it may be months or even years before a
contract document may need to be viewed. In circumstances such as this, HSM is ideal. HSM
allows all contract documents to remain available while controlling the amount of online disk
consumption and needed full backup space.
30
Chapter 2
Policy-Based Storage Virtualization

Several backup software vendors, including VERITAS and CommVault, currently offer policy-
based storage virtualization. This approach to virtualization has simplified how backup and
restore operations are run. By using a logical container known as a storage policy, backup
administrators no longer need to know the physical location of data on backup media. Instead,
the physical location of data is managed by the policy. Consider a storage policy to be a logical
placeholder that defines the following:
• Backup target device (library, disk array, and so on)
• Backup medium (tape, disk)
• Backup data retention days
When a server is defined to be backed up using enterprise backup software, an administrator
does not need to select a backup target. This selection is automated through the use of the storage
policy. Although backups are simplified through the use of storage policies, restores are where
an organization will see the most benefit. When an administrator attempts to restore data, he or
she does not need to know which tapes the necessary backups were stored on. Instead, the
administrator simply selects the system and the files to restore. If any exported tapes are needed
for the restore job, the administrator will be prompted. If the tapes are already available in the
library, the restore will simply run and complete.
The advantage to this approach is that administrators don’t need to scan through a series of
reports to find a particular backup media in order to run a restore. Instead, they can simply select
a server, file, and date from which to restore. This level of storage virtualization is especially
useful in large and enterprise-class environments with terabytes to petabytes of managed storage.
As storage continues to grow, it is becoming increasingly difficult to manage. Adding a storage
policy management layer to data access alleviates many of these problems.
For more information about storage virtualization, download a copy of SNIA’s “Shared Storage Model:
A Framework for Describing Storage Architectures” at
http://www.snia.org/tech_activities/shared_storage_model. This document describes the SNIA
standards for a layered storage architecture.
Storage virtualization is often seen as a first step in consolidating network resources. With
storage resources centrally pooled and managed, the constraints of running tens to hundreds (or
even thousands) of independent data islands on the LAN are eliminated. By consolidating to a
SAN, a storage investment will now be closer to storage needs and organizations will have an
easier time backing up servers within an allocated backup window. With storage firmly under
control, the next logical step in consolidating network resources is server virtualization.
31
Chapter 2
Server Virtualization
Server virtualization involves freeing servers on the network from their normal hardware
dependencies. The result of sever virtualization is often additional space in the server room. This
benefit results from the fact that most likely several servers on the network are not using nearly
all of their physical resources. For example, an organization might have one file server that
averages 10 percent CPU utilization per day. Suppose that peak utilization hits 30 percent.
Ultimately, the majority of the server’s running CPU resources are doing nothing.
One way to solve the problem of under-utilized server hardware resources is to run multiple
logical servers on one physical box. Today, there are two fundamental approaches to achieving
this:
• Virtual Machines
• Shared data clusters
The next two sections show how each of these approaches allows for a reduction in the amount
of physical resources on the LAN.
Virtual Machines
The use of virtual machines enables the running of multiple independent logical servers on a
single physical server. With virtual machines, the needed hardware for the virtual machine to run
is emulated by the virtual machine-hosting application. The use of virtual machines offers
several advantages:
• Allows for the reduction in the number of physical systems on the network
• Provides hardware independence—a virtual machine can be migrated from one physical
host to another without significant driver updates
• Enables an organization to run legacy application servers on newer, more reliable
hardware
Simplified Recovery
When restoring a Windows server backup on a system containing different hardware, it can be
difficult to get the restored backup to boot on the hardware. This difficulty is usually attributed to
the characteristics of the Windows System State. When a System State backup is run on a
system, the system’s registry, device drivers, boot files, and all other system files are collectively
backed up. When the System State is restored, all of these files come back as well. This behavior
can cause problems if an administrator is restoring to different hardware, which is usually the
case when a restore is needed to recover from complete system failure or a disaster. Once the
restore operation writes all of the old registry, boot, and drive settings to the new system, odds
are that the system will blue screen the first time it boots. Sometimes recovery from these
problems lasts only a few minutes. However, for some administrators, this process may take
several hours.
In virtualizing system hardware, restoring a virtual machine to another virtual machine running
on a separate host system should be relatively problem free because the hardware seen by the OS
on each system will be nearly identical. Thus, portability is another benefit provided by virtual
machines in production environments.
32
Chapter 2
Major Drawbacks
As a result of the advantages of virtual machines, some administrators rushed to fully virtualize
their complete production environments, only to later return some virtualized servers back to
physical systems. The reason was the major drawbacks to running virtual machines in
production:
• Additional latency
• No reduction in the number of managed systems
Virtual machine host applications emulate system resources for their hosted OSs, so an
additional access layer exists between virtual machines and the resources they use. Some of the
latency encountered by virtual machines can be reduced by having them connect directly to
physical disks instead of using virtual disk files. However, on CPU-intensive applications, the
latency is still noticeable.
The other hidden drawback to consolidation through the use of virtual machines is that it does
not reduce the number of managed systems on the network—instead it can increase that number.
For example, if an administrator plans to consolidate 24 servers to virtual machines running on
three hosts, there will be 27 servers to manage—the 24 original systems as well as each of the
three virtual machine host servers. Therefore, although virtual machines enable fewer physical
resources in the server room, there will still be the same number or even more servers that will
require software, OS, and security updates.

Shared data clusters have emerged as a way to escape from the boundaries of server
consolidation through virtual machines. Unlike with virtual machines, deploying shared data
clusters can provide the ability to reduce both the number of physical systems and managed OSs
on the network.
A major difference between shared data clusters and virtual machine applications is in their
approach to consolidation. Shared data clusters are application centric, meaning that the clustered
applications drive the access to resources. Each application, whether a supported database
application or file server, can directly address physical resources. Also, in being application
centric, the consolidation ratio doesn’t need to be 1-to-1. Thus, 24 production file servers could
perhaps run on four cluster nodes. As the virtualized servers exist as part of the clustering
application, they are not true managed systems. Instead, there would be only four true servers to
update and maintain. This option offers the benefit of physical server consolidation at a highly
reduced cost of ownership.
33
Chapter 2
Shared data clusters also offer the following advantages:

• Failover support—If a cluster node fails, an application can move to another node in the
cluster, thus maintaining data availability after a system failure
• Shared data support—Shared data clustering provides the ability for data sharing between
hosted file serving and database applications
• Load-balancing support—Client connections can be distributed among multiple cluster
nodes; with traditional failover clustering, each virtual server entity is hosted by a single
system
• Simplified backup—With shared data support, backups can be driven through a single
cluster node, thus simplifying backups and restores as well as reducing the number of
required licenses for backup software
Comparing Virtual Machines and Shared Data Clusters

To make sense of these two approaches, Table 2.1 lists the major differences between server
consolidation via virtual machines and by deploying shared data clusters.
Administrative Task Virtual Machines Shared Data Clusters
Reduce the number of Each virtual machine must still be The number of managed systems
managed systems independently updated and is significantly reduced, with the
patched; the number of managed cluster nodes representing the
systems will likely increase as total number of managed systems
virtual machine hosts are added
Consolidate legacy file Supported Supported
servers to a single
physical system
Failover Yes, via installed OS or third-party Yes
application
Virtualization software Up to 25 percent None
overhead
Single point of failure Potentially each virtual machine No
Ability to share data No Yes
Backup and recovery Each virtual machine must be Shared storage resources in the
backed up independently SAN can be backed up through a
single node attached to the SAN
Support for legacy Yes No
applications
Table 2.1: Virtual machine vs. shared data clusters.
34
Chapter 2
As this table illustrates, server consolidation via shared data clusters offers several advantages
over consolidation with virtual machines. With shared data clusters, you wind up with fewer
managed systems, no additional CPU overhead, the ability to share data between applications, no
single point of failure, and far fewer managed systems. However, application support is limited
to what is offered by the shared data cluster vendor. Virtual machines can host nearly any x86
OS and thus are well-suited for consolidating legacy application servers, such as older NetWare,
Windows NT, or even DOS servers that have had to remain in production in order to support a
single application.
When looking to use virtual machines or shared data clusters to support server consolidation, an
organization might not need to choose between one and the other. Instead, an organization
should look to use each application where it’s best suited—virtual machines for consolidating
legacy application servers, and shared data clusters for supporting file and database server
consolidation for mission-critical applications.
In consolidating to a shared data cluster, organizations not only see the benefit of fewer managed
systems but also realize the benefits of high availability and improved performance.
Examining Unappliance vs. Appliance Solutions

At the heart of server consolidation are two fundamentally different points of view—appliance
and unappliance. Each differs in its approach to both the hardware and software used in the
consolidation effort:
• Appliance—Vendor-proprietary hardware and/or software solution that provides for
server and/or storage consolidation
• Unappliance—Vendor-neutral x86-based hardware and software solution
These approaches to consolidation are significantly different, with both short-term and long-term
consequences. This section will look at the specific differentiators between unappliance and
appliance solutions:
• Proprietary vs. open solutions
• Volume economics
• Integration with existing infrastructure and investments
• Scalability
• Backup challenges
The section will begin with a look at the key differences between proprietary and open solutions.
35
Chapter 2
Proprietary vs. Open Solutions

When retooling a network, there are convincing arguments for both proprietary and open
solutions. Proprietary solutions are usually packaged from a single vendor or group of vendors
and are often deployed with a set of well-defined guidelines or by a team of engineers that work
for the company offering the solution.
A long-term benefit of a proprietary solution is that an organization ends up with a tested system
that has predictable performance and results. A drawback to this approach, however, lies in cost.
The bundled solution often costs more than comparable open solutions.
Another difference with proprietary solutions is that they may be managed by a proprietary OS.
For example, many NAS filers run a proprietary OS, so management of the filer after it is
deployed will require user training or at least a few calls to the Help desk.
Aside from the initial cost, buying a proprietary solution could cause additional higher costs in
the long run. Upgrading a vendor-specific NAS, for example, will likely require the purchase of
hardware through the same vendor. Software updates will also need to come through the vendor.
Finally, if the network outgrows the proprietary solution, an organization might find that it must
start all over again with either another proprietary solution or an open-standard solution.
The most obvious difference between open solutions and proprietary solutions is usually in cost.
However, the differences extend much further. Open solutions today are based on industry-
standard Intel platforms and enable the ability to run an x86 class OS, such as Windows, Linux,
and NetWare. For hardware support, there are several vendors selling the same products, thus
lowering the overall cost. Also, with open-standard hardware being able to support a variety of
applications and OSs, as servers are replaced, they can be moved to other roles within the
organization.
A drawback to open systems is often support. Proprietary solution vendors will frequently argue
that they provide end-to-end support for their entire solution. In many cases, several vendors
involved with a problem existing on a network using open architecture may point fingers when a
problem occurs. For example, a storage application vendor may say a problem is the result of a
defective SAN switch. The SAN vendor may go back and say that the problem is with a driver
on the application server, or that the application is untested and thus not supported.
Although finger pointing often occurs in the deployment and troubleshooting of hardware and
software on open architecture, this is often the result of a lack of knowledge of most of the parties
involved. Consider Ethernet as an example. Today, just about anyone will help assist problems with
Ethernet network troubleshooting, and can do so because this open standard has been around for
several years. The same can be said about SANs. As SANs have matured, the number of skilled IT
professionals that understand SANs has grown too. This growth leads to more effective
troubleshooting, less finger pointing, and often less fear of migrating to a SAN.
Thus, although open systems don’t always offer the same peace of mind as proprietary solutions,
their price is often enough to sway IT decision makers in their direction.
36
Chapter 2
Volume Economics
In IT circles, proprietary is almost always equated to expense. This association is perhaps the
simplest argument for going with a non-proprietary solution. With non-proprietary hardware, an
organization can choose preferred servers and storage infrastructure. Also, use of industry-
standard equipment allows for the free use any of the existing management applications on the
consolidated server solution.
In March of 2004, PolyServe studied the price and performance differences between a
proprietary NAS filer and a shared data cluster. In the study, the company found that going with
a shared data cluster over NAS resulted in 83 percent savings. To arrive at the savings,
PolyServe priced the hardware and software necessary to build a 2-node PolyServe shared data
cluster. The cost of 12.6TB storage in a SAN and two industry-standard servers with two CPUs,
2GB RAM and running Windows Server 2003 (WS2K3) and PolyServe Matrix Server was
$79,242. In comparison, two NAS files with 12.6TB of storage with the CIFS file serving option
and cluster failover option cost $476,000. If you look at the solution in terms of cost per terabyte,
the proprietary NAS appliance-based solution cost $38,000 per terabyte. The unappliance-based
solution cost $6300 per terabyte.
Integration with Existing Infrastructure and Investments

Integration is another key differentiator between appliance and unappliance philosophies.
Proprietary appliances are often limited to management tools provided by the appliance vendor.
Installation of management software on an appliance is often taboo. Many NAS appliances can
connect to and integrate with SANs to some degree, but the level of interoperability is not that of
an open unappliance-based system.
The Scalability Dilemma

With an initial investment in a NAS appliance, an organization’s needs will likely be satisfied for
the next 12 to 18 months. Many proprietary NAS appliances offer some scalability in terms of
storage growth by allowing for the attachment of additional external SCSI arrays or connectivity
to fibre channel storage via a SAN.
Scaling to meet performance demand is much more difficult for proprietary NAS. Unlike shared
data clusters on industry-standard architecture, proprietary NAS solutions may offer failover but
do not offer load balancing of file serving data. As two or more NAS appliances cannot
simultaneously share the same data in a SAN, they cannot offer true load balancing for access to
a common data store. Instead, as client demand grows, the NAS head often becomes a bottleneck
for data access. To address scalability, organizations often have to deploy multiple NAS heads
and divide data equally among them. Tools such as DFS can allow the addition of the NAS heads
to be transparent to the end users.
Unappliance-based shared data clusters do not run into the same scalability issues as proprietary
NAS appliances. With open hardware and architecture, additional nodes can be added to the
cluster and attached to the SAN as client load increases. Furthermore, with the ability to load
balance client access across multiple nodes simultaneously with a common data store, shared
data clusters can seamlessly scale to meet client demand as well.
37
Chapter 2
Backup Challenges
Many proprietary NAS appliances have not been able to work well with current open backup
practices. Instead, each vendor typically offers its own method of advanced backup functionality.
For example, both EMC and NetApp provide proprietary snapshot solutions for performing
block-level data backups. To perform a traditional backup, backup products that support NDMP
can issue NDMP backup commands to the NAS appliance. Figure 2.5 illustrates an example of
an NDMP-enabled backup.
Figure 2.5: NDMP-enabled backup.
As additional NAS appliances are added, they too could be independently backed up via NDMP.
To keep up with newer industry standard backup methods such as server-free and server-less
backups (discussed in Chapter 6), NAS appliance vendors have worked to develop their own
methods of backing up their appliances without the need of CPU cycles on the NAS head.
Not all NAS appliances can directly manage robot arms of a tape library, so often a media server
must be involved in the backup process to load a tape for the NAS appliance. Once the tape is
loaded, the NAS can then back up its data.
Figure 2.6 shows how backups can be configured on a shared data cluster. Notice in this scenario
that one of the cluster nodes is handling the role of the backup server. With two other nodes in
the cluster actively serving client requests, the dedicated failover node is free to run backup and
restore jobs behind the scenes.
38
Chapter 2
Figure 2.6: Unappliance-based shared data cluster backup.
The key to making this all work lies in the fact that all nodes in the shared data cluster can access
the shared storage simultaneously. This functionality allows the passive node to access shared
data for the purpose of backup and restore. Also, as the passive node is doing the backup “work,”
the two active nodes do not incur any CPU overhead while the backup is running.
Notice that the backup data path appears to reach the failover node before heading to the library. The
behavior of the backup data will be ultimately determined by the features of the backup software and
SAN hardware. For example, if the backup software and SAN hardware support SCSI-3 X-Copy,
backup data will be able to go directly from the storage array to the tape library without having to
touch a server.
Another advantage to backing up data in an unappliance shared data cluster is the cost of
licenses. As all nodes in the cluster could potentially back up or restore the shared data, only one
node needs to have backup software installed on it, and consequently a backup license. As a
cluster continues to scale, an even higher cost savings for backup licensing will become
apparent.
39
Chapter 2
Taming Server and Storage Growth—the Non-Proprietary Approach

With most of the industry leaning in the direction of managing growth and scalability through
non-proprietary hardware, the remainder of this chapter will focus on building a scalable file
serving infrastructure on non-proprietary hardware. Taming server and storage growth requires
consolidation with an eye on scalability and management of both file servers and storage
resources.
Storage Consolidation via SAN

For several years, SANs have been a logical choice to support storage consolidation. As SANs
share data using industry-standard protocols such as FCP and iSCSI, there are products from
several hardware vendors available to build a SAN. Because the products adhere to industry
standards, they allow for the flexibility to mix and match products from different vendors.
One of the primary goals of consolidating via a SAN is to remove the numerous independent
data islands on the network. The SAN can potentially give any server with access to the SAN the
ability to reach shared resources on the SAN. This feature can provide additional flexibility for
both data management and backups. In addition to sharing disks, an administrator can share
storage resources such as tape libraries, making it easier to back up data within the constraints of
a backup window. With SAN-based LAN-free backups, backup data does not have to traverse a
LAN in order for it to reach its backup target. To ensure that an administrator is able to perform
LAN-free backups on the SAN, the administrator will need to ensure that the backup vendor
supports the planned SAN deployment.
The use of SAN routers enables the continued use of SCSI-based storage resources on the SAN.
This may mean that to get started with a SAN, an organization will need to purchase only the
following:
• 1 to 2 HBAs for each server—Two HBAs re required for multipath support and allows
for fault tolerance in the complete SAN data path
• 1 to 2 fabric switches—Two switches are required in redundant fabrics)
• 1 router—The quantity of routers will vary depending on the number of SCSI resources
being moved to the SAN
The number of routers required for the initial SAN deployment may be increased to address concerns
such as the creation of a bottleneck at the fibre channel port of the router. Chapter 3 covers these
issues in detail.
With several servers potentially accessing the same data on the SAN, care will need to be taken
to ensure that corruption does not occur. The choice of virtualization engine or configuration of
zoning or LUN masking can protect shared resources from corruption.
40
Chapter 2
Server Consolidation via Clustering

Server consolidation via clustering offers a true benefit of reducing the total number of managed
systems on the network. In addition, consolidating to a shared data cluster provides greater
flexibility with backups, the ability to load balance client requests, and failover support.
Shared data clusters offer the benefit of scaling to meet both changes in client load and storage.
Because they run on industry standard hardware, scaling is an inexpensive option. Furthermore,
consolidating to shared data clusters offers significant savings in cost of ownership and software
licensing. Consolidating a large number of servers to a shared data cluster results in the need to
maintain fewer OS, backup, and antivirus licenses. With fewer managed systems on the network,
administrators will have fewer hardware and software resources to maintain on a daily basis.
With substantial cost savings over proprietary NAS appliance solutions, shared data clustering
has emerged as an easy fit in many organizations.
Planning for Growth While Maintaining Freedom

When planning for growth while consolidating, there are several best practices to keep in mind.
When planning to consolidate the network, consider the following guidelines:
• Design for scalability
• Design for availability
• Use mature products
• Avoid proprietary hardware solutions
One of the most over-used words in the IT vocabulary is scalability. However, it is often one of
the most important. Scalability means that all elements of the network infrastructure restructure
should support company growth, both expected and unexpected. For example, if current
requirements warrant the purchase of an 8-port fibre channel switch, consider purchasing a 16-
port switch. Ensure that the planned SAN switches offer expansion ports so that the option is
available to scale out further in the event that growth surpasses projections.
Scalability can also be greatly aided by the use of centralized management software. Many of the
solutions previously mentioned in this chapter can help with centrally managing storage
resources and backups across the enterprise. If performance is an issue, another major scalability
concern should be in technologies that support load balancing of client access. Such technologies
alleviate single server bottlenecks that result from unexpected client growth.
41
Chapter 2
In designing for availability, redundancy is always crucial. Redundancy often starts with shared
storage on the SAN via a RAID implementation. However, adding redundancy to the physical
disks is most valuable when the disks are behind a fully redundant data path. Ensuring true
redundancy means not only protecting disks but also
• Using cluster architecture to prevent data access loss due to a system failure
• Using redundant switches in the SAN fabric and redundant HBAs
• Using multipath-compliant HBA drivers on servers to ensure that they can realize the
benefits of redundancy in the SAN
• Planning for redundant power sources to prevent data loss or corruption from a power
failure
• Adding redundancy to the LAN to prevent switch failure from interrupting data access
Using mature products refers to products that have established reputations in IT. Regardless of
how great the sales pitch, bringing an unknown product into the network is always a risk.
Products with the backing of OS vendors such as Microsoft are more likely to do what they
promise, and their vendors are more likely to be in business at both the beginning and end of the
project.
With server and storage consolidation, it’s often tempting to look to purchase a proprietary
solution for a single hardware vendor. With a SAN, there will be little problem integrating
solutions from vendors such as Brocade, McData, and Cisco Systems. However, several vendors
offer end-to-end solutions. If the proposed solution involves proprietary hardware, it’s unlikely
that the solution will work as well with the rest of the systems on the network. For example,
managing and reporting can’t be done by the tools currently be used and instead will need to be
achieved through an add-on utility from the product vendor or by using custom scripts. Most
vendors have no problem offering professional services and can write a script to handle most
tasks. However, each time support is required for the script, an organization might end up
spending additional dollars on more professional services.
As an organization begins to piece together a planned network, it must pay close attention to the
product support matrixes listed on each vendor Web site. Doing so ensures that the proposed
pieces have been tested and will work well together. For planned products not on a vendor’s
support matrix, a good practice is to negotiate a pilot period to ensure the product works with
new hardware and software.
Summary
Chapters 1 painted a picture of the current file serving landscape. This chapter looked deeper into
the picture and examined the available server and storage consolidation alternatives. Building on
this framework for the technologies that may be involved in consolidation, the next chapter will
look at how to piece these technologies together while emphasizing how to maintain high
availability and high performance.
42
Chapter 3
Chapter 3: Data Path Optimization for Enterprise File Serving

With enterprise file serving, much of the attention concerning availability and high performance
is focused on the servers themselves. However, the clients that access data on file servers face
many other obstacles and potential bottlenecks along the way. For a client’s request to reach a
file, it must traverse the network, reach the server, and, ultimately, the server must request the
file from its attached storage. This chapter will view the entire landscape of the file serving
picture. Although attention will be paid to the servers themselves, you will also see the common
problems and pitfalls involved with getting data from storage to a client.
The Big Picture of File Access

Optimizing and protecting file access involves more than just setting up fault-tolerant disks on
each file server. Figure 3.1 illustrates a simple example of the objects involved in a client
accessing a file from a server.
Figure 3.1: Objects in the data path.
For the client to open the file residing on the SAN, the client’s request will need to traverse
Switch1, Router1, and Switch2 to reach the file server. For the file server to answer the request,
the server will need to pull the file from the disk array through the fabric switch on the SAN. In
this example, each device between the client computer and the file represents both a potential
bottleneck and a single point of failure. A single point of failure is any single device whose
failure would prevent data access.
43
Chapter 3
Availability and Accessibility

Many administrators tout the fact that some of their servers have been available 99.999 percent
of the time during the past year. To make better sense of uptime percentages, Table 3.1
quantifies uptime on an annual basis.
Uptime Percentage Total Annual Downtime
99% 3.65 days
99.9% 8.75 hours
99.99% 52.5 minutes
99.999% 5.25 minutes
Table 3.1: Quantifying downtime by uptime percentage.
With 99 percent uptime, you have 1 percent downtime. One percent of 365 days yields 3.65
days. If you divided this number by 52 (number of weeks in a year), you would average being
down 0.07 days a week. This statistic equates to 1.68 hours (0.07 days × 24 hours) or 1 hour and
41 minutes of downtime per week. If you advance to the 5 nines (99.999) of availability, a server
would need to be offline no longer than 5.25 minutes a year. This statistic equates to only 6
seconds of allowable weekly downtime!
Keeping your file servers available is always important. The amount of uptime that is required
often varies by organization. For example, if no one will be accessing a file server on a Sunday,
it might not be a big deal if it is down for 8 hours. For other shops that require 24×7 access,
almost any downtime is detrimental.
When measuring uptime, most organizations simply report on server availability. In other words,
if the server is online, it’s available. Although this method sounds logical, it is often not
completely accurate. If the switch that interconnects clients to the server fails, the server is not
accessible. It might be online and available, but if it is not accessible, it might as well be down.
Thus, uptime percentages can be misleading. Having a server online is only valuable when the
data path associated with the server is online as well. To truly deploy a highly available file
server, the data path must be highly available as well. For high-performance file serving, the
same logic holds true. To meet the performance expectations of the server, the data path must be
able to support getting data to and from the server at a rate that—at a minimum—meets client
demands.
The next sections in this chapter provide examples of how to ensure availability and performance
in the following data path elements:
• Redundant Storage
• SANs
• LAN switches and routers
• Servers
• Power
Let’s start by looking at how to add redundancy and performance to storage resources.
44
Chapter 3
Redundant Storage
Redundant storage can offer two elements that are crucial to high-performance and high-
availability file serving. In configuring redundant storage, or Redundant Array of Independent
Disks (RAID), you can configure two or more physical disks to collectively act as a single
logical disk. This combination can result in both better performance and fault tolerance. In this
section, you will see the most common types of RAID configurations as well as their relation to
improving enterprise file serving.
RAID Levels
RAID levels are described by number, such as 0, 1, or 5. The most common RAID
implementations today are:
• RAID 0
• RAID 1
• RAID 5
• RAID 0+1
• RAID 1+0
• RAID 5+0
Let’s start with a look at RAID 0.
RAID 0
RAID 0 is not considered to be true RAID because it does not offer redundancy. Because of this,
RAID 0 is often combined with other RAID levels in order to achieve fault tolerance.
Although not fault tolerant, RAID 0 does offer the fastest performance of all RAID levels. RAID
0 achieves this level of performance by striping data across two or more physical disks. Striping
means that data is being written to multiple disks simultaneously. All the disks in what is known
as the stripe set are seen by the operating system (OS) as a single physical disk. RAID 0 disk
striping is depicted in Figure 3.2.
Figure 3.2: RAID 0 operation.
45
Chapter 3
To understand how RAID works, consider the following example. Suppose you wanted to store
the word “get” in a RAID 0 array containing three disks. Now picture each disk as being a cup.
Since “get” has three letters, a single letter would be stored in each cup. With the letters evenly
spaced out, you could theoretically drop all the letters into the three cups simultaneously. This
scenario illustrates the advantage of RAID 0—it’s fast. The problem, however, is if one of the
cups (disks) is lost or damaged, all data is lost.
Because of its lack of fault tolerance, RAID 0 is not considered to be a good fit for file serving.
Its raw performance makes it ideal for high-speed caching but makes it risky for storing critical
files. As you will see later in this chapter, RAID 0 can be combined with other RAID levels to
achieve fault tolerance. This setup can give you the best of both worlds—speed and resiliency.
The remaining RAID levels discussed in this section offer fault tolerance at the expense of some
of the performance found in RAID 0.
RAID 1
RAID 1 is the first fault-tolerant RAID level, and is available in two forms:
• Disk mirroring
• Disk duplexing
With both disk mirroring and disk duplexing, two or more physical disks provide redundancy by
having one or more disks mirror another. Data written or deleted from one disk in the mirror set
is automatically written or deleted on all other disks in the set. With this approach, fault tolerance
is ensured by having redundant copies of the same data on several disks. The failure of a single
disk will not cause any data loss.
The disk mirroring and disk duplexing implementations of RAID 1 differ in how the physical
drives are connected. With disk mirroring, the physical disks in the mirror set are connected to
the same disk controller. With disk duplexing, the physical disks in the mirror set are connected
using at least two disk controllers. Disk duplexing is the more fault tolerant RAID 1
implementation because it eliminates a disk controller as a single point of failure.
When a file is saved to a two-disk RAID 1 array, it is written to both disks simultaneously. Thus,
the actual write operation does not complete until the file is finished being written to the slowest
disk. The result of this architecture is that the actual performance of the RAID 1 array will be
equal to the speed of the slowest disk.
RAID 1 is ideal when you are looking for an easy means to ensure data redundancy. RAID 1
automatically creates a mirror of disks, so you are getting a continuous online backup of data.
This setup allows for little to no data loss in the event of a disk failure.
RAID should not be considered a substitute for backing up data. Although fault-tolerant RAID protects
against the failure of a single disk, it does not protect against data corruption or disasters. Because of
this shortcoming, regular backups to media stored offsite should still be performed.
The one disadvantage to RAID 1 is that you have to purchase at least twice the amount of disk
space for the data you want to store, depending on the number of disks in the RAID 1 mirror. If
you are planning to configure two disks to mirror each other, remember that one disk will work
exclusively as a backup. For example, a 100GB RAID 1 volume consisting of two physical disks
would need a total of 200GB of storage (two disks × 100GB).
46
Chapter 3
RAID 5
RAID 5 operates similar to RAID 0 by striping data across multiple disks. However, it differs in
the following ways:
• RAID 5 uses parity to achieve fault tolerance
• RAID 5 requires three or more physical disks (RAID 0 only requires two disks)
With each write, RAID 5 writes a parity bit to one disk in the array. This functionality allows a
RAID 5 array to lose a single disk and still operate. However, if a second disk in the array fails,
all data would be lost. This loss would result in having to rebuild the array and restore data from
backup. In terms of performance, RAID 5 is slower than RAID 0, but outperforms RAID 1.
As RAID 5 uses parity to provide fault tolerance, you must consider the storage of the parity data
when sizing a RAID 5 array. Parity writes will effectively take up one disk in the array. Thus,
with a three-disk array, two disks would store actual data and the third disk would store the
parity bits. If you built a RAID 5 array with three 100GB disks, you would have 200GB of
available storage, enabling you to store actual data on 67 percent of your purchased storage. If
you add disks to the array, the efficiency of the array improves. For example, with four disks in
the array, three disks would store data and one disk would store parity bits, giving you 75 percent
utilization of your disks.
RAID 5 has been very popular for enterprise file serving because it offers better speed than
RAID 1 and is more efficient. Although it is slower than RAID 0, the fact that it provides fault
tolerance makes it desirable.
RAID 0+1
RAID 0+1 arrays provide the performance of RAID 0 as well as the fault tolerance of RAID 1.
This configuration is commonly known as a mirrored stripe. With RAID 0+1, data is first striped
to a RAID 0 array and then mirrored to a redundant RAID 0 array. Figure 3.3 shows this process.
47
Chapter 3
Figure 3.3: RAID 0+1 operation.
RAID 0+1 is configured by first creating two RAID 0 arrays and then creating a mirror from the
two arrays. This approach improves performance, but the inclusion of RAID 1 means that your
storage investment will need to be double the amount of your storage requirement. Assuming
that the illustration that Figure 3.3 shows uses 100GB disks, each RAID 0 array would be able to
store 300GB of data (100GB × three disks). As the second RAID 0 array is used for redundancy,
it cannot store new data. This setup results in being able to store 300GB of data on 600GB of
purchased disk storage.
An advantage to RAID 0+1 is that it offers the performance of RAID 0 and provides fault
tolerance. You can lose a single disk in the array and not lose any data. However, you can only
lose one disk without experiencing data loss. If you’re looking for better fault tolerance, RAID
1+0 is the better choice.
48
Chapter 3
RAID 1+0
RAID 1+0 (also known as RAID 10) combines RAID 1 and RAID 0 to create a striped set of
mirrored volumes. To configure this type of RAID array, you first create mirrored pairs of disks
and then stripe them together. Figure 3.4 shows an example of this implementation.
Note that the RAID 1+0 configuration is exactly the opposite of RAID 0+1. With RAID 1+0, the
RAID 1 arrays are first configured. Then each mirror is striped together to form a RAID 0 array.
A major advantage of RAID 1+0 over RAID 0+1 is that RAID 1+0 is more fault tolerant. If there
are a total of six disks in the array, you could lose up to three disks without losing any data. The
number of disks that can fail is determined by where the failures occur. With RAID 1+0, as long
as one physical disk in a mirror set in each stripe remains online, the array will remain online.
For example, the array that Figure 3.4 shows could lose disks four, two, and six and remain
operational. As long as one disk in each RAID 1 mirror remains available, the array will remain
available as well.
RAID 1+0 is similar to RAID 0+1 in terms of storage efficiency. If each disk in the array shown
in Figure 3.4 is 100GB in size, you would have 600GB of total purchased storage but only
300GB of writable storage (due to the RAID 1 mirroring). If you’re looking for better storage
efficiency at the sake of a little speed, RAID 5+0 might be a better option.
RAID 5+0
RAID 5+0 is configured by combining RAID 5 and RAID 0. This array would be configured by
first striping data across RAID 5 volumes. This setup is more efficient than RAID 1+0 because
only a fraction of your storage investment is lost, instead of half the investment. Figure 3.5
shows this RAID type.
49
Chapter 3
Compared with RAID 5, RAID 5+0 provides faster read and write access. However, a problem
with RAID 5+0 is that if a drive fails, disk I/O to the array is significantly slowed. Unlike RAID
5, RAID 5+0 is more fault tolerant because it can withstand the loss of a disk in each RAID 5
subarray. With the array that Figure 3.5 shows, both disks one and five could fail and the array
would still remain online. If disks four and five failed, the array would go down.
RAID 5+0 sizing is similar to sizing a RAID 5 array, except that a disk in each RAID 5 subarray
is used for parity. If the array in the figure had 100GB disks, you would have 600GB of storage
space in the array. Of the 600GB, you could write data to 400GB because you give up one
100GB disk in each of the subarrays to parity. As with RAID 5, adding disks to each subarray
would provide better storage efficiency.
50
Chapter 3
Hardware vs. Software RAID

As you can see, there are several methods for improving storage performance and fault tolerance
for your file servers. When looking to configure each of these RAID levels, you have two
general choices—hardware RAID and software RAID. Hardware RAID volumes are set up using
a hardware RAID controller card. This setup requires the disks in the array to be connected to the
controller card. With software RAID, disks are managed through either the OS or a third-party
application. Let’s look at each RAID implementation in more detail.
Hardware RAID
Hardware RAID controllers are available to support all of the most common disk storage buses,
including SCSI, Fibre Channel (FC), and Serial ATA (SATA). Hardware RAID is advantageous
in that it’s transparent to the OS. The OS only sees the disk presented to it by the RAID
controller card. Hardware RAID also significantly outperforms software RAID because no CPU
cycles are needed to manage the RAID array. This management is instead performed by the
RAID controller card.
Another advantage of hardware RAID is that it is supported by most clustering implementations,
whereas most software RAID configurations are not supported by cluster products. To be sure
that your configuration is compatible, you should verify that the hardware RAID controller and
disk products have been certified by your clustering product vendor.
Sometimes hardware alone is not the only compatibility issue. Be sure to verify that your installed
hardware is using firmware and drivers that have been certified by your clustering product vendor.
Most of the major RAID controller vendors post technical manuals for their controllers on their
Web sites. This accessibility makes it easy to configure RAID controllers in your file serving
storage infrastructure.
SCSI RAID
Most of the hardware RAID implementations in production today are in the form of SCSI RAID.
Although FC and SATA have gained significant ground in recent years, SCSI still has
maintained a large portion of market share. Among the most popular SCSI RAID controller
vendors are Adaptec, QLogic, and LSI Logic. Each of these vendors offer products with long-
standing reputations and excellent support.
FC RAID
With FC RAID, disk arrays can be configured as RAID arrays and attached directly to a SAN.
As SANs have continued to become an integral part of enterprise file serving, FC RAID has
risen in popularity. For example, Adaptec’s SANbloc 2Gb RAID solution can allow you to
connect FC RAID to a cluster via a SAN and can scale to 112 drives with as much as 16.4TBs of
storage. Other vendors that offer FC RAID solutions include Hewlett-Packard, Quantum, Dot
Hill, and XioTech.
51
Chapter 3
SATA RAID
SATA has been steadily growing in recent years as a cost-effective alternative to SCSI. SATA
offers data transfer rates as fast as 450MBps, depending on the SATA RAID controller and disk
vendor. Besides the lower cost, SATA also differs from SCSI in that each disk has a dedicated
serial connection to the SATA controller card. This connection allows each disk to utilize the full
bandwidth of its serial bus. With SCSI, all disks on the bus share the bandwidth of the bus.
Unlike SCSI, SATA disks are not chained together. Thus, the number of disks in the array will
be restricted to the physical limitations of the controller card. For example, the Broadcom
RAIDCore 4852 card supports eight ports and all of the popular RAID levels, including RAID
1+0 and RAID 5+0. This controller provided RAID 0 writes at 450MBps and RAID 5 writes at
280MBps during vendor tests.
Many vendors are also developing technologies that allow you to connect SATA disk arrays to
an FC SAN. For example, the HP StorageWorks Modular Smart Array (MSA) controller shelf
can allow you to connect as many as 96 SATA disks to an FC SAN. This feature gives you the
ability to add disk storage to support your file servers on the SAN at a significant cost savings
over SCSI.
Software RAID
Software RAID is advantageous in that you do not need a RAID controller card in order to
configure it. However, with many organizations deploying clustering technology to support the
demands of file serving, software RAID has not been a possibility for shared storage resources in
the cluster. There are some exceptions to this rule. For example, Symantec (formerly VERITAS)
Volume Manager can set up software RAID that is compatible with some clusters such as
Microsoft server clusters. However, most organizations that spend the money to deploy a cluster
in the first place don’t try to save a few bucks by cutting corners with RAID.
Having an OS control disks via software RAID can also result in significant CPU overhead. The
CPU loading of software RAID often makes it impractical on high-volume enterprise-class file
servers. However, some organizations that connect shared cluster storage via hardware RAID
will use software RAID to provide redundancy for the OS itself. Having the OS files mirrored
across a RAID 1 array can prevent a disk failure from taking a server down. You can also do so
with a hardware RAID controller, but if you’re at the end of your budget, you might find using
software RAID to protect the OS to be an alternative. As with hardware RAID, you still must use
multiple physical disks to configure the RAID array, so breaking a disk into partitions to build a
software RAID array is not an option.
Windows OSs natively support software RAID, which can be configured using the Disk
Management utility, which is a part of the Computer Management Microsoft Management
Console (MMC). Using Disk Management, you can configure RAID 0, 1, and 5 arrays on
Windows Server OSs. On Windows client OSs, you can only configure RAID 0 using Disk
Management.
With Linux OSs, software RAID 0, 1, and 5 can be configured using the Disk Druid tool during
a GUI installation of the OS. If the OS is already installed, you can use the Raidtools package to
configure and manage software RAID. Although increasing performance and availability of
disks through RAID is an important part of enterprise file serving, there are more elements of the
data path that must be considered as well.
52
Chapter 3
Redundant SAN Fabrics

Redundant disks are not of much value if there is only a single path to the disks through a SAN.
With this in mind, if you have protected your disk storage through RAID, you should also
strongly consider adding redundant data paths between servers attached to the SAN and the
storage resources.
Elements of the Redundant SAN

A fully redundant SAN has no single point of failure. An example of a redundant SAN fabric is
shown in Figure 3.6.
Figure 3.6: Redundant SAN fabric.
In this example, three servers that are part of a shared data cluster all share common storage in a
SAN. Fault tolerance begins with redundant FC HBAs in the servers. This setup eliminates an
HBA as a single point of failure. Each HBA connects to a separate FC switch. This way, all three
servers can withstand the failure of a switch or switch port in the SAN. Finally, the disk array
and library in the SAN are also connected to each switch.
53
Chapter 3
Although redundancy adds to the cost, many organizations with SANs have seen redundancy as
a necessity. For example, a large hospital recently deployed a non-redundant SAN to connect
their file servers to disks resources. The goal was to get better use of their existing storage
resources while streamlining backups. However, the administrator’s opinion of SANs faded
when a switch failed and as a result took down seven servers. With a FC switch serving as the
access point to SAN storage, the switch’s failure could have devastating consequences.
With a fully redundant SAN fabric, a switch failure will not equate to the failure of several
servers. Instead, it will simply be a minor hiccup. However, getting all of this to work is not as
simple as just connecting all the devices. Each host OS must be aware of the duplicate paths
through the SAN to each storage resource. For this setup to work, you will need to install the
multipath drivers for the SAN HBA. You will also need to ensure that the purchased SAN HBAs
support multipath.
Managing the Redundant SAN

All of the major SAN switch vendors offer tools to help you manage their products. For example,
Brocade’s Fabric Manager allows you to manage as many as 200 switches in a SAN. With this
product, you can make changes to all switches simultaneously or can make changes to individual
switches or even small groups. You can also configure alerting features to alert you if a failure
occurs. Other storage vendors have also jumped into the SAN management ring by offering
products that collectively manage a variety of SAN hardware devices. Symantec’s
CommandCentral is an example of software that can manage a diverse collection of storage
resources across an enterprise.
There are several products that can assist you in spotting failures on a SAN. How you deal with
failures may depend on your IT budget. Some organizations maintain spare parts on hand to
quickly resolve failures. This practice could mean having a spare HBA, switch, and FC hard
disks. This way, if a failure occurs, you can quickly replace the failed component. Once the
failed component is replaced, you can then order the replacement.
Most SAN products have built-in backup utilities. To quickly replace a failed switch and update its
configuration, you should perform frequent backups of your SAN switches. Many organizations
perform configuration backups before and after each configuration change to a SAN switch. This
practice ensures that you will always have the most recent configuration available if you need to
replace a failed switch.
54
Chapter 3
Redundant LANs
At this point, you have seen how to improve performance and fault tolerance in the data path
from a server to storage. Another element of the data path that is also crucial is the path from the
clients to the servers. This path often encompasses the LAN. A simple example of adding fault-
tolerant LAN connections to servers is shown in Figure 3.7.
Figure 3.7: Redundant server LAN connections.
The idea of redundant LAN connections is relatively straightforward. As with redundant SAN
connections, the redundant LAN illustrated uses a meshed fabric to connect each node to two
switches. This approach will make each node resilient to NIC, cable, or switch failure. With this
approach, a teamed NIC driver should be installed on each server. This installation will allow the
two NICs to be collectively seen as a single NIC and share a virtual IP address.
Aside from meshing server connections to switches, some organizations mesh connections
between switches, thus providing for additional resiliency. Figure 3.8 shows this architecture.
Figure 3.8: Redundant switched LAN connections.
55
Chapter 3
Meshing core and access layer switches can provide fault tolerance for the network backbone,
but this also requires additional management. To prevent network loops, Spanning Tree Protocol
(STP) will need to be configured on the switches. Loops occur when multiple paths exist
between hosts on a LAN. With multiple open paths, it is possible for frames leaving one host to
loop between the switches offering the redundant paths while never actually reaching the
destination host. Lops can not only disrupt communication between hosts but also flood the LAN
with traffic. When configured, STP will dynamically build a logical tree that spans the switch
fabric. In building the tree, STP discovers all paths through the LAN. Once the tree is
established, STP will ensure that only one path exists between two hosts on the LAN. Ports that
would provide a second path are forced into a standby or blocked state. If an active port goes
down, the redundant port is brought back online. This setup allows for fault tolerance while
preventing network loops from disrupting communication.
One other element of the LAN that can benefit from redundancy is routers. As most hosts on a
network route through a single default gateway, failure of a router can shut down LAN
communications. This shutdown can be overcome by using routers that support Hot Standby
Routing Protocol (HSRP) or Virtual Router Redundancy Protocol (VRRP). Both HSRP and
VRRP allow you to configure multiple routers to share a virtual IP address. This functionality
provides failover between routers. If one router fails, a second router can automatically assume
the routing duties of the first router. Although HSRP and VRRP offer similar functionality, they
differ in the fact that HSRP is Cisco-proprietary, while VRRP is an open standard.
Regardless of the protocol used, both HSRP and VRRP can allow you to eliminate a router as a
single point of failure. Of course, eliminating the router as a single point of failure comes at the
cost of having to purchase and power additional routers for redundancy.
For more information about HSRP, read the Cisco internetworking case study “Using HSRP for Fault-
Tolerant IP Routing.” This document is available at
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ics/cs009.htm.
Redundant Power
With single points of failure eliminated from the LAN, you can focus on power. Power loss,
sags, or surges can also wreak havoc on the availability of your file servers. To eliminate these
potential problems, both Uninterruptible Power Supplies (UPS) and backup generators can be
deployed. Redundant power is no secret in IT circles and has been used for quite some time. If
you’re managing enterprise-class file servers, odds are that you already have redundant power in
place.
In protecting against power failure, the UPS can sustain servers, storage, and network devices for
a short period of time. During this period, all devices could be powered down gracefully so as
not to corrupt any stored files. In organizations in which availability is crucial, the role of the
UPS is usually to maintain servers online long enough for backup generators to start. The backup
generator can sustain the critical elements of the network for hours or even days, depending on
the number of systems on the network as well as the amount of fuel available to power the
generator.
56
Chapter 3
Redundant Servers
Thus far, we have looked at how to add redundancy to power, the LAN, the SAN, and disks. The
only aspect of the information system that has been ignored to this point has been the servers
themselves. If you have gone this far to protect your file serving infrastructure, you don’t want a
motherboard failure, for example, to disrupt data access.
Adding redundancy to servers can be accomplished in a few different ways:
• Deploy shared data clusters
• Deploy failover clusters
• Deploy proprietary servers that are fully redundant
Let’s start with a look at shared data clusters.

Shared data clusters have already been fully described in Chapters 1 and 2. They provide full
failover support for file servers, allowing a virtual file server entity to move from one physical
host to another if a failure occurs. In having this ability, all hardware and software elements of a
physical server are eliminated as single points of failure.
In addition to failover support, shared data clusters offer load balancing by allowing multiple
nodes in the shared data cluster to simultaneously access files in the shared storage on the SAN.
This functionality prevents the performance bottlenecks that are common in other redundant
server and clustering solutions. Finally, with shared data clusters running on industry standard
x86-class hardware, organizations do not have to fear deploying a proprietary solution when
deciding to go with a shared data cluster.
Failover Clusters
Failover clusters, like shared data clusters, offer failover support. If one server’s hardware fails, a
virtual file server running on that server can simply move to another node in the cluster. All
major OS vendors, including Microsoft, Red Hat, and SUSE, offer clustering support with some
of their OS products, making them convenient for administrators already familiar with a certain
OS.
If performance was not an issue, failover clusters would be an ideal fault-tolerant file serving
solution. However, failover clusters lack in the ability to effectively load balance data between
hosts. Instead, failover clusters use a shared nothing architecture that allows only one node in a
cluster access to a file resource. This setup prevents failover clusters from being able to load
balance access to a virtual file server. Instead, access to the virtual file server would have to be
provided by one server at a time.
57
Chapter 3
Proprietary Redundant Servers

One final alternative to eliminating servers as a single point of failure is to deploy fully
redundant server solutions. These solutions can range in price from tens to hundreds of
thousands of dollars. On the low end, companies such as Stratus Technologies offer a server that
has fully redundant power, motherboards, CPUs, and storage. At the high end, companies such
as Network Appliance and EMC offer fault-tolerant NAS appliances. Although both EMC and
Network Appliance share a significant portion of the file serving market, their popularity has
been heavily challenged in recent years by companies such as PolyServe that offer fully
redundant high-performance file serving solutions that can run on industry-standard hardware.
Eliminating Bottlenecks
In addition to adding availability to the data path by eliminating single points of failure,
performance bottlenecks should be a key concern. As with failure points, each element in the
data path can represent a potential bottleneck (see Figure 3.9).
Figure 3.9: Potential data path bottlenecks.
58
Chapter 3
Figure 3.9 points out eight potential bottlenecks in a data path:

1. Client access switch—10Mbps uplink
2. Router—Single router connects all clients to server network segment
3. Server access switch—100Mbps uplink
4. Server NIC—100Mbps NIC
5. Server internal hardware—CPU, RAM, motherboard, and so on
6. Server FC HBA—1Gbps
7. Fabric switch—1Gbps
8. Disk array—Just a Bunch of Disks (JBOD)
Connecting clients to the server LAN through a single “router on a stick” via 10Mbps switches
can quickly slow file access performance. The term router on a stick refers to a single router that
services several LAN segments that are multinetted together. To communicate with each logical
network that resides on the multinet, the router will have multiple IP addresses on its network
interface that faces the clients.
If a 10Mbps switch connects to the router, you are already faced with all clients having to share a
single 10Mbps pipe to access server resources. These bottlenecks could be reduced or eliminated
by upgrading the client access switch to 100Mbps and with at least one 1Gbps port to uplink to
the router. If several network segments are bottlenecked at the router, you could consider
replacing the router with a Layer-3 switch. If needed, a switch with 1Gbps ports could be used. If
the server NIC is the bottleneck, it could also be upgraded to 1Gbps or teamed with a second
NIC to improve throughput.
At the server, several elements could hurt performance. If there is not enough RAM, too slow of
a CPU, or slow hard disks, performance will suffer. The expensive answer to solving server
resource bottlenecks is to replace or upgrade hardware. A more scalable solution in the file
serving arena is to configure the file servers in a shared data cluster. This solution offers the
benefit of load balancing, fault tolerance, and often results in server consolidation.
If you’re looking to document a server bottleneck, Windows and Linux offer tools to help
pinpoint a problem. On Windows OSs, System Monitor can be used to collect system
performance statistics in real time. On Linux, tools such as the Gnome System Monitor or vmstat
can allow to you query system performance.
Resource Threshold Required Action
Memory Committed bytes > Physical Add or upgrade RAM
RAM
Pages/sec > 20
Physical Disk Disk Queue Length > 2 Upgrade to a faster disk, deploy RAID
% Disk time > 90%
Processor % Processor time > 80% Upgrade CPU or add an additional CPU
Processor queue length > 2
Network Remains near 100% utilization Upgrade to a faster NIC or team NICs
Table 3.2: Common performance thresholds.
59
Chapter 3
The SAN also introduces a possible bottleneck site. Arbitrated loop SANs behave like Token
Ring LANs and thus provide shared bandwidth for all resources attached to the SAN. Thus, 10
servers attached to an arbitrated loop SAN would have to share its bandwidth. If SAN
performance is slow and you are using an arbitrated loop topology, the most effective way to
improve performance will be to upgrade to a switched fabric SAN. The same can be said for a
1Gbps SAN. If you have SAN switches and HBAs that support a maximum throughput of
1Gbps, upgrading to a newer 2Gbps or 4Gbps SAN fabric will greatly improve performance.
Finally, the disks themselves could also represent a bottleneck. Upgrading to faster disks is an
option; another alternative is to configure the disks as RAID. For example, moving from RAID 1
to RAID 5 could improve performance. Another alternative is to go to RAID 1+0.
Architectural Bottlenecks
Sometimes it’s not the individual pieces that represent the bottleneck; instead it could be the
architecture. If the file serving infrastructure is architected poorly, the bottleneck might be the
architecture itself. Two typical examples of architectural bottlenecks are single NAS heads and
single file servers.
Single NAS Head

As NAS grew in popularity, a big selling point of NAS was that you could consolidate all your
file serving resources into a single NAS. With terabytes of available storage, this option seemed
like a good idea to many at the time. However, the single NAS head presents severe scalability
and performance limitations. The NAS itself represents a single path for LAN clients to access
files. Even with teamed NICs, performance scaling is limited. When stuck with a NAS
bottleneck, many organizations find themselves adding NAS heads. At first, servers are
consolidated, but performance demands must be met by adding NAS boxes. However, adding
NAS boxes to handle file serving will likely induce additional management overhead. Thus, you
will likely need to reduce user drive mappings as well as restructure backups to accommodate
the additional server. If scaling continues, you will need to add another NAS, further
compounding the problem.
Single File Server

The single file server represents the same problems presented with single NAS heads. The lone
exception to the single file server is that it is not a proprietary solution. However, having a single
access point is still an issue. You can add NICs and certainly max out CPU and RAM resources
but could still be faced with network throughput bottlenecks in enterprise-class environments.
The answer to the performance bottleneck dilemma of both single NAS heads and single file
servers can be found in load balancing.
60
Chapter 3
Load Balancing
Traditional load balancing involves balancing a client load between multiple servers. In the
traditional load-balancing architecture, each server that participates in a load-balanced cluster
maintains its own local storage. Without shared storage, the traditional load-balanced cluster is
not a solution for file serving scalability issues. Instead, it is best suited for read-only user-
intensive demands such as front-end Web serving or as FTP download servers.
For file serving, the only true way to provide load-balanced read/write access to file system data
is through shared data clusters. In the shared data cluster, multiple servers can present a single
logical file server application to clients. This setup allows the client load to be distributed across
multiple physical servers. This approach can eliminate many of the traditional file serving
bottlenecks, including:
• Single network access point
• CPU
• RAM
• Motherboard
• HBA
With a file serving load being distributed across four server nodes, for example, you have four
times the amount of server resources to handle client demand. This flexibility can allow you to
get more out of your hardware investment and likely extend the life of your servers. In many
shops, some servers are over-utilized while others are underutilized. Consolidating file servers to
a single shared data cluster will allow you to equally use all file server resources in your
organization.
Managing the Resilient Data Path

Building out a resilient data path requires knowledge that crosses several technical boundaries.
It’s easy for a storage or server administrator to fail to take LAN performance issues into
account. Likewise, it’s easy for a switch and router administrator to disregard SAN issues. To
assist administrators in managing performance and fault-tolerance issues, many vendors are
developing products that locate and report on server performance. One such product is the HP
Performance Management Pack (see Figure 3.10).
61
Chapter 3
Figure 3.10: Discovering a system memory problem with the HP Performance Management Pack.
Tools such as this have grown in popularity because they not only alert you of performance
problems but also will recommendations for how to solve them. The Brocade and Symantec
tools mentioned earlier in this chapter are ideal for monitoring and managing SAN resources.
Other SAN hardware vendors, such as QLogic, offer similar solutions. It’s an easy trap for
administrators to invest countless dollars in hardware and then decide to forgo management
software in order to save money. The time saved by having software monitor and report on
problems within your data path will undoubtedly be worth the cost of the management software.
62
Chapter 3
Many vendors like to tout the fact that they provide an end-to-end solution. However, end-to-end
solutions often advertise to solve all problems along a data path, but rarely deliver. When
designing a high-performance fault-tolerant data path, be sure to ask yourself or any vendors
involved in the project the following questions:
• Is the network path to my file servers fault tolerant?
• Is the SAN path to the file resources fault tolerant?
• Can the planned technologies effectively load balance file access requests?
• How can I monitor and report on LAN bottlenecks and failures?
• How can I monitor and report on SAN bottlenecks and failures?
• How can I monitor and report on server bottlenecks and failures?
With acceptable answers to these questions, you should be ready to enjoy a resilient and fault-
tolerant file serving infrastructure.
Summary
Far too often with file serving, administrators focus solely on performance issues from the
servers back to storage and all but ignore the remainder of the data path. Hopefully, this chapter
has made you aware of all of the aspects of getting a file from a server to a client, and back. With
the right architecture in front of and behind your file servers, they should be able to grow and
respond to client performance as the needs of your organization evolve. Basing your file serving
infrastructure around shared data cluster architecture is the only true way to add fault tolerance
and load balancing to file server resources.
As you have seen, all the resources that you need to build a fault-tolerant and resilient file
serving data path exist today. Having knowledge of what is available as well as how to use new
resources should allow you to build a reliable file serving infrastructure within your organization.
The next chapter will look at building high-performance file serving solutions in both Windows
and Linux environments. Chapter 4 will take you through the process of building a high-
performance Windows file server and Chapter 5 will explore the process of building a highly
available and high-performance Linux file serving solution.
63
Chapter 4
Chapter 4: Building High-Performance, Scalable, and

Resilient Windows File Serving Solutions
The last chapter took more of an external look at file serving. In taking on file serving from a
data path perspective, you can see which elements of the network must be made highly available
as well as meet performance and scalability demands. This chapter takes a deep look at the role
of the OS itself—in particular, how to manage high-performance and highly available Windows-
based file servers. As technology has continued to improve, Microsoft and countless vendors
have developed new tools and methods for managing file systems. In many cases, the available
file serving tools and OS enhancements are complimentary rather than competitive.
As you navigate though this chapter, you will first see what Microsoft has done on its own to
improve file serving. You will then see the role that some of the major file serving vendors have
provided. Although awareness of Windows file serving technologies is important, it is equally
significant that you understand how these technologies can—or cannot—coexist with your
existing or planned file serving infrastructure. Let’s begin with a look at WS2K3’s file serving
enhancements.
Managing High-Performance and Availability Across a Windows

Infrastructure
With each new churn of Windows server OSs, Microsoft has continually added more file serving
features to the OS. Among the new file serving-related features of WS2K3 are:
• Virtual Disk Service (VDS)
• Volume Shadow Copy Service (VSS)
• Shadow Copies for Shared Folders
• Enhanced storage and file serving support
This section will look at the numerous technical considerations for deploying WS2K3-based file
servers. Along the way, you will see the new file-serving and storage features available in
WS2K3 as well as the factors that must be considered when integrating third-party applications
such as antivirus with Windows-based file servers.
VDS
VDS provides a method to standardize the way application vendors access disk storage resources
connected to WS2K3 hosts. In short, VDS is an application programming interface (API) that
allows third-party storage application vendors to connect to all attached disk storage resources
through a single API. The API that the third-party vendors connect to is VDS. Figure 4.1 shows
the VDS architecture.
64
Chapter 4
Figure 4.1: The VDS architecture.
Basically, VDS sits between applications and storage resources. Storage applications that support
VDS can send instructions to VDS, which in turn passes the instructions to the storage resource.
This architecture prevents storage application vendors from having to write their own drivers or
write code that provides instructions to each specific type of hardware that the vendor supports.
Instead, a single set of instructions can be passed to VDS, and VDS will take care of the rest.
Providing a common storage interface is nothing new to Microsoft. The company actually tried
providing a common interface in Windows 2000 (Win2K) for removable media resources such
as tape libraries. This service was known as Removable Storage Manager (RSM) and its
intention was similar to that of VDS in WS2K3. The primary difference with VDS is that it
provides access to disk resources instead of removable media. With RSM, many storage vendors
found the service to be moderately reliable at best and thus wound up writing their own drivers
for removable storage resources. For storage resources that were normally unsupported, the
storage vendors’ could communicate with those devices through RSM. By narrowing its scope,
VDS has proven to be much more reliable in production than RSM.
Notice that the objects underneath VDS in Figure 4.1 are listed as providers. The term provider
is used by Microsoft to describe code written by disk vendors to interface with VDS. For VDS to
send the correct instructions to a disk resource, it must communicate with the disk’s provider.
Only disk storage vendors that have written VDS providers are supported by VDS.
To take advantage of the flexibility provided by VDS, before purchasing disk storage resources, be
sure to ask the storage vendor if their products offer a VDS provider. For storage applications, be
sure to verify that the software application is written to support VDS.
65
Chapter 4
VSS
VSS is a WS2K3 service that is often confused with the Shadow Copies for Shared Folders
feature. Although WS2K3’s Shadow Copies for Shared Folders feature is a part of VSS, it is
only a piece of VSS. VSS is a service that can be utilized by backup and storage management
applications to effectively back up files that are normally open during backup. For example, to
back up a third-party database, pre-VSS, you had two choices:
• Stop the database before the backup runs and restart it after the backup completes
• Purchase a specialized backup agent that provides for online backups of the database
If your backup product supports VSS, you can back up the database as part of a normal file
system backup without taking the database offline or purchasing a backup agent. Many
enterprise shops already have backup agents for major database applications such as Oracle or
SQL, so for these applications, the news of VSS is not that significant. However, for smaller
third-party databases that do not have available backup agents, VSS provides a viable alternative
to cold backups.
VSS also has merit in file-serving applications as well. For example, if a user has an open Word
file at the time of backup, the file would probably be skipped during the backup. Pre-VSS, if you
wanted to back up open files such as these, you would need to purchase a third-party product
such as St. Bernard’s Open File Manager. If your backup product supports VSS, you can now
back up open document files without needing a third-party open file manager product.
VSS works by creating a point-in-time snapshot of an open file. Prior to the snapshot being
made, VSS first freezes write operations to the open file. All writes to an open file are stored in a
temporary cache while VSS backs up the open file. Once the backup of the file is finished, any
suspended writes to the file are then committed. This process allows the backup software to
achieve a consistent backup of an open file. Maintaining the consistency of the file through the
backup process is imperative. Changes written to a file while it is being backed up could result in
file corruption.
To take advantage of VSS, your file server needs to run WS2K3. Also, you need to ensure that
VSS is supported by your backup product vendor. All the major backup vendors such as
Symantec (formerly VERITAS) and CommVault support VSS, so it is likely that VSS is a
supported feature of your backup vendor. With Windows Backup, this feature is enabled by
default. If you want to disable VSS backups, which would cause open files to be skipped during
the backup, you can do so from the Windows Backup GUI.
When a backup is initiated from the Windows Backup GUI, you first select the data to back up,
then click Start Backup. Once you click this button, you are presented with the Backup Job
Information dialog box. If you click Advanced, you are presented with the Advanced Backup
Options dialog box (see Figure 4.2). From this point, you can select to disable VSS by selecting
the Disable volume shadow copy check box.
66
Chapter 4
Figure 4.2: Windows Backup advanced backup options.
As a general practice, VSS should be enabled unless particular files defined in the backup will be
backed up by an application-specific agent. Each backup vendor may have their own guidance for
working with VSS, so your backup product’s documentation will provide the best guidance on how to
work with VSS.
If you are executing backups using the ntbackup.exe command-line tool, the switch
/SNAP:{on | off}
determines whether VSS is used.
VSS can provide greater reliability of file server backups and allow you to back up open files
that are normally skipped. This service also provides a feature that can offload some of the day-
to-day file recovery work to the end user—Shadow Copies for Shared Folders.
Shadow Copies for Shared Folders

Shadow Copies for Shared Folders is the best-known aspect of VSS. During the initial release of
WS2K3, the Shadow Copies for Shared Folders feature was touted as one example of how
WS2K3 could lower TCO. Simply, enabling Shadow Copies for Shared Folders causes a server
to run periodic snapshots of a volume on a file server. When the shadow copy is executed, point-
in-time snapshots of each file on the shadow copy-enabled volume are created. If a user
accidentally deletes a file or wants to work with an earlier version of a file, the user can simply
restore the earlier version of the file—thus, this feature helps avert extra Help desk calls.
Although the Shadow Copies for Shared Folders feature is powerful, it will require you to train
the end user on how to recover files. This task can be difficult—many administrators still do not
know how to correctly recover shadow copied files, so expecting end users to be able to recover
their own files without any training is a bit of a stretch.
67
Chapter 4
Shadow Copies for Shared Folders Basics

Many organizations run backups of their file servers only at night, which means that users
needing an earlier version of a file must revert back to the previous day’s file. With Shadow
Copies for Shared Folders configured, you could run two to four snapshots during the day, for
example, which would provide for more options when wanting to return a file to an earlier state.
Shadow Copies for Shared Folders is not a replacement for backup but rather another tool that
can assist the productivity of users.
In every organization, there are bound to be some users that are too embarrassed to call the Help
desk and request a restore when they accidentally delete a file or want to revert back to an earlier
version of the file. In these situations, the users often recreate earlier work in an effort to save
themselves what they perceive as embarrassment. By empowering the users to recover their own
files, the users would be less likely to spend time recreating earlier work.
The backup APIs provided by Microsoft to the backup vendors do not provide the functionality to back
up any previous versions of a file secured by a shadow copy snapshot. Instead, only the most
recently saved version of a file will be copied when a Windows file server is backed up.
On WS2K3-based file servers, Shadow Copies for Shared Folders is enabled at the volume level.
Selectively enabling Shadow Copies for Shared Folders at the individual folder level is not
supported. This shortcoming is worth noting because it may play into your decision making
when deploying a new file server. Also, shadow copy recovery is only possible via shared
folders. Thus, you cannot view shadow copy snapshots of files locally using Windows Explorer.
When Shadow Copies for Shared Folders is enabled on a volume, a best practice is to reserve
another free volume on the file server for storage of shadow copy snapshots. This setup allows
for dedicated and controllable disk space for the shadow copy snapshots. Each time a snapshot is
executed, the snapshot will update a base block-level image of the volume. The image updates
are incremental in nature, so only file changes are captured in the snapshot. This method allows
the OS to save numerous snapshots of a volume on another volume of equal size.
For example, a 100GB volume with 60GB of stored data could have its shadow copy snapshots
stored on another 100GB volume. On the 100GB second disk, you may easily be able to fit ten
versions of previous volume snapshots. If the Shadow Copies for Shared Folders default settings
were used, snapshots would run twice daily (7:00AM and 12:00PM). With ten saved snapshots,
users would be able to roll back to a file as old as 5 days. If a user needed to go further back in
time, the user can request that an earlier version of the file (for example, from 2 weeks ago) be
restored.
Enabling Shadow Copies for Shared Folders Support

Enabling Shadow Copies for Shared Folders support is a relatively simple process that can be
completed in Windows Explorer. Prior to enabling Shadow Copies for Shared Folders for a
volume, you should install an additional volume on the server that can be used exclusively for
storage of shadow copy snapshots.
If enabled with the default settings, shadow copy snapshots will be stored on the same volume on
which Shadow Copies for Shared Folders is enabled. This setup is not recommended for any
performance-intensive file serving environments.
68
Chapter 4
To enable Shadow Copies for Shared Folders:

• In Windows Explorer, right-click the volume on which you want to enable Shadow
Copies for Shared Folders support, and select Properties.
• In the drive properties dialog box, select the Shadow Copies tab.
• On the Shadow Copies tab, ensure that the correct volume is highlighted, then click
Settings.
• In the Settings dialog box (see Figure 4.3), select the volume on which to store the
shadow copy snapshots from the Located on this volume drop-down menu.
Figure 4.3: Shadow copy volume settings dialog box.
• Select the No Limit radio button from the Maximum Size portion of the window to use
the entire volume for shadow copy backups, or click the Use Limit radio button and
specify the limit for shadow copy backups in megabytes.
• With the snapshot volume defined, you need to set the shadow copy backup schedule. To
do so, click Schedule.
• By default, shadow copy snapshots will run at 7:00AM and 12:00PM Monday through
Friday. As Figure 4.4 shows, in the Schedule dialog box, you can either edit the default
times and days of the week for the shadow copy snapshot schedule or add new shadow
copy snapshot times to the schedule. Keep in mind that the more events you maintain, the
more storage space is needed and thus fewer days will ultimately be able to be
maintained.
69
Chapter 4
• Once the snapshot schedule is set, click OK.

• In the Settings dialog box, click OK.
• With Shadow Copies for Shared Folders now enabled for the volume, you can create the
first shadow copy snapshot of the volume by clicking Create Now. Note that this step is
optional.
• Repeat the earlier steps to enable Shadow Copies for Shared Folders support for
additional volumes.
• After you are finished enabling Shadow Copies for Shared Folders support on the desired
volume(s), click OK to close the volume properties dialog box.
Figure 4.4: Shadow Copies for Shared Folders default schedule.
For more flexibility with shadow copy snapshots, you might want to execute snapshots using the
vssadmin.exe command-line utility. As this tool can be integrated into a script, you’ll have more
freedom in controlling when shadow copy snapshots are performed. The general syntax for using
vssadmin.exe to create a shadow copy snapshot is:
vssadmin create shadow /for=<volume name>
70
Chapter 4
The volume name parameter must be in the form of

\\?\volume{GUID}\
The GUID is the globally unique identifier for the volume. You can determine the GUID of each
volume on the server by accessing the command prompt and running vssadmin list
volumes. Listing 4.1 shows an example of using vssadmin to display volume information.
C:\>vssadmin list volumes
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line
tool
(C) Copyright 2001 Microsoft Corp.
Volume path: C:\

Volume name: \\?\Volume{a0bfd740-6772-11d9-8604-806e6f6e6963}\
Volume path: E:\
Volume name: \\?\Volume{9063d3b7-0f32-11da-bf02-505054503030}\
Volume path: F:\
Volume name: \\?\Volume{9063d3b8-0f32-11da-bf02-505054503030}\
C:\>
Listing 4.1: Querying volume GUID information using vssadmin list volumes.
Once you have the volume information, you can then use vssadmin create shadow to
create a snapshot. For example, to create a snapshot of the E drive and associated GUID shown
in Listing 4.1, you would run
vssadmin create shadow /For=\\?\Volume{9063d3b7-0f32-11da-bf02-
505054503030}\
By default, Shadow Copies for Shared Folders run as a scheduled task on the server on which it
was enabled; this setting can cause problems for clustered file servers. In a cluster, if a volume
configured to support Shadow Copies for Shared Folders fails over to another server, by default,
the task to run the shadow copy snapshot will not failover as well. If you are running Microsoft
Cluster Service (MSCS) clusters, you can accommodate periodic snapshots by adding a Volume
Shadow Copy Service Task resource to each file server cluster group. This resource will provide
failover support for scheduled shadow copy snapshots. For other third-party cluster applications,
such as the PolyServe Matrix Server, you could run the
vssadmin create shadow
command in a script to provide failover support for scheduled snapshots.
71
Chapter 4
Recovering Previous Versions of a File

Both Win2K and Windows XP client OSs can recover shadow copies. However, neither OS
provides Shadow Copies for Shared Folders support out-of-the-box. As Shadow Copies for
Shared Folders are a feature initially introduced in WS2K3, only WS2K3 natively supports
Shadow Copies for Shared Folders file recovery. To support Shadow Copies for Shared Folders
recovery on the current Windows client OSs, you can install one of the following:
• Previous Versions Client software (only supported on Windows XP)
• Shadow Copy Client Software (supported on Windows XP and Win2K with SP3 or later)
The Previous Versions Client software (Twcli32.msi) is copied to WS2K3 systems during setup
and is located in the %windir%\system32\clients\twclient folder on the server. Most
organizations prefer to deploy the Shadow Copy Client Software (ShadowCopyClient.msi)
because it supports both Windows XP and Win2K. This software is available at
http://www.microsoft.com/windowsserver2003/downloads/shadowcopyclient.mspx. Both
shadow copy client programs are packaged as .msi files, so they can be deployed throughout
your enterprise via Group Policy.
Once the Shadow Copy Client is installed on the user workstations, users will be able to recover
previous versions of their files on their own. As Shadow Copies for Shared Folders support relies
on the Common Internet File Sharing (CIFS) protocol, users must access files using CIFS in
order to recover them. CIFS access to shared folders is accomplished by using a Universal
Naming Convention (UNC) path for access. For example, to access the shared folder named
Public on the server named Eagle, the UNC path would be \\Eagle\Public.
Most organizations provide users with mapped network drives for accessing and saving files
over the network, so users with existing mapped drives that connect to network shares via UNC
can recover previous versions of a file by simply navigating to their mapped drive.
Suppose that you accidentally delete a file located on your mapped Z drive. To recover the file,
you would need to follow these steps:
• Click Start—My Computer.
• In the My Computer window, double-click the Z drive.
• In the File and Folder pane located on the left side of the Window, click the View
Previous Versions link. If you don’t see the View Previous Versions link, right-click any
open space in the right pane of the window, and select Properties. Then select the
Previous Versions tab in the folder properties dialog box.
• As Figure 4.5 shows, you will then see a list of the previous versions of the folder that
were included in a VSS snapshot.
• If you remember the last time that you had the file, you can double-click the snapshot that
occurred right before you deleted the file. If you do not know, you will need to start with
the most recent folder and work backwards until you find the file you need.
• The point-in-time snapshot of the folder will now open in a new window.
• Once the deleted file is located, you would then need to double-click it to open it.
• The file will open as read only. To permanently save the file, you will need to select the
Save As option from the document’s File menu, then select the location in which to save
the file.
72
Chapter 4
Figure 4.5: Viewing previous file versions.
At this point, the file has been successfully recovered.
You could have also copied the file and pasted it to its original location instead of opening the file and
then saving it to the original location.
Although user training is an inevitable aspect of Shadow Copies for Shared Folders support,
using this feature can eliminate a significant number of Help desk calls for requests to restore
accidentally deleted files. Also, as snapshots are incremental in nature and complete in a
relatively short period of time, running shadow copy snapshots during business hours will
provide for more alternatives (than the previous night’s backup) for users needing to revert to an
earlier file version or recover a deleted file.
73
Chapter 4
Enhanced Storage and File Serving Support

Several storage features are also included to WS2K3 that help to better support file serving.
According to Microsoft, several architectural improvements to WS2K3 result in CIFS
performance improvements of 250 percent over systems running Windows NT 4.0. NFS
improvements made to Microsoft Services for UNIX with WS2K3 resulted in performance
improvements 1500 times faster than with NT. There are also several additions to WS2K3 that
result in improved quality of life for both users and administrators:
• Improved multipath I/O support
• STORport driver model support
• iSCSI support
• Improved offline files support
Multipath I/O Support

The importance of multipath I/O to the reliability of a SAN was stressed in Chapter 3. With
WS2K3, Microsoft also realized this importance and worked with the major SAN HBA vendors
to help them certify multipath drivers for WS2K3. Again, with multipath drivers, WS2K3-based
file servers can access storage resources in a SAN with a resilient data path between the servers
and storage. Thus, if one fibre channel link goes down, having the multipath driver installed will
ensure that access to the resources will seamlessly fail over to the next available data path.
STORport Driver Support

With first generation SANs, miniport drivers were used by HBAs to interface with SAN
resources. The primary drawback to miniport drivers is that they were designed for SCSI and
ATA storage devices. Thus, an OS connecting to storage devices in a fibre channel SAN using
an HBA with miniport drivers could not take advantage of any of the new features that separated
fibre channel from SCSI. Instead, fibre channel devices for the most part were treated as SCSI
devices. With STORport driver support, the OS can take full advantage of fibre channel storage
devices and are not constrained by the limits of SCSI.
iSCSI Support
Although iSCSI storage networks still significantly lag behind fibre channel in terms of industry
acceptance and market share, this technology has still been growing steadily in popularity. iSCSI
was first supported with the WS2K3 release; however, Microsoft requires that WS2K3 SP1 be
installed on WS2K3-based failover clusters for full 8-node iSCSI cluster support.
For more information about iSCSI support for Windows, see the Microsoft Storage Technologies
iSCSI page at http://www.microsoft.com/windowsserver2003/technologies/storage/iscsi/default.mspx.
74
Chapter 4
Improved Offline Files Support

The use of offline files can be beneficial to both users and file servers. With offline files, you can
configure a user’s system to store a local copy of select files that reside on a file server. For
mobile users, the benefit is obvious—the user will have access to the same files the user sees
when he or she is in the office. When the user returns, his or her locally stored offline files will
synchronize with the copies of the files that reside on the file server.
However, offline files are not just beneficial for mobile users. For example, suppose a local user
edits numerous documents and presentations on a file server each day. Although the user is
editing the documents, they are remaining open on the file server. Each save is being saved
directly to the file server. Eventually, this standard approach to file serving could limit the file
server’s scalability. With offline files, you could configure offline file caching for permanent
local users in addition to mobile users. This way, when a local user is editing a document, the
user is doing so locally on his or her workstation. This practice would offload a substantial
amount of work from the file server.
By default, Windows users can set their offline files settings. This ability allows offline files (on
the client side) to run transparent to the file server. To optimize performance of offline files, you
can edit the Offline Settings of any shared folder on the file server. To do so:
• Open Windows Explorer, and locate the shared folder that you want to modify.
• Right-click the shared folder, and select Properties.
• In the Properties dialog box, select the Sharing.
• On this tab, click Offline Settings.
• For optimal file server performance, select the All files and programs that users open
from the share will be automatically available offline radio button. Make sure the
Optimized for performance check box is selected (see Figure 4.6), and click OK.
• Click OK to close the folder properties dialog box.
Figure 4.6: Optimizing offline files for best server performance.
75
Chapter 4
With the Optimized for performance check box selected, both files and programs will be locally
cached on each workstation. With program files, clients will only need to download updates or
changes to the file that is based on the server. Otherwise, the client system will run the program
using its locally cached copy.
With offline files enabled on a shared folder, server performance can be substantially improved.
Keep in mind, however, that offline files is not a one-size-fits-all solution. Your choice of shared
folders to configure for offline support should be driven by the needs of the network as a whole.
For example, offline files are not well suited for organizations in which users are not consistently
logging on from the same workstation. In these situations, waiting for offline files to synchronize
on each different workstation from which a user accesses the network would be extremely
frustrating.
For shared folders that are accessed by users who always log on from the same computer,
optimizing those folders for performance using offline files can be beneficial. Keep in mind,
however, that users will need adequate disk space on their workstations to support the offline
files caching.
By default, cached offline files are stored on the client’s system drive. This location can be changed
by running the WS2K3 resource kit Cachemov.exe tool. For example, to locate offline files on the
client’s E drive, you would run
cachemov –unattend e:\
At the client level, users have the ability to select which folders they want to make available
offline:
• Right-click the shared folder or mapped network drive, and select Make Available
Offline.
• When the Offline Files Wizard opens, click Next.
• Click Finish.
After you click Finish, the wizard will then synchronize the Windows XP workstation with the
file server. This process creates a local cache of the files stored on the file server. You can
change the way Windows XP works with offline files by following these steps:
• Open Windows Explorer, select the Tools menu, and select Folder Options.
• In the Folder Options dialog box, select the Offline Files tab.
• Aside from enabling or disabling offline files support, you can also set the amount of disk
space that is reserved for offline file caching. By default, 10 percent of the drive is
reserved. Figure 4.7 shows the available offline files options.
• Once you have set the appropriate options, click OK.
76
Chapter 4
Figure 4.7: Windows XP client offline files options.
User-based offline files settings can be configured using a Group Policy Object (GPO). Offline files
settings can be found in a GPO in the User Configuration, Administrative Templates, Network, Offline
Files or Computer Configuration, Administrative Templates, Network, Offline Files portion of a GPO.
By default, only the folder that a user selects to be available offline is cached (and not its subfolders).
This behavior can be changed so that subfolders are included by editing the Computer Configuration,
Administrative Templates, Network, Offline Files, Subfolders Always Available Offline GPO.
77
Chapter 4
Another issue that often gets in the way of successful offline files deployments is that whenever
a client workstation connects to a network, offline files will try to synchronize. As many mobile
users are often connecting to WiFi hotspots, they will probably find this feature to be annoying.
This behavior can be changed by performing the following steps on the client:
• Click Start, All Programs, Accessories, Synchronize.
• In the Items to Synchronize dialog box, click Setup.
• You can limit offline file synchronization by selecting the appropriate network
connection in the When I am using this network connection drop-down menu in the
Synchronization Settings dialog box (see Figure 4.8).
• You can also force Windows to prompt users before synchronizing by selecting the Ask
me before synchronizing the items check box.
• Once finished setting the synchronization settings, click OK.
• Click Close to close the Items to Synchronize dialog box.
Figure 4.8: Configuring offline file synchronization settings.
As you can see, using offline files can provide more flexibility in dealing with both server
performance scalability as well as mobile clients. Next, we’ll look at how Microsoft is
structuring its existing technologies to support enterprise file serving.
78
Chapter 4
The Microsoft Approach to High-Availability File Serving

Microsoft’s approach to high-availability file serving is twofold. Microsoft addresses high-
availability and performance concerns by offering the following technologies with the
company’s OSs:
• Server clusters (also known as failover clusters)
• Distributed File System (DFS)
• Active Directory (AD)
Let’s look at how Microsoft applies these technologies to both high-availability and high-
performance file serving.
MSCS
Microsoft offers server clustering support with WS2K3 Enterprise and Datacenter OSs. Server
clustering has been available with Microsoft’s enterprise-class server OSs since NT. Microsoft
Cluster Server (MSCS) supports clusters as large as 8 nodes in size and utilizes a shared nothing
cluster architecture.
In using a shared nothing as opposed to a shared disk architecture, only one node in the cluster
can access a physical disk at a time. This limitation results in the cluster supporting failover but
not load balancing. Microsoft’s Network Load Balancing cannot run in conjunction with
Microsoft’s server cluster service and is primarily geared to users and servers accessing read-
only data. The reason is that with the Microsoft Load Balancing cluster model, each node in the
cluster maintains its own separately managed storage resources. This setup makes the load
balanced cluster impractical for typical file-serving applications.
With failover support, Microsoft server clusters provide fault tolerance. If one node in the cluster
fails, another node can host the virtual server that originally ran on the failed node. Although
failover support is essential for mission-critical applications, the limitations of the failed nothing
architecture hamper the scalability of Windows clusters as file servers. As only one physical
node in the cluster can access a shared disk at a time, the single cluster node can, in time, become
an I/O bottleneck as user demand increases. To solve this problem, Microsoft offers two
solutions: split the file-serving duties across two clusters or deploy DFS.
Many organizations move to a cluster-based file serving architecture in an effort to better support
consolidation, availability, and performance. If a cluster is divided in two to support performance
demands, you’re starting a backward trend in which you’re adding managed systems to the
network. The second solution offered by Microsoft is to run DFS on top of the MSCS.
79
Chapter 4
DFS
DFS has increasing grown in popularity as a complimentary file serving technology because it
offers the following advantages:
• Maintain consistent replicated file data between two hosts
• Provide transparent access to network resources for users and applications
• Provide load balancing for file access across replica links
Figure 4.9 illustrates an architectural example of using DFS to add performance load balancing
to Microsoft server clusters.
Figure 4.9: Running DFS on top of two MSCS clusters.
In theory, you could set up DFS to replicate data between two server clusters. As DFS load
balances requests across replica links in a round-robin fashion, you would have a mechanism to
offer load balanced fault tolerant data access. However, many in the field have found File
Replication Service (FRS), the replication engine behind DFS replication, to be unreliable at
best. If you are serious about using DFS as a means to balance access across multiple file server
clusters, strongly consider investing in a third-party product to guarantee reliable replication of
the data. For example, NuView’s StorageX provides this capability and would allow you to
manage a reliable DFS architecture across your enterprise.
80
Chapter 4
AD Integration
AD integration of Windows applications and services started with Win2K and has become even
stronger in WS2K3. The AD database is extensible, so any vendor could add AD schema
extensions to their products to allow their products to integrate with AD.
Microsoft has been pushing AD integration of its products for several years, and many
organizations have started to buy into this philosophy. On the file-serving side, for example, you
could publish all your shares in AD using the Active Directory Users and Computers Microsoft
Management Console (MMC) snap-in. To provide a granular view of shares, you could publish
shares into specific organizational units (OUs) based on the users or departments that will need
to access them. By performing a simple directory browse through My Network Places, users or
administrators can browse to an OU in the directory to view all the published shares.
A good practice is to publish shares into AD at the OU level and create desktop shorts for users
to their respective OUs. This way, when they look inside their OU folder, they see only the
shares that you have published for them. This setup prevents users from randomly browsing the
network for resources and can add a level of access transparency similar to DFS. If a shared
folder’s location moves, you will just have to update the published object’s UNC path in AD.
The movement of the share will be transparent to the user.
Although Microsoft may have a home court advantage with its file servers, other commercial
product vendors are offering competing high-performance and high-availability file-serving
solutions.
Commercial File Serving Solutions

Today, countless products exist for providing file-serving solutions in the Windows space.
Among the sea of available products, two major vendors stand out: PolyServe and Symantec
(formerly VERITAS). This section will look at each of these product offerings as alternatives to
existing Microsoft solutions.
PolyServe NAS Cluster

PolyServe NAS Clusters, like MSCS clusters, offer failover support for file-serving applications.
However, PolyServe’s approach to clustering is significantly different in that that it offers true
shared data clustering. Unlike MSCS, PolyServe’s Matrix Server platform does not employ a
shared nothing architecture, meaning that several servers can access files on the cluster’s shared
storage simultaneously. This setup provides for true load-balancing support in addition to the
failover support found with MSCS.
Unlike traditional NAS products, PolyServe’s NAS cluster is designed to run on a standard OS,
such as WS2K3; it also runs on industry-standard hardware. With this approach, no significant
hardware investments are needed to deploy this technology. Remember that even with 8-node
scalability, MSCS clusters provide only a single access head for a single virtual file server. Thus,
the virtual file server cannot take full advantage of the processing power of all 8 nodes with
MSCS and is instead relegated to using the horsepower of a single node.
81
Chapter 4
With the PolyServe architecture, multiple nodes can host virtual server resources simultaneously,
so you can have true load-balancing support as well as a much easier scalability model. Earlier, it
was mentioned that MSCS could scale by splitting a cluster into two clusters and possibly
configuring DFS to run on top of them. This solution will require that you effectively double
your hardware and OS investment in order to meet the scalability need. With the PolyServe
architecture, a performance problem can be managed by simply adding one more server to the
cluster or by reallocating some processing to an underutilized node that is already in the cluster.
Symantec Cluster
Before the growth in popularity of the PolyServe Matrix Server clustering architecture,
VERITAS (now Symantec) stood as the leading cluster service provider in the market. Similar in
Microsoft’s approach to clustering, Symantec’s product enables only a single node in the cluster
to access a file. Thus, this architecture cannot provide true load balancing. However, VERITAS
Cluster Server for Windows does offer an intelligent agent that can dynamically move a virtual
server in the cluster to an underutilized node. VERITAS clusters can also scale to 32 nodes,
which allows for significantly more growth within a single cluster compared with MSCS.
Current Trends in Windows File Serving

Today, Windows-based file serving has been dominated by two major trends: server
consolidation and storage consolidation. This section will explore the impact of these two trends
on the Windows file-serving landscape.
Benefits of Consolidation
Server sprawl, which was a common trait of the late 1990s and early 2000s, led to the movement
of server consolidation. The major problem with server sprawl was that it led to increased TCO
of nearly all IT entities. More servers equated to more management, more software licensing,
and additional hardware and power costs.
As the availability of high-performance system hardware improved, consolidation became an
easy sell. For example, if an organization can consolidate from 250 servers to 80 servers without
sacrificing performance, the TCO savings would easily reach several hundred thousand dollars a
year. For network administrators, the argument for consolidation is more than just about money.
Having fewer servers to maintain can mean less trips into the office at 11:00 PM on a Saturday
night. With fewer systems to manage, administrators have fewer systems to repair and upkeep.
The organization as a whole has significantly less software licenses and service contracts to
maintain.
In essence, consolidation really means utilizing your server resources to their full potential.
Instead of running 12 servers with an average CPU utilization of 6 percent, why not consolidate
to a single server and take full advantage of the one server’s CPU resources?
Keep in mind that consolidating to virtual machines running on a commercial VM host product such
as VMWare ESX Server or Microsoft Virtual Server could lead to fewer physical systems but likely will
not reduce your number of managed systems or related software licensing. File server consolidation
is best achieved via clustering, which can reduce the number of both physical systems and managed
systems.
82
Chapter 4
Benefits of Shared Storage

Consolidating storage resources is another major file serving trend. With the growth of managed
data, maintaining data availability has become an increasingly more challenging problem to
manage. In consolidating storage resources to a SAN, storage can be allocated as it’s needed,
which will give you a greater return on your storage investment by not wasting resources through
over-allocation. In addition, consolidating to a SAN opens more backup and data management
possibilities.
Chapter 6 will discuss storage scenarios in more detail.
Deploying Enterprise-Class Windows File-Serving Solutions

As we’ve explored, there are many solutions at your fingertips. To help determine the right
solution for your environment, let’s look at some general guidelines for deploying Windows file-
serving solutions.
Pre-Deployment Considerations
The tendency of IT administrators is often to deploy first and customize later. For those that
practice this approach, planned customizations take months or even years to complete. With the
file server deployed and operational, justifying spending additional time on the project may be
difficult—especially with the myriad tasks already on the IT staff’s list.
To deploy a file server right the first time, planning has to be an important part of the process.
One major part of the planning process is deciding which technologies should be used to
compliment the file server. Table 4.1 lists the most common file serving problems as well as the
available technologies that can alleviate or manage the potential problems.
File Serving Problem Solution
Improve availability of file versions Deploy and configure Shadow Copies for Shared Folders
Limit user usage of file server Deploy and configure disk quotas
resources
Provide failover support Deploy and configure a server cluster
Provide failover and load-balancing Deploy a PolyServe Matrix shared data cluster
support
Provide offline access to data Deploy and configure Offline Files
Provide antivirus protection Deploy antivirus solution that is compatible with any installed file
serving applications as well as your backup product
Prevent unauthorized access Determine the necessary permissions for each user or group that
has access to the server
Table 4.1: Solutions for the most common file serving deployment problems.
Unsupported antivirus products can significantly degrade file server performance by triggering a scan
of each file during a file server backup. Your backup product vendor should be able to tell you which
antivirus products have been tested and thus are supported.
83
Chapter 4
Validating Server and Storage Requirements

At a minimum, WS2K3 requires a 550MHZ CPU and 256MB of RAM. As you already know,
this setup won’t go very far on an enterprise-class file server. For CPU scalability, the installed
Windows OS will determine the number of CPUs that are supported. The maximum CPUs
supported by each WS2K3 OS are:
• WS2K3 Standard—4 CPUs
• WS2K3 Enterprise Edition—8 CPUs
• WS2K3 Datacenter Edition—32 CPUs
In file-serving applications, Microsoft research has shown that going from one to two processors
will improve performance anywhere from 1.4 to 1.6 times, depending on the original client load.
Going from 1 to 8 processors will result in an improvement between 2.4 and 3.2 times. When
sizing RAM, Microsoft has estimated that 1GB of RAM can effectively handle as many as
100,000 simultaneous open file handles. In general, the OS will use as much as 500MB of RAM
for OSs tasks. Thus, a system with 2GB of RAM will have 1.5GB of addressable RAM for open
files. The more open file content that can be stored in RAM, the less the file server has to rely on
disk paging to serve up the file to users. This results in improved performance. As you can see,
the amount of RAM in the file server can play a huge role in the amount of simultaneous open
files that can be supported.
Chapter 3 described the many available methods for providing fault-tolerant storage access—this
section will focus on sizing. WS2K3 by itself will consume about 1.5GB of disk storage.
Microsoft recommends that for file-serving deployments, you allocate 1.5 times the amount of
physical RAM to the paging file. So a file server with 2GB of RAM should have a 3GB paging
file. Performance of the page file can also be improved by locating it on a separate disk. This
setup clears an I/O channel for just paging operations.
Aside from the OS storage requirements, you will also need to budget for program files and log
files. Each application vendor should be able to provide appropriate sizing guidelines for their
log files.
Finally, you will also need to budget for the file data itself. This task is often predictable because
you should have on hand information about the current file server capacity as well as some
historical data showing capacity over the past 12 to 18 months. This data should allow you to
predict requirements 18 to 24 months out. For new deployments, it’s always best to plan on
future capacity. If you are in a situation in which you are unsure of how much data to budget for,
growth can be estimated by looking at reports from previous backups. For example, examining
the size of a server’s monthly full backup over the past 12 months should provide a reasonable
baseline for expected storage growth.
Another major aspect of file server deployment involves backup planning. This topic is covered in
Chapter 6.
84
Chapter 4
Summary
This chapter presented several technologies that aid in deploying reliable Windows file-serving
solutions. Tools such as Shadow Copies for Shared Folders can give you more flexibility with
data availability by allowing users to recover their own files and providing the ability to perform
snapshots of open files during business hours. With the abundance of new tools comes greater
complexity—that complexity can be reduced by consolidating file-serving applications to shared
data clusters. Although other solutions exist, only shared data clusters can provide the benefit of
server consolidation, load balancing, and failover support.
This chapter was fully devoted to Windows file serving, which really only represents a part of
the file-serving landscape. The next chapter will look at the issues and technologies surrounding
Linux file serving in the enterprise.
85
Chapter 5
Chapter 5: Building High-Performance, Scalable, and

Resilient Linux File-Serving Solutions
The last chapter took a close look at the world of Windows file serving. This chapter will take a
similar approach with Linux file serving. Although many of the challenges facing file serving
today remain consistent between both Windows and Linux operating systems (OSs), the
approach to solving the challenges presented by enterprise file-serving for these OSs certainly
differ.
In this chapter, you will see the world of file serving from a Linux perspective. Along the way,
you’ll get a close look at the several challenges facing file serving on Linux platforms. We will
also examine the current alternatives (both commercial and open source) for solving the
performance, scalability, and availability file-serving issues.
Another major aspect of Linux file serving is the ability to integrate Linux file servers onto
Windows-based networks. With a Windows Active Directory (AD)-dominated client base,
building Linux file-serving solutions that can seamlessly integrate into an AD infrastructure is
deemed critical by many organizations. To that end, this chapter will also introduce several of
the technologies that promote file serving across the heterogeneous enterprise.
Chapter 6 will provide detailed procedures and examples of Linux and Windows integration concepts,
such as simplifying authentication using winbind single sign-on and mapping user home folders
between both Windows and Linux desktops.
Before we turn to examining the technologies that are being used to solve today’s Linux file-
serving problems, let’s first take a look at the current file-serving landscape.
Challenges Facing the Linux File-Serving Landscape

Today’s Linux-based file servers face similar challenges to their Windows-based counterparts.
Among these challenges are:
• Performance
• Scalability
• Availability
• Integration
Let’s start with a look at performance.
86
Chapter 5
Performance
As an organization grows, so does its demands on file serving. To accommodate growth, several
elements of the file server may need to be evaluated:
• CPU utilization
• Memory usage
• Disk performance
• Network bottlenecks
Any of these issues can seriously degrade system performance. Problems such as CPU or
memory usage may be overcome with a simple upgrade. The same may hold true for disk
performance. Upgrading to a U320 SCSI or SATA 2.0 hard disks could be a relatively
inexpensive solution, depending on the server capacity.
Network bottlenecks are often the result of having a single network access point for a file server.
This setup is often the case with traditional standalone file servers as well as Network Attached
Storage (NAS) appliances. In these instances, often one of the easiest ways to streamline
performance management is to consolidate to a shared data cluster. Shared data clustering not
only gives you the ability to balance client traffic across several servers but also can provide an
alternative to decommissioning servers that have reached their maximum CPU or memory limit
and thus cannot be upgraded further. Later in this chapter, additional time is spent analyzing the
benefits of shared data clustering as the baseline for Linux file serving in comparison with the
traditional approaches.
For more information about performance tuning and data path optimization, turn back to Chapter 3.
Scalability
In time, scalability issues often result in many of the performance problems that were noted in
the last section. As your organization’s data requirements increase, how does your file server
respond? In some organizations, scalability problems are not easy to predict. In some instances,
server and storage resources are over-allocated due to anticipating too much growth for one
division within an organization. On the flip side, if other resources grow beyond your existing
forecasts, some servers may quickly reach capacity. Running at capacity could result in several
problems, such as hitting a performance bottleneck or running out of available disk space.
To be fully prepared for the pains of scalability, it is important for your file-serving infrastructure
to be just as dynamic as the flow of the business processes within your organization.
87
Chapter 5
Availability
With data access being critical to countless business processes, availability of data is also a
significant consideration among today’s Linux file servers. If a server crashes due to a hardware
or software failure, or even from human error, how does the network respond? If the answer is
that the administrators are running around scrambling for parts or are troubleshooting software, it
means that a particular IT shop is not taking advantage of the many high-availability
technologies that are currently available. If a file server is crucial to your organization’s day-to-
day operations, its data should be resilient to any server-based hardware or software failure.
Integration
Integration is another significant concern among those managing Linux file servers. If your
organization is running a Windows domain, ensuring that your Linux file servers and domain
controllers can seamlessly work together is also very important to the success of your file-
serving infrastructure. Managing permissions and authentication between to two OS platforms in
many cases presents challenges for administrators. However, with knowledge of the right tools
and integration techniques, the two OSs can play together.
Linux-Windows integration is not a topic to be taken lightly; therefore, most of Chapter 6 deals with
how to effectively mesh the two environments together.
Another major integration concern with Linux file-server management is that of configuring
multiple file servers to coexist in a SAN. Although all major Linux distributions offer fibre
channel support, most have limited support in terms of distributed file locking across shared data
in a SAN. Another weakness that exists in some of today’s Linux file-serving solutions is a lack
of reliable multipath support in the SAN. To take advantage of a redundant SAN, predictable
multipath support on fibre channel HBAs attached to Linux servers is crucial. When faced with
these problems, many Linux shops have turned toward tested and certified solutions offered by
third-party hardware and software product vendors.
The previous sections have hit on the major problems that exist in the Linux file-serving
landscape; let’s look at the methods many organizations are currently using to provide file
services to their networks.
88
Chapter 5
Existing Linux File-Serving Solutions

Today, there are predominantly four architectures for offering Linux-based file serving:
• Standalone
• NAS
• DFS
• Clustered
This section will take a look at each of these four approaches.
Standalone
The standalone approach to file serving has stood the test of time and is still well suited for many
small businesses. With this approach, a single server provides shared data access to users and
applications. This approach is suitable for small organizations that do not live and die by the
availability of their file services. If availability is critical, one of the next three architectures
would be a better bet.
NAS
NAS has been a very popular architecture for Linux file serving in recent years. As many NAS
appliances are easy to deploy, include built-in redundant components, and can offer several
terabytes of storage, they have been viewed as an easy choice for many organizations.
As the last chapter mentioned, one of the problems faced by NAS appliances, however, is
growth. If an organization outgrows one NAS, they will need to buy another one. NAS
appliances run on proprietary hardware, so a NAS cannot be redeployed for other uses if it no
longer serves a file-serving need. Another drawback to NAS is sprawl. If an organization deals
with file data growth by continuing to add multiple NAS appliances to the LAN, management
costs for a network that could grow to host several NAS devices would inevitably go up. One
other problem with NAS appliances relates to performance. It is difficult for NAS appliances to
be as resilient to high network traffic as other architectures such as shared data clusters.
One final drawback to NAS-based file serving as seen by many organizations is the high cost of
a NAS appliance. As nearly all NAS vendors sell products that run on proprietary hardware, cost
is another factor that sways organizations toward other Linux-based file serving technologies.
89
Chapter 5
DFS
Like the Windows DFS options discussed in Chapter 4, Linux file servers can also participate in
a DFS hierarchy. Linux file servers running DFS via Samba 3.0 can accept connections from any
DFS-aware Windows clients, such as Windows 98, Windows 2000 (Win2K), or Windows XP.
With DFS support on Samba, there are two ways to integrate Linux file serving into a DFS
hierarchy:
• Create links on a Microsoft DFS root that map to Samba Common Internet File System
(CIFS) file shares on a Linux file server
• Configure the Linux file server as the DFS root
Most AD shops that run DFS configure Windows DFS controllers as DFS roots and simply
create DFS links to any CIFS file shares on Linux Samba servers. This approach allows
organizations to take advantage of some of the Windows DFS features that have not yet made it
into Samba, such as DFS root replicas and AD site-awareness.
If your preference is to run your entire file-serving infrastructure on Linux, you may opt to
configure the DFS root on a Linux box, then point each DFS link to other Linux file servers.
DFS is unique in file-serving architectures in that it does not have to represent an absolute
choice. Instead, DFS can complement other file-serving approaches such as standalone, NAS, or
clustered. The ability to deliver transparent access to file shares could free up administrators to
migrate file shares to other servers without having to impact users. Instead, all that would need to
be updated would be the DFS link that exists at the DFS root so that it references the new shared
folder location.
Basic Samba setup is covered later in this chapter.
Clustered
Another major approach to Linux file serving is to implement a clustered file server. For Linux
file serving, two open source cluster solutions currently exist:
• Linux-HA failover clustering
• LVS load-balanced clustering
These solutions are described in the next two sections.
90
Chapter 5
Failover Clustering
Open source failover clustering on Linux is provided by the High-Availability Linux Project
(http://www.linux-ha.org). Linux-HA clusters can be configured on nearly any Linux
distribution. The key to the operation of Linux-HA clusters is Heartbeat. Heartbeat is the
monitoring service that will allow one node to monitor the state of another and assume control of
the cluster’s virtual IP address if the primary node fails. Heartbeat also provides the ability to
automate the startup of services on the standby node.
Many Linux vendors have jumped on the Heartbeat bandwagon. One such vendor is SUSE,
which includes the Heartbeat setup packages on its SUSE Linux Enterprise Server setup CD. For
distributions such as Red Hat Enterprise Linux, you can download Heartbeat and all necessary
dependant packages from http://www.ultramonkey.org. Ultra Monkey provides the software and
documentation of Linux-HA on Red Hat distributions. Figure 5.1 shows a simple Linux-HA
failover cluster.
Figure 5.1: A 2-node Linux-HA failover cluster.
Note that in Figure 5.1, each node is maintaining its own copy of local storage. For file serving,
this setup can prove to be very challenging. In order for each cluster node to present a consistent
view of file system data, the local storage on each node will need to be continually synchronized.
To maintain consistency across the local storage in the cluster, many organizations turn to
rsync. With rsync, you can configure incremental block-level replication to run between each
node in the failover cluster. Doing so will ensure that the second node in the cluster (RS2) will
have up-to-date data in the event of a failover of the first node (RS1).
Of course, this functionality comes with a significant drawback. For the sake of supporting
failover, you would need to double your storage investment. For clusters consisting of more than
two nodes, this investment would be proportionally higher. As you can imagine, this presents
significant problems when facing storage growth.
For more information about configuring incremental file replication using rsync, visit
http://rsync.samba.org.
91
Chapter 5
Load-Balanced Clustering
Most Linux load-balanced clusters are based on the Linux Virtual Server (LVS) Project.
Compared with the Microsoft network load-balanced cluster architecture, you will see that LVS
uses a fundamentally different approach. With LVS, one or two servers outside of the cluster are
used to distribute client traffic among cluster members. Thus, to build a 3-node LVS cluster,
you’ll need at least four servers. Figure 5.2 illustrates this configuration.
Figure 5.2: A 3-node load-balanced cluster.
In Figure 5.2, the server labeled as Load Balancer accepts incoming client requests and directs
them to an internal Real Server (RS). Each RS is a cluster node. With the load balancer directing
client traffic, the RS nodes in the cluster can be located anywhere that has TCP/IP connectivity
to the load balancer. Thus, each RS does not have to be on the same LAN segment. As the load
balancer is the director for all client requests, having one server as the load balancer does have
one fundamental flaw—fault tolerance. If the load balancer fails, the entire cluster is brought
down. To avoid this problem, most LVS cluster implementations use two systems as load
balancers. One system serves as the active load balancer, and the second system is passive, only
coming online in the event that the active system fails. Figure 5.3 shows a fault-tolerant LVS
cluster.
92
Chapter 5
Figure 5.3: A 3-node fault-tolerant load-balanced cluster.
As with the failover cluster, the LVS load-balanced cluster by default allows for each real server
to maintain independent local storage. This setup again means that to maintain consistency
across the cluster, a replication tool such rsync will need to be employed.
Now that you have seen the basic operation of an LVS cluster, you may be wondering whether
the load balancer acts as a bottleneck for client access. The answer lies completely in LVS
cluster architecture that is applied to the cluster.
LVS Architecture
LVS is generally configured in one of three ways:
• LVS via Network Address Translation (NAT)
• LVS via IP tunneling
• LVS via direct routing
In the next three sections, we’ll look at each of these unique configurations.
93
Chapter 5
LVS via NAT

With the LVS via NAT architecture, the load balancer server is dual-homed and NATs all traffic
to the real servers on an internal LAN. Figure 5.2 and 5.3 show this configuration. With NAT,
each load balancer server directs client traffic into the internal LAN and to a real server. When
the real server replies, the reply goes back through the load balancer system before returning to
the requesting client.
This approach can present both a performance bottleneck as well as scalability limits. Most LVS
cluster implementations cannot scale beyond 10 to 20 nodes and still see any gains in
performance.
LVS via IP Tunneling

Several advantages exist with the LVS via IP tunneling, most notably scalability. Unlike
configuring LVS via NAT, the IP tunneling approach causes the load balancer server to direct
client requests to the real servers via a Virtual Private Network (VPN) tunnel. Replies from the
real servers will use a different network. This approach does not have the scalability limitations
of LVS via NAT.
With use of VPN tunneling, this cluster can easily be distributed among multiple sites and
connected via the Internet. However, this approach is usually best suited for load balancing
between FTP servers and is rarely applied as a high-performance file-serving solution.
LVS via Direct Routing

The LVS via direct routing approach is similar to LVS via NAT, except that reply traffic will not
flow back through the load balancer; instead, replies will be sent directly from the real servers to
the requesting client. As with LVS via NAT, real servers connect to the load balancer via the
LAN. Replies from the real servers would return to the client over a different LAN segment that
is routable to the requesting client.
Unlike the LVS via IP tunneling approach, this method is more sensible for LAN-based file
serving. However, it is still far from the best solution for enterprise file serving. The currently
available commercial solutions are far superior to their open source counterparts.
Although open source clustering technologies have emerged as methods for increasing the
availability and performance of file servers, many organizations are wary of open source technologies
due to a lack of support. If a failure occurs, help may be days (instead of minutes) away.
94
Chapter 5
Commercial File-Serving Solutions

There are several commercial file-serving solutions in the Linux space, including:
• PolyServe NAS Cluster
• VERITAS (now part of Symantec) Cluster Server
• Red Hat Linux Cluster Suite and GFS
In the next three sections, each of these enterprise file-serving solutions will be looked at in
closer detail.
PolyServe NAS Cluster

PolyServe NAS Cluster provides all the benefits of NAS (consolidation, ease of management,
high availability) as well as all the advantages of both Linux-HA and LVS clustering. PolyServe
NAS Clusters offer failover support for file-serving applications and offer true shared data
clustering. In a PolyServe Matrix Server cluster, each node in the cluster shares a common
storage pool in a SAN. Thus, with all cluster shares being in a common location, there is no need
to replicate file server data between nodes. In comparison with the 3-node Linux-HA cluster
shown earlier in Figure 5.1, migrating to a PolyServe NAS Cluster platform will allow you to
immediately triple the amount of storage available for the cluster. Assuming that a Linux-HA
failover cluster had 500GB of local storage attached to each node, the cluster would have
1500GB of total storage, with only 500GB that is truly writable. The reason is that the shared
cluster storage in each node must mirror the storage of the other nodes in the cluster. If the same
storage resources were applied to a PolyServe NAS Cluster, all 1500GB of storage would be
writable. Figure 5.4 provides a comparison between a PolyServe Matrix Server cluster and a
Linux-HA cluster.
95
Chapter 5
Figure 5.4: PolyServe NAS Cluster vs. Linux-HA cluster.
96
Chapter 5
The fact that multiple nodes in a PolyServe NAS Cluster can simultaneously access shared files
provides for high-performance load balancing as well as failover support. Thus, with this
architecture, you can get the benefits of open source clustering products as well as a maximum
return on your storage investment.
Aside from PolyServe’s better approach to clustering, it also has advantages over traditional
NAS vendors such as Network Appliance and EMC. Unlike NetApp and EMC, PolyServe’s
NAS Cluster can be deployed on industry-standard Intel or AMD platforms running Linux.
Unlike NAS, the answer to performance bottlenecks is not through a separate NAS; instead, you
can simply add another node to the cluster.
For more information about the PolyServe NAS Cluster solution, download the white paper UNIX to
Linux Migration at http://www.polyserve.com.
VERITAS Cluster
Similar to the Windows clustering solution described in Chapter 4, VERITAS offers a
comparable clustering solution for Linux. Although this product offers failover support, it does
not provide the load balancing support that is found in PolyServe NAS Clusters.
VERITAS does make up for its lack of load-balancing support by offering other features such as
an intelligent agent that can dynamically move a virtual server in the cluster to an underutilized
node. VERITAS clusters can scale to 32 nodes, giving you plenty of room for growth potential.
If performance and availability are primary concerns, the VERITAS cluster solution has trouble
delivering in performance-demanding environments. This shortcoming is essentially due to the
fact that VERITAS Linux clusters can only provide failover support and do not allow multiple
nodes in the cluster simultaneous access to the same file.
Red Hat Cluster Suite and Global File System

Red Hat offers its own commercial clustering product, which is the company’s adaptation of the
Linux-HA project. Unlike Linux-HA, which is available for free via download and with SUSE
Linux, Red Hat’s Cluster Suite must be purchased as a separate add-on to the Red Hat Enterprise
Advanced Server OS. The Red Hat Cluster Suite provides support for as many as 8-node failover
clusters.
The Red Hat Cluster Suite supports shared storage via SCSI or fibre channel, a management UI
to simplify configuration, and a shared cluster quorum. In a significant diversion from many
traditional Linux server-management practices, Red Hat only supports management of its Cluster
Suite using its Cluster Manager GUI tool. If you want to change cluster configuration files via a
text editor, you’re on your own! The Red Hat Cluster Suite also supports Global File System
(GFS), which provides for better integration with storage networks. GFS supports
simultaneously reads and writes to a single shared file system in a SAN. This feature allows
clusters configured in the Red Hat Cluster Suite to offer both failover and load-balancing
support, similar to the PolyServe NAS Cluster.
97
Chapter 5
Deploying Performance-Based Scalable Linux File-Serving Solutions

Now that you are aware of the available alternatives, let’s take a look at some considerations for
deploying Linux file-serving solutions.
Pre-Deployment Considerations
The tendency of IT administrators is often to deploy first and customize later. For those that
practice this approach, planned customizations take months or even years to complete. After all,
with the file server deployed and operational, justifying spending additional time on the file
server may be difficult, especially if you’re like many IT folks and have countless other tasks on
your list.
To deploy a file server right the first time, planning has to be an important part of the process.
One major part of the planning process is deciding which technologies should be used to
complement the file server. Table 5.1 lists the most common file-serving problems as well as the
available technologies that can alleviate or manage the potential problems.
File-Serving Problem Solution
Limit user usage of file-server Deploy and configure disk quotas
resources
Provide failover support Deploy and configure a Linux-HA cluster
Provide failover and load-balancing Deploy a third-party product
support
Provide antivirus protection Deploy an antivirus solution that is compatible with any installed
file-serving applications as well as your backup product
Prevent unauthorized access Determine the necessary permissions for each user or group that
has access to the server
Table 5.1: Solutions for the most common file-serving deployment problems.
With some of the general requirements under your belt, let’s look at the process of sizing up both
server and storage requirements.
Server Sizing
One of the most difficult aspects of deploying any server is the process of determining the
server’s hardware requirements. This task can be difficult and the result is often an educated
guess based on past experience. To help administrators in their quest to build servers that are
perfect for their needs, many hardware vendors offer online sizing tools. To aid in the reliability
of the tools, sizing tools are typically organized by server purpose, such as file server, and OS.
One such tool is the IBM eServer Workload Estimator, which is available at http://www-
912.ibm.com/wle/EstimatorServlet. Figure 5.5 shows this tool.
98
Chapter 5
Figure 5.5: The IBM eServer Workload Estimator tool.
In the example in Figure 5.5, the Workload Estimator is being used to size a Samba server
running on SUSE Linux Enterprise Server 9. Note that the tool allows to you size server
hardware requirements based on factors such as concurrent user sessions, average user
throughput, and average storage allocation per user.
Once you provide the estimator with the necessary information (or accept the default settings),
the tool will recommend server hardware that will meet your performance requirements. For
SUSE Linux Enterprise Server 9 Samba servers, IBM offered the general guidelines that Table
5.2 shows.
Environment Recommended CPU Recommended RAM Server Platform
Large (400 concurrent 1.9GHZ 4-core 4GB P5 550 Express
users)
Medium (200 1.9GHZ 2-core 2GB P5 520 Express
concurrent users)
Small (85 concurrent 1.65GHZ 1-core 2GB P5 505 Express
users)
Table 5.2: IBM Linux file-server sizing recommendations.
99
Chapter 5
If your preferred server vendor does not offer an online tool to assist in Linux file-server sizing, you
can probably pass along your requirements to your local vendor representative. The local rep should
be able to use an internal tool or consult an engineer to arrive at the proper server sizing
requirements for your environment. As each server application and server uses system resources
slightly differently, there is no one-size-fits-all tool for server resource sizing.
Storage Sizing
Storage sizing starts with allocating adequate internal disks for the OS, applications, log files,
and the paging file. For file-server deployments, a best practice is to allocate 1.5 times the
amount of physical RAM to the paging file. Thus, a file server with 4GB of RAM should have a
6GB paging file. For optimal performance, the paging file should be stored on a separate disk,
which clears an I/O channel for just paging operations. For log file storage sizing, you should
consult with the application vendors for each application running on the server.
Once you have determined the storage requirements for the OS, paging file, and applications,
you can then move on to the storage requirements for the data itself. This value is often
predictable because you should have on hand information about the current file server capacity
as well as some historical data showing capacity over the past 12 to 18 months. For file server
data sizing, a good practice is to requisition ample storage to meet the expected data growth for
the next 18 to 24 months. When unsure about past storage growth, backup logs can usually
provide the information you need. A simple method is to simply view the statistics for monthly
full backups over the past year. This task should allow you to gauge the percentage of storage
growth over the next 1.5 to 2 years.
Once you have a handle on how much storage you need, you can work with your preferred
storage vendor to decide the type and size of disk drives that you’ll need to purchase. As with
server sizing, most storage vendors offer sizing tools that can assist you in determining the
storage devices that will meet your disk storage requirements.
One such tool that can help in identifying the hardware components that could support your
storage requirements is the HP StorageWorks Sizing tool, which is available at
http://h30144.www3.hp.com/. With this tool, you can enter your planned capacity and RAID
level and the tool will generate information about the hard disks to use to meet your requirements
as well as the overall storage efficiency of your planned storage system. Being able to view
efficiency is very helpful when comparing different RAID levels. Figure 5.6 shows a portion of
the HP StorageWorks Sizing tool output.
100
Chapter 5
Figure 5.6: Comparing RAID level efficiency using the HP StorageWorks Sizing Tool.
In the example that Figure 5.6 shows, a 1TB RAID 5 was compared with a 1TB RAID 10. The
tool shows that the RAID 10 configuration would be 49 percent efficient, while the RAID 5
would be 74 percent efficient. The tool also allows you to see information about the disk size to
be used as well as the total amount of storage to be purchased. For example, the 1TB RAID 5
would incorporate a total of twelve 146GB hard disks, total capacity of 1752GB. The usable
capacity would be 1293GB. Once you have your Linux file-serving hardware sized, you are
ready for deployment and management of the essential Linux file-serving services.
Managing Enterprise-Class Linux File Serving

Regardless of whether you have a standalone, NAS or clustered file server, the protocols that
enable file sharing on Linux file servers are the same. This section will look at the roles of the
following protocols and services as they pertain to Linux file serving:
• Network File System (NFS)
• Samba
Let’s start with a look at NFS.
NFS
NFS has long been the de facto file-sharing protocol on UNIX and Linux servers. NFS has stood
the test of time because it provides a simple and efficient means for sharing data between
systems on a network. NFS has continued to evolve and get better with age as was proven by the
recent improvements that were introduced in NFS v4.
101
Chapter 5
What Is New in NFS v4?

NFS v4 is currently supported on both the SUSE 10 and Red Hat Enterprise Linux 4
distributions. NFS v4 offers several new features that both significantly improve the performance
and security of NFS. The following list highlights some of the most significant improvements
brought about by NFS v4:
• Improved security—Supports Kerberos v5 and Simple Public Key Mechanism 3
(SPKM3)
• Better ACL management—Supports named attributes; user and group information is
stored in strings instead of numeric values
• Better firewall compatibility—The disparate NFS protocols (ACL, mount, NFS, NLM,
and stat) are now combined into a single protocol specification
• File delegation—NFS clients can now modify files stored locally in their own cache
without having to send requests back to the NFS server; this feature provides for a
significant network performance improvement
• Lease-based file locking—NFS v4 clients lock files based on a share reservation; if an
NFS v4 client loses contact with a server, once its lease on a locked file expires, that file
is free to be accessed by other users
• Supports file migration and replication—File migration and replication are now
supported via NFS
With a general overview of NFS under your belt, let’s examine the steps for getting this service
up and running.
NFS Setup Checklist

Setting up NFS is a relatively straightforward process. Let’s start by looking at the general steps
for configuring and enabling NFS on a Linux file server:
• Define the folders to publish as shares in the /etc/exports file.
• Set local permissions for each shared folder as necessary.
• Define the hosts and logical networks that are allowed access to the NFS service by
editing the /etc/hosts.allow and /etc/hosts.deny files.
• Start the NFS service.
• Mount a shared folder from an NFS client.
Here’s an example of /etc/exports configured to share a folder named /public:
/public/ *(ro,root_squash,sync)
102
Chapter 5
Once the shares are defined in the /etc/exports file, you then need to ensure that you have the
proper local permissions set for each exported folder. This step is necessary to prevent against
unauthorized access, modification, or deletion of shared files.
Network access can be restricted on a host-by-host or network-by-network basis by editing
/etc/hosts.allow and /etc/hosts.deny. When a connection is attempted to a Linux file server, the
connecting host’s IP address is first evaluated in the etc/hosts.allow file. If no match exists, the
/etc/hosts.deny file is checked. If a match exists, the host is denied access. By default, if no
match exists, the host is allowed access. If you want to deny all traffic from any hosts or services
not explicitly listed in the /etc/hosts.allow file, you would add the following line to the
/etc/hosts.deny file:
ALL:ALL.
Although denying all traffic not explicitly granted access is the most secure method of locking
down a file server, you will need to remember this setting in the event that you are setting up
additional network services or applications on the file server. If the new service or application is
not allowed in /etc/hosts.allow, clients will not be able to connect to the service.
Once you have created the implicit deny rule in /etc/hosts.deny, you would then need to edit
/etc/hosts.allow to grant access to the appropriate hosts or network segments. The following
example shows how to configure /etc/hosts.allow to grant NFS access to hosts on the
172.16.1.0/24 subnet:
lockd: 172.16.1.
rquotad: 172.16.1.
mountd: 172.16.1.
statd: 172.16.1.
At this point, you can start the NFS service and you are on your way. Linux distributions are
continually improving their GUI management tools, and such is particularly the case with SUSE
Linux 9. NFS can be fully configured within minutes by using SUSE Linux’s YaST, as Figure
5.7 shows.
103
Chapter 5
Figure 5.7: Configuring NFS using YaST.
Now that you have seen how to complete the initial setup of NFS, let’s take a quick look at
Samba.
For more information about NFS, point your Web browser to http://nfs.sourceforge.net.
Samba
Samba provides the functionality for Linux file servers to host shared folders that are accessible
via the CIFS protocol, which is the default file-sharing protocol for all Windows-based OSs.
Both Red Hat Enterprise 4 and SUSE Linux 9 run Samba 3. With Samba 3, major improvements
were made that allowed for reliable authentication between Windows AD domain controllers and
Samba servers. Although the reliability improvements are significant, Samba’s feature set is
closer to that of a Windows NT Primary Domain Controller (PDC) than that of a Win2K or
Windows Server 2003 (WS2K3) AD domain controller. This limitation of Samba is expected to
change in Samba v4.
104
Chapter 5
What Is Coming in Samba 4.0?

The upcoming release of Samba 4.0 is being hailed as Samba’s first true challenge to AD.
Among the planned features for Samba 4.0 are:
• Support for AD logon and administration protocols
• An internal Lightweight Directory Access Protocol (LDAP) server
• Internal Kerberos server
• Flexible (extensible) database architecture
• Full NTFS semantics
• Much better scalability
Many administrators have reveled in Samba 3 for its ability to provide highly available CIFS file
serving. With so many planned enhancements in Samba 4, its pending arrival has garnered
significant buzz in the industry.
Samba Deployment
Samba deployment is similar in approach to NFS deployment, with the exception that additional
attention will need to be paid to Windows authentication—considering Samba file servers most
often are used to provide access to Windows client systems. As with NFS, Samba can be
configured using YaST on SUSE Linux or with the Samba Server Configuration tool (see Figure
5.8) on Red Hat Linux.
Figure 5.8: Red Hat Linux Samba server configuration.
105
Chapter 5
The Samba server configuration is stored in the /etc/samba/smb.conf file. The following
example shows the smb.conf file settings that match the /public share definition shown in the
Samba Server Configuration tool earlier:
[public]
comment = Company Docs
path = /public
writeable = yes
This code creates a writable share named “public.” In addition to defining the shares and level of
share access, you need to set permissions for the shared files and folders. In the next chapter, you
will see how to set permissions on a Linux Samba server for Windows user accounts residing in
an AD domain. As so many Samba issues in the enterprise are directly related to Windows, the
bulk of the information on fully deploying Samba is provided in Chapter 6.
For additional information about Samba setup and configuration, read the Official Samba HOWTO at
http://www.samba.org.
Current Trends in Linux File Serving

Linux file servers have gained from three major trends in the IT industry:
• Migration from UNIX to Linux
• Server consolidation
• Storage consolidation
This section will look at the impact of these three trends on the Linux file-serving landscape.
Migration from UNIX to Linux

When Linux first burst onto the IT scene, many thought that it would be a serious challenger to
Windows. Although the question of whether Linux will ever be able to overtake Windows still
remains to be decided, UNIX OSs have suffered substantially at the hands of Linux.
To most, moving from UNIX to Linux is a no-brainer. Many of the enterprise applications that
run on UNIX, such as Oracle, also run on Linux. As Linux OSs can run on industry-standard
Intel-based hardware platforms, Linux servers are far less expensive than UNIX servers that run
on proprietary hardware. Being proprietary can also mean that an organization will need to pay
more to maintain a UNIX server. This cost is not only related to the proprietary hardware in the
server but also the higher cost to pay an administrator that has the specialized skills to maintain
the UNIX server. With Linux file-serving solutions being able to offer comparable performance
to UNIX servers at a fraction of the price, migrating legacy UNIX boxes to Linux is a logical
step.
106
Chapter 5
Benefits of Consolidation
Another logical step that many have taken in file-server management is toward consolidation.
Both proprietary UNIX servers and NAS appliances have been major contributors to server
sprawl. For organizations that have anywhere from two to five NAS appliances, management
overhead is becoming more difficult as the network expands. As with UNIX migration, the
detriments to server sprawl are easy to spot and have led to a flood of organizations
consolidating dozens of UNIX servers and NAS appliances to Linux clusters.
The bottom line with consolidation is that it can result in significant yearly savings. Take for
example a consolidation project that reduces 60 servers to two 15-node PolyServe Matrix
clusters. In this case, the TCO savings could easily reach several hundred thousand dollars a
year. Fewer servers can also mean fewer software updates. With less to maintain, IT shops can
stretch their budgets further.
Consolidation is not about getting smaller for the sake of getting smaller but is instead about
getting the most out of your existing hardware investments. Having several servers with 30
percent CPU utilization, for example, means that you have several servers that have CPUs doing
nothing 70 percent of the time. If your organization has paid for the hardware, it should very well
get the most out of it.
Again, since consolidation is about reducing hardware costs and system management, it is
important to keep in mind that file server consolidation is best suited for shared data clustering.
Clustering provides the ability to configure failover support and load-balanced data access for
critical file servers. Other approaches to consolidation, such as those that consolidate file servers
to virtual machines, only reduce the amount of managed hardware on the network. They do not
reduce the number of managed systems on the network and thus will not help with reducing
software licensing costs. Thus, although there are several ways to go about file-server
consolidation, consolidating to a shared data cluster that can offer the benefit of load balancing,
failover, and streamlined management from a single console has been deemed the most logical
methodology by many organizations in the IT community.
Storage Consolidation
Most organizations also consolidate their storage resources while in the process of consolidating
file-server resources. Storage consolidation offers several benefits:
• More efficient utilization of purchased storage resources
• Simpler storage scalability
• Ability to back up and protect data using methods that are not available to traditional
DAS storage
• Ability to share data between redundant servers instead of having each server maintain its
own local copy of data files
107
Chapter 5
When combined with server consolidation to a shared data cluster, sharing disk resources
between servers in a SAN also allows for true load balancing of data access to storage. With
consolidated storage, when a need arises for additional disk resources, the disks can simply be
added to the SAN and then mapped to the server that needs them. This method is more efficient
for managing storage than the traditional process of “marrying” a disk array to a single server.
If you allow it, the complexity of networks will only continue to grow over time. Warding off
network complexity requires you to be proactive. The instinct to growth is always to buy more
parts. More parts only further add to complexity and in turn more management costs.
Streamlining your network with consolidated server and storage resources will ultimately lead to
better TCO. When combined with shared data clustering, consolidation will also result in vastly
improved reliability and performance.
Summary
In this chapter, you were presented with the state of the Linux file-serving world as well as best
practices for optimizing Linux file serving in production. The final chapter will look at the
management issues surrounding heterogeneous networks. In particular, you’ll see how to
configure winbind authentication on your Linux file servers for the sake of supporting user
authentication to a Linux server via AD. For environments that are running both Windows and
Linux desktops, you will also see how to set up user home folders to be shared across both
Windows and Linux workstations.
After tackling the most challenging Windows-Linux integration issues, the chapter will then
examine modern backup methodologies that are used to maintain data availability and disaster
protection for both Windows and Linux file servers.
108
Chapter 6
Chapter 6: Managing High-Performance, Scalable, and

Resilient Data Across the Enterprise
The two previous chapters took an in-depth look at both Windows and Linux file-serving
solutions and how these two operating systems (OSs) present individual challenges and
advantages in the areas of performance, scalability, availability, and integration. As vendors have
worked to manage and mitigate these challenges, varying new technologies have been developed
to meet the need to manage data. As each new incarnation is adopted, heterogeneous networks
have developed over time, presenting challenges in the areas of integration, backup and recovery,
and freedom to manage your storage solution the way you see fit.
This chapter will examine the broader enterprise picture and leverage what previous chapters
have discussed to develop a clearer understanding of the high-level responsibilities (and granular
realities) of managing data across the enterprise. This chapter will assume a holistic vantage
point to examine the challenges faced in the enterprise today and touch on key points to consider
when defining the strategy behind building highly scalable enterprise file-serving solutions. Let’s
begin by examining a few of the challenges faced in heterogeneous networks and how those
challenges come into play in the enterprise and progress into the areas of backup and recovery.
Challenges Facing Heterogeneous Networks

For reasons apparent to IT managers and administrators, it is appropriate to refer to large-scale
implementations of networked computer systems as an enterprise—a word that not only means
“an undertaking” but more specifically an undertaking of great scope, complication, and risk.
The scope of an enterprise may vary, but generally, all enterprise environments are complicated
landscapes comprised of one or more heterogeneous networks that come together to be defined
as one enterprise Wide Area Network (WAN). As IT managers work to align solutions with their
individual IT missions and goals, meet compliance requirements, increase their return on
investment, and drive cost efficiency, decisions are made that very rarely permit each solution to
align with a single OS or product offering—and what is decided upon gets added to the over-
encompassing umbrella referred to as the enterprise. Heterogeneous networks comprised of
many OSs and protocols present inherit challenges such as:
• Inhibited agility
• Complexity
• Integration concerns
• IT risk and compliance considerations
This section will focus on the numerous management concerns presented when managing a
heterogeneous network environment.
109
Chapter 6
Inhibited Agility
Although you will often find strength in diversity, diversity also has a price. Protocols that are
not inherently compatible with one another complicate the environment, creating additional
management overhead and inhibiting the capability of the enterprise to remain agile. Agility
within the enterprise is the enterprise’s ability to quickly reconfigure IT resources to meet
changing business demands. An enterprise should strive to be flexible in its ability to quickly
respond to changing business requirements to meet the growing needs of the enterprise.
Remaining agile in an environment populated by years of accumulated, different storage
solutions is challenging, to say the least. Many data centers are faced with the realization that
although in the business world “more” often equates to “better,” the same is not the case in the
data center. Striving to meet shorter times to market for emerging business initiatives is often a
challenge that is hindered by the capacity of both the physical and virtual enterprise IT
infrastructure components required to meet the growing demands of the organization.
Complexity
Relying on multiple OSs, hardware platforms, and software platforms inhibits the ability of
administrators to centrally manage the entire environment. As the complexity of the enterprise
environment increases, so do the costs of managing the environment. In addition to the core
competencies required of the engineering, architecture, and support staff, these costs include
implementing and maintaining the systems and technologies required to support, maintain, and—
when disaster strikes—recover the environment. Whenever possible, steps should be taken to
simplify the enterprise architecture to minimize these costs and maximize the effectiveness of
ongoing support efforts.
Integration Concerns
Enterprise IT managers are continually driven to seek harmony in their environments.
Compatible systems reduce the total cost of ownership of the environment by allowing the
enterprise to standardize and simplify. As new systems, protocols, and applications are
developed by competing vendors, often with little internal desire to remain compatible with the
competition, the challenge of integrating enterprise systems escalates and becomes burdensome.
Focusing on integration will help you reduce cost by reducing complexity and simplify
management efforts by reducing or eliminating incompatible systems.
IT Risk and Compliance Considerations

Each OS, firmware revision, and supporting software application presents unique security and
compliance challenges. These range from the broad consideration of service packs and hotfixes
to granular OS and application configuration. The amount of time associated with managing and
mitigating risks and maintaining compliance across the enterprise will inherently impact server,
system, or network availability. As the enterprise strives to remain secure and compliant, the
impact to availability is often felt. These risks need to be analyzed for their potential to impact
business processes and goals.
110
Chapter 6
Integrating Windows and Linux File-Serving Solutions

Windows and Linux file-serving solutions represent different, yet often complementary,
approaches to file serving. At their very core, these two OSs are dramatically different and
distant in design; as such, the solutions developed for each OS have been accordingly disparate.
Central to integrating these two platforms are the challenges presented by:
• Common Internet File System (CIFS) and Network File System (NFS) integration
• Managing Access Control Lists (ACLs)
• Integration with existing services
This section will focus on each of these challenges and how you can leverage the information
covered in the previous chapters to effectively integrate a cross-platform file-serving solution.
CIFS and NFS Integration

Chapter 4 presented Microsoft’s Shadow Copy and its reliance on CIFS as the entry point for
users to recover files. Chapter 5 presented NFS, which has risen to become the unmatched
standard file-sharing protocol on UNIX and Linux servers. In addition to the out-of-the-box
packaged distributions of UNIX and Linux, many NAS devices, of which the vast majority are
based on Linux, use NFS as well. Enterprise IT managers that are looking to maximize their
return on investment by utilizing the best Windows- and Linux-based technologies are now faced
with a bit of dilemma—incompatibility.
The gap that exists between these two protocols can be bridged, but doing so presents
performance concerns. NFS is inherently slower than CIFS, so access from NFS to CIFS will be
slower than a direct CIFS connection. Software solutions exist that allow UNIX- and Linux-
based servers to provide remote file access functionality to PCs without requiring NFS. The most
widely used of these are SAMBA, Hummingbird NFS Maestro, and Microsoft Windows
Services for UNIX (SFU). SAMBA is a server-side installation, while NFS Maestro and SFU are
NFS redirectors that are installed on client workstations.
When considering the use of a redirector, it is important to keep in mind the scalability of the
solution and the architecture’s dependency upon it. Installing NFS Maestro on a few dozen
workstations that require access to a UNIX system for a special purpose, such as a small
accounting department with a need to access an NFS share to run a particular report, may be
acceptable in the short term. Success, however, often equates to scale, so pay heed to the overall
strategy. What may be communicated to you today as a special-purpose small-scale
implementation could very well develop into a situation that is much larger in scale than anyone
initially foresaw. Installing and managing a centralized server to bridge the gap is often a better
option not only to reduce complexity but also to help prevent application sprawl by preventing
the need for redirector software to be installed on multiple systems across the enterprise.
111
Chapter 6
The use of Linux as a mainstream file-serving solution is becoming a reality, and many IT
managers are no longer being afforded the luxury of sticking with a single protocol for all their
file serving needs. This reality adds to the complexity of the environment, as NFS and CIFS
protocols may both be required. NFS vs. CIFS is not an all-or-nothing equation; both can exist
simultaneously to further enhance the capabilities of an existing infrastructure, and as
technologies are continually developed to further enable enterprise management, the inclusion
and integration of both CIFS and NFS file-sharing protocols can dramatically simplify the
inclusion of future systems into the multi-protocol file-sharing environment.
Managing ACLs
Security management is becoming an increasingly complex task. Adapting to industry and
regulatory compliance needs puts the enterprise IT manager in a virtually continuous reactive
state. As new compliance directives are developed and subsequently adopted by your
organization, information security policies, standards, guidelines, and procedures are
implemented to meet these needs. Legislative compliance standards such as the Health Insurance
Portability and Accountability Act (HIPAA) of 1996 provide clear directives on what needs to be
protected from disclosure as well as severe penalties for organizations who fail to meet those
directives. The only piece this legislation doesn’t supply is the means to implement the changes
to meet the requirements—that decision is left to you.
Often the most challenging and certainly the most granular task an IT administrator may face is
managing ACLs. This task can be further compounded when file-serving resources are spread
across multiple platforms with dissimilar ACL architectures. Different ACLs supporting separate
application systems or users make it more difficult to meet legislative compliance requirements
because they lack the means to be centrally managed effectively.
Because the ACL security model in NTFS is more robust and fundamentally different than the
file-security model used in Linux, no one-to-one mapping can be made between them. The
fundamental problem occurs when a Windows client (which expects an NTFS ACL) accesses a
Linux file, or a Linux client (which expects Linux file permissions) accesses a Windows file. In
these cases, the file server must sometimes authorize the request using a user identity that has
been mapped from one system to the other—or, in some cases, even a set of permissions that has
been synthesized for one system based on the actual permissions for the file in the other system.
This setup creates its own set of security concerns that also need to be managed.
There is some good news in that NFS version 4 standardizes the use and interpretation of ACLs
across Posix and Windows environments. This standardization will make centralized
management of ACLs between these two systems easier. It will also support named attributes.
User and group information is stored in the form of strings, not as numeric values. ACLs, user
names, group names, and named attributes are stored with UTF-8 encoding.
For more information on NFSv4 refer to the NFSv4 home page at http://www.nfsv4.org/.
112
Chapter 6
Integration with Existing Services

Transparency is an important concept of enterprise architecture that aides in security efforts by
keeping authorized users relatively unaware of the inner-workings of the infrastructure and
makes systems and resources easier to locate and use. In few places is transparency more vital to
productivity than in the file-serving arena. The viability of a file-serving solution within an
enterprise environment is, to a great extent, dependant upon that solution’s capability to remain
transparent to the end user. File services should appear to users to be a conventional, centralized
file system. The number and location of servers and storage devices should be made invisible.
Backup and Recovery

The previous chapters spent time discussing the use of file servers as a centralized repository to
simplify the process of backing up and recovering systems. It is important when designing and
implementing file-serving solutions that due care be given to ensure that these solutions are also
backed up. The following list highlights a few simple rules to remember when designing a
backup strategy:
• A backup should be easy to do.
• A backup should be automated and rely on as little human interaction as possible.
• Backups should be made regularly.
• There should be at least two copies of the data stored on resilient media and kept at
different locations.
• A backup should rely on standard, well-established formats.
• A backup should not use compression. Uncompressed data is easier to recover if the
backup media is damaged or corrupted.
• A backup should be able to run without interrupting normal work.
113
Chapter 6
A backup is simply one process in an overarching area of responsibility for an organization.

Whether the focus is on business continuity planning (BCP), disaster recovery, or compliance,
backup and recovery planning is going to be crucial to the success of an organization to recover.
To effectively plan and implement a backup and recovery plan in support of business continuity
or disaster recovery, several processes need to be examined for inclusion:
• Risk analysis—Risk is a subjective term and although storage administrators tend to treat
all data as if it were mission critical, it is important to understand how much risk a line of
business is willing to accept in terms of data loss.
• Scheduling—It is a generally accepted rule that backups should not affect production.
Work with your business partners to understand their business requirements and
minimize the impact of backup operations on their environment.
• Review of logs—Backups generate logs that need to be reviewed regularly in order to
ensure that they’re performing properly.
• Testing—Backups are often taken for granted. Until, of course, you need to recover the
system and discover that a backup has malfunctioned during a real disaster recovery
operation. Test backups often to ensure their quality and viability for recovery.
• Retention—All data is not created equal. Some can be discarded virtually immediately;
other data must be archived and maintained for years to meet legal, regulatory, or
compliance obligations. The amount of time data must be maintained is an important
metric to determining the proper archiving solution.
Disaster Planning Essentials

An area that is all too often overlooked or given too little emphasis is that of disaster recovery
planning. Systems that comprise an enterprise file-serving environment should be protected,
ideally in a secure data center with sufficient resources to operate autonomously should a natural
disaster or other emergency interrupt key services and utilities such as telecommunications and
power.
For disaster planning to be effective, it must be put into the proper context. Disaster planning is
often presented in a manner that is—for all intents and purposes—inaccurate. Enterprise
managers often approach disaster recovery planning with a perspective that doesn’t clearly
define the benefits. Managers also often fail to communicate the need for disaster recovery
planning effectively to the line of business operations that will depend upon it. The planning
itself doesn’t relate directly back to an immediate or pressing need, so it is therefore set aside.
When presented in context, the need for disaster recovery planning is clear. Although you hope
to never need to use this parachute, there is peace of mind in knowing it is there to protect you.
The upfront cost and effort required for disaster recovery planning may be uncomfortable, but
when you clearly illustrate the risk and real potential for disaster, the response from the business
should be one of appreciation rather than one of remorse. Once you understand why disaster
recovery planning is essential and how to best communicate it, you can move forward to
examine the stages of disaster recovery planning and how you can begin.
114
Chapter 6
Development
The first stage of disaster recovery planning requires the development of a plan to document the
procedures for responding to an emergency, providing extended backup operations during the
interruption, and managing recovery and reclamation of data and processes afterwards (should an
organization experience a loss of data access or processing capability). A disaster recovery plan
is an enterprise document that should outline the roles and processes of senior management as
well as IT management and other critical personnel in key areas such as security, facilities
engineering, and finance.
Disaster Planning Roles

From the vantage point of the IT manager, there will be several key roles to be fulfilled that
should be documented in the disaster recover plan:
• Identifying and prioritizing mission-critical applications
• Recovering and reconstructing all critical data, systems, and supporting infrastructure
• Continuously reassessing the recovery site’s stability
Identifying and prioritizing mission-critical applications is only one step that requires a close and in-
depth understanding of business needs. Throughout, this chapter will refer to the importance of
maintaining open lines of communication with storage users to better understand their business and
subsequently their current and future storage requirements.
Identify and Prioritize

Identifying and prioritizing mission-critical applications is a step that should be taken as part of
disaster recovery planning and then periodically reevaluated to align with changing business
needs. It is important that clear lines of communication be established early on between the
various lines of business supported by the enterprise infrastructure and IT management so that IT
managers can make informed decisions to protect their line of business partners. Reevaluating
this step periodically is important, not only to keeping those communication lines open but also
in providing visibility of the disaster recovery plan.
Recovery and Reconstruction

The speed, efficiency, and effectiveness of the recovery and reconstruction efforts should be the
focus of a sound disaster recovery plan. Many times, organizations spend a great deal of effort
and planning to ensure that the backup of systems themselves does not impact production and
pay little attention to the recovery time associated with the process.
Reassess
When operating in a disaster from a recovery site, an organization is in its most vulnerable state
of operations. Few organizations outside of government the financial services industry maintain
multiple recovery sites that can be utilized during a disaster. IT managers must act vigilantly to
monitor and protect the stability of the recovery site.
As part of an organization’s disaster recovery planning efforts, a recovery team may be defined
with the mandate to implement the recovery procedures once a disaster is declared. The recovery
team’s primary duty is to get critical business functions operating at the alternative or backup
site.
115
Chapter 6
Traditional Backup Methodologies

Backup methodologies vary with the size, scope, and criticality of the data they are intended to
protect. In the most basic traditional model, critical data from a key system is copied to storage
media such as a tape, or file server, separate from the client to provide protection in the event of
client failure. This process depends heavily on the system resources of the client to perform
either the tape backup or copy operation, which naturally affects the performance of the client
(by way of CPU utilization, network utilization, and so on).
Tape storage can become quite costly when adopted as a standard methodology as the use of
individual tape drives sprawls throughout the enterprise. In addition to the cost of personnel’s
time to handle the physical tapes and maintenance cost associated with the tape drive hardware
and replacement of tapes as they become unviable, there is the cost of storage and shipping to
consider when transporting the media off-site for safekeeping.
In the past, it has been generally accepted that a backup should be performed at least once every
24 hours. This number is arbitrary and should be reconciled against the actual business
continuity and disaster recovery requirements of the system you intend to recover.
Snapshots
Although some backup solutions simply copy data directly to a tape or another disk, some
solutions use another process that utilizes snapshots. A snapshot is a relative (or delta) copy of a
data set. It is differentiated from a mirror in that there are links between the original (or source)
and the copy (or mirror).
In the snapshot process, the backup software makes a copy of the pointers to the data, which
indicate its location, then relies on data movers to pick up the pointers and transfer the data.
Snapshot volumes are point-in-time copies of primary storage volumes. By creating snapshot
volumes, the primary volumes continue to be available for production operations, while the
snapshot volumes are used for offline operations such as backup, reporting, and testing. This
setup results in improved backup operations, data reporting, application testing, and many other
day-to-day operations.
116
Chapter 6
Server-Free Backups
The biggest advantage to server-free backup is the reduction of workload on the target server. A
server-free backup, as the name implies, frees target servers’ CPU, memory, and I/O
consumption during the backup process by decreasing the servers’ involvement in the backup
process. Essentially, the data being backed up will move from the target server’s disk to a data
mover (see Figure 6.1). In a server-free architecture, the data mover is another server that is
dedicated to providing the actual transportation of the data. A data mover can also be a device, as
we’ll see in the next section, such as a Small Computer System Interface (SCSI) drive or router
that reads the data from a network drive and writes it to the backup device. The data mover also
manages the flow of data between the network drive and backup device to ensure that no
information is lost.
Figure 6.1: In a server-free design, a dedicated server acts as the data mover to free system resources on the
target servers.
117
Chapter 6
Server-Less Backups
Like server-free backups, server-less backups offer the advantages of efficiency, scalability, fault
tolerance, and cost reduction, but are defined by a complete lack of dependence on a dedicated
server to fulfill the role of a data mover. In a server-less environment, either the storage device or
its supporting infrastructure are used to fulfill the role of the data mover. Technologies such as
the SCSI-3 Extended Copy (XCOPY) command can be used to read and write data directly
between a disk array and a secondary device and can take advantage of existing modules in
backup applications to coordinate the backup process. This method of backup reduces total cost
of ownership and operational costs by eliminating any need for additional servers and increases
backup performance by eliminating the intermediary server from the backup process (see Figure
6.2).
Figure 6.2: A server-less architecture relies upon the storage solution, or the components of its supporting
infrastructure, to transfer the data.
118
Chapter 6
Some of the major advantages of using server-less backup include:

• Increased server efficiency
• Increased scalability
• Better fault tolerance
• Lower overall hardware costs
Utilizing a server-less architecture provides savings in the form of server elimination as there is
no need for a dedicated server to perform the role of a data mover. In addition, this setup saves
on network utilization, permitting the data to be transferred once directly from the server to the
storage device and thus eliminating the additional step required with an additional server.
Archiving and Migration

There are times when data needs to be held onto for the long haul. Aside from traditional backup
requirements, many organizations find that certain data may be required for legal, or compliance,
reasons to be maintained for months or even years. Archiving refers to the processes supporting
these needs and the storage requirements necessary to meet legal obligations.
When designing a file-serving storage solution, architects need to understand the archiving
requirements and how they pertain to the data being stored so that the solution can be designed to
meet those requirements. A tiered-storage approach provides insight into the value of data over
time by classifying data early and progressively re-classifying the data as its value changes based
upon differing motivators.
Take for example financial data whose value is immensely important during the time of and
immediately following a transaction. Loss of a large-scale financial database, such as those used
by credit card processors, during peak transaction hours could be catastrophic to the business.
Once the transactions have cleared and have been reconciled, the data still remains important and
the processor may need the data surrounding the transaction immediately on-hand for the next 30
days to facilitate refunds or for other internal business processes. As time passes, the data
becomes less critical but still important as information about the transaction is used in tax
reporting and, depending upon the industry and purpose of the transaction, regulatory
compliance may require the data be kept for several years.
Understanding the need and scope of the long-term storage requirement will help drive storage
decisions and the underlying financial motivations. As an understanding of the business need for
the data matures, this understanding drives transformation. Enterprise storage architects, armed
with an understanding of business needs, are compelled to consider storage architectures that
meet those needs especially in the areas of performance, scalability, and resiliency—which leads
many to consider migration to a consolidated storage architecture.
119
Chapter 6
Unfortunately, storage consolidation isn’t as simple as buying a large, enterprise-class storage

array and migrating applications to the new platform. Migration takes time for planning if it’s
going to be successful. Essentially, there are four key areas that need to be assessed when
planning for storage migration:
• Assess the current environment—During this phase, you’ll need to identify your current
storage capacity and gather metrics surrounding its utilization. You’ll also need to re-
establish your understanding of the business being supported by the storage environment
and strive to understand the future requirements the business may be soon facing.
• Understanding the current costs—This phase is important in gathering cost justification
for a storage consolidation effort. Work to gather the current storage hardware and
software costs and understand how those costs impact the bottom line. Remember to
focus attention on the cost of supporting the environment in time, personnel, support
contracts, and licensing agreements.
• Assess storage management capabilities—It is important have a clear picture of the
current management so that you can make informed decisions in the same context. In
addition to administration, one must also consider the ability to monitor the environment
for performance, availability, and security.
• Understand the future business and legal requirements—Although you might not have a
crystal ball readily available to predict the future requirements that will be placed on your
storage infrastructure, you can leverage the experience of your business colleagues whom
often can provide a wealth of information about what the future may hold. New
compliance issues and regulations that your partners may be working to meet may have
significant impact on storage.
Once these steps have been completed, a clearer picture will have developed that will serve to
guide you through the migration process. To aide in understanding a few of the architectures
available, let’s first examine what has worked in the past and what is now being adopted for use
in enterprise backup architecture.
Successful Backup Architectures

There are many approaches to backup and recovery to be considered when developing an
enterprise file-serving solution. Each will be dependant on the size, scope, business continuity,
and disaster recovery needs of the file-serving solution you intend to develop. The three
architectures most commonly found within an enterprise environment are
• Disk-to-Tape (D2T)
• Disk-to-Disk (D2D)
• Disk-to-Disk-to-Tape (D2D2T)
120
Chapter 6
D2T
D2T has been, and to a great extent still is, the most common method used to create backups.
D2T as a standalone backup solution still has many inroads within the enterprise, but its use as a
standard for backup and recovery—as a standalone solution enterprise standard—is diminishing.
New storage solutions and architectures such as NAS and SAN, as discussed in Chapter 2, have
opened avenues to consolidation. For small scale or systems isolated from the storage
infrastructure, D2T is viable as an independent solution, and the relative size-to-cost ratio of tape
storage makes it one of the most affordable solutions from a media standpoint. Within an
enterprise file-storage solution, D2T is commonly found as a component of the more robust and
resilient D2D2T architecture.
D2D
In D2D, as it is often represented, a computer hard disk is backed up to another hard disk rather
than to removable media. D2D as a backup methodology enables both greater performance and
higher capacity relative to tape or other removable media alternatives, which directly translates
to shorter time to recovery. D2D can be used both to refer to a dedicated backup architecture in
which one “disk” is used as a dedicated backup media and whose sole function is to serve as a
backup device. D2D can also refer to contingency backup solutions in which one system is
routinely backed up to a second identical system for recovery purposes.
D2D is often confused with Virtual Tape Library (VTL) technology. A VTL is a data storage technology
that employs the use of emulation that causes hard disks to behave virtually as if they were tape
drives. However, D2D differs in that it enables multiple backup and recovery operations to
simultaneously access the disk directly by using a true file system.
D2D has further advantages over D2T in that in midsized to large-scale implementations, D2D
can lower the total cost of ownership of the backup and recovery solution due to increased
automation of the process and lower hardware costs.
D2D2T
In D2D2T, data is initially copied to backup storage on a disk storage system and then
periodically copied again to a tape or other removable media storage system. Traditionally, many
businesses have done backup directly to relatively inexpensive tape systems. Many high-
performance application systems, such as financial databases, however, have a production
assurance or business continuity need to have their data immediately ready to be restored from
secondary disk if and when the data on the primary disk becomes inaccessible.
As individual storage requirements have begun to be defined in terms of business criticality,
rather than in terms of storage devices, organizations have adopted the concept of storage
virtualization. In a storage virtualization system, IT managers can define an organization’s need
for storage in terms of storage policies rather than physical devices. For example, if a financial
database has a business requirement stating that no more than 15 minutes worth of data may be
lost as the result of a technology failure, D2D2T makes a great deal of storage virtualization
sense. Figure 6.3 demonstrates how D2D2T can be used as a backup architecture for such a
production database.
121
Chapter 6
Figure 6.3: D2D2T and storage virtualization.
To meet a 15-minute goal, the supporting infrastructure would require a server to be dedicated to
providing that contingency role. Every 15 minutes, data is backed up to the contingency server;
then, at regular intervals, the contingency server can be backed up, which eliminates the demand
for a full backup process from the production server. Other critical data, such as recent email,
may also benefit from such a system. Email is considered by many to be mission-critical data in
the short term, though dependency on this information eventually fades over time. D2D backup
will enable email backups to be readily available for recovery in the short-term, then, as the
organization’s policy dictates, eventually moved for archival purposes.
D2D2T has further advantages in that once the data has been moved to a secondary device—
either a dedicated server (as Figure 6.3 shows) or a disk dedicated to serving as backup media—
the data can then be examined by other applications. For example, if one of the businesses being
supported by the architecture desires reporting to be performed that is non-real-time and non-
intrusive, the backup data present on the secondary disk may serve as a means to access that data
without impacting production systems. Such methods should NEVER be used if the data itself is
to be modified in any way, but for simple reporting based upon the data utilizing the backup
copy, is a good way to reduce impact on production systems.
122
Chapter 6
Benefits of Share-Data Approaches

Share-data approaches to storage architecture help to reduce the risk of data loss and increase
productivity and collaboration as well as reduce backup and recovery processes and expenses,
which is why they have become so popular. Over the years, as the ability to share data or
“centralize” data storage has evolved, enterprises architects have battled to maintain availability,
scalability, and manageability across their enterprises. However, the need for expansion and
growth compounded by a sprawling file-serving architecture has crippled many storage solution
initiatives. The benefits of share-data approaches are
• Reduction of complexity
• Increased performance
• Increased scalability
• High availability
• Consolidation of storage
• Simplification of management.
In many data centers, storage is an afterthought, something that is considered to be a byproduct
of an application installation. Because of this mindset, storage is often dedicated to a server or
collection of servers specific to the application systems they support. In a share-data approach,
all servers can see all the data, storage is consolidated, and this architecture aggregates I/O
performance and enables enterprise storage architects to greatly simplify storage management.
By approaching storage management with a share-data approach, the enterprise can consolidate
existing storage and scale the environment as a whole without directly impacting any one
application. This scalability means that a share-data approach is highly flexible, and by sharing
the data throughout the storage infrastructure, the design is inherently fault tolerant allowing for
failover with virtually no application disruption. Share-data approaches centralize, simplify, and
holistically contain the storage infrastructure into a manageable solution.
Comparison: Consolidated vs. Distributed Backup Architectures

Consolidation and simplification will be the focus of countless IT projects this year as
organizations strive to reduce their total cost of ownership and leverage new, higher-performing
platforms available that result in increased storage density. Depending upon the project and
subsequent storage application, consolidation can make a great deal of financial sense. The
results of such a project can be felt directly on the bottom line by reducing hardware and
software licensing costs, ongoing maintenance fees, and power and network requirements.
123
Chapter 6
When comparing the architectures of consolidated and distributed backup architectures, several
key points of contrast come to light. The first and broadest in scope is that there are many
different hardware and software architectures and options to choose from—each supporting
different or varying protocols and their own architectures that come together to develop the
overall storage strategy. The following list highlights the key points to examine:
• Availability
• Scalability
• Interoperability
• Data protection
Distributed Approach
In a distributed environment, file servers and supporting infrastructure abound, and although
traditional methodologies for providing high data availability have driven the enterprise storage
infrastructure in this direction, such environments are more costly to maintain. High-availability
has for years been the siren song of the distributed approach. In a distributed environment, there
are more copies of the data on hand to meet the availability requirements of the business, but the
concept of high-availability is not exclusive to distributed environments. Today, consolidated
data storage can meet the demands of high availability as readily as many distributed approaches
and in an architecture that lends itself more easily to centralized management and scalability.
Scaling a distributed architecture to meet the growing needs of the enterprise brings significant
costs. In addition, new software, servers, drive arrays, and supporting infrastructure are brought
online to meet growing needs—and the cost quickly adds up. Over time as new vendors present
new options in storage, the sprawl of devices within your storage environment becomes
increasingly difficult to manage. Storage administrators now face new challenges as a byproduct
of years of distributing their environments and struggle to maintain interoperability of storage
products within the enterprise.
Planning and providing for common data protection within the enterprise storage environment is
a bit more difficult in the distributed approach. Consider, for an example, the process of moving
large backup jobs in a distributed approach. With a dozen servers each being backed up
independently to separate, distributed storage devices, the operational and subsequent disaster
recovery of these systems can become quite complex. If experience has taught the enterprise
storage community anything, it’s that recovery needs to be as simple and swift a process as is
technically and humanly possible.
124
Chapter 6
Consolidated Approach
Server and storage simplification and consolidation are reaping great benefits to enterprise IT
infrastructure. Technologies are continually being developed to increase the return on investment
for costly mission-critical equipment. Servers, infrastructure, and storage devices aren’t just
costly to purchase, they’re also expensive to maintain and support. Thus, leveraging these new
advances can produce real financial savings.
Maintaining high-availability in a consolidated environment is no longer the exercise in futility it
once may have seemed. However, many still compare the consolidated approach to storage
architecture as putting too many eggs in one basket. The fact of the matter is that today
consolidated storage networks can be just as highly available and resilient as their distributed
counterparts. Redundancy methods and technologies—such as redundant host bus adapters
(HBAs) to protect against cable failure, multi-pathing software, and resilient connectivity
paths—have been developed to facilitate more highly available, robust, and resilient storage
solution architectures.
A consolidated architecture excels in the scalability arena. Existing consolidated solutions can be
scaled up much more easily than a distributed approach, which would often require not only the
expense of the additional equipment but also the supporting architecture. Providing for common
data protection is significantly simplified in a consolidated storage architecture. In its simplest
form, a consolidated storage solution can offer a virtual one-to-one backup architecture. To
illustrate the savings, consider an environment of 60 application servers geographically
disbursed, each with their own individual storage requirements and supporting storage devices.
To maintain common data protection throughout this environment, the backup and recovery
efforts would need to be centered on those data storage devices and the subsequent supporting
infrastructure; whether this results in individual tape drives for the servers or some other
removable media, the cost to maintain the backups can become burdensome. Next consider the
same 60 devices, but instead of a tape drive, they’re all linked to a common SAN. The backup
effort has now been reduced by a factor of 60 because the data has been consolidated to a single,
albeit virtual, location. Figure 6.4.illustrates a potential layout for a dedicated SAN contingency
environment.
125
Chapter 6
Figure 6.4: An example dedicated SAN contingency environment.
Data Recovery
The only reason to make a backup copy of any data is to be able to restore that data after it has
either been lost or damaged. Data recovery is the process of recovering data from primary
storage media when it cannot be accessed normally as a result of physical damage to the storage
device or logical damage to the file system that prevents it from being mounted by the host OS.
Physical damage of a storage device can occur for a multitude of reasons ranging from
malfunction of the physical inner workings of the device—such as a magnetic I/O head of a drive
making physical contact with one of the drive platters—to disasters that impact the equipment
directly as in the case of a fire or a flood. Most organizations lack the facilities, tools, and
experience to recover physically damaged media in-house and must rely on external data
recovery centers that specialize in the recovery of data from physically damaged media. This is a
costly undertaking. Not only is data unavailable and the impact of the absence burdening the
lines of business that depend upon it, but the entire process can be extremely expensive. The
impacts and implications of physical damage are strong motivators to design and implement
recovery solutions that reduce or completely eliminate dependency upon any one physical
device.
126
Chapter 6
Logical damage is by far the most common data recovery focus; fortunately, however, despite
the ease with which logical damage can be inflicted, it is, to a great extent, offset by the relative
ease (in comparison to physical damage) to which the damage can be recovered. Logical damage
is often the byproduct of a sudden loss of power to a file storage device that prevents the file
system structures from being completely written to the storage medium; however, problems with
hardware, drivers, supporting infrastructure, and system crashes can have the same effect. The
result is that the file system is left in an inconsistent state. This situation can cause a variety of
problems, such as strange behavior (for example, infinite directory recursion, drives reporting
negative amounts of free space), system crashes, or an actual loss of data.
Various programs exist to correct these inconsistencies and most OSs come with at least a
rudimentary repair tool for their native file systems. Linux, for instance, comes with the fsck
utility, and Microsoft Windows provides chkdsk. Third-party utilities are also available, and
some can produce superior results by recovering data even when the disk cannot be recognized
by the OS’s repair utility.
The Advantages of Freedom

Throughout, this chapter has discussed many pitfalls to data storage integrity and availability.
There are an overwhelming abundance of storage solutions vendors, each with their own
architecture and agenda to fulfill. Although they may be working to align their interests with the
needs of enterprise consumers, to date, there is no one single universally accepted miracle
solution to meet all the needs of all consumers. The question then becomes which solutions most
closely align with the needs of your enterprise and how can you leverage these solutions to meet
those needs? Central to this concern is the ability for today’s solutions to thrive in future
environments. Freedom from proprietary solutions and standards, as defined by vendors, is
critical in maintaining the flexibility and scalability of a storage architecture.
Benefits of Avoiding Proprietary Solutions

The drive for innovation is intoxicating. Vendors continually work to develop solutions and
market those solutions based upon a foundation of their own technology to further their stake in
the enterprise market; foundation being the operative word. When building an enterprise storage
architecture, enterprise architects should approach their task with the same wisdom as a wise
man who would build his house upon a rock rather than the sand.
Building an enterprise foundation that includes proprietary solutions is akin to building a house
upon sand, which shifts over time and provides little stability against the elements—which, in
storage architecture, are the battering of change and how that change impacts the ability of the
solutions to be scalable, highly available, and resilient. To avoid the pitfalls of this approach,
build a storage solution that embraces standardization and openness.
127
Chapter 6
Proprietary solutions come with a price in that they can be difficult to manage and these
difficulties often increase in direct correlation to the size of the implementation. Although a
small organization can often withstand the year-to-year changes in proprietary solutions and
standards, larger organizations have a much more difficult time weathering the storms.
Proprietary solutions are, by definition, difficult, if not impossible, to integrate with other
solutions, so although a proprietary solution that is being touted as highly scalable today may
seem to meet the immediate needs of the business, a wise storage architect will weigh in other
factors not the least of which is the solution’s ability to be integrated with other key
architectures.
Uncapped Scalability and Performance

If an enterprise file storage system is to be measured by any means it is in scalability and
performance. For a solution to be viable, it must perform to meet the needs of the business, and
to stay viable, it must scale to meet the growing needs of the organization.
In a distributed environment, scalability has been historically hindered by the dedication of
servers to fulfill a specific storage role. A consolidated share-data approach is unhindered by
those chains of bondage as new storage capacity can be brought online without directly
impacting any of the applications supported by the environment. New servers and storage can be
added as needed with no service disruption, enabling the storage environment to grow without
directly impacting the business it is designed to support.
Some third-party solution vendors, such as PolyServe and Network Appliance (NetApp), offer
solutions that enable truly uncapped scalability and performance. As your enterprise storage
requirements grow, such solutions’ architecture facilitates the growth of the storage environment
by providing flexibility in architecture and freedom of choice.
Architecture Flexibility
Some third-party solution architectures enable storage growth and the ability to “scale on
demand,” which means that your storage architecture can remain agile and flexible to meet the
growing needs of your business. By providing for a centralized, consolidated yet highly available
and flexible storage architecture, these solutions enable the enterprise to better serve changes in
business requirements as increased demands for capacity and performance, whose cost at one
time may have been prohibitive, are now being realized.
Freedom of Choice
The storage solutions offered by PolyServe and NetApp are not constrained by hardware
platform or OS to the same degree other solutions may be and enables multiple low-cost Linux-
or Windows-based servers to function as a single, easy-to-use, highly available system. For
example, PolyServe’s Matrix Server includes a fully symmetric cluster file system that enables
scalable data sharing, high-availability services that increase system uptime and utilization, and
cluster and storage management capabilities for managing servers and storage as one solution
independent of hardware platform or supported application system. All of this equates to
freedom of choice, and because storage administrators are unburdened by a dependency on any
one particular hardware or software platform, they are free to reuse existing infrastructure or
more closely align their storage infrastructure with enterprise IT hardware and software
roadmaps easing administration, management, and compliance efforts.
128
Chapter 6
Summary
Throughout this chapter, you have seen how storage solutions are often the critical pivot point
for management and business concerns. The abilities of an enterprise storage environment to
remain agile, reduce complexity, seamlessly integrate, and reduce risk are all concerns that have
a direct financial impact on business operations. Disaster recovery panning has been underscored
as a central theme to storage management and one that deserves due care throughout the life of
your enterprise environment. Remaining flexible and not allowing your environment to become
constrained by proprietary solutions will provide the freedom you require to provide agility and
manage the environment as a whole.
As you have seen throughout the course of this guide, there are many disk, server, performance,
and availability choices at your disposal—each with their own benefits and limitations. As you
continue to work to bring harmony to your enterprise storage solution environment, battle current
file-serving storage growth problems, provide for data path optimization, and set out to build
high-performance, scalable, and resilient Windows and Linux file-serving solutions, remember
to keep an eye on the big picture. Storage solutions are about more than just providing storage;
they’re about enabling the business to succeed by providing quick and easy access to data
whenever and wherever it is required in a manner that is cost effective to implement, manage,
and maintain.
129

The Definitive Guide To Building Highly Scalable Enterprise File Serving Solutions

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

The Definitive Guide To Building Highly Scalable Enterprise File Serving Solutions

Enviado por

Direitos autorais:

Formatos disponíveis

tm

The Definitive Guide To

Building Highly Scalable

Existing Storage Solutions.............................................................................................................24

Hardware RAID .....................................................................................................51

Deploying Performance-Based Scalable Linux File-Serving Solutions........................................98

Chapter 1: Moving Beyond Current File Serving Philosophies

State of the World

Growth of Managed Data

Today’s File Serving Landscape

Figure 1.1: Standalone file server implementation.

Figure 1.2: A simple DFS implementation.

Figure 1.3: A simple NAS deployment.

Figure 1.4: Failover cluster with SCSI-attached shared storage.

Figure 1.5: A 3-node N-to-1 cluster.

Shared Data Clusters

Figure 1.6: Shared data cluster with SAN-attached storage.

The Cost Factor

Current Storage Architectures

Table 1.1: SCSI bus type comparison.

Switches and Hubs

FCIP and iFCP

Figure 1.8: A small iSCSI SAN.

Clustered File Serving Gaining Momentum

Figure 1.9: An example of a high-availability clustering architecture.

Drive Toward Standardization

Chapter 2: Taming Storage Growth—A Modern Perspective

Current Storage Problems

Expanding Backup Windows

Existing Storage Solutions

Figure 2.1: Virtualization access layer for physical storage resources.

Figure 2.2: In-band storage virtualization.

Figure 2.3: Out-of-band storage virtualization.

Figure 2.4: Migrating files older than 6 months to tape.

HSM tools typically set file migration criteria based on:

Policy-Based Storage Virtualization

Shared Data Clusters

Shared data clusters also offer the following advantages:

Comparing Virtual Machines and Shared Data Clusters

Table 2.1: Virtual machine vs. shared data clusters.

Examining Unappliance vs. Appliance Solutions

Proprietary vs. Open Solutions

Integration with Existing Infrastructure and Investments

The Scalability Dilemma

Figure 2.5: NDMP-enabled backup.

Figure 2.6: Unappliance-based shared data cluster backup.

Taming Server and Storage Growth—the Non-Proprietary Approach

Storage Consolidation via SAN

Server Consolidation via Clustering

Planning for Growth While Maintaining Freedom

Chapter 3: Data Path Optimization for Enterprise File Serving

The Big Picture of File Access

Figure 3.1: Objects in the data path.

Availability and Accessibility

Table 3.1: Quantifying downtime by uptime percentage.

Figure 3.2: RAID 0 operation.

Figure 3.3: RAID 0+1 operation.

Figure 3.4: RAID 1+0 operation.

Figure 3.5: RAID 5+0 operation.

Hardware vs. Software RAID

Redundant SAN Fabrics

Elements of the Redundant SAN

Figure 3.6: Redundant SAN fabric.