Você está na página 1de 14

Intelligent Storage ™

Data Consolidation:
Benefits and Implementation Guidelines

A W H I T E PA P E R F R O M A D V A N C E D D I G I T A L I N F O R M AT I O N C O R P O R AT I O N

www.adic.com
ADIC WHITE PAPER

Table of Contents

Introduction ..........................................................................................................................................2

Why the Push for Consolidation?........................................................................................................2

Consolidated Storage ..........................................................................................................................4

Data Consolidation: It’s All about the Data ........................................................................................7

NAS Devices..........................................................................................................................................8

SAN-Based Data Consolidation ..........................................................................................................9

Guidelines for Implementing Enterprise Data Consolidation ........................................................11

Conclusion ..........................................................................................................................................12

1
DATA CONSOLIDATION

Enterprise Data Consolidation: Guidelines for


Leveraging the Benefits of Consolidated Data
This white paper discusses the differences between storage consolidation and
data consolidation. By consolidating data, organizations can realize significant
benefits in storage management and data availability, while better utilizing
storage capacity. This paper explains different approaches to data consolidation
—including NAS devices and SAN file systems—and offers guidelines for
planning and implementing this new approach to data management.

Introduction
The last two decades have seen the distribution of processing and data management
throughout the enterprise. Now we’re reversing that trend, particularly for storage.
Consolidation is the new IT mandate, and storage consolidation is a big part of it. Storage
consolidation is driven by the cost and complexity of managing growing amounts of critical
storage, and the recognition of data as a corporate asset that must be available and accessible.

There is an important, often overlooked distinction between consolidating storage and


consolidating the data itself. Storage consolidation addresses some of the problems facing
IT organizations today in terms of data management costs. But actual data consolidation is
essential to meet service level requirements for increasingly data-dependent organizations
without creating debilitating administrative overhead.

Until recently, data consolidation has been difficult to achieve and was limited to file servers
and NAS devices. Now, a new generation of distributed, heterogeneous SAN file systems is
making data consolidation both possible and strategically important. For example, ADIC’s
StorNext Management Suite offers a SAN file system that provides scalable data consolidation
in heterogeneous environments, reducing the cost and effort of managing enterprise data.

Why the Push for Consolidation?


In the mainframe computing model, all computing and storage is consolidated. The advent of
powerful, smaller processors helped fuel the drive towards the client/server computing model.
This approach, involving multiple, interconnected programs distributed across different CPUs,
proved more flexible and scalable in many ways than mainframes.

Today, 20 years after the general introduction of client/server computing, businesses rely on
computing in new and strategic ways. As the needs for data and processing capacity grew,
this model generated islands of data that were critical to business operations. Managing the
distributed data has been especially difficult in the areas of data availability and data protection.

2
ADIC WHITE PAPER

From a financial perspective, IT departments are spending increasing amounts of their budget
on storage, even if the overall budget is flat. Managing distributed storage is costly and
inefficient. Consolidation is being driven by escalating demands for storage, high service level
requirements, and the administrative costs of meeting those demands.

Most data still resides in a traditional, direct-attached storage model, distributed throughout
the enterprise. Often this is true—even for applications for which multiple computers serve the
same purpose with the same data.

Although the direct-attached model works well for smaller environments, it has serious
limitations as an enterprise strategy for managing data:

• Multiple management points make it difficult to administer storage and apply


consistent policies.

• Different storage platforms and configurations are served by different management


utilities, further adding to the cost of managing the data.

• Data is duplicated throughout the infrastructure, leading to versioning, transfer, and


synchronization problems.
Local Area Network

Clients

File Server Farm

Figure 1. A typical direct-attached storage environment—creating and maintaining multiple copies of data

In short, the direct-attached storage model distributes administration as well as data


throughout the enterprise. This makes it difficult and labor-intensive for organizations to
ensure adequate protection for their critical data. Administrative capacity is often the
bottleneck for growth using this model.

3
DATA CONSOLIDATION

Reducing the cost of storage administration is the big payoff for consolidating storage.
According to IDC, storage administrators can manage up to nine times more data in a
consolidated SAN environment than in a direct-attached model. Even accounting for the
potentially higher costs of network administrators for this environment, management costs are
reduced by 33%. [Source: IDC: Leveraging Networks for Storage Consolidation, October 2001.]

The cost of storage itself is a secondary, but still significant factor. The direct-attached model
is extremely wasteful. Because running out of storage can stop a server cold, most administra-
tors over-provision storage—trading unused storage for the promise of availability. Magnify
this over-provisioning by hundreds or thousands of servers and the waste is enormous.
In IT departments facing fixed budgets and growing service-level demands, this kind of
waste detracts from other IT budget areas.

Although the factors driving consolidation are strong, hard experience has taught us that
new technology paradigms don’t always result in a quick return on investment. So, rather
than taking an ad hoc, opportunistic approach to consolidating storage and data, it’s good
to understand the basis for these technologies and analyze what you hope to achieve.
First, we’ll compare the general concept of storage consolidation with data consolidation.
Then, we’ll review and contrast two different strategies for consolidating data: NAS devices
and SAN file systems.

Consolidated Storage
Storage consolidation is a first step towards reducing the cost and complexity of storage
management—and it’s as far as many organizations have gotten to date. Organizations have
been cautiously adopting storage area networks (SANs) in hopes of achieving the benefits of
consolidated storage.

Most SANs primarily serve to consolidate access by multiple servers to storage devices, such
as arrays or high-end tape devices. Network administrators then use storage virtualization
software to make the consolidated storage behave as a single virtual disk. The physical drives
are accessed via their Logical Unit Numbers (LUNs). Several companies provide LUN
virtualization software to ease the overall management of the host to device connections,
but these techniques can be cumbersome to implement. To prevent different servers from
overwriting the same data, LUN masking and zoning techniques are employed to help ensure
that only one server can access any specific storage resource. Although the storage is
consolidated physically, it is still logically tied to a specific server.

4
ADIC WHITE PAPER

The physically-consolidated storage model, shown in Figure 2, has several advantages over
traditional, direct-attached storage:

• You now have fewer management points for managing storage resources. There are
fewer independent devices to back up, with storage devices in a centralized location.

• You can allocate storage more efficiently, hosting multiple servers with one large array
instead of arrays for each. Or, you can use JBOD storage on the SAN, and add new disks
to the virtual volumes as storage needs increase.

SGI A Windows NT
or 2000
B

Sun Linux

Figure 2. The storage is consolidated, but the data is not; storage is partitioned between servers

However, the physically-consolidated storage model still has some inefficiencies. If you look
closely at Figure 2, you can see that the link between server and data has not been broken.
The disk devices on the SAN are still sliced into individual partitions for each server; data
cannot be shared between them. Each operating system still “owns” its own slice of disk.
In fact, if Server B needs data from server A, Server A must transfer the data over the LAN to
server B. In workflow or other data-sharing environments, this generates a significant amount
of network traffic.

Only consolidating physical storage has several downsides. First, consider the common but
essential task of backup. Although the storage is consolidated, regular backups still occur
on a server-by-server basis, with backup data typically traveling through the LAN to the
backup server.

5
DATA CONSOLIDATION

Second, administrators are still faced with issues of data synchronization and versioning.
Capacity planning occurs on a server-by-server basis. And because administrators have to
provision adequate storage for each server, you still have underutilized storage.

LAN

Tape Library Backup Server Server B Server A Server C

SAN Storage

Figure 3. Backups from a physically-shared array still travel through each host server

Finally, this model does nothing to address the problem of data proliferation. Think again
about the direct-attached example. Assume that each server uses mirrored storage to ensure
availability of data; for three servers, this results in six copies of the same files. Larger server
farms clearly have significant data redundancy.

This is the state that many organizations find themselves in today. Having achieved the initial
benefits of storage consolidation in early SANs, they struggle with storage management tasks
such as backup and capacity planning, and still allocate a significant (and increasing) part of
the IT budget to storage. Although they have proven valuable, most SANs have failed to
deliver the dramatic, paradigm-shifting benefits promised by a complete overhaul of the
system architecture. What’s required is a full shift to data consolidation.

6
ADIC WHITE PAPER

Data Consolidation: It’s All about the Data


Once you’ve consolidated storage in a SAN, the groundwork is in place to consolidate the
data itself. Logical consolidation is primarily a matter of software. The file system intelligence
moves from the individual server level to the SAN level, enabling multiple computers to use
shared storage as a single logical entity.

This model differs from the physically-consolidated model in several important ways. First
and foremost, no single server “owns” the data. Rather, a distributed, independent file system
owns the disk and manages data access between the various systems. Multiple clients (which
may run on different operating system (OS) platforms can share concurrent access to files
without compromising data integrity. The benefits of data consolidation are many:

• Data availability improves; if one server is unavailable, its data remains available to
other servers.

• Productivity improves in workflow environments; no time is spent transferring data


between servers.

• Network bottlenecks caused by data transfers are eliminated.

• Redundant storage is eliminated; instead of four copies of a web page, for example, you
store a single copy on highly available storage.

SGI ABACDBDBA Windows NT


ACBDACAAB or 2000
BCDBBA

Sun Linux

Figure 4. Using data consolidation, all servers share the same pool of data

But the big payoff, again, is in the cost of management. You no longer have to keep separate
disk partitions for each server. Provisioning storage is much easier—you can grow and resize
file systems according to their actual usage, rather than allocating capacity among multiple
machines. Capacity planning is greatly simplified, and there is less data to back up or replicate.
With simplified management, each storage administrator can manage a much larger pool of
storage, providing administrative scalability.

There are two general approaches to consolidating data in file systems: network file servers or
NAS, and distributed file systems on a SAN. The two approaches have significant architectural
and functional differences, described next.

7
DATA CONSOLIDATION

NAS Devices
Network file servers, such as NFS and CIFS, have been around for years, providing consoli-
dated file services over local networks. Network Attached Storage (NAS) devices offer the
same functionality in specialized systems optimized for supporting network file systems.

When a computer requests data from the NAS, the NAS accesses its storage array, obtains the
necessary disk blocks, and converts the data into a generic file format. Then it sends the data
across the LAN to the originating computer where it is converted to the local file format.

Server Server Server


LAN

NAS Appliance Disk

Figure 5. Using NAS, all data travels through the NAS and over the local network

The NAS appliance has the convenience of adding file storage to a network nearly instantly.
It works well in many workgroup environments. But as your needs grow, NAS storage has
several limitations:

• Managing large files: The NAS approach works well when serving small (<1MB) files in
high transaction environments. But with larger files, the conversion to file format intro-
duces significant latency. The local network may introduce latency as well, depending on
traffic. Sometimes users keep data that requires high performance locally instead of on
the NAS—compromising data consolidation.

• Scaling capacity: Some NAS vendors suggest that files larger than 10 MB be distributed
across multiple NAS devices. And often the best way to add significantly more storage is
to put yet another NAS device on the network. This only serves to further distribute, not
consoli-date, data. Before you know it, you’re managing multiple NAS servers in a model
that looks very similar to the original direct-attached storage model.

• Protecting NAS data: Few NAS devices offer robust, back-end processes for moving data
onto tape or other devices. Most often backups must be processed through the NAS box
itself and then written to a backup server—creating a huge performance problem for
around-the-clock environments. And it is often difficult to integrate NAS backups into
enterprise-wide data protection policies. Because of the “black box appliance” nature
of the NAS, the device’s own island of data is isolated from enterprise policies.

8
ADIC WHITE PAPER

SAN-Based Data Consolidation


The other approach to consolidating data is to create a file system that works on the SAN,
enabling multiple computers to share access to the same set of files. This approach has the
benefit of providing the performance of direct-attached storage for all file types, while enabling
data sharing among different platforms.

Figure 6 illustrates how SAN files systems work:

LAN

Meta Data Server Server Server

Fabric

High Speed,
Low Latency SAN

Disk

Figure 6. Meta data travels the LAN, while data access occurs over the SAN

Data access requests go through the meta data server via the local area network. If an
application requests read access to a shared file, it sends the request to the meta data server,
which does the following:

• Determines if the file is available for the request;

• Determines if the application or user is authorized to access the file;

• Directs the client to the file location on the SAN.

The client then reads the SAN data directly, without further interaction with the meta data
server. The meta data transaction itself is quite short and fast prior to the actual data transfer.
This “out-of-band” approach eliminates potential performance bottlenecks, particularly for
large files. This contrasts sharply from the NAS model where all requests, meta data and data
alike, go through the NAS device.

9
DATA CONSOLIDATION

A SAN-based file system provides the following services:

• It supports clients of different OS platforms, enabling file sharing between a wide range
of systems. This lets organizations choose the right system for the right task, without
worrying about data format.

• It allows any SAN client to potentially read or write data without data integrity issues.
Often this is managed by a meta data server, which gives clients access to the file meta
data. Then, clients access the file data directly from the SAN, at fibre channel speed.

• File sizes and bandwidth scale independently. Each SAN file system client has direct
access to the data at Fibre Channel speeds and bandwidth can be added through
additional Fibre Channel connections.

Simplified data administration is a major benefit of data consolidation. Active storage


management capabilities can extend this benefit. By consolidating data into functional file
systems, it’s easier to assign the specific storage characteristics, such as bandwidth and
protection, to specific classes of data.

The shared file system has extraordinary knowledge about the data it controls: creation date,
last access, creating application, permissions, and so on. Active storage management systems
can leverage this knowledge to track data from inception to archival. For example, you can
ensure that critical data is replicated to another facility and stored on highly available storage,
while migrating little-used or older data to high capacity tape devices.

ADIC refers to this concept as Total Data Life Management™, which is the foundation of the
StorNext Management Suite software. ADIC went beyond its SAN file system to create
integrated software that includes policy-driven, automated storage management—offering
administrative efficiencies beyond those inherent in the consolidated data model. This
automated data management and protection magnifies the administrative cost savings
of consolidating data, while ensuring consistent data protection processes.

SGI

RAID

Linux

Storage Area Policy JBOD


Network Management

Sun

Windows NT
or 2000
Automated Tape Library

Figure 7. Integrated Solution. Clients have shared access to policy-managed, consolidated SAN storage

10
ADIC WHITE PAPER

Guidelines for Implementing Enterprise Data Consolidation


The maxim “think globally, act locally” applies well to the process of consolidating enterprise
data. Few organizations are in a position to make wholesale infrastructure changes—most will
phase in innovations. However, in selecting solutions for data consolidation, you need to plan
for the long-term data storage needs of your organization.

With that in mind, here are some general guidelines for data consolidation:

• Take an integrated approach to data consolidation. Data consolidation requires servers,


network infrastructure, disk storage, tape storage, and enabling software. A SAN alone
is not enough—you need the right software to leverage it fully. Although plug-and-play
SANs are the wave of the future, today you should look for vendors that help you create
a proven, integrated SAN file system.

• Choose platform-independent solutions. Avoid locking into one hardware vendor.


Throughout your enterprise, computer platforms are chosen for specific and important
reasons. One may have better floating point operations, while another may handle large-
scale parallel processing particularly well. Cost is another significant consideration in
selecting platforms. The distributed file system must support access from a wide range
of platforms if it is to offer true enterprise data consolidation.

• Integrate data protection into the solution. Your goal is reducing the cost of managing
storage, without compromising data protection. Make sure that data protection processes
like backups, vaulting, and replication are integrated with your SAN file system.

• Leverage opportunities for high availability. The shared file system itself enhances data
availability by providing multiple paths to data. With many paths to shared storage,
clustering and failover become much easier to implement.

• Build for growth. Be sure that your consolidated solution can grow both in terms of
bandwidth and capacity to handle present and future needs.

• Start automating data management. Leverage the file system’s knowledge about the
data and its usage in active storage management systems that track data from inception
through archival.

In selecting projects for SAN file systems, you’ll want to start with those that deliver the
greatest and most visible ROI. You may find that these projects address problems other than
storage administration costs, such as escalating hardware costs, productivity, or network
bottlenecks. Addressing these problems with a SAN file system will give you the metrics you
need to quantify and justify the administrative cost reduction case.

11
DATA CONSOLIDATION

Here are some suggestions for choosing initial data consolidation projects:

• Replace multiple file servers with a SAN file system. Users are already used to sharing
files in file servers; by using a SAN file system you can consolidate several servers into
a shared pool of storage, managed by a single SAN file system. Users will see an
immediate performance improvement in shared storage, while administrative overhead
is reduced. Available network bandwidth should also improve as file transfers are
removed from the LAN.

• Identify workflow applications, particularly those that are data-intensive. In many cases,
the shared file system pays for itself quickly by speeding workflow processes and
reducing redundant storage. It can also lead to a significant net increase in local area
network bandwidth, as large file transfers are removed from the network while servers
access SAN data directly. Good candidates for data consolidation include CAD/CAM
applications and geospatial processing, where multiple people work with large data files.

• Identify applications in which multiple servers provide the same basic functionality with
the same data are great candidates for early data consolidation. Moving to a shared file
system for data that is primarily read-only provides immediate and significant storage
savings, while reducing content distribution and synchronization efforts. The storage
cost savings alone can recoup the software investment rapidly.

Conclusion
According to IDC, “The trend toward networked storage will become the disk data storage
paradigm of the future precisely because of the business advantages.” [Source: IDC:
Leveraging Networks for Storage Consolidation, October 2001.] No one changes their IT
infrastructure without a good idea that the effort will pay off, either in immediate savings
or enabling future growth and technology.

There are different ways to implement consolidation. If you focus on storage alone, then you’ll
limit the return for your networked storage investment. To achieve all potential benefits of
consolidation, you must consolidate the data itself. While NAS devices offer some consolida-
tion benefits, SAN-based file systems offer the most scalable approach for enterprise data
consolidation. They also support multiple operating systems and can be extended to include
automated data protection.

ADIC’s StorNext Management Suite includes a SAN-based file system that enables concurrent
file sharing between heterogeneous systems on a SAN. Integrated storage management
capabilities track and apply data protection and management policies throughout the life of the
data. By consolidating data and automating data management, you can achieve the promises
of consolidation with significant reductions in cost and administration, and improve service
levels for data.

12
Intelligent Storage ™

AD I C G L O B A L H E A D Q U A RT E R S

11431 Willows Road NE


P.O. Box 97057
Redmond, WA 98073-9757 USA www.adic.com
Toll-Free: 800.336.1233
Phone: 425.881.8004
Fax: 425.881.2296

ADIC and StorNext are registered trademarks, and Total Data Life Management is a trademark of Advanced Digital Information Corporation.
© 2003 Advanced Digital Information Corporation.

WPEDC 0203

Você também pode gostar