Você está na página 1de 47

SAN Workshop

Legal Disclaimer
 All or some of the products detailed in this presentation may still be under development and certain specifications, including but not limited to, release dates, prices, and product features, may change. The products may not function as intended and a production version of the products may never be released. Even if a production version is released, it may be materially different from the pre-release version discussed in this presentation. NOTHING IN THIS PRESENTATION SHALL BE DEEMED TO CREATE A WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT OF THIRD PARTY RIGHTS WITH RESPECT TO ANY PRODUCTS AND SERVICES REFERENCED HEREIN. Brocade, the Brocade B-weave logo, Brocade, Fabric OS, File Lifecycle Manager, MyView, Secure Fabric OS, SilkWorm, and StorageX are registered trademarks and the Brocade B-wing symbol and Tapestry are trademarks of Brocade Communications Systems, Inc. or its subsidiaries, in the United States and/or in other countries. FICON is a registered trademark of IBM Corporation in the U.S. and other countries. All other brands, products, or service names are or may be trademarks or service marks of, and are used to identify, products or services of their respective owners.
SAN Design Principles

August 2007

Agenda
 SAN Design Principles  Other SAN Design Considerations

August 2007

SAN Design Principles

SAN Design Principles

Terminology
 SAN: The network for block-level storage connectivity  Fibre Channel SAN: SAN based on SCSI over Fibre Channel (FCP) for open systems; includes switching elements (switches and directors), optics, cabling, and routers. FC represents more than 99% of the SAN installed base.  Functional SAN: Grouping of Fibre Channel networks based on function and/or physical location. Typical functional SANs: ` Disk SAN. Connectivity between individual hosts and disk LUNs. In most shops this is the biggest and most challenging SAN. Generally one or more paired fabrics (A/B). ` Disk Replication SAN. Distance connectivity between storage subsystems for the purposes of data replication. ` Tape SAN. Connectivity between tape users (hosts) and tape (real and/or virtual).

August 2007

SAN Design Principles

Terminology (continued)
 Physical Fabric: Network of interconnected Fibre Channel switches/directors. Services are distributed throughout all switches.  Virtual Fabric: Logical grouping of ports in the physical fabric with dedicated subset of fabric services (naming, zoning, etc.)
` Hardware-based (e.g. Brocade M-series LPARs) ` Software-based (e.g. Brocade Virtual Fabrics, Cisco VSANs)

 Routed SAN: Two or more fabrics connected via Fibre Channel inter-fabric routers. As in the IP/Ethernet world, it is not possible to scale a connectivity model indefinitely without moving to a hierarchical design. (The Internet is the most famous example.) SAN routers are more analogous to firewalls than to simple IP routers.

August 2007

SAN Design Principles

Terminology (continued)
 SAN Size ` Number of usable ports for all host and storage connectivity in all fabrics ` In the Disk SAN there are always at least two independent sides for redundancy ( side A and side B ). There may be multiple fabrics in each side . ` In these discussions, when referring to the SAN size, we will generally be referring to one side ` A scalable SAN design is able to meet the changing connection and bandwidth requirements of the business in a non-disruptive manner Fabric Size: ` Number of usable ports for all host and storage connectivity in one physical fabric ` In the Disk SAN there are always at least two independent fabrics for redundancy ( fabric A and fabric B ) ` A scalable fabric design is able to grow to the desired number of connections in a non-disruptive manner
SAN Design Principles

August 2007

Engineering Requirements
 The SAN design is an engineering project just like a bridge

In a bridge design the stakeholders have different requirements


` ` ` ` Architect wants it to look slick and people to say Wow, look at that bridge! Engineer wants it to be safe, never break and last 200 years Users want it to have enough lanes to prevent congestion and always be open Owner wants it to fit within budget, be easy and cost-effective to maintain and not have to worry about replacement for a long time

August 2007

SAN Design Principles

What Does the Fibre Channel SAN do?


 It s the REALLY IMPORTANT network that provides the paths for I/O transactions (host-to-disk, host-to-tape, disk-to-disk, etc.)
` ` ` ` ` ` No SAN no storage No storage no applications No applications no business function No business function no customers No customers . no revenue No revenue well that s bad for everybody, including Brocade!

 The SAN should be invisible to the rest of the IT infrastructure


The best SAN is a silent SAN. Nobody should even know it s there!

 So, the most important thing about the SAN is that it has to work ALL THE TIME It s all about AVAILABILITY!

August 2007

SAN Design Principles

SAN Business Requirements


 Availability/reliability - Always on
` 99.999% availability (5 minutes per year of down time) ` It must be as reliable as the old direct-attached technologies!

August 2007

SAN Design Principles

Storage Transaction Reliability

  

What a pain when it happens during your web-based IP transaction Storage transactions cannot tolerate this Storage transactions need to be reliable with consistent response times
SAN Design Principles

August 2007

Key Business Requirement Cost!


1. 2.
` ` ` ` ` `

Connectivity costs - Port and bandwidth. Hardware, software. One time and recurring. Well understood. Mostly up front . Management costs - The people to run the SAN on a day-to-day basis. These costs are influenced by:
Rate of change Complexity/simplicity of architecture Quality of equipment Number of points of potential congestion to be managed Use of and quality of software tools Outages (think of all the people that get involved when there is an unplanned outage!)

3.
` `

Business function impact costs


Outages Lost opportunities

In the end it has to be about cost-effectively meeting the requirements of the business
August 2007 SAN Design Principles

Other Important Business Requirements


 Performance - Acceptable and consistent I/O response time. A few milliseconds.  Efficiency Efficiently use SAN resources (ports, bandwidth and storage)  Flexibility Easily define data paths as required to efficiently utilize capacity  Scalability Non-disruptively grow capacity as required  Manageability Does not require an army to manage!

August 2007

SAN Design Principles

SAN/Storage Management Costs Stuff


 Not easy or practical to define industry standard costs although many seek these mythical numbers and some experts claim to know them!  It s all about the specific stuff in your environment:
` Quantity: How much stuff do you have? ` Quality: Do you have good stuff of bad stuff? ` Diversity: Do you have one kind or many kinds of stuff? ` Change: Do you have to make changes to your stuff all the time or does it stay the same? ` Complexity: Is your stuff complex or simple?

August 2007

SAN Design Principles

Basic Design Inputs


1. SAN size - This is the biggie . How many points of connectivity to design for? How many years out? How quickly will it grow? 2. I/O workloads - The workloads drive the need for the data paths in the first place (zero I/Os is easy to design for!):
` Bandwidth Data transferred per unit time (MB/sec). Congestion in the SAN data paths is caused by bandwidth demands. Most applications don t drive sustained high bandwidth (but backup does!) Transaction rate I/Os per unit time (IOPs). Congestion at storage ports may be caused by high IOPs. Applications like databases can drive high IOPs. IOPs themselves do not cause congestion in the SAN data paths.

Rarely are these workloads well known. At a minimum one needs to separate the high workload (bandwidth and IOPs) users from everybody else .

August 2007

SAN Design Principles

Additional Design Inputs


3. Asset reuse ` Will existing switches/directors be reused? ` Driven by things like: depreciation schedule, maintenance costs, applicability in new design, compatibility 4. Migration plan ` What inventory (hosts and storage) will be migrated? ` What application outages are acceptable? ` Are changes occurring with the storage subsystems also? ` Are there power/cooling/space constraints?

August 2007

SAN Design Principles

Key SAN Design Decisions


1. Type of storage
a) Total amount of storage b) Number of target ports c) Fan-in ratio

2. Fabric size and number of fabrics 3. Selection of switching elements


a) Number of ports b) Bandwidth c) RAS

4. Architectural fabric model


a) b) c) d) Flat Core-edge Mesh Network-centric

August 2007

SAN Design Principles

Type of Storage
 The S in SAN stands for Storage!
` Large Tier 1 storage subsystems that allow many front-end target ports (32, 64, and larger) provide connectivity flexibility ` Smaller Tier 2/3 storage subsystems have limited front-end ports (4 to 8), which constrain connectivity flexibility

 Host-to-storage fan-in ratio determines the required number of storage target ports:
` Should be driven by workloads (bandwidth and IOPs) ` Typical industry rule of thumb historically 7:1 at 1Gbit ` This increased to 12:1 at 2Gbit as storage devices became more advanced ` In 4Gbit environments, 18:1 is not uncommon ` However, real world production deployments range from 1:1 to 64:1 ` Consult storage array vendor for recommendations
August 2007 SAN Design Principles

Size/Number of Fabrics
 Manageability ` The size and number of fabrics that have to be managed will impact management costs ` Size of an individual fabric and number of fabrics must be balanced. One fabric is easier to manage than many but a single large and complex fabric (versus several smaller, simpler fabrics) increases risk of human error. ` Manageability applies to both physical and virtual fabrics. Virtual fabrics must be managed similarly to physical fabrics. ` Today, the largest production fabrics have more than 4,000 connections but the typical fabric size in a large SAN is 1,000 to 2,000 ports. ` Large SANs can be built using a managed unit of SAN strategy ` A Managed unit represents a well understood unit of SAN capacity (ports and bandwidth) AND behavior. When we build this we know how it will behave! ` Resource stranding is not an issue when the manage unit is BIG. These are SAN CONTINENTS, not SAN islands!
SAN Design Principles

August 2007

Size/Number of Fabrics (continued)


 Risk ` A balanced approach to fabric size and risk must be taken ` If a large fabric fails the impact to the business is more widespread ` The overall storage provisioning and data path management process may become more complex as the fabric size increases ` Using well-understood managed units reduces overall risks. ` Logical partitioning of fabrics (zoning, virtual fabrics, administrative domains, etc.) minimizes some but not all risks in a physical fabric ` Fabric design must be resilient to withstand the failure of individual components without compromising the overall fabric availability

August 2007

SAN Design Principles

Size/Number of Fabrics (continued)


 Physical isolation (separate physical fabrics) ` Separate physical locations (different data centers) ` Business reasons (some big project wants their own stuff) ` Very different change control needs (business critical 7x24 systems versus all the others) ` Different RAS specifications (blade center embedded switches versus directors) ` Incompatible I/O workloads (tape versus disk) Virtual isolation (separate virtual fabrics) ` Minimize RSCN broadcasts ` Increase security (restrict management or connectivity into a certain fabric) ` Provide granularity of administration ` Appropriate for some functionality but not a substitute for separate physical fabrics

August 2007

SAN Design Principles

Switch Selection
 Number of ports
` ` Larger port-count allows larger fabrics to be built Larger port-count makes host and storage locality (HASL) easier to maintain in flat / collapsed-core designs

Bandwidth
` ` Individual ports have bandwidth limitations (2Gb, 4Gb, ) Port cards and individual port card processors have limitations that may limit the aggregate port bandwidth. This is over-subscription at the component (ASIC, port card, supervisor card, etc.) level Entire switch has finite bandwidth that may also limit the aggregate port bandwidth. This is over-subscription at the switching element level

August 2007

SAN Design Principles

Switch Selection (continued)


 Bandwidth Over-subscription
` ` ` Over-subscription will always exist but must be used and managed such that AVAILABILITY and RELIABILITY are not compromised Over-subscription adds complexity and may require active traffic monitoring and management Multiple levels of over-subscription (e.g. at the ASIC, at the port card, at the ISL, at the storage target) may reduce initial connectivity costs but drive up overall TCO due to increased recurring management costs Not all over-subscription is created equal. Need to understand the manner in which the resource is allocated. For example, Brocade 32port 4Gbit card seems to be 2:1 oversubscribed since the ingress bandwidth is 4Gbit x 32 = 128Gbit and the egress bandwidth is 64Gbit. 2:1 implies that two 4Gbit ingress ports share a single 4Gbit egress port. However, there are actually two groups of 16 ports with each group sharing eight 4Gbit egress ports (32Gbit total egress). So, the effective over-subscription ratio is more accurately defined as 16:8.
SAN Design Principles

August 2007

Switch Selection (continued)


 RAS (Reliability, Availability, Serviceability)
` ` ` ` ` Directors are 5-Nines, switches may be less Choice is driven by tolerance to outages at the physical fabric level and individual port level Storage 5-Nines Some hosts may not need 5-Nines Not advisable mixing different RAS levels in a fabric since fabric RAS is impacted when element failures occur
Fabric instability affects all connections Isolation technologies (e.g. Brocade Access Gateway) mitigate these risks

August 2007

SAN Design Principles

Architectural Model
 When more than one switch is required the manner in which they are inter-connected is defined by the architectural model  Inter-Switch Links (ISLs) are used to connect the switches using in one of the following models: ` Flat/collapsed core model ` Mesh model ` Core-edge model ` Network-centric model

August 2007

SAN Design Principles

Architectural Model (continued)


 The architectural model deployed should support a storagecentric approach:
` Any host should be able to access storage capacity in any storage subsystem ( any port-to-any port is NOT required) ` Storage ports should be stable and not have to be moved ` Data paths should be easily and consistently defined and easy to troubleshoot. Connecting a host and provisioning storage should not be a science project! ` A properly designed SAN should not require a significant amount of inter-fabric routing within the data center ` Routing should be used to address specific functions (e.g. DR, evaulting, etc.) ` Layer 2 switching functionality should be the cornerstone of all I/O transactions

August 2007

SAN Design Principles

Architectural Model (continued)


FLAT/COLLAPSED core model ` Easy to manage No ISLs (may use for management) ` Single director may be able to meet all connectivity needs (collapsed core) ` Expansion is done by adding switches ` Expansion may require storage port moves ` Switches can be any size ` About 4 is maximum reasonable size ` 1,000 ports when using a 256-port switch ` All host-to-storage connections contained within the switch (HASL) ` High performance and easy to understand but ` Least flexible and scalable ` Best fit for static environments ` Host and storage place may be optimized within the switch (e.g. local switching)

August 2007

SAN Design Principles

Architectural Model (continued)

MESH model ` Usually becomes a MESS! ` Not recommend except in very small environments ` Wouldn t mesh more than 3-4 switches ` Hard to expand a fabric usually just add another ` Try to keep host and storage local (HASL) to a switch but difficult due to number and size of switches ` Difficult to manage traffic resulting and congestion ` ISLs consume lots of ports. ISL port consumption grows by n*(n-1)

August 2007

SAN Design Principles

Architectural Model (continued)


CORE-EDGE model ` Storage connections are stable and centralized at the core ` Flexible host access to storage LUNs ` A familiar tiered design
Edge hosts (one hop to storage) Core hosts (zero hops to storage)

` Second core may be added for growth (best to do it in the beginning!) ` Can be built using large and/or small switches ` Size of core dictates scalability of fabric ` Well-defined managed unit
August 2007 SAN Design Principles

Architectural Model (continued)


Network Centric Model
` One large physical fabric ` Physical fabric must be partitioned using virtual fabrics for manageability, stability, performance and risk mitigation ` Routing fabric only contains ISLs from other edge fabrics ` Edge fabrics can either be flat or core-edge ` All ports have access to every other port via the core fabric (routing between virtual fabrics is required to get any-to-any ) ` Servers use zero to four hops to storage ` Can be built using large and/or small switches ` Storage connections are stable but scattered over multiple fabrics ` Very complex

August 2007

SAN Design Principles

Design Principles
1. Minimize the number of fabrics that have to be managed (virtual and physical)
` Fewer things are easier to manage ` Sometimes you have to build more (physical constraints, special security, etc.) ` Allows storage targets to be more easily shared
Doesn t introduce resource isolation Make them big enough and inter-fabric routing is not needed

` Large disk storage subsystems with many target ports enable all disk LUNs to be accessible on all fabrics ` Must balance; manageability and risk ` KEEP IT SIMPLE to maximize AVAILABILITY ` BIG Managed Units (SAN Continents )

August 2007

SAN Design Principles

Design Principles (continued)


2. Minimize the number of switches per fabric (812)
Fewer things are easier to manage Easier to visualize on an computer screen or piece of paper Less inter-switch coordination required during fabric events Less fabric disruption Blade centers with embedded switches are an exception; use isolation technologies (e.g. Access Gateway) to overcome ` Use large switching elements for large SANs ` ` ` ` `

August 2007

SAN Design Principles

Design Principles (continued)


3. Limit the fabric size to about 2,000 target and initiator connections
` Minimizes per-fabric risk ` These are the biggest fabrics running today in environments with the most demanding AVAILABILITY requirements ` Approaches practical limits for zones, zone sets and port aliases ` Approaching limits of management software tools ` Actual limits are much higher: production fabrics exist with over 4,000 usable ports. However, these are exceptional cases and not the rule. ` If more ports are needed, simply deploy additional big managed units and using routing if needed. Localizing most flows within a 2,000 port group is generally easy to do.

August 2007

SAN Design Principles

Design Principles (continued)


4. Utilize switches with a high level of reliability, availability and serviceability (RAS)
` Even with redundant data paths you don t want paths to be lost it s a big pain in the neck! ` Shared storage target ports should always connect to switching elements with high RAS ` The CORE should be high RAS ` If the fabric has demanding RAS requirements, all switches should have similar RAS specifications ` FC switches with high RAS are a cornerstone of a highly AVAILABLE storage environment ` Elements with high RAS characteristics are fundamental to HA environments

August 2007

SAN Design Principles

Basic Design Principles (continued)


5. Ensure that over-subscription does not cause congestion and degrade performance
` Where over-subscription is used, ensure that workloads (bandwidth) are well understood ` Design conservatively ` Classify hosts in at least two categories to isolate high bandwidth users ` Use trunking and DPS to maximize ISL utilization and minimize management

August 2007

SAN Design Principles

Basic Design Principles (continued)


6. Use a core-edge model for large environments
` The biggest fabrics can be built using a storage-centric core-edge model:
Should be used in those data centers with the largest SAN size requirements Provides a stable location for all storage ports Model is full scalable using managed units and inter-fabrics routing where needed

` Two-tier host attachment provides:


Eliminates high bandwidth hosts from consuming ISL bandwidth Reduces operational management overhead

` Creates a more stable environment which increases AVAILABILITY ` A storage-centric flat model can be used in smaller sites

August 2007

SAN Design Principles

Basic Design Principles (continued)


7. Design for storage transactions
` Use storage subsystems with a sufficient number of target ports to meet the connectivity and throughput needs of the hosts ` In large environments plan for and build large fabrics to simplify data paths and eliminate the need for full time routing ` For critical applications, consider localizing traffic within the switching element by co-locating host and storage, and perhaps even within a blade or port group ` LAN-specific requirements like any-to-any are not practical. What is needed is flexible host-to-storage connectivity in order to efficiently utilize storage resources ` Utilizing routing for transient requirements within the data center and for distance applications

August 2007

SAN Design Principles

Basic Design Principles (continued)


8. Keep it Simple
` It shouldn t be a science project to add hosts and storage or figure out the path between an initiator and target ` If the design is simple it will:
Be simpler to manage Reduce the opportunity for mistakes Maximize AVAILABILITY

` Build out in Managed Units and use big managed units for big SANs

August 2007

SAN Design Principles

Other SAN Design Considerations

Other SAN Design Considerations


     Server trends, consolidation and virtualization Distance solutions (replication, e-vaulting) SAN-based storage virtualization Tape SAN Backup-to-disk (B2D)

August 2007

SAN Design Principles

Server Consolidation
Server consolidation in the data center is impacting the SAN designs  Partitioned hardware and server virtualization software
` VMware ` AIX MPARs and VIO servers ` Bladed servers with embedded FC switches

 May result in higher I/O workloads concentrated on fewer SAN connections  Host connections at the edge in a core-edge model will drive higher I/O workloads on average. This must be factored into the design, especially the bandwidth requirements between core and edge switches.  Isolation technologies (e.g. Brocade Access Gateway) are needed to address embedded FC switch connectivity. These SAN switches are owned by the server group!  Power consumption, cooling and cabling more important due to density

August 2007

SAN Design Principles

Distance Solutions
Disk replication and e-vaulting must be accommodated  Storage subsystem-based replication still dominates the market  Leveraging TCP/IP using Fibre Channel over IP (FCIP) is the dominate communication method for open systems  Director-based blades as well as standalone appliances exist. Choose your poison!  Fast-write (and fast-read) functionality is required to overcome WAN latency  A separate fabric (or fabrics) may be justified depending on: volume of data, technology used, compatibility, etc.
August 2007 SAN Design Principles

Storage Virtualization
How many times have you heard This is the year of virtualized storage?         Well, the jury is still out on this one Everyone is jockeying for position The standards don t exist What s the best way to do it (in-band, out-of-band, in the network, in the storage subsystem, etc.)? The idea is to not be locked in to one storage vendor But then you may be locked in to one software vendor Who will win the software war? Do you want to place all your I/O transactions into the hands of a little startup ISV? Who will win the platform war? Where will this software run? A blade? An appliance? What vendor(s)?

August 2007

SAN Design Principles

The Tape SAN


 Tape has a very different I/O profile (large block, bandwidth intensive) than disk access (small block, transaction intensive)  There s no real multi-pathing for fault tolerance (some exceptions). So, if you lose a component in the data path you basically lose access to the device.  The tape targets (drives) and initiators (host users) consume lots of bandwidth for extended periods of time  Tape I/O operations DO NOT behave well during fabric events (usually abort the mission )  Tape connectivity does not experience the same level of configuration changes (zoning, LUN masking, adds/changes/deletions) as disk connectivity
August 2007 SAN Design Principles

The Tape SAN (continued)


Tape SAN Design Rules
` Separate tape (and virtual tape) from the disk SAN ` This separates high bandwidth I/O workloads (tape and media servers) from IOPs sensitive workloads (disk) ` Flat design model is best:
Minimize/eliminate over-subscription over ISLs Most flexible drive sharing within a fabric

` Tape users should have a dedicated HBA into Tape SAN


Tape and disk I/O bandwidth peaks during backup job Disk I/O is small block Tape I/O is large block

` Routing into Tape SAN may be done on a exception basis (locally) or for long distance requirements (e.g. e-vaulting) ` In collapsed core designs, localize data paths

August 2007

SAN Design Principles

Backup-to-Disk
Backup-to-disk trends are also impacting the SAN designs  B2D storage may be in the form of virtual tape or native file systems  Even though it s disk the I/O profile is like tape (large block, bandwidth intensive) versus disk (small block, transaction intensive)  High bandwidth usage for long periods of time  The potential for congestion due to over-subscription is increased  Tape and virtual tape do not support real HA and multipathing therefore a dual fabric design is not necessarily needed  Should be part of the Tape SAN treat it like real tape
August 2007 SAN Design Principles

Thanks!

Você também pode gostar