Você está na página 1de 132

White Paper

Cisco Application Centric


Infrastructure Dual-Fabric
Design Guide

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 1 of 132
Contents
Overview ................................................................................................................................................................... 4
Cisco Services ...................................................................................................................................................... 4
Target Audience.................................................................................................................................................... 5
Prerequisites ......................................................................................................................................................... 5
Introduction .............................................................................................................................................................. 5
Deployment Models for Interconnecting Cisco ACI Fabrics .................................................................................. 6
Cisco ACI Stretched Fabric ............................................................................................................................... 6
Cisco ACI Dual-Fabric Design .......................................................................................................................... 7
Cisco ACI Dual-Fabric Design Overview ............................................................................................................... 8
Reference Topology .............................................................................................................................................. 8
Cisco Data Center Interconnect ............................................................................................................................ 9
vPC as DCI Transport ..................................................................................................................................... 12
OTV as DCI Transport .................................................................................................................................... 13
VXLAN as DCI Transport ................................................................................................................................ 15
Dual-Fabric Layer 2 and Layer 3 Connectivity .................................................................................................... 17
Layer 2 Reachability Across Sites................................................................................................................... 17
Layer 3 Reachability Across Sites................................................................................................................... 23
Policy Consistency Across Cisco ACI Fabrics .................................................................................................... 26
Policy Consistency for Layer 2 Communication .............................................................................................. 27
Policy Consistency for Layer 3 Communication .............................................................................................. 28
Hypervisor Integration ......................................................................................................................................... 30
L4-L7 Service Integration Models ....................................................................................................................... 31
Data Center Firewall Deployment ................................................................................................................... 31
Cisco ASA Deployment Models ...................................................................................................................... 31
Cisco ASA Active-Standby Deployment ..................................................................................................... 31
Cisco ASA Cluster Deployment .................................................................................................................. 33
Multitenancy Support .......................................................................................................................................... 35
Cisco UCS Director and Cisco ACI Dual-Fabric Design ..................................................................................... 37
Storage Considerations ...................................................................................................................................... 40
Cisco ACI Dual-Fabric: Deployment Details ........................................................................................................ 40
Validated Topology ............................................................................................................................................. 40
Cisco ACI Fabric ............................................................................................................................................. 42
Firewalls .......................................................................................................................................................... 42
Data Center Interconnect ................................................................................................................................ 44
WAN Connectivity ........................................................................................................................................... 46
Logical Traffic Flow ............................................................................................................................................. 47
Traffic from DC1 to the WAN .......................................................................................................................... 47
Traffic from the WAN to DC1 .......................................................................................................................... 50
Traffic from the WAN to Data Center for Stretched Subnets ........................................................................... 52
Routed Traffic from DC1 to DC2 ..................................................................................................................... 54
Dual-Fabric Layer 2 and Layer 3 Connectivity .................................................................................................... 55
Deploying Layer 2 Connectivity Between Sites ............................................................................................... 55
Deploying Layer 3 Connectivity Between Sites ............................................................................................... 59
Deploying Hypervisor Integration ........................................................................................................................ 68
Cisco ASA Cluster Integration in a Cisco ACI Dual-Fabric Design ..................................................................... 71
Cisco ASA Cluster Configuration: Admin Context ........................................................................................... 72
ASA Cluster Configuration: Tenant Context .................................................................................................... 77
WAN Integration Considerations ......................................................................................................................... 84
North-South Traffic Flows ............................................................................................................................... 84
Deploying VXLAN as a DCI Solution .................................................................................................................. 89
Testing and Results ............................................................................................................................................... 99

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 2 of 132
Traffic Generator: Emulated Device Configuration .............................................................................................. 99
Traffic Generator: Streams................................................................................................................................ 100
Testing Overview .............................................................................................................................................. 100
Results Summary .............................................................................................................................................. 101
Test Results: Worst Affected Flows Only .......................................................................................................... 101
Link from ACI Leaf 1 in DC1 to the local Nexus 9300 VXLAN DCI device .................................................... 102
Nexus 9300 VXLAN DCI device node failure ................................................................................................ 104
Peer link failure between the Nexus 9300 DCI devices ................................................................................ 106
Cisco ASA Cluster Member Failure (Slave Node in DC1) ............................................................................. 109
Cisco ASA Cluster Member Failure (Master Node)........................................................................................... 114
Cisco ASA Cluster Member Failure (Slave Node DC2) .................................................................................... 118
Customer edge router: link with ACI fabric failure ............................................................................................. 122
Customer Edge Router WAN Link Failure ........................................................................................................ 125
Cisco ACI Border Leaf Node Failure ................................................................................................................. 128
Cisco ACI Spine Nodespine Failure .................................................................................................................. 130
Conclusion ........................................................................................................................................................... 131
Demonstrations of the Cisco ACI Dual Fabric Design ....................................................................................... 132
For More Information ........................................................................................................................................... 132

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 3 of 132
Overview
In the past few years, a new requirement has emerged for enterprises and service providers: they must provide a
data center environment that is continuously available. Customers expect applications to always be available, even
if the entire data center experiences a failure.

Enterprises and services providers also commonly need to be able to place workloads in any data center where
computing capacity exists. And they often need to distribute members of the same cluster across multiple data
center locations to provide continuous availability in the event of a data center failure.

To achieve such a continuously available and highly flexible data center environment, enterprises and service
providers are seeking an active-active architecture.

When planning an active-active architecture, you need to consider both active-active data centers and active-active
applications. To have active-active applications, you must first have active-active data centers. When you have
both, you have the capability to deliver new service levels by providing a continuously available environment.

A continuously available, active-active, flexible environment provides several benefits to the business:

● Increased uptime: A fault in a single location does not affect the capability of the application to continue to
perform in another location.
● Disaster avoidance: Shift away from disaster recovery and prevent outages from affecting the business in
the first place.
● Easier maintenance: Taking down a site (or a part of the computing infrastructure at a site) for maintenance
should be easier, because virtual or container-based workloads can be migrated to other sites while the
business continues to deliver service non disruptively during the migration and while the site is down.
● Flexible workload placement: All the computing resources on the sites are treated as a resource pool,
allowing automation, orchestration, and cloud management platforms to place workloads anywhere, more
fully utilizing resources. Affinity rules can be set up on the orchestration platforms so that the workloads are
co-located on the same site or forced to exist on different sites.
● Extremely low recovery time objective (RTO): A zero or nearly zero RTO reduces or eliminates
unacceptable impact on the business of any failure that occurs.

This document provides a guide to designing and deploying Cisco® Application Centric Infrastructure (Cisco ACI™)
in two data centers in an active-active architecture that delivers the benefits listed here.

The design presented in this document helps enterprises and service providers achieve a fully programmable,
software-defined multiple–data center infrastructure that reduces total cost of ownership (TCO), automates IT
tasks, and accelerates data center application deployments.

Cisco Services
Cisco Services offerings are available to assist with the planning, design, deployment, support, optimization, and
operation of the solution described on this document.

Effective design and deployment is essential to reduce risk, delays, and the total cost of adopting an active-active
architecture.

For an overview of Cisco Services for Cisco ACI, see Services for Cisco Application Centric Infrastructure and
Cisco Nexus 9000 Series Switches.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 4 of 132
Target Audience
The target audience for this document includes network and systems engineers and cloud, data center, and other
solution architects who are involved on the design of active-active data centers.

Prerequisites
To best understand the design presented in this document, the reader should have basic knowledge of Cisco ACI
and how it works and is designed for operation in a single site.

For more information, see the Cisco ACI white papers available at Cisco.com.

Introduction
This document explains the use of two independent Cisco ACI fabrics deployed on two data centers that are
interconnected through Cisco Data Center Interconnect (DCI) technologies such as virtual port channel (vPC),
Virtual Extensible LAN (VXLAN), and Overlay Transport Virtualization (OTV) at Layer 2 and Layer 3 to support an
active-active dual-data center design.

The goal of such a design is to support an active-active architecture and deliver its benefits as described in the
“Overview” section of this document. The design discussed in this document enables interconnection of two Cisco
ACI fabrics, each managed with a separate and dedicated Cisco Application Policy Infrastructure Controller (APIC)
cluster.

Each site contains Cisco Nexus 9000 Series Switches used as leaf and spine switches in the Cisco ACI fabric. The
number and models of the Cisco Nexus 9000 Series Switches used at each site are independent from each other.
For example, if the primary site requires more leaf switches than the secondary site, you can deploy Cisco Nexus
9500 platform modular spine switches at the primary site and deploy Cisco Nexus 9336PQ Switches as fixed spine
switches at the other site (the Cisco Nexus 9500 platform supports more interfaces, and so supports more leaf
switches, than the Cisco Nexus 9336PQ).

The Cisco ACI fabrics deployed in each data center location can be interconnected using the following DCI
technologies:

● Virtual port channel: You should use vPC technology only to interconnect two Cisco ACI fabrics in a point-
to-point manner. It requires the use of dark fibers or dense wavelength-division multiplexing (DWDM)
circuits between the fabrics. Connecting more than two sites with vPC is usually not recommended.
● Virtual Extensible LAN and Overlay Transport Virtualization: VXLAN and OTV are multipoint technologies
that you can use to interconnect more than two Cisco ACI fabrics; however, the focus of this document is on
a design with just two sites. With VXLAN and OTV, IP connectivity is required between the sites, and
connections can cross multiple Layer 3 devices (for example, some WAN routers).

Each Cisco ACI fabric is managed and configured independently from any other Cisco ACI fabric. The APIC cluster
on each site serves as the central point of management and operations for each fabric.

When operating two (or more) independent Cisco ACI fabrics, you need to synchronize policy across the sites to
provide an active-active architecture and support flexible workload placement and virtual machine live migration
(using VMware vMotion or similar technology). The process for achieving automated policy synchronization is
described later in this document.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 5 of 132
To provide true active-active architecture, in addition to supporting it from a network perspective, the design also
needs to integrate Layer 4 through Layer 7 (L4-L7) services such as firewalls and server load balancers. The Cisco
Adaptive Security Appliance (ASA) supports multisite active-active firewall clustering with sites located hundreds of
miles (or kilometers) apart and so this design uses ASA firewalls.

Note: Other firewall solutions with similar active-active capabilities can be used but are outside the scope of this
document.

Virtual machine manager (VMM) integration is performed on a per-site basis. The virtual machine controller, such
as the VMware vCenter, VMware vShield, or Microsoft System Center Virtual Machine Manager (SCVMM), is
integrated with the APIC cluster on each site. For example, vCenter in Site 1 would be integrated with the APIC
cluster in Site 1. VMware vSphere Release 6.0 and later adds a new feature, called Cross vCenter vMotion, that
supports the live migration of virtual machines between vCenter server instances. The design presented in this
document supports this new feature as one of the technologies for providing an active-active architecture.

Storage synchronization between the sites is an important consideration in an active-active architecture. Although
detailed deployment of a storage solution is outside the scope of this document, some overall considerations are
presented in the “Storage Considerations” section of this document.

Deployment Models for Interconnecting Cisco ACI Fabrics


When deploying Cisco ACI in two (or more) data centers, you can choose between two main deployment model
options for interconnecting them (Figure 1):

● Stretch a single Cisco ACI fabric between the two locations.


● Use two independent fabrics, one per site, and interconnect them.

Figure 1. Design Options for Interconnecting Cisco ACI Fabrics

Cisco ACI Stretched Fabric


A Cisco ACI stretched fabric is a partially meshed design that connects Cisco ACI leaf and spine switches
distributed in separate locations (Figure 2). The stretched fabric is functionally a single Cisco ACI fabric. The
interconnected sites are one administrative domain and one availability zone with shared fabric control planes
(using Intermediate System–to–Intermediate System [IS-IS] protocol, Cooperative Key Server Protocol [COOP],
and Multiprotocol Border Gateway Protocol [MP-BGP]). Administrators can manage the sites as one entity;
configuration changes made on any APIC node are applied to devices across the sites.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 6 of 132
The stretched fabric is managed by a single APIC cluster, consisting of three APIC controllers, with two APIC
controllers deployed at one site and the third deployed at the other site. The use of a single APIC cluster stretched
across both sites, a shared endpoint database synchronized between spines at both sites, and a shared control
plane (IS-IS, COOP, and MP-BGP) defines and characterizes a Cisco ACI stretched fabric deployment.

Figure 2. ACI Stretched Fabric

Note: This document does not cover the Cisco ACI stretched fabric deployment model. For more information
about this model, please refer to the following document:
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_kb-aci-stretched-fabric.html.

Cisco ACI Dual-Fabric Design


In a Cisco ACI dual-fabric design, each site has its own Cisco ACI fabric, independent from each other, with
separate control planes, data planes, and management planes (Figure 3). The sites consist of two (or more)
administration domains and two (or more) availability zones with independent control planes (using IS-IS, COOP,
and MP-BGP). As a consequence, administrators need to manage the sites individually, and configuration changes
made on the APIC at one site are not automatically propagated to the APIC at the other sites. You can deploy an
external tool or orchestration system to synchronize policy between the sites.

A dual-fabric design has an APIC cluster per site, and each cluster includes three (or more) APIC controllers. The
APIC controllers at one site have no direct relationship or communication with the others at other sites. The use of
an APIC cluster at each site, independent from other APIC clusters, with an independent endpoint database and
independent control plane (using IS-IS, COOP, and MP-BGP) per site, defines a Cisco ACI dual-fabric design.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 7 of 132
Figure 3. Cisco ACI Dual-Fabric Design

This document focuses on the design and deployment of a Cisco ACI dual-fabric design.

Cisco ACI Dual-Fabric Design Overview


This section provides an overview of the validated Cisco ACI dual-fabric deployment model.

Reference Topology
The validated Cisco ACI dual-fabric design consists of two Cisco ACI fabrics, one per site, interconnected through
one of the following DCI options: back-to-back vPC over dark fiber, back-to-back vPC over DWDM, or VXLAN or
OTV (Figure 4).

Figure 4. Cisco ACI Dual-Fabric Reference Topology

Each fabric is composed of Cisco Nexus 9000 Series spine and leaf switches, and each site has an APIC cluster
consisting of three or more APIC controllers. Between the sites, over the DCI links, Layer 2 is extended by
configuring a static endpoint group (EPG) binding that extends an EPG to the other site using the DCI technology.
At the remote site, a static binding using the same VLAN ID maps the incoming traffic to the correct EPG.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 8 of 132
For Layer 3 connectivity between the sites, Exterior BGP (eBGP) peering is established between the border leaf
switches. Each Cisco ACI fabric is configured with a unique autonomous system number (ASN). Over this eBGP
peering system, IP prefixes relative to subnets that are locally defined at each site are advertised.

For the perimeter firewall, to handle north-south communication (WAN to data center and data center to WAN), the
reference topology presented in this document deploys an active-active ASA cluster, with two ASA devices at each
site. The topology also has been validated using an active-standby firewall design with, for example, the active
ASA at Site 1 and the standby ASA at Site 2. On both cases, the firewalls are inserted without a service graph,
instead using IP routing with a Layer 3 out between the ACI fabric and the Firewalls using OSPF as the routing
protocol.

The ASA cluster solution is better suited for an active-active architecture, because north-south communication is
through the local ASA nodes for IP subnets that are present at only one of the sites. When an ASA cluster is used,
the cluster-control-link (CCL) VLAN is extended through the DCI links. For traffic to subnets that exist at both sites,
if the traffic entering through site 1 needs to be sent to a host in data center 2, intracluster forwarding keep flows
symmetrical for the IP subnets present at both sites.

For inter-EPG filtering, Cisco ACI contracts were used during the validation process. A stateful firewall can also be
used for inter-EPG communication, but this option is beyond the scope of this document.

The ASA uses the Open Shortest Path First (OSPF) peering shown in the reference topology between the ASA
firewalls and the WAN edge routers to learn about the external networks and to advertise to the WAN edge devices
the subnets that exist in the Cisco ACI fabric. A detailed discussion of packet flow is provided in the section
“Logical Traffic Flow” later in this document.

Between the WAN edge routers and the WAN, the reference design uses eBGP because it provides demarcation
of the administrative domain and provides the option to manipulate routing policy.

The Cisco ACI dual-fabric design supports multitenancy. In the WAN edge routers, Virtual Routing and Forwarding
(VRF) provides logical isolation between the tenants, and within each VRF instance an OSPF neighborship is
established with the ASA firewall. In the ASA firewall, multiple contexts (virtual firewalls) are created, one per
tenant, so that the tenant separation is preserved. Tenant separation is maintained by creating multiple tenants in
the Cisco ACI fabric and extending Layer 3 connectivity to the firewall layer by using per-tenant (VRF) logical
connections (Layer 3 outside [L3Out] connections). Per-tenant eBGP sessions are also established between the
Cisco ACI fabrics, effectively creating between the fabrics multiple parallel eBGP sessions in a VRF-lite model over
the DCI extension.

Cisco Data Center Interconnect


To meet disaster-avoidance and workload-mobility needs requirements, Layer 2 domains (VLANs) need to be
extended across different Cisco ACI fabrics. The simple requirement of routing between sites must also be met.
Unlike with other networking approaches, Cisco ACI allows establishment of Layer 3 connectivity over vPC, using a
Layer 3 dynamic routing protocol. The solution proposed in this document has been unified to use direct EBGP
peering between the Cisco ACI fabrics over a VLAN offered by the DCI. The DCI network between the sites is then
a Layer 2 enabled transport, used both to extend Layer 2 connectivity and enable the establishment of route
peering.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 9 of 132
Three DCI options are proposed (Figure 5):

● One very simple option, limited to dual-site deployments, uses vPC. In this case, the border leaf switches of
both fabrics are simply connected back to back using either dark fiber of DWDM connections.
● The second option uses the most popular DCI technology: OTV. It is still uses vPC to connect to the fabric,
but it uses a Layer 3 routed connection over the core network.
● The third option is still emerging. It uses VXLAN technology to offer Layer 2 extension services across sites.

Figure 5. Layer 2 Extension Options

Both OTV and VXLAN allow you to interconnect more than two sites together. This document focuses on
interconnection of two sites, but technically you can connect more sites (if you have more than two sites, contact
your account team to be sure that you have full support).

Whatever technology is chosen for the interconnection, the DCI function must meet a set of requirements.
Remember that the aim of DCI is to allow transparency between sites with high availability: that is, to allow open
Layer 2 and Layer 3 extension while helping ensure that a failure in one data center is not propagated to another
data center.

For meet this goal, the main technical requirement is the capability to control Layer 2 broadcast, unknown unicast,
and multicast flooding at the data-plane level while helping ensure control-plane independence.

Layer 2 extension must be dual-homed for redundancy, but without allowing the creation of end-to-end Layer 2
loops that can lead to traffic storms that can overflow links and saturate the CPUs of switches and virtual
machines.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 10 of 132
Thus, DCI deployments must also complement support for Layer 2 extension with storm control (Figure 6).

Figure 6. Intersite Storm Control

The storm-control rate limiter must be tuned to a value that is determined mainly by the overload of the virtual
machine CPU. This value also depends on the server used and the ratio of physical to virtual servers because
each virtual machine receives the broadcast flow. A good starting point is to rate limit broadcast, unknown unicast,
and multicast traffic to 100 Mbps, which is 1 percent of a 10-Gbps link.

Cisco ACI border leaf nodes can rate-limit broadcast, unknown unicast, and multicast traffic at ingress from the DCI
vPC. This rate limiting is very specific, allowing a different limitation to be applied to different types of traffic.

● Broadcast traffic must be strictly limited, because it is the traffic that reaches the most the CPUs.
● Unknown unicast traffic also must be strictly limited. Under normal conditions, the amount of this type of
traffic should be small in a network as Address Resolution Protocol (ARP) exchange causes the learning of
remote MAC addresses.
● Layer 2 multicast traffic is more difficult to limit because some applications will be using it. If possible, you
should limit the intersite multicast traffic because this traffic can access the CPUs of multiple virtual
machines and switches at the same time. However, you must be sure to verify in every specific network
environment the amount of legitimate multicast traffic needed to avoid degrading the applications that rely
on it.

Note: For more information about how to configure storm control on the APIC, please refer to this link:
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_KB_Configuring_Traffic_Storm_Control
_in_APIC.html.

Another consideration is the speed of the DCI links. In a metropolitan environment, with short distances between
the sites, an active-active data center configuration, and live migration of virtual machines between the sites, the
amount of bandwidth can be high, and two or more 10-Gbps connections may be needed. In the case of disaster
recovery, with a long distance between the data centers, the amount of bandwidth is dictated mainly by the time
needed to replicate the data.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 11 of 132
In addition, the Layer 2 transport must adapt to the core transport, which can be Multiprotocol Label Switching
(MPLS), IP, or DWDM, helping ensure fast convergence, flapping protection, and if possible, path diversity to allow
the heartbeats of server clusters to be transported on different physical paths.

Latency considerations are also part of the general DCI analysis. In the testing performed for the architecture
presented in this document, no latency was added between sites, but the basic assumption is that high latency is
supported because both Cisco ACI fabrics are totally independent from each other, and the only control planes
between them use either BGP or learning bridges (both supported over long distances).

Therefore, the latency considerations are not specific to the Cisco ACI dual-fabric model but instead are relative to
the other components of the solution. For example, in general virtual machine live migration is subject to limitations
introduced by the hypervisor. Traditionally, the maximum supported latency has been a round-trip time (RTT) of 10
milliseconds (ms), but this capability is evolving along with hypervisor-vendor recommendations (for example,
VMware supports RTT of up to 100 ms starting with vSphere Release 6.0).

An important consideration for applications is the location of the application data, with latency recommendations
based on storage replication. With asynchronous replication, there is no real limit, and so the limit depends on
disaster-recovery needs. Synchronous replication, however, has a strict limitation that depends on the deployed
technology. For example, EMC VPLEX and NetApp MetroCluster solutions support a maximum RTT latency limit of
10 ms.

Other types of clustering solutions deployed across data center sites may introduce similar limitations. For
example, server cluster extension is in general limited to 10 ms, but this limitation is evolving among cluster
vendors. Cluster extension considerations also apply to the deployment of active-active firewall solutions over
separate sites. Starting with Cisco ASA Software Release 9.5(1), ASA clusters are supported over two sites
deployed with 20 ms of RTT latency.

You must be sure to assess the impact that latency may have on applications deployed across data center sites.
You must especially be sure to assess the impact between the application tier and the database tier. In
deployments in which the latency between sites increases too much, the best approach usually is to deploy all
application tiers at the same site with local storage. In this case, you also will likely want applications to use
networks services (such as firewalls and load balancers) that are locally deployed at the same site.

When planning DCI deployments, you also need to consider path optimization. The goal of this optimization is to
attract traffic from the WAN directly to the data center in which the requested resource is deployed. Two
technologies can be used for this optimization: Cisco Locator/ID Separation Protocol (LISP) and host-based
routing. Both methods can help ensure optimal inbound traffic delivery to endpoints that are part of IP subnets that
are extended across separate data center sites. Integration of these technologies with a Cisco ACI dual-fabric
design is not described in this document, and the access from the WAN analyzed in the “WAN Connectivity”
section of this document is traditional access with a firewall.

vPC as DCI Transport


In a very simple approach, two Cisco ACI fabrics can be directly connected back to back. As shown in Figure 7, on
each side, one pair of border leaf nodes can use a back-to-back vPC connection to extend Layer 2 and Layer 3
connectivity across sites. Unlike traditional vPC deployments on Cisco Nexus platforms, with Cisco ACI you don’t
need to create a vPC peer link or a peer-keepalive link between the border leaf nodes. Instead, those peerings are
established through the fabric.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 12 of 132
Figure 7. Cisco ACI Dual-Fabric Design with Back-to-Back vPC

You can use any number of links to form the back-to-back vPC, but for redundancy reasons, two is the minimum,
and this is the number validated in this document.

This dual-link vPC can use dark fiber. It can also use DWDM, but only if the DWDM transport offers high quality of
service. Because the transport in this case is ensured by Link Aggregation Control Protocol (LACP), you should not
rely on a link that offers only three 9s (99.9 percent) or less resiliency. In general, private DWDM with high
availability is good enough.

When using DWDM, you need to keep in mind that loss of signal is not reported. With DWDM, one side may stay
up while the other side is down. Cisco ACI allows you to configure Fast LACP to detect such a condition, and the
design reported in this document validates this capability to achieve fast convergence.

OTV as DCI Transport


OTV is a MAC in IP technique for supporting Layer 2 VPNs to extend LANs over any transport. The transport can
be Layer 2 based, Layer 3 based, IP switched, label switched, or anything else as long as it can carry IP packets.
By using the principles of MAC address routing, OTV provides an overlay that enables Layer 2 connectivity
between separate Layer 2 domains while keeping these domains independent and preserving the fault-isolation,
resiliency, and load-balancing benefits of an IP-based interconnection.

The core principles on which OTV operates are the use of a control protocol to advertise MAC address reachability
information (instead of using data-plane learning) and packet switching of IP encapsulated Layer 2 traffic for data
forwarding. OTV can be used to provide connectivity based on MAC address destinations while preserving most of
the characteristics of a Layer 3 interconnection.

Before MAC address reachability information can be exchanged, all OTV edge devices must become adjacent to
each other from an OTV perspective. This adjacency can be achieved in two ways, depending on the nature of the
transport network that interconnects the various sites. If the transport is multicast enabled, a specific multicast
group can be used to exchange control protocol messages between the OTV edge devices. If the transport is not
multicast enabled, an alternative deployment model is available starting from Cisco NX-OS Software Release
5.2(1). In this model, one OTV edge device (or more) can be configured as an adjacency server to which all other
edge devices register. In this way, the adjacency server can build a full list of the devices that belong to a given
overlay.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 13 of 132
An edge device forwards Layer 2 frames into and out of a site over the overlay interface. There is only one
authoritative edge device (AED) for all MAC unicast and multicast addresses for each given VLAN. The AED role is
negotiated, on a per-VLAN basis, among all the OTV edge devices that belong to the same site (that is, that are
characterized by the same site ID).

The internal interface facing the Cisco ACI fabric can be a vPC on the OTV edge device side. However, the
recommended attachment model uses independent port channels between each AED and the Cisco ACI fabric, as
shown in Figure 8.

Figure 8. Connecting the Cisco ACI Fabric to a Pair of OTV Devices

Each OTV device defines a logical interface, called a join interface, that is used to encapsulate and decapsulate
Layer 2 Ethernet frames that need to be transported to remote sites.

OTV requires a site VLAN, which is assigned on each edge device that connects to the same overlay network.
OTV sends local hello messages on the site VLAN to detect other OTV edge devices in the site, and it uses the
site VLAN to determine the AED for the OTV-extended VLANs. Because OTV uses IS-IS protocol for this hello, the
Cisco ACI fabric must run software release 11.1 or later. This requirement is necessary because previous releases
prevented the OTV devices from exchanging IS-IS hello message through the fabric.

Note: An important benefit of the OTV site VLAN is the capability to detect a Layer 2 back door that may be
created between the two Cisco ACI fabrics. To support this capability, you should use the same site VLAN on both
Cisco ACI sites.

One of the main requirements of every LAN extension solution is Layer 2 connectivity between remote sites without
compromising the advantages of resiliency, stability, scalability, etc. obtained by interconnecting sites through a
routed transport infrastructure. OTV achieves this goal through four main functions:

● Spanning-tree isolation
● Unknown unicast traffic suppression

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 14 of 132
● ARP optimization
● Layer 2 broadcast policy control
OTV also offers a simple command-line interface (CLI), or it can easily be set up using a programming language
such as Python.

Because Ethernet frames are carried across the transport infrastructure after OTV encapsulation, you need to
consider the size of the maximum transmission unit (MTU).

Figure 9. OTV Encapsulation

As shown in Figure 9, OTV encapsulation increases the overall MTU size of 50 bytes. Consequently, you should
increase the MTU size of all the physical interfaces along the path between the source and destination endpoints to
account for those additional 50 bytes. An exception can be made when you are using the Cisco ASR 1000 Series
Aggregation Services Routers as the OTV platform, because these routers do support packet fragmentation.

Note: The figure above shows the new OTV encapsulation available on Cisco Nexus 7000 family of switches
starting from NXOS Software release 7.2 and on F3 series linecards.

In summary, OTV is designed for DCI, and it is still considered the most mature and functionally robust solution for
extending multipoint Layer 2 connectivity over a generic IP network. In addition, it offers native functions that allow
a stronger DCI connection and increased independence of the fabrics.

VXLAN as DCI Transport


VXLAN, one of many available network virtualization overlay technologies, is an industry-standard protocol and
uses underlay IP networks. It extends Layer 2 segments over a Layer 3 infrastructure to build Layer 2 overlay
logical networks. It encapsulates Ethernet frames in IP User Data Protocol (UDP) headers and transports the
encapsulated packets through the underlay network to the remote VXLAN tunnel endpoints (VTEPs) using the
normal IP routing and forwarding mechanism.

VXLAN has a 24-bit virtual network identifier (VNI) field that theoretically allows up to 16 million unique Layer 2
segments in the same network. Although the current network software and hardware limitations reduce the usable
VNI scale in actual deployments, the VXLAN protocol by design has at least lifted the 4096-VLAN limitation in the
traditional IEEE 802.1Q VLAN name space. VXLAN can solve this dilemma by decoupling Layer 2 domains from
the network infrastructure. The infrastructure is built as a Layer 3 fabric that doesn’t rely on Spanning Tree Protocol

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 15 of 132
for loop prevention or topology convergence. The Layer 2 domains reside on the overlay, with isolated broadcast
and failure domains.

The VTEP is a switch (physical or virtual) that originates and terminates VXLAN tunnels. The VTEP encapsulates
the end-host Layer 2 frames within an IP header to send them across the IP transport network, and it decapsulates
VXLAN packets received from the underlay IP network to forward them to local end hosts. The communicating
workloads are unaware of the VXLAN function.

VXLAN is a multipoint technology and can allow the interconnection of multiple sites. In the solution proposed in
this document, a VXLAN standalone network simply offers Layer 2 extension services to the Cisco ACI fabrics.
This Layer 2 DCI function is used both to stretch Layer 2 broadcast domains (IP subnets) across sites and to
establish Layer 3 peering between Cisco ACI fabrics to support routed communication.

As shown in Figure 10, logical back-to-back vPC connections are used between the Cisco ACI border leaf nodes
and the local pair of VXLAN DCI devices. Both DCI devices use a peer link between each other and connect to the
fabric border leaf nodes using either two or four links. Any edge VLAN is then connected to a VXLAN segment that
is transported using one only VNI (also called the VXLAN segment ID).

Figure 10. Using VXLAN as the DCI Option on Cisco Nexus 9000 in NX-OS Mode

The transport network between VTEPs can be a generic IP network. Unicast Layer 2 frames are encapsulated in
unicast Layer 3 VXLAN frames sent to the remote VTEP (both remote VXLAN devices advertise themselves in the
VXLAN network as a single anycast VTEP logical entity), and the packet is delivered to one of the remote DCI
nodes, with load balancing and backup. This backup is managed by the underlay routing protocol at the
convergence speed of this protocol. In the tests conducted for this document, BGP was used in conjunction with
Bidirectional Forwarding Detection (BFD) for fast convergence, but any other routing protocol, such as OSPF or IS-
IS, can also be used.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 16 of 132
Layer 2 broadcast, unknown unicast, and multicast frames must be delivered across the VXLAN network. Two
options are available to transport this multidestination traffic:

● Use multicast in the underlay Layer 3 core network. This is the optimal choice when a high level of Layer 2
multicast traffic is expected across sites.
● Use head-end replication on the source VTEP to avoid any multicast requirement to the core transport
network. This is the option that is validated in this document.
VXLAN can also rate-limit broadcast, unknown unicast, and multicast traffic, and as shown in previous Figure 6 this
should be used in conjunction with ACI storm-control capabilities.

VXLAN uses BGP with an Ethernet VPN (EVPN) address family to advertise learned hosts. The BGP design can
use edge-to-edge BGP peering, which is the best choice for a dual site, or it can use a route reflector if the network
is more complex, in which case Internal BGP (iBGP) can be used. VXLAN can provide Layer 2 and Layer 3 DCI
functions, both using BGP to advertise the MAC address, IP host address, or subnet connected. As previously
mentioned, in this document VXLAN is used as a pure Layer 2 DCI, and no Layer 3 option is used. The Layer 3
peering is fabric to fabric in an overlay of VXLAN Layer 2 in a dedicated VLAN.

VXLAN is by nature a multipoint technology, so it can offer multisite connection.

One interesting VXLAN option is the capability to perform ARP suppression. Because VXLAN advertises both
Layer 2 MAC addresses and Layer 3 IP addresses and masks at the same time, the remote node can reply to ARP
locally without the need to flood the ARP request through the system.

Note: ARP suppression in VXLAN fabrics to extend only Layer 2 (and not Layer 3) connectivity is not supported
at the time of this writing, so it was not configured in validating this design.

Dual-Fabric Layer 2 and Layer 3 Connectivity


Deploying a Cisco ACI fabric allows you to transparently provide Layer 2 and Layer 3 communication between
endpoints connected to the Cisco ACI leaf nodes. This connectivity is achieved by deploying logical networks on
top of a purely routed fabric, using VXLAN tunnels established between Cisco ACI leaf nodes. The forwarding
behavior of the Cisco ACI fabric resembles that of a traditional network: Cisco ACI leaf nodes forward traffic based
on the destination MAC address (intrasubnet traffic) or IP address (intersubnet traffic). Bridge semantics are
preserved for traffic within a subnet (no time-to-live [TTL] decrement, no MAC address header rewrite, etc.).

The use of VXLAN starts and terminates at the Cisco ACI leaf layer and should not be confused with the potential
use of VXLAN as DCI technology as discussed in the previous section. When deploying a Cisco ACI dual-fabric
design, you thus must extend Layer 2 and Layer 3 reachability for endpoints connected to those separate fabrics
across the physical infrastructure that interconnects the sites.

Layer 2 Reachability Across Sites


You can achieve Layer 2 connectivity between data center sites by deploying any of the Layer 2 DCI technologies
described previously in the section “Cisco Data Center Interconnect” that allow a Layer 2 domain (usually
associated with a VLAN) to be extended between sites. To allow Layer 2 connectivity between endpoints
connected across independent Cisco ACI fabrics, you then must create a common Layer 2 broadcast domain that
spans the entire system (not only between the sites).

In a Cisco ACI fabric, a Layer 2 broadcast domain is represented by a logical entity called a bridge domain. In
Cisco ACI, a bridge domain is a Layer 2 forwarding construct used to constrain broadcast and multicast traffic.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 17 of 132
Endpoints are always part of a bridge domain (BD). You can, however, group these endpoints into subgroups,
called endpoint groups, or EPGs, defined within a bridge domain. Each EPG can belong to only a given bridge
domain, but multiple EPGs can be part of the same bridge domain, as shown in Figure 11.

Figure 11. Multiple EPGs Can Be Part of the Same Bridge Domain

This association of endpoints into separate EPGs allows you to isolate them (for security policy enforcement) even
when they belong to the same Layer 2 domain.

The validated solution discussed in this document used a network-centric approach in which EPGs were mapped
to bridge domains one to one, as shown in Figure 12.

Figure 12. One-to-One Mapping Between Bridge Domains and EPGs

Workloads belonging to the same EPG are allowed to communicate freely with each other. The information about
the security group for specific endpoints is carried within the Cisco ACI fabric in a specific field in the VXLAN
header. As a consequence, this information is lost when the packets are decapsulated by the Cisco ACI border leaf
nodes and sent to the DCI connection.

Layer 2 reachability thus must be extended between endpoints across separate Cisco ACI fabrics to help ensure
that EPGs are correctly mapped to bridge domains at each data center site. Because a separate APIC cluster is
deployed at each site, this mapping must be configured independently, and it must be configured consistently.

Traffic originating from an endpoint belonging to a given EPG leaves the Cisco ACI fabric through a VLAN hand-off
performed by the Cisco ACI border leaf nodes. This process occurs independent of the specific DCI technology
deployed to interconnect the sites. Mapping an EPG on each site to a common VLAN ID allows you to provide
Layer 2 adjacencies between endpoints that are part of those security groups, as shown in Figure 13. It also allows
each site to classify the incoming packets into the correct EPG, in other words the VLAN ID is used on ingress to
detect the EPG membership of traffic coming from the other site.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 18 of 132
Figure 13. Static VLAN-to-EPG Mapping for Layer 2 Reachability Across Sites

A static VLAN-to-EPG mapping is defined on the border leaf nodes to help ensure that the VLAN=bridge
domain=EPG equation is kept consistent on both sites. The result is a logical end-to-end extension of the Layer 2
broadcast domain, which allows the two endpoints to become Layer 2 adjacent even when connecting to separate
Cisco ACI fabrics. This extension also allows support for live migration (or vMotion, to use the VMware
nomenclature) of endpoints across the sites.

Note: The example in Figure 13 shows the use of the same EPG and bridge domain names on both Cisco ACI
fabrics (EPG1 and BD1). This approach is recommended to simplify the design from an operational point of view.
However, note that the main requirement for achieving Layer 2 adjacency between the endpoints is that you map
the same VLAN tag to an EPG on each side and then verify that the endpoints are connected to those specific
EPGs (independent of the specific names they may have).

From a traffic flow perspective, Layer 2 communication between endpoints connected to separate fabrics is
achieved as shown in Figure 14.

Figure 14. Layer 2 Communication Between Endpoints

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 19 of 132
Traffic originating at EP1 is sent VXLAN-encapsulated across Cisco ACI Fabric 1, until it reaches the border leaf
nodes in Fabric 1. At that point, the traffic is de-encapsulated, and the VLAN hand-off to the DCI devices occurs.
The DCI devices in Fabric 1 perform the VLAN extension to the DCI devices in Fabric 2, which then forward the
VLAN-encapsulated traffic to the border leaf nodes of Site 2. After the border leaf nodes in Cisco ACI Fabric 2
receive the traffic, they classify it to the correct EPG based on the incoming VLAN ID, then they perform VXLAN re-
encapsulation, and the traffic is sent to the specific leaf node to which the destination is connected.

As a result of the traffic flow depicted in Figure 14, Cisco ACI Fabric 1 discovers EP2 as a local device connected
to the vPC local port on the border leaf nodes, and the opposite happens in Cisco ACI Fabric 2 (in which EP1 is
locally discovered on the border leaf nodes). This process is an important consideration for endpoint scalability,
because the mapping database in the spine devices of each fabric will need to maintain information about all the
local endpoints plus the endpoints in the remote fabric that belong to Layer 2 segments that are stretched across
sites. Keep in mind that the number of endpoints that can be stored in the spine node mapping database depends
mainly on the type of Cisco Nexus 9000 Series platforms deployed in that role, as shown in Figure 15.

Figure 15. Scalability of Cisco ACI Spine Node Mapping Database

Note: You mix different types of platforms as spine nodes in the same Cisco ACI fabric. However, keep in mind
that the endpoint scalability value is the minimum common denominator of all the deployed switch models. For the
latest scalability numbers, please refer to the ACI Verified Scalability Guides available at
http://www.cisco.com/c/en/us/support/cloud-systems-management/application-policy-infrastructure-controller-
apic/tsd-products-support-series-home.html

To allow Layer 2 communication between endpoints deployed in separate Cisco ACI fabrics, several configurations
are needed:

● ARP flooding must be enabled in the bridge domains defined on the two Cisco ACI fabrics. This
configuration is required when EP1, for example, is trying to send an ARP request to EP2, but EP2 has not
yet been discovered by the border leaf nodes in Fabric 1. Enabling ARP flooding in Fabric 1 helps ensure
that the ARP request can be sent to Fabric 2 across the DCI connection, so that EP2 can receive it and
respond.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 20 of 132
● Layer 2 unknown unicast flooding should be enabled in the bridge domains defined on the two Cisco ACI
fabrics. This configuration is needed in the event that the local border leaf nodes lose MAC address
information about remote endpoints that are part of the extended Layer 2 domain. A local endpoint (EP1)
may still have valid ARP information in its local cache and hence may still be creating data traffic directed to
the remote EP2 device. If you don’t enable unknown unicast flooding, the data traffic will be dropped at the
spine layer, because no information is available for the remote MAC address destination.

Note: For a more detailed description of the step-by-step packet flow, see the section “Cisco ACI Dual-Fabric:
Deployment Details.”

You also need to safely extend Layer 2 between data center sites to avoid the risk of creating end-to-end Layer 2
loops. Cisco ACI offers three main built-in functions to handle the creation of looped topologies, as shown in
Figure 16.

Figure 16. Cisco ACI Fabric Loopback Protection

These functions, and their impact on the specific dual-fabric design discussed in this document, are as follows:

● Link-Layer Discovery Protocol (LLDP) loop protection: Every time two ports of different leaf nodes that
belong to the same Cisco ACI fabric are connected together, the exchange of LLDP packets causes the
connection to be disabled. The connection is disabled because no leaf-to-leaf connections are ever allowed
for leaf nodes that belong to the same fabric. Note that connection is not disabled when you connect two
leaf nodes of different Cisco ACI fabrics, as it is required when you deploy back-to-back vPC as a DCI
solution.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 21 of 132
● Spanning-tree loop detection: From a spanning-tree point of view, the Cisco ACI fabric is considered to be
like a wire. Therefore, when a Layer 2 device connected on the south side of the fabric originates a
spanning-tree BPDU frame, the Cisco ACI leaf node receiving the BPDU will forward it to all the other leaf
nodes that have local ports and that are part of the same EPG. Assuming that the EPG-to-VLAN mapping is
consistent on all the leaf nodes, this behavior allows the Layer 2 devices connected to the fabric to detect
and block a Layer 2 loop (as shown in the example in Figure 16).
In the context of the dual-fabric design discussed here, the recommended approach is to use vPC
connections between the border leaf nodes and the DCI devices. If you use back-to-back vPC or VXLAN as
DCI technologies, a single vPC logical connection is used on the Cisco ACI border leaf nodes, so a Layer 2
loop cannot be created. If you instead use OTV for DCI, two separate vPC connections are established
between the border leaf nodes and the local OTV devices, as shown earlier in Figure 8.
OTV has embedded capabilities to prevent the creation of data-plane end-to-end Layer 2 loops, helping
ensure that only one local OTV device is handling the Layer 2 traffic (unicast, multicast, and broadcast)
associated with each extended VLAN segment. This feature is very important because spanning-tree
BPDUs are not forwarded across the OTV logical connection, so those loops cannot be detected at the
control-plane level. A Layer 2 spanning-tree loop could be created, however, if you mistakenly connect two
local OTV devices. As shown in Figure 17, the Cisco ACI spanning-tree loop detection mechanism will help
ensure that those BPDUs are looped back toward the OTV devices, allowing them to break the loop.

Figure 17. Detecting a Layer 2 Loop Created by Local OTV Devices

● Miscabling Protocol (MCP) loop detection: This function is the latest addition to the set of Cisco ACI
loopback protections (available with Cisco ACI Software Release 11.1 and later). It detects Layer 2 loops
created southbound of the Cisco ACI fabric by sending MCP probes out from edge Layer 2 ports. Reception
of these probes (which are Layer 2 multicast frames) on a Layer 2 port on a different (or even on the same)
Cisco ACI leaf node indicates that a loop has been created south of the Cisco ACI fabric, and the interface
will be disabled. This mechanism can provide protection for the same scenarios as described for spanning-
tree loop detection. In addition, it can break Layer 2 loops that do not result in the generation of spanning-
tree BPDU frames. A typical example is the deployment of a firewall in transparent (or bridged) mode that
loops traffic between its interfaces because of a misconfiguration.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 22 of 132
Note: MCP loop detection can be enabled at the interface level. It is disabled by default.

For the dual-fabric design, MCP can be used to detect local loops, as shown previously in Figure 17. MCP
cannot be used on Layer 2 ports that connect two separate Cisco ACI fabrics together, because a leaf
device in one Cisco ACI fabric always drops MCP frames that originate from a leaf device that belongs to a
separate Cisco ACI fabric (because of a unique identifier that is added in the MCP packet).

Layer 3 Reachability Across Sites


The considerations described in the previous section apply to Layer 2 segments that are stretched across two
Cisco ACI fabrics. In the validated solution discussed in this document, each of those Layer 2 broadcast domains is
associated with a unique IP subnet. IP subnet information is associated with a bridge domain and with a Layer 3
multitenant construct called a VRF (or private network). The relationship between the EPG, bridge domain, and
VRF instances is shown in Figure 18.

Figure 18. Tenants, Private Networks, Bridge Domains, and EPGs

Note that a tenant is only a logical container of VRF instances, bridge domains, and EPGs that is usually defined
for administrative purposes (the administrator of each tenant usually has rights to modify only constructs that apply
to the administrator’s dedicated environment).

Note: For more information about multitenancy support in the Cisco ACI fabric, see the section “Multitenancy
Support.”

For the specific dual-fabric design under discussion, the relationship between these logical constructs is shown in
Figure 19.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 23 of 132
Figure 19. Tenants, Private Networks, Bridge Domains, and EPGs for Dual-Fabric Design

In this case, a single private network (VRF instance) is defined for each tenant. All the bridge domains are
associated with the same VRF instance. Also, a single IP subnet is defined in each bridge domain, leading to a
one-to-one mapping between IP subnets and Layer 2 broadcast domains (a common networking practice).

This design raises a question, however: If endpoints that are part of the same IP subnet can be deployed across
separate Cisco ACI fabrics, where is the default gateway used when traffic needs to be routed to endpoints that
belong to different IP subnets?

Cisco ACI uses the concept of an anycast gateway: that is, every Cisco ACI leaf node can function as the default
gateway for the locally connected devices. When you deploy a dual-fabric design, you will want to use the anycast
gateway function across the entire system independent of the specific fabric to which an endpoint connects.
Figure 20 shows this model.

Figure 20. Pervasive Default Gateway Used Across Separate Cisco ACI Fabrics

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 24 of 132
The goal is to help ensure that a given endpoint always can use the local default gateway function on the Cisco
ACI leaf node to which it is connected. To support this model, each Cisco ACI fabric must offer the same default
gateway, with the same IP address (common virtual IP address 100.1.1.1 in the example) and the same MAC
address (common virtual MAC address). The latter is specifically required to support live mobility of endpoints
across different Cisco ACI fabrics, because with this approach the moving virtual machine preserves in its local
cache the MAC and IP address information for the default gateway.

Note: The capability to have the default gateway active on multiple sites requires ACI software release 1.2(1i) or
later.

When routing must be performed from an endpoint connected to a Cisco ACI fabric, two scenarios are possible:

● The destination is an internal IP subnet. In the specific dual-fabric design discussed here, an internal IP
subnet is either a subnet that is only locally defined in the same Cisco ACI fabric to which the endpoint
belongs, or it is an IP subnet stretched across Cisco ACI fabrics (using the Layer 2 DCI capabilities to
extend Layer 2 broadcast domains).
● The destination is an external IP subnet: In this case, the IP subnet is defined in the remote Cisco ACI fabric
only or in the WAN and therefore considered an external Layer 3 network domain (and not stretched across
sites or located in the WAN).
For routed communication between two endpoints connected to different Cisco ACI fabrics, the main difference in
these two scenarios is in the way that routed traffic is sent to the destination endpoint:

● If the destination endpoint is connected to an internal IP subnet, routing to the destination IP subnet is
performed in Cisco ACI Fabric 1, and the Layer 2 connection that stretches the bridge domain between data
center sites sends the traffic to the destination, as discussed earlier in the section “Layer 2 Reachability
Across Sites.” Figure 21 shows this scenario.

Figure 21. Routing to an Endpoint Connected to an Internal IP Subnet

● If the destination endpoint is connected to an external IP subnet, a Layer 3 routing path must be defined
between the two Cisco ACI fabrics. As shown in Figure 22, this approach helps ensure that traffic can be
routed across the DCI connection to reach the remote destination endpoint.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 25 of 132
Figure 22. Routing to an Endpoint Connected to an External IP Subnet

You can create the Layer 3 peering between the two Cisco ACI fabrics shown in Figure 22 using a L3Out
logical function enabled on the border leaf nodes in each site. Note that when you run Cisco Nexus 9000
Series platforms in Cisco ACI mode, you can establish a dynamic Layer 3 peering over a vPC connection (a
function that at the time of this writing is not supported when you deploying Cisco Nexus 9000 Series
platforms running NX-OS standalone mode). As a consequence, the validated Cisco ACI dual-fabric design
proposes the use of the same vPC logical interface as is used to bridge Layer 2 traffic for routed
communications with the remote site. With this approach, the vPC logical interface on the border leaf nodes
is associated with the L3Out interface, and a specific VLAN is carried across the DCI connection to allow
establishment of Layer 3 dynamic peering between the border leaf nodes.
Figure 23 shows the validated use of eBGP sessions between the border leaf nodes that belong to separate
Cisco ACI fabrics.

Figure 23. Use of eBGP Control Plane between Cisco ACI Fabrics

Note: The same design model shown in Figure 23 is applicable independent of the DCI technology used to
extend Layer 2 domains between fabrics (vPC back-to-back, OTV, Virtual Private LAN Service [VPLS], VXLAN,
etc.).

Policy Consistency Across Cisco ACI Fabrics


A basic principle of Cisco ACI is that connectivity information (Layer 2 or Layer 3) alone is not enough to allow
communication between two endpoints. If the endpoints belong to two separate security groups (EPGs), a policy is
also required for them to successfully communicate.

Application of security policies (called contracts) between endpoints connected to a Cisco ACI fabric is a two-steps
process:

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 26 of 132
● First, the policy must be created and applied to (at least) two EPGs (one is called the provider of the policy,
the other is called the consumer).
● After the policy is in place, it can be applied to control communication between endpoints belonging to the
two different EPGs. For this mechanism to work, the endpoints must be classified and associated with the
correct EPG.
Association of an endpoint with a specific EPG is achieved by mapping the port to which the endpoint is connected
(which can be a physical or logical interface: for example, a vPC interface, a VLAN or a VXLAN ID) to that EPG.
This mapping can be static (when classifying traffic from physical endpoints: bare-metal servers, routers, switches,
etc.) or dynamic (when creating a VMM domain as the result of establishing a relationship between the APIC and
the VMM; for more information about this topic, see the section “Deploying Hypervisor Integration”).

As previously discussed, in the Cisco ACI dual-fabric design communication can be established between endpoints
connected to separate Cisco ACI fabrics. Because independent APIC clusters manage those fabrics, you must
help ensure that Layer 2 and Layer 3 traffic is properly classified at the point of entrance into the fabric, as
discussed in the following two sections.

Policy Consistency for Layer 2 Communication


As previously discussed, Layer 2 traffic originating from a remote Cisco ACI fabric is received by the Cisco ACI leaf
nodes of the local fabric through the Layer 2 DCI connection. This traffic normally carries a specific VLAN tag that
identifies the Layer 2 broadcast domain to which it belongs.

In the Cisco ACI dual-fabric design, one-to-one mapping is performed between those VLAN tags and the
corresponding EPGs. This approach allows traffic flows to be associated with the proper security group (EPG),
achieving a logical EPG extension across data center sites, as shown in Figure 24.

Figure 24. Contract Relationship with Static Binding

In the example in Figure 24, Web1 and App1 EPGs are defined on the APIC that manages Cisco ACI Fabric 1,
together with the contract (security policy) C1 that governs the communication between endpoints that belong to
those EPGs. Web2 and App2 are EPGs defined on the APIC that manages Cisco ACI Fabric 2, and a C2 contract
(security policy) is applied between them. In a scenario in which the two application tiers are stretched between the
two separate sites (with the goal of being able to freely move across data centers the workloads that belong to
those tiers), you must help ensure that endpoints that belong to the Web1 EPG in Fabric 1 are treated as part of
the same extended EPG as the endpoints that belong to the Web2 EPG defined on the APIC in Fabric 2.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 27 of 132
An endpoint belonging to EPG Web1 and trying to communicate with an endpoint in the extended application EPG
(represented by the App1 EPG deployed in Fabric 1 and the App2 EPG deployed in Fabric 2) is subject to the C1
security policy defined on the APIC in Cisco ACI Fabric 1. This is the case independent of whether the application
endpoint is connected to Fabric 1 EPG App1 or Fabric 2 EPG App2, because routing between the web and
application IP subnets happens locally inside Cisco ACI Fabric 1, and traffic is then bridged across sites with the
VLAN tag (VLAN 301) associated with the application EPG.

Note in the example in Figure 24 that return traffic from the endpoint in EPG App2 to the endpoint in EPG Web1 is
subject to the C2 contract (security policy) because the application endpoint is connected to Fabric 2. For this
reason, the two policies must be configured consistently to avoid unexpected behavior depending on the Cisco ACI
fabric to which the endpoints are physically connected. Manual synchronization is possible but is an operationally
complex and tedious process. Use of an orchestrator that communicates with both APIC clusters to properly
configure Cisco ACI parameters is the recommended approach, as it will be discussed in the section “Cisco UCS
Director and Cisco ACI Dual-Fabric Design.”

Also, this asymmetric behavior across the two fabric does not represent a problem because Cisco ACI contracts
are stateless, so there is no need to see both legs of the same communication to allow traffic, as would be the case
with a L4-L7 stateful firewall implementation.

Note that when you define a specific filter entry associated with a Cisco ACI contract, you can enable a Stateful
option. This option is used to program a reflective access control list (ACL) in the hardware to allow TCP packets
only if the acknowledgment (ACK) flag is set, and it does not perform a true stateful inspection. This behavior is
shown in Figure 25, in which a stateful Cisco ACI contract is configured between EPG A and EPG B to allow only
traffic destined for port 80.

Figure 25. Use of Stateful Contract Policy

Enabling the Stateful flag creates a second entry that permits traffic from an endpoint in EPG B to travel to an
endpoint in EPG A (sourced from port 80) only if the ACK flag is set in the TCP header. The asymmetric traffic path
shown in Figure 24 does not create problems even when this stateful behavior is enabled.

Policy Consistency for Layer 3 Communication


The need for policy consistency applies also to Layer 3 communication between separate Cisco ACI fabrics. There
are two scenarios to consider:

In the first scenario, in line with the approach taken in this document, you deploy separate EPGs in separate bridge
domains. In this case, endpoints that belong to different EPGs are also part of separate IP subnets, as shown in
Figure 26

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 28 of 132
Figure 26. One EPG per Bridge Domain with a Unique IP Subnet

In this case, traffic can be classified into a specific security group (EPG) based simply on the IP subnet of the
sourcing endpoint. In the example in Figure 26, when endpoint 172.10.1.1, part of the Web1 EPG in Fabric 1,
sends traffic to Fabric 2, this traffic can be mapped to an Ext-Web1 EPG associated with the L3Out logical
construct defined on the border leaf nodes in Fabric 2. The external EPG associated with the L3Out interface is
used to model external Layer 3 networks that try to communicate with resources internal to the Cisco ACI fabric.

From a policy point of view, a specific security contract must always be provided between the external EPGs
associated with L3Out connections and the internal EPG to which the destination endpoint belongs (in this
example, App2 in Fabric 2). The main requirement here is that this contract must be consistent with the contract
used when a web endpoint tries to communicate with a local application endpoint (in other words, C1 and C2
should be consistent). This requirement helps ensure consistent policy enforcement of intrafabric and interfabric
communication between different application tiers.

Note that communication between an endpoint that is part of the Web1 EPG and an endpoint this is part of the
App2 EPG (shown in Figure 26) is always subject to two contracts: The C1 policy controls communication in Cisco
ACI Fabric 1 between the internal Web1 EPG and the external EPG Ext-App2. The C2 contract governs
communication in Fabric 2 between the Ext-Web1 EPG and the internal App2 EPG.

In the second scenario, multiple EPGs are associated with the same bridge domain. In the example shown in
Figure 27, in which a single IP subnet is associated with the bridge domain, endpoints that belong to different
EPGs are part of the same IP subnet.

Figure 27. Multiple EPGs per Bridge Domain

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 29 of 132
In this case, you cannot map an IP prefix to an External EPG because two specific IP addresses that are part of
the same IP subnet prefix may refer to endpoints that belong to different EPGs. A more precise approach is thus
required, in which you use the specific host route information to map the Layer 3 traffic from Fabric 1 to the proper
external EPG defined in Fabric 2.

Note: This second scenario is not discussed in the rest of this paper.

Hypervisor Integration
To provide tight integration between physical infrastructure and virtual endpoints, Cisco ACI can integrate with
hypervisor management servers (VMware vCenter, Microsoft SCVMM, and OpenStack are available options at the
time of this writing). These hypervisor management stations are usually referred to as virtual machine managers, or
VMMs. You can create one (or more) VMM domains by establishing a relationship between the VMM and the APIC
controller.

For more detailed information about APIC integration with vCenter and SCVMM, please refer to the following links:

● http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-
x/virtualization/b_ACI_Virtualization_Guide/b_ACI_Virtualization_Guide_chapter_010.html
● http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-
infrastructure/guide-c07-735992.html
In the dual-fabric solution, separate APIC clusters are deployed to manage different Cisco ACI fabrics; hence,
different VMM domains are created in separate sites. Depending on the specific deployment use case, you may
want to allow endpoint mobility across data center sites, which requires moving workloads across VMM domains.
At the time of this writing, the only possible solution is to integrate the APIC with VMware vSphere Release 6.0,
because this release introduces support for live migration between VMware ESXi hosts managed by different
vCenter servers (Figure 28).

Figure 28. Live Migration across vCenter Servers

Cisco ACI Release 11.2 introduces support for integration with vCenter 6.0, so it is the minimum recommended
release needed to support live migration across the dual-fabric deployment. Note that Cisco ACI Release 11.2
supports live mobility only when the native VMware vSphere Distributed Switch (DVS) virtual switch is used.
Starting with the next Cisco ACI release, support will be extended to deployments using the Cisco Application
Virtual Switch (AVS) on top of vSphere.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 30 of 132
Note: For more information about the use of AVS with vSphere and the integration with Cisco ACI, please refer
to the following document: http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-
centric-infrastructure/white-paper-c11-731999.html

L4-L7 Service Integration Models


If you want to integrate L4-L7 services in a Cisco ACI deployment, you can select from the following options:

● Managed integration using a service graph with a device package: In this case, the Cisco ACI fabric
provides both connectivity and configuration of the L4-L7 device using the device package. The device
package normally is provided by the vendor and consists of a packaged set of scripts that the APIC uses to
configure the device. It is similar to a device driver in a typical operating system.
● Unmanaged integration using a service graph only, without a device package required: In this model, the
Cisco ACI fabric provides only connectivity to and from the L4-L7 device. The configuration of the device is
left to the end user and can be performed either using native tools provided by the manufacturer of the
device or using an orchestration tool for the device. This approach does not require the use of device
packages, which allows you to integrate essentially all third-party network service functions, even when a
device package from the specific vendor is not available.
● No integration, with traffic to and from the L4-L7 appliance mapped manually: This deployment model
doesn’t use either a service graph or a device package. Traffic is mapped to and from the L4-L7 appliance
either on the basis of Layer 2 or using Layer 3 routing. In other words, the network service device is
connected to the Cisco ACI fabric as a generic physical resource.

Data Center Firewall Deployment


Data center firewalls can be categorized into two types:

● Internal firewalls: These are used for east-west traffic between security zones inside a data center. In many
scenarios, you may be able to use the contracts-based security natively provided by the Cisco ACI fabric.
● External (perimeter) firewalls: These protect north-south communication in and out of the data center. This
type of communication often requires stateful traffic inspection, so external firewalls are commonly deployed
using dedicated security appliances.

For the purposes of this document and the validation process, a Cisco ASA firewall is used for the perimeter, and
Cisco ACI fabric contracts are used to protect the east-west traffic.

The perimeter firewall often also handles other types of communication in the organization, and a dedicated team
often manages it, so the design discussed here uses the “no integration” deployment model. So from the viewpoint
of Cisco ACI, the firewall is an externally managed physical device.

Cisco ASA Deployment Models


The ASA platform provides a number of options for deploying a redundant solution across dual data centers.

Cisco ASA Active-Standby Deployment


With the active-standby deployment option, you place one firewall appliance in one data center (DC1) and another
appliance in a second data center (DC2). One of the appliances will be active (forwarding traffic), and the other will
be in standby mode. Both appliances communicate with each other, and all configurations and session information
are kept synchronized, and in the event of a failure the standby appliance is ready to take over immediately. If
multitenancy support is required, each firewall can host multiple contexts: one context per tenant. You can run

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 31 of 132
different contexts as active on different appliances, providing load sharing (for example, you can have half of the
contexts active in one appliance, and half active in the other appliance), as shown in Figure 29.

Figure 29. Per-Context Active-Standby Deployment

The active ASA firewall (or better, the active context) peers with the Cisco ACI fabric on its internal interface. At the
same time, an OSPF peering is established between the outside interface and the WAN edge routers, using the
Cisco ACI fabric as pure Layer 2 transport for the establishment of this peering.

Consider the network subnets deployed in the scenario in Figure 29:

● Subnet A: Available only in DC1 (10.100.14.0/24)


● Subnet B: Available only in DC2 (10.200.14.0/24)
● Subnet C: Stretched across and available in both data centers (10.1.4.0/24)
● Subnet D: Represents an external Layer 3 destination in the WAN

Assume that all these IP subnets are associated with the same tenant using ASA Context 1 for communication with
the external network domain. The following occurs:

● Traffic from subnet A (DC1 only) leaving the fabric and destined for subnet D in the WAN uses the local
ASA in DC1. This traffic uses the optimal forwarding path.
● Traffic from subnet B (DC2 only) destined for the WAN uses the Layer 3 DCI connection between the
fabrics to get to DC1, where the active ASA is located. This traffic uses a suboptimal forwarding path. A
suboptimal path is also used for the return traffic from subnet D to subnet B.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 32 of 132
● Traffic from subnet C destined for the WAN:
◦ Traffic originating from DC1 uses the local ASA in DC1. This traffic uses the optimal forwarding path.
◦ Some devices currently located in DC2 will have to use the Layer 3 DCI between data centers to reach
the active ASA in DC1. This traffic uses a suboptimal forwarding path.
Note that the return traffic from subnet D to subnets A, B, and C follows a path that is symmetric to that in the
outbound direction, because the only entry point into the fabric is the active firewall deployed in Site 1.

The introduction of a second deployment model, the ASA cluster, is required if the goal is to improve the north-
south forwarding behavior.

Cisco ASA Cluster Deployment


The cluster deployment model uses ASA firewalls running in cluster mode. Cluster mode allows multiple ASA
appliances to be grouped into a single logical unit. Cluster mode offers several benefits, including:

● Scalability up to 16 nodes
● Simple management using the master node as a central point of management for the whole cluster
● State sharing using the cluster control link (CCL)
● High availability

ASA clustering differs from the traditional active-standby deployment model. In cluster mode, every member of the
cluster has the same configuration, is capable of forwarding every traffic flow, and can be active for all flows.

In the event of a failure, connectivity is maintained through the cluster because the connection information is
replicated to at least one other unit in the cluster. Each connection has a replicated connection residing on a
different cluster unit and takes over if a failure occurs.

The benefits of clustering over multiple data centers include the following:

● Some of the north-south traffic flows in a multisite data center can be asymmetrical or suboptimal (as
discussed in the previous section). Clustering features force asymmetrical flows to become symmetrical.
● Clustering provides transparent stateful live migration with automatic redirection of flows.
● Clustering provides consistent firewall configurations between data centers.
● New connections can be offloaded to other members of the cluster if the firewall is too busy.
● Clustering provides strong redundancy and disaster recovery in the event of an appliance or link failure.

For a detailed introduction to cluster mode, refer to the following:


http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/VMDC/ASA_Cluster/ASA_Cluster.html.

The design discussed in this document uses two physical ASA 5585-X appliances in each data center, for a total of
four devices in the cluster. Within each data center, firewalls are attached to the Cisco ACI fabric using vPC
technology. Each firewall uses two vPCs: one for the data traffic, and one for the Cluster Control Protocol (CCP)
traffic: the CCL, as shown in Figure 30.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 33 of 132
Figure 30. Cisco ASA Cluster Deployment Model

This design uses multiple VLANs to bring the traffic in and out of the firewall through the data port channel. VLANs
are also used to access the multiple contexts of the firewall.

The firewalls are running in routed mode. Within each site, each firewall peers on its inside interface with the local
Cisco ACI fabric using a dynamic routing protocol (OSPF). On the outside interface, the firewall peers with the local
WAN edge routers through the Cisco ACI fabric (the fabric performs only Layer 2 transport functions in this specific
case).

A dedicated VLAN is used to provide the CCL connectivity between cluster nodes within each site. This VLAN is
also extended through the DCI link to the other data center, allowing all four firewalls to communicate and build the
cluster. This connectivity uses the CCL vPC out of the ASA.

Figure 31 presents the same example as previously discussed for the active-standby ASA option, but showing the
traffic-path-optimization benefits of an ASA cluster deployment.

Figure 31. Traffic Path with Cisco ASA Cluster Deployment

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 34 of 132
● Traffic from subnet A (DC1 only) leaving the fabric and traveling to subnet D in the WAN uses one of the
local ASA devices in DC1. This traffic uses the optimal forwarding path.
● Traffic from subnet B (DC2 only) leaving the fabric and traveling to the WAN uses one of the local ASA
devices in the data center. This traffic uses the optimal forwarding path.
● Traffic from stretched subnet C:
◦ Traffic originating from DC1 uses one of the local ASA devices in DC1. This traffic uses the optimal
forwarding path.
◦ Traffic for some devices currently located in DC2 use one of the local ASA devices in DC2. This traffic
uses the optimal forwarding path.
As you can see, the cluster mode uses the optimal forwarding path in all situations for outbound traffic. The same
is true for traffic from the WAN to the nonstretched IP subnets (A and B). For stretched subnet C, the WAN will
likely learn the IP prefix information from both sites, so traffic may be steered indifferently to Site 1 or Site 2. The
deployment of an ASA cluster helps ensure that this communication can succeed even when it results in
asymmetric behavior (for instance, outbound traffic to the WAN leaves from Site 1 and reenters through Site 2).
This approach works because the ASA cluster nodes can redirect flows through the CCL to the cluster node that
holds the state information for each specific flow.

Note: Several techniques (global site load balancer [GSLB], LISP, etc.) can be used to influence the inbound
traffic flows destined for stretched subnets to help ensure that they are delivered to the correct sites: the site to
which the specific destination endpoint is connected. With these techniques, you would not have to perform cluster
redirection within ASA devices, which is a suboptimal traffic behavior. These inbound path optimization options are
beyond the scope of this document.

Multitenancy Support
The Cisco ACI dual-fabric design discussed in this document fully supports multitenancy. This section provides an
overview of this support.

For simplicity, the example in Figure 32 shows one Cisco ACI fabric. However, the design and the multitenancy
considerations are the same for the other site (or other sites, if more than two are used).

Figure 32. Multitenancy Support

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 35 of 132
Logical isolation across different tenants is maintained end to end as described here for traffic coming from the
WAN and going to the data center. The same logic applies for the return traffic from the data center to the WAN.

● In the WAN (for traffic between a WAN router and WAN edge router), multitenancy is achieved through well-
known WAN multitenant solutions, such as MPLS Layer 3 VPN and VRF-lite. Figure 32 shows a VRF-lite
approach between the WAN edge router and the WAN router. In this case, the routers are interconnected
through an IEEE 802.1Q trunk with multiple Layer 3 subinterfaces, with each interface associated with a
different VRF instance. There is one eBGP session per VRF instance between the routers for route
distribution.
● To preserve multitenancy between the WAN edge router and the firewall, the WAN edge router is
connected to the Cisco ACI fabric with a Layer 3 port-channel interface, and Layer 3 subinterfaces are
created and allocated to different VRF instances (VRF-Lite approach). On the Cisco ACI fabric, on a per-
tenant basis, an EPG is created and statically bonded to the vPC to which the WAN edge router is
connected. This same EPG is also statically bonded to the vPC connected to the ASA firewall. Therefore,
the Cisco ACI fabric provides a transit Layer 2–only EPG and bridge domain on a per-tenant basis for the
Layer 3 subinterface on the WAN edge router to communicate with the firewall context associated with the
same tenant. Between the WAN edge router and the firewall context, OSPF is used as the routing protocol
for the WAN edge router to advertise WAN subnets to the firewall and for the firewall to advertise the tenant
subnets to the WAN.
● The ASA in this design is configured in multiple-context routed mode to provide multitenancy support, and it
runs OSPF within the context with the WAN edge router. The ASA provides the firewall services for traffic
entering and leaving the Cisco ACI fabric and is inserted in the data path through routing. When the traffic
leaves the ASA and travels toward the Cisco ACI fabric or toward the WAN edge router, the traffic is tagged
with the VLAN ID that was configured on the Cisco ACI fabric. This tagging helps ensure that the traffic is
forwarded by the fabric within the designated tenant.
● As previously mentioned in the section Layer 3 Reachability Across Sites, multitenancy is natively
supported in the Cisco ACI fabric itself. All configurations that dictate traffic forwarding in Cisco ACI are part
of a tenant. The application abstraction demands that EPGs always be part of an application network
profile, and the relationship between EPGs through contracts can span application profiles within the same
tenant and even between tenants.
Bridging domains and routing instances to move IP packets across the fabric provide the transport
infrastructure for the workloads defined in the EPGs. Within a tenant, you define one or more Layer 3
networks (VRF instances), one or more bridge domains per network, and EPGs to divide the bridge
domains.
● In the Cisco ACI dual-fabric design, each tenant is also configured with two L3Out logical connections for
routed connectivity to external networks. The external networks in this case are the WAN and the subnets
that exist exclusively in the remote data center. One of the L3Out interfaces provides the OSPF peering
between the fabric and the ASA firewall, and the fabric uses it to receive the WAN routes and to advertise
the internal (tenant) subnets to the ASA, which then advertises them to the WAN.
The second L3Out interface is configured on each tenant (and is called L3Out-DCI in Figure 32). This
interface is configured with eBGP and is used to establish BGP peering between the fabrics in the data
centers so that each fabric can advertise its local subnets to the other fabric. Here, a local subnet is a
subnet that, for example, exists only in Site 2 but still needs to be reachable from Site 1. This subnet is
considered an external subnet from the perspective of Site 1, and it is learned through the eBGP peering

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 36 of 132
established through the L3Out-DCI. This eBGP peering session is transported over the DCI links that
interconnect the two sites and is established one per tenant (VRF-lite configuration).

Cisco UCS Director and Cisco ACI Dual-Fabric Design


Cisco UCS® Director is a unified infrastructure management solution that provides management from a single
interface for computing, networking, storage, and virtualization layers (Figure 33).

Figure 33. Cisco UCS Director as a Multidomain Cloud Management Platform

Cisco UCS Director uses a workflow orchestration engine with workflow tasks that support the computing,
networking, storage, and virtualization layers. Cisco UCS Director supports multitenancy, which enables policy-
based and shared use of the infrastructure.

Cisco UCS Director integrates with Cisco ACI by communicating with an APIC cluster. When Cisco UCS Director
establishes a connection with the APIC, it discovers all infrastructure elements in the Cisco ACI fabric.

To establish the connection from Cisco UCS Director to the APIC, you just provide the IP address of one of the
APIC controllers in the APIC cluster with a username and password. Cisco UCS Director will automatically discover
the IP address of other APIC nodes that are part of the same APIC cluster. If the IP address of the APIC that was
used to establish the connection goes down or is not reachable for 45 seconds, Cisco UCS Director tries to use
any reachable controller IP address to interact with the APIC cluster.

After Cisco UCS Director has established connection with the APIC, information about the Cisco ACI fabric
becomes available in Cisco UCS Director. A list of the information collected and displayed by Cisco UCS Director
is available in the document Cisco UCS Director Configuration Guide for Cisco ACI.

Cisco UCS Director can establish connection with one or more APIC clusters, including with APIC clusters
deployed at multiple sites as in the Cisco ACI dual-fabric design described in this document.

After Cisco UCS Director has established connection with multiple APIC clusters, it can deploy multitier
applications in one or more data centers and create Cisco ACI objects (EPGs, bridge domains, subnets, etc.) in
multiple APIC clusters simultaneously.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 37 of 132
To deploy a multitier application in Cisco ACI, Cisco UCS Director creates an application profile or uses an existing
one. An application profile defines the following:

● Cisco ACI network tiers for delivering application resources for the associated tenant profile
● A suitable resource group, which defines the capacity and quality of the Cisco UCS physical, virtual,
computing, and storage resources for each application component
● Cisco ACI network services that are required to deliver the appropriate service quality and security for the
application

To perform automated provisioning of the Cisco ACI configuration, Cisco UCS Director uses workflows that consist
of tasks. Cisco UCS Director comes preconfigured with more than 200 workflow tasks specific to Cisco ACI, out of
a total of more than 1800 network, computing, and storage tasks. You can drag and drop the tasks to create the
desired workflow (Figure 34).

Figure 34. Cisco UCS Director Workflows

If you need some specific Cisco ACI tasks that are not available in the Cisco UCS Director library, you can use an
open-source tool to automatically generate Cisco UCS Director custom tasks for Cisco ACI automation. This tool is
available at https://github.com/erjosito/request and is shown in Figure 35.

Figure 35. Tool for Creating Cisco UCS Director Custom Tasks

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 38 of 132
You can use this tool to automatically generate WFDX files containing custom tasks that can be imported into
Cisco UCS Director. You need to capture the Cisco ACI representational state transfer (REST) calls (for example,
with the API inspector or the Save As function) and paste them in the tool.

You can then use these custom tasks to build more complex workflows by combining them with the preconfigured
Cisco ACI tasks.

In the Cisco ACI dual-fabric design discussed in this document, a single instance of Cisco UCS Director is
deployed, and it establishes connection with the APIC clusters in both data centers. After that, Cisco UCS
Director becomes the central and preferred platform for the provisioning of application network profiles, EPGs,
bridge domains, etc., as shown in Figure 36, because it keeps the configuration between the two APIC clusters
consistent. Cisco UCS Director discovers configuration changes performed directly in the APIC controllers as it
monitors the infrastructure for changes, and it reflects those changes in its object model. The configuration,
however, is not automatically synchronized with the other APIC cluster.

Figure 36. Cisco UCS Director Integration in a Cisco ACI Dual-Fabric Design

Cisco UCS Director includes several workflows for Cisco ACI provisioning, and these can easily be customized.
You can perform customization or create new workflows by yourself, or Cisco Advanced Services can assist with
the creation of the workflows.

A demonstration of the use of Cisco UCS Director with Cisco ACI to provision and deprovision a three-tier
application is available here.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 39 of 132
Storage Considerations
The Cisco ACI dual-fabric design does not impose specific storage requirements. However, to achieve an active-
active architecture, you must consider the required storage architecture for your application.

For disaster recovery, asynchronous replication is usually acceptable. This approach allows data centers to be
physically located hundreds or even thousands of miles apart. If you use asynchronous replication, some amount
of data loss on application failover must be acceptable because not all data will be synchronized.

With synchronous replication, when disk I/O is performed by the application or by the file cache system on the
primary disk, the server waits for I/O acknowledgment from the local disk and from the remote disk before sending
an I/O acknowledgment to the application or to the file system cache. This mechanism allows failover without data
loss, but it limits the distance between the data centers. Usually, synchronous replication is limited to tens or
hundreds of miles and is commonly used within metropolitan areas.

For live migration of virtual machines, the storage infrastructure should provide host access to shared storage.
Therefore, during a vMotion operation, the migrating virtual machine must be on storage that is accessible to both
the source and target hypervisors hosts.

Depending on the hypervisor, live migration of virtual machines can be supported in environments without shared
storage. For example, the VMware vSphere 6.0 hypervisor can support this migration. In this case, you can use
vMotion to migrate virtual machines to different computing resources and storage devices simultaneously, which
means that you can migrate virtual machines across storage accessibility boundaries. This approach is useful for
performing cross-cluster and cross-data center migrations when the target cluster machines may not have access
to the source cluster’s storage. Note that vMotion migration in environments without shared storage take
considerably longer than such migration in environments with shared storage.

Note: Recommendations about specific storage products and solutions for an active-active data center
architecture are beyond the scope of this document.

Cisco ACI Dual-Fabric: Deployment Details


This section of the document provides the details required to implement a low-level design and deployment plan for
a Cisco ACI dual-fabric design.

Validated Topology
Figure 37 shows the overall validated topology, its components, the software versions used, and the physical
connectivity. Subsequent subsections provide details about each specific area.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 40 of 132
Figure 37. Cisco ACI Dual-Fabric Validated Topology

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 41 of 132
Cisco ACI Fabric
Each data center has its own Cisco ACI fabric with the Cisco Nexus 9000 Series used as leaf and spine switches
and three APIC controllers forming each APIC cluster per site.

The leaf switches are connected to the spine switches in a full bipartite graph, or Clos architecture. There are no
links between the leaf nodes or between the spine nodes.

The APIC controllers are connected on any ports of the fabric leaf switches with 10 Gigabit Ethernet interfaces.
Each APIC has two 10-Gbps interfaces, and each interface should be connected to a different leaf switch. Each
APIC can be connected to a different pair or to the same pair of leaf switches. In the topology in Figure 37, all three
APICs are connected to the same pair of leaf switches (leaf 101 and leaf 102).

The Cisco ACI platform tested in this design guide uses these software releases:

● Cisco APIC Software Release 1.2(3c)


● Cisco Nexus 9000 Series Software Release n9000-11.2(3c)

Software releases later than the ones listed here also support the design discussed in this document.

Note: You should adjust the actual numbers and models of the leaf and spine switches and APIC controllers
based on the number of ports and bandwidth required in your specific deployment.

Firewalls
Two options for firewall configuration were validated: active-active failover and ASA cluster deployments.

For active-active failover, you divide the security contexts on the ASA into two failover groups. A failover group is
simply a logical group of one or more security contexts. One group is assigned to be active on the primary ASA,
and the other group is assigned to be active on the secondary ASA. When a failover occurs, it occurs at the failover
group level. In a Cisco ACI dual-fabric design using active-active failover, one ASA is located in data center 1, and
the other is located in data center 2, and the ASAs are configured in routed mode with multiple contexts.

For the physical connection between the ASA and Cisco ACI, the ASAs connect to the Cisco ACI leaf switches
using an EtherChannel with LACP on the ASA side and a vPC on the Cisco ACI leaf switches. Separate
EtherChannels are used for the data and failover links. ASA does not allow user data and the failover link to share
interfaces, even if different subinterfaces are configured for user data and failover. The failover link and stateful
failover link (also known as the state link) share the same link, because this is the best way to reduce the number
of physical interfaces used.

The second firewall design validated in the Cisco ACI dual-fabric design uses ASA clustering. ASA clustering lets
you group multiple ASAs together as a single logical device. A cluster provides all the convenience of a single
device (management, integration into a network, etc.) while achieving the increased throughput and redundancy of
multiple devices.

ASA clustering is the preferred firewall option for the Cisco ACI dual-fabric design. In an ASA cluster, all the
firewalls actively forward traffic, allowing better utilization of the resources. Also, the cluster automatically manages
any asymmetric traffic, redirecting the traffic back to the specific ASA node that owns the connection.

In the Cisco ACI dual-fabric design with ASA clustering, each data center has two Cisco ASA 5585-X firewalls. Any
Cisco ASA firewall that supports ASA clustering can support the validated design. The four ASA firewalls present in

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 42 of 132
the topology are optimally configured in an all-active ASA cluster as detailed in the section “Cisco ASA Cluster
Integration” later in this document.

The ASA software release tested on this design guide is Cisco ASA Software Release 9.5(1).

For the physical connectivity to establish the ASA cluster, each ASA has two EtherChannels. One is used for data
traffic, and the other is used for the cluster control link, or CCL. As shown in Figure 38, the EtherChannel member
links are connected to two different Cisco ACI leaf switches, and a vPC is configured on the Cisco ACI fabric.

EtherChannels and vPC have built-in redundancy. If one link fails, traffic is rebalanced between remaining links. If
all the links in the EtherChannel fail on a particular device but other devices are still active, the failed device is
removed from the cluster.

The recommended way to connect ASAs to the Cisco ACI fabric is to use EtherChannels with LACP on the ASA
connected to a vPC with LACP on the Cisco ACI fabric.

Figure 38. Using EtherChannels on Cisco ASA Cluster Nodes

In the Cisco ACI dual-fabric design, the ASA cluster is configured in routed mode with multiple contexts using
individual interfaces. OSPF is used as the routing protocol for peering between the Cisco ACI fabric and the ASAs
and between the ASAs and the WAN edge routers using subinterfaces on the data EtherChannel to best utilize the
available interfaces.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 43 of 132
For the operation of the ASA cluster, each CCL has an IP address on the same subnet. This subnet should be
isolated from all other traffic and should include only the ASA CCL interfaces. In the case of the dual-fabric design,
the CCL must be extended between the data centers. To achieve this extension, the CCL links of all the devices
are placed in the same EPG and bridge domain in the Cisco ACI fabrics, and this EPG and bridge domain are
extended over the DCI links.

To help ensure CCL operation, the RTT between devices must be less than 20 milliseconds (ms). This maximum
latency enhances compatibility with cluster members installed on different sites. To check the latency, enter a ping
command on the CCL between devices.

For intersite clustering, you need to size the DCI bandwidth appropriately. You should reserve bandwidth on the
DCI for CCL traffic equivalent to the following calculation:

● (Number of cluster members per site/2) x CCL size per member

If the number of members differs at each site, use the larger number for your calculation.

The minimum bandwidth for the DCI should not be less than the size of the CCL for one member. For example, for
four cluster members total at two sites, with two members at each site, you need a 5-Gbps CCL per member:

Reserved DCI bandwidth = 5 Gbps (2/2 x 5 Gbps)

For intersite clustering, do not configure connection rebalancing. You do not want connections rebalanced to
cluster members at a different site. The cluster implementation does not differentiate between members at multiple
sites for incoming connections. Therefore, connection roles for a given connection may span sites. This behavior is
expected.

Data Center Interconnect


In the Cisco ACI dual-fabric design, the sites are interconnected through DCI technologies so that Layer 2 and
Layer 3 communication can occur.

VXLAN and back-to-back vPC were validated as part of the Cisco ACI dual-fabric design. However, OTV is also
supported as a DCI option.

Note: At the time of this writing, OTV is still the recommended DCI solution because of its level of maturity and
the specific functions it offers in this environment.

Figure 39 shows the physical connectivity when VXLAN is used as the DCI option.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 44 of 132
Figure 39. VXLAN as the DCI Option

A double-sided vPC is used between the Cisco ACI leaf switches and a pair of Cisco Nexus 9300 platform
switches used for DCI: that is, both the Cisco ACI leaf switches and the Cisco Nexus 9000 Series Switches running
in NX-OS mode run vPC. To provide increased resiliency and bandwidth, four links are bundled in the same vPC,
and LACP is used on the vPC on both sides.

The Cisco Nexus 9000 Series Switches running in NX-OS mode extend Layer 2 between the sites using VXLAN.
The software validated on those switches is Cisco NX-OS Release 7.0(3)I2(2a).

Between the Cisco Nexus 9000 Series Switches running in NX-OS mode, three links are recommended. Two links
are bundled in a port channel to form the vPC peer link. The third link (192.168.1.0/24 in the topology) is a Layer 3
routed link used to prevent a DCI network isolation scenario. In other words, if a Layer 3 uplink from the DCI
switches to the DCI Layer 3 network fails, then this link between the DCI switches is used as the alternative path to
protect the DCI switches from being isolated.

Note: Alternatively, you can create this Layer 3 peering between the Cisco Nexus 9000 Series Switches by
using a dedicated VLAN carried on the vPC peer link (with corresponding switch virtual interfaces [SVIs] defined on
the two switches).

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 45 of 132
To reach the other site, each DCI switch has a Layer 3 routed uplink to the WAN or DCI network that leads to the
remote site (in the topology in Figure 30, the DCI WAN network is represented by the DCI router). The use of two
DCI routers is recommended, with each DCI switch connected to a different DCI router so that there is no single
point of failure.

Because the traffic between the sites is VXLAN encapsulated using unicast, one or multiple Layer 3 hops between
the sites are supported.

WAN Connectivity
For connectivity from the Cisco ACI fabrics in the data centers to the WAN (remote sites), both data centers have a
local WAN exit point with one or more WAN edge routers.

For connectivity between the Cisco ACI fabrics and the WAN edge routers in each site, each WAN edge router
uses a local port channel to connect to both Cisco ACI border leaf nodes (LACP is used to allow dynamic bundling
of the physical links in the port channel).

The ports on the Cisco ACI fabric connected to the WAN edge router are statically bonded to an EPG, hence
providing Layer 2 services between the WAN edger routers and the ASA firewalls as well as connecting the WAN
edge routers in the two data centers through the DCI links. Figure 40 shows the physical connection.

Figure 40. Cisco ACI Fabric Connection to the WAN Edge Router

Any router that supports LACP, Layer 3 port channels, OSPF, BGP, and optionally VRF (when multitenancy is
required) can be used as the WAN edge router in the Cisco ACI dual-fabric design.

From a logical perspective, when you connect the WAN edge router to the fabric, you are connecting a router to a
regular EPG port. In other words, the fabric is providing Layer 2 services between the WAN edge router and the
ASA firewall. No L3Out connection is established for this type of connectivity.

Note that when the WAN edge router is connected to a regular EPG port, Cisco Discovery Protocol and LLDP must
be disabled on the router or on the fabric port for the endpoint information to be learned, as shown in Figure 41.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 46 of 132
Figure 41. Cisco Discovery Protocol and LLDP Required Configuration

Logical Traffic Flow


The test topology uses a number of test flows to validate the capabilities of the design. This document focuses on
two main types of traffic patterns:

● Traffic from the data center to the WAN (north-south traffic)


● Traffic between data centers (east-west traffic)

Traffic from DC1 to the WAN


In the first test flow, traffic originates from a Spirent traffic generator connected to Leaf3 in DC1 (10.100.14.0/24)
and is sent to a WAN destination (192.99.14.0/24), as shown in Figure 42

Figure 42. Traffic from DC1 to the WAN

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 47 of 132
The default gateway for the 10.100.14.0/24 subnet is located in the Cisco ACI fabric in DC1 (because this subnet is
defined only at this site). First consider the routing table in Leaf3, where the source Spirent port is located:
i06-9396-03# show ip route vrf TnT-14:TnT-14NET 192.14.99.0/24
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.14.99.0/24, ubest/mbest: 2/0


*via 10.0.192.95%overlay-1, [200/1], 1d17h, bgp-100, internal, tag 100
recursive next hop: 10.0.192.95/32%overlay-1
*via 10.0.192.91%overlay-1, [200/1], 1d17h, bgp-100, internal, tag 100
recursive next hop: 10.0.192.91/32%overlay-1

From the output, you can see two paths to the destination subnet advertised through BGP (this is the MP-BGP
VPNv4 process used by the Cisco ACI border leaf nodes to advertise external IP prefixes to the fabric). The next-
hop IP addresses are the VTEP addresses of Leaf1 and Leaf2.

Following the path to the destination, here is the routing table on Leaf1:
i06-9396-01# show ip route vrf TnT-14:TnT-14NET 192.14.99.0/24
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.14.99.0/24, ubest/mbest: 2/0


*via 192.14.31.102, vlan79, [110/1], 1d18h, ospf-default, type-2, tag
3489661028
*via 192.14.31.105, vlan79, [110/1], 1d18h, ospf-default, type-2, tag
3489661028

The next-hop IP addresses you see here belong to the two ASA devices in DC1 and are learned through OSPF.
On the basis of the hashing, select either ASA1 or ASA2. The ASA device that is selected will become the flow
owner.

Note: If there is any asymmetry in the return traffic, the ASA cluster will redirect the return flow through the CCL
back to the ASA node that owns the flow (the node that originally created state information for the outbound flow).

Following the path to the destination, here is the routing table on the first ASA node in DC1:
DC1-ASA-1_i05-5585-01/TnT-14/master# show route 192.14.99.0
Routing Table: TnT-14
Routing entry for 192.14.99.0 255.255.255.0
Known via "ospf 1", distance 110, metric 1
Tag Complete, Path Length == 1, AS 100, , type extern 2, forward metric 10

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 48 of 132
Last update from 192.14.20.1 on outside, 42:14:25 ago
Routing Descriptor Blocks:
* 192.14.20.1, from 192.14.1.1, 42:14:25 ago, via outside
Route metric is 1, traffic share count is 1
Route tag 3489661028

The next-hop IP address you see here belongs to the local WAN edge router in DC1, which communicates external
routing information to the ASA through OSPF.

Note: The test topology uses one customer edge router. However, for production environments you should
deploy a pair of routers to provide resiliency in the design.

Following the path to the destination, here is the routing table on the local WAN edge router:
le06-2911-01_DC1#show ip route vrf TnT-14 192.14.99.0

Routing Table: TnT-14


Routing entry for 192.14.99.0/24
Known via "bgp 100", distance 20, metric 0
Tag 300, type external
Redistributing via ospf 14
Advertised by ospf 14 subnets
Last update from 192.14.10.2 3d02h ago
Routing Descriptor Blocks:
* 192.14.10.2, from 192.14.10.2, 3d02h ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 300
MPLS label: none

The next-hop IP address on the path to the destination is the IP address of the WAN router. Notice that this is
information is learned through eBGP.

Finally, here is the routing table on the WAN router:


le06-2911-02_WAN#show ip route vrf TnT-14 192.14.99.0

Routing Table: TnT-14


Routing entry for 192.14.99.0/24, 2 known subnets
Attached (2 connections)
Variably subnetted with 2 masks
C 192.14.99.0/24 is directly connected, GigabitEthernet0/1/0.1238
L 192.14.99.1/32 is directly connected, GigabitEthernet0/1/0.1238

The destination subnet is locally attached on port Gi0/1/0: the Spirent traffic generator.

This concludes the trace of traffic from DC1 to the WAN router. In a production deployment the WAN router will be
connected to a WAN network and will route the traffic to the remote sites.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 49 of 132
Traffic from the WAN to DC1
For completeness, consider the routing information from the WAN back to DC1 (Figure 43).

Figure 43. Traffic from the WAN to DC1

Here is the routing information for the destination network in DC1 on the WAN router:
le06-2911-02_WAN#show ip route vrf TnT-14 10.100.14.0

Routing Table: TnT-14


Routing entry for 10.100.14.0/24
Known via "bgp 300", distance 20, metric 20
Tag 100, type external
Last update from 192.14.10.1 1d21h ago
Routing Descriptor Blocks:
* 192.14.10.1, from 192.14.10.1, 1d21h ago
Route metric is 20, traffic share count is 1
AS Hops 1
Route tag 100
MPLS label: none

The next hop points to the WAN edge router in DC1. The following shows the routing table on the WAN edge
router in DC1:
le06-2911-01_DC1#show ip route vrf TnT-14 10.100.14.0

Routing Table: TnT-14


Routing entry for 10.100.14.0/24
Known via "ospf 14", distance 110, metric 20, type extern 2, forward metric 11

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 50 of 132
Redistributing via bgp 100
Advertised by bgp 100 route-map OSPF-INTO-BGP
Last update from 192.14.20.102 on Port-channel10.1238, 1d21h ago
Routing Descriptor Blocks:
* 192.14.20.105, from 102.102.12.12, 1d21h ago, via Port-channel10.1238
Route metric is 20, traffic share count is 1
192.14.20.102, from 102.102.12.12, 1d21h ago, via Port-channel10.1238
Route metric is 20, traffic share count is 1

The destination network is reachable through two paths, represented by the ASA devices in DC1. On the basis of
the hashing, select one of the ASAs as the next-hop to the destination.

Note: As previously mentioned (and as shown in the example in Figure 43), for stateful traffic such as TCP, if
the flow owner is, for example, ASA1 in DC1, but the return traffic is hashed to ASA2, the firewall will use the CCL
to send the traffic back to the flow owner (ASA1).

Here is the routing table on ASA1:

DC1-ASA-1_i05-5585-01/TnT-14/master# show route 10.100.14.0

Routing Table: TnT-14


Routing entry for 10.100.14.0 255.255.255.0
Known via "ospf 1", distance 110, metric 20, type extern 2, forward metric 10
Last update from 192.14.31.12 on inside, 45:34:37 ago
Routing Descriptor Blocks:
* 192.14.31.12, from 102.102.12.12, 45:34:37 ago, via inside
Route metric is 20, traffic share count is 1
192.14.31.11, from 101.101.11.11, 46:00:56 ago, via inside
Route metric is 20, traffic share count is 1

The destination network is reachable through two paths: Leaf1 and Leaf2 of the Cisco ACI fabric in DC1. On the
basis of the hashing, select one of these leaf nodes as the next hop to the destination. Here is the routing table on
Leaf1:

i06-9396-01# show ip route vrf TnT-14:TnT-14NET 10.100.14.0


IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.100.14.0/24, ubest/mbest: 1/0, attached, direct, pervasive


*via 10.0.136.65%overlay-1, [1/0], 1d21h, static
recursive next hop: 10.0.136.65/32%overlay-1
via 192.14.31.12, vlan79, [110/20], 1d21h, ospf-default, type-2

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 51 of 132
Because the destination subnet is behind an anycast gateway, the destination subnet is shown as directly
attached, and the VTEP information shown is Spine1. The Cisco ACI fabric handles the delivery of packets to the
destination leaf and port using the endpoint information database.

The second destination (192.14.31.12) is a path to Leaf2 that Leaf1 learns from the OSPF peering used to
communicate with the ASA devices in DC1 (this path is not the preferred one and is not used).

You can trace the destination endpoint (the IP address behind the Spirent tester attached to Leaf3) from Leaf1 by
entering the following commands:
i06-9396-01# show system internal epm endpoint ip 10.100.14.101

MAC : 0000.0000.0000 ::: Num IPs : 1


IP# 0 : 10.100.14.101 ::: IP# 0 flags :
Vlan id : 0 ::: Vlan vnid : 0 ::: VRF name : TnT-14:TnT-14NET
BD vnid : 0 ::: VRF vnid : 2228225
Phy If : 0 ::: Tunnel If : 0x18010004
Interface : Tunnel4
Flags : 0x80004410 ::: sclass : 49183 ::: Ref count : 3
EP Create Timestamp : 01/13/2016 15:09:15.124249
EP Update Timestamp : 01/14/2016 17:27:33.225764
EP Flags : locally-aged,IP,sclass,timer,
::::

This command returns the details of the interface on which the destination endpoint is located (Tunnel4). After you
know the interface number, you can verify the destination VTEP:
i06-9396-01# show interface Tunnel4
Tunnel4 is up
MTU 9000 bytes, BW 0 Kbit
Transport protocol is in VRF "overlay-1"
Tunnel protocol/transport is ivxlan
Tunnel source 10.0.192.95/32 (lo0)
Tunnel destination 10.0.192.92
Last clearing of "show interface" counters never
Tx
0 packets output, 1 minute output rate 0 packets/sec
Rx
0 packets input, 1 minute input rate 0 packets/sec

The destination IP address is the VTEP IP address on Leaf3.

Traffic from the WAN to Data Center for Stretched Subnets


The preceding sections analyzed the traffic path for a subnet that is located only in DC1. For subnets that are
present in both data centers, the situation is slightly different. Because the design discussed here is fully
symmetrical, normally a subnet that is present in both data centers will be advertised to the branch router from two
sources (the customer edge router in DC1 and the customer edge router in DC2). If multipath is enabled, the WAN
router will load-share the traffic across both data centers. This approach may be acceptable, because when the
traffic reaches any of the four ASA devices on its way back to the data center, a flow lookup will be performed and

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 52 of 132
the traffic will be redirected to the ASA that owns the flow. A deterministic entry point for traffic destined for a
stretched subnet may also be desirable. You can, for example, treat one of the data centers as primary for this type
of traffic.

In the design validated, multipath is not enabled, and BGP is used to select the best path to the stretched subnet,
10.1.4.0/24.

Here is the output for the routing table on the WAN router:

le06-2911-02_WAN#show ip route vrf TnT-14 10.1.4.0


Routing Table: TnT-14
Routing entry for 10.1.4.0/24
Known via "bgp 300", distance 20, metric 20
Tag 100, type external
Last update from 192.14.10.1 17:43:37 ago
Routing Descriptor Blocks:
* 192.14.10.1, from 192.14.10.1, 17:43:37 ago
Route metric is 20, traffic share count is 1
AS Hops 1
Route tag 100

Best-path selection is left at the default setting, and 192.14.10.1 (DC1) is selected based on the lowest neighbor IP
address:

le06-2911-02_WAN#show bgp vpnv4 unicast vrf TnT-14 10.1.4.0


BGP routing table entry for 14:14:10.1.4.0/24, version 31
Paths: (2 available, best #2, table TnT-14)
Multipath: eBGP
Advertised to update-groups: 1
200
192.14.11.1 from 192.14.11.1 (192.20.20.1)
Origin incomplete, metric 20, localpref 100, valid, external
Community: 200:200
Extended Community: OSPF DOMAIN ID:0x0005:0x0000000E0200
OSPF RT:0.0.0.0:5:1 OSPF ROUTER ID:192.14.2.1:0
100
192.14.10.1 from 192.14.10.1 (192.10.10.1)
Origin incomplete, metric 20, localpref 100, valid, external, best
Community: 100:100
Extended Community: OSPF DOMAIN ID:0x0005:0x0000000E0200
OSPF RT:0.0.0.0:5:1 OSPF ROUTER ID:192.14.1.1:0

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 53 of 132
Routed Traffic from DC1 to DC2
To analyze the traffic flows between IP subnets that exist only in DC1 (10.100.14.0/24) and IP subnets that exist
only in DC2 (10.200.14.0/24), start in DC1 Leaf3, where the traffic originates. Here is the routing information in the
Cisco ACI fabric in DC1 (the output from Leaf3):
i06-9396-03# show ip route vrf TnT-14:TnT-14NET 10.200.14.0
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.200.14.0/24, ubest/mbest: 2/0


*via 10.0.192.91%overlay-1, [200/0], 17:54:46, bgp-100, internal, tag 200
recursive next hop: 10.0.192.91/32%overlay-1
*via 10.0.192.95%overlay-1, [200/0], 00:07:54, bgp-100, internal, tag 200
recursive next hop: 10.0.192.95/32%overlay-1
i06-9396-03#

The next-hop information you see here consists of the VTEP IP addresses of Leaf1 and Leaf2 (border leaf nodes),
where the Layer 3 DCI is connected.

Examine the routing table on one of the boarder leaf nodes (Leaf1):
i06-9396-01# show ip route vrf TnT-14:TnT-14NET 10.200.14.0
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

10.200.14.0/24, ubest/mbest: 2/0


*via 192.14.32.22%TnT-14:TnT-14NET, [200/0], 00:01:21, bgp-100, external, tag
200
recursive next hop: 192.14.32.22/32%TnT-14:TnT-14NET
*via 192.14.32.21%TnT-14:TnT-14NET, [200/0], 00:01:19, bgp-100, external, tag
200
recursive next hop: 192.14.32.21/32%TnT-14:TnT-14NET

Then next-hop IP addresses are Leaf1 and Leaf 2 (border leaf nodes) in DC2. This information comes from the
BGP peering that exists between the data centers.

Now verify the information on Leaf1 in DC2:


i06-9396-04# show ip route vrf TnT-14:TnT-14NET 10.200.14.0
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 54 of 132
'%<string>' in via output denotes VRF <string>

10.200.14.0/24, ubest/mbest: 1/0, attached, direct, pervasive


*via 10.0.208.65%overlay-1, [1/0], 4d19h, static
recursive next hop: 10.0.208.65/32%overlay-1
via 192.14.31.14, vlan55, [110/20], 4d19h, ospf-default, type-2

The address 10.200.14.0/24 is local to DC2, so you can see that the subnet is locally attached.

Dual-Fabric Layer 2 and Layer 3 Connectivity


This section discusses in detail how to achieve Layer 2 and Layer 3 connectivity between sites in the Cisco ACI
dual-fabric design. The section provides a detailed packet walk and offers best-practice design recommendations.

Deploying Layer 2 Connectivity Between Sites


As previously mentioned, Layer 2 connectivity between endpoints is achieved by logically stretching their Layer 2
broadcast domain (bridge domain) across separate Cisco ACI fabrics. In the dual-fabric design discussed in this
document, the Cisco ACI fabrics are fully independent (that is, managed by different APIC clusters), and a bridge
domain = an EPG = an IP subnet. This design requires two main steps (shown in Figure 44).

Figure 44. Deploying Layer 2 Connectivity Between Sites

● Deploy Layer 2 DCI technology across sites to extend Layer 2 network connectivity across the entire
system.
● Logically extend across sites the EPG to which the endpoints belong to classify intersite Layer 2 traffic and
properly apply security policies.

Note: As will be clarified in the discussion that follows, the name assigned to the EPG in each fabric
has only local significance (in the example in Figure 44, EP1 belongs to EPG Web1, and EP2 belongs to
EPG Web2, but they can be part of the same logical extended EPG).

This section presents the step-by-step procedure that allows successful intersite Layer 2 communication.

The first main step is to complete an ARP exchange between the endpoints connected to separate Cisco ACI
fabrics. For this discussion, consider the situation in which the two endpoints have just been connected to the
network and have not yet sent any packets (a probably unrealistic scenario, but useful to show how Cisco ACI can
handle this worst-case situation).

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 55 of 132
Figure 45 shows the sequence of steps required to send the ARP request between EP1 connected to Cisco ACI
Fabric 1 and EP2 connected to Cisco ACI Fabric 2.

Figure 45. ARP Request Delivered to an Endpoint in Fabric 2

1. EP1 generates a Layer 2 broadcast ARP request to resolve the MAC address of EP2 that is part of the same
IP subnet.
2. The local Cisco ACI leaf node receives the ARP request and uses it to learn the MAC and IP address
information of the locally connected EP1 (the Cisco ACI leaf node can look into the payload of the ARP
request to retrieve this information). Notice that the leaf node also classifies EP1 as part of the Web1 EPG
based on the IEEE 802.1Q tag used by the endpoint to send the ARP request to the leaf (alternatively, a
VXLAN tag can be used for the same purpose when you integrate the solution with AVS or Open vSwitch).
The MAC and IP information is then communicated to the spine nodes by using the COOP control plane, so
that it can be added to the spine hardware proxy database. Also, because EP2 is not known at this point, the
Cisco ACI leaf node encapsulates the ARP request and sends it to the anycast VTEP address that identifies
the proxy service on the spine nodes (each spine node can receive traffic sent to this address). The leaf node
inserts information about the source EPG Web1 in the VXLAN header.
3. The specific spine that receives the frame decapsulates the packet, and because EP2 is not known in the
database, it floods the ARP frame in the bridge domain. Note that for this flooding to occur, you must enable
ARP flooding in the bridge domain configuration, as shown in Figure 46.

Figure 46. Bridge Domain Flooding Settings

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 56 of 132
Note: Unknown unicast flooding (achieved by disabling the Hardware Proxy option) is required in case an
endpoint in Fabric 1 still has a valid local ARP cache entry to send traffic to another endpoint in the same IP
subnet but for some reason the destination endpoint information is not present anymore in the spine database.
This scenario is probably rare, so a configuration to remove Layer 2 flooding from the fabric could be used, by
enabling Hardware Proxy.

4. All the Cisco ACI leaf nodes on which the bridge domain is actively instantiated receive the VXLAN
encapsulated frame. One of the border leaf nodes decapsulates the frame and floods it out of the local
interfaces in the same bridge domain (and EPG), including the logical vPC connection used for Layer 2 DCI
communication. For this flooding to occur, you must use the VLAN-to-EPG mapping configuration discussed in
the section “Layer 2 Reachability Across Sites” section and shown in Figure 47.

Figure 47. Deploying Static EPG to VLAN Mapping

The configuration in Figure 47 maps the Web1 EPG to which EP1 belongs to a path represented by the DCI
vPC connection on the border leaf nodes, specifying that VLAN tag 1220 is used when sending the traffic
outside the local fabric. This process logically extends the Web1 EPG to Cisco ACI Fabric 2.

5. One of the border leaf nodes in Fabric 2 receives the IEEE 802.1Q tagged frame from the Layer 2 DCI
connection. A configuration similar to the one shown in Figure 47 is applied to the local border leaf nodes, so
that packets tagged with VLAN 1220 are classified as part of the Web2 EPG representing the logical extension
of the EPG to which EP1 belongs in Fabric 1.

Note: As previously mentioned, the names assigned to the EPGs in Cisco ACI Fabric 1 and Fabric 2 have
only local significance. What is important is for the VLAN-to-EPG mapping to be consistent on both sides to
help ensure the logical extension of the EPG across sites.

The border leaf node receiving the packet learns the MAC and IP addresses of EP1. From the point of view of
Cisco ACI Fabric 2, EP1 appears to be locally connected to this border leaf node, so a COOP update is also
sent to the local spines to save this information in the hardware database. Because the border leaf node at this
point does not know yet the location of EP2, the packet is VXLAN encapsulated and sent to the anycast VTEP
of the local spine nodes. As before, the local border leaf node inserts the Web2 EPG value in the VXLAN
header.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 57 of 132
6. One of the spine nodes receives the frame, and because EP2 is not yet known in the spine proxy database, it
floods the ARP frame in the bridge domain, assuming that a setting similar to what is shown in Figure 46 is
also applied to the bridge domain in Fabric 2.
7. All the local Cisco ACI leaf nodes that have the specific bridge domain active receive the flooded ARP request
and learn EP1 MAC and IP address information. (The border leaf node in Fabric 2 is the next hop because it
encapsulated the packet received from the Layer 2 DCI connection before flooding it to the fabric.) The frame
is also flooded on all the local interfaces that are part of the bridge domain, so the frame also reaches the
intended destination EP2.
Figure 48 shows the sequence of events that allow EP2 to reply to the ARP request generated by EP1.

Figure 48. ARP Reply Delivered to the Endpoint in Cisco ACI Fabric 1

8. EP2 sends the ARP reply destined for the MAC address of EP1 that generated the request.
9. The Cisco ACI leaf node to which EP2 is connected receives the ARP reply and uses it to learn EP2 MAC and
IP address information and communicate it to the local spine nodes through COOP. The leaf node then
encapsulates the packet and sends it to the local border leaf nodes. At this point, Leaf6 already knows that
EP1 and EP2 have been locally discovered in Cisco ACI Fabric 2 as belonging to the same Web2 EPG, so
from a policy perspective, traffic is allowed.
10. The receiving border leaf node decapsulates the traffic and learns the EP2 information associated with the
local leaf node to which EP2 is connected. The border leaf node then performs a Layer 2 lookup and forwards
the packet to the Layer 2 DCI connection to Cisco ACI Fabric 1 (again, policy allows this communication
because EP1 and EP2 are part of the same Web2 EPG).
11. One of the border leaf nodes in Cisco ACI Fabric 1 receives the packet, decapsulates it, and learns EP2 as a
local device connected to the Layer 2 DCI interface. This event triggers a COOP update destined for the local
spine nodes. The border leaf node classifies EP2 as part of the locally defined Web1 EPG; performs a Layer 2
lookup; and encapsulates the traffic to the Proxy A VTEP address identifying the local spines (since no specific
information for EP1 are learned yet).
12. The spine nodes receiving the frame decapsulates it and then re-encapsulates it toward Leaf1. After
decapsulating it and performing a Layer 2 lookup, the leaf node forwards the packet to the locally connected
EP1 device.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 58 of 132
After the ARP exchange is completed, all the leaf nodes have the proper information to allow successful Layer 2
data-plane communication between EP1 and EP2: the second main step. As shown in Figure 49, VXLAN
encapsulation is used for communication within the fabric (at both sites) between the computing leaf node and the
border leaf node, whereas hand-off to a VLAN is performed to carry the traffic between sites through the deployed
Layer 2 DCI solution.

Figure 49. Layer 2 Data-Plane Communication Between Endpoints

1. EP1 generates a data packet with the MAC and IP destination addresses of EP2.
2. The local leaf node receives the packet and classifies it as part of the Web1 EPG. The local leaf performs the
Layer 2 lookup for the EP2 MAC address, determines that the packet needs VXLAN encapsulation, and send
it to the local border leaf nodes. From a policy perspective, the leaf knows that EP2 belongs to the same Web1
EPG as EP1 (this information was previously learned on the data plane from the ARP reply).
3. One of the local border leaf nodes receives the packet, decapsulates it, and performs a Layer 2 lookup,
determining the need to forward the traffic through the DCI vPC connection to EP2.
4. One of the border leaf nodes in Fabric 2 receives the packet. The traffic is classified as part of the Web2 EPG
based on the static VLAN-to-EPG mapping previously described. Traffic destined for EP2 thus is allowed by
the policy, so after the Layer 2 lookup, the packet is VXLAN encapsulated and sent to Leaf6.
5. The leaf receives the packet, decapsulates it, performs a Layer 2 lookup, and forwards it locally to EP2.

Deploying Layer 3 Connectivity Between Sites


The previous section discussed how intra-IP subnet communication is handled between independent Cisco ACI
fabrics. This section describes how Layer 3 traffic is exchanged between endpoints that belong to separate IP
subnets.

As previously mentioned, the initial assumption for the design discussed in this document is that each EPG is
deployed as part of a separate bridge domain. The consequence is that endpoints that belong to separate EPGs
are deployed in different IP subnets. Therefore, communication between them is always routed.

To enable this routed communication between endpoints connected to different Cisco ACI fabrics, you must
perform the following two main steps:

1. Configure a consistent default gateway for bridge domains that are extended across sites.
2. Establish Layer 3 peering between the two Cisco ACI fabrics to allow routed communication between bridge
domains (IP subnets) that are not extended across sites.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 59 of 132
As discussed in the introductory section of this document, the goal is to provide a distributed default gateway
function, so that independent of the site to which an endpoint is connected, a local default gateway on the
endpoint’s directly connected Cisco ACI leaf node is always available. Deploying a distributed default gateway
across independent Cisco ACI fabrics requires some configuration coordination to help ensure that a common
MAC address and IP address can be assigned to the gateway.

Cisco ACI Software Release 11.2 and later add a new option for the bridge domain configuration. This option
assigns a virtual MAC (vMAC) address for the fabric to use when replying to ARP requests for the default gateway
(Figure 50).

Figure 50. MAC and IP Address Configuration for an Extended Bridge Domain in Cisco ACI Fabric 1

Figure 50 shows the required bridge domain configuration for the IP and MAC addresses:

● Custom MAC Address: This is the physical MAC address associated with the SVI that is created on the
Cisco ACI leaf nodes to perform the default gateway functions. This MAC address is never used in ARP
reply messages sent to locally connected endpoints and should be unique per site.
● Virtual MAC Address: This is the MAC address value that is returned in the ARP replies sent from leaf
switches to local endpoints and is the MAC that the endpoints use when sending traffic to the fabric traffic
that needs to be routed. A common value must be configured in separate Cisco ACI fabrics for the bridge
domains that must be extended across sites.
● Subnets: Two IP addresses are usually configured here. One (Primary IP Address = True) is a unique IP
address associated with the SVI deployed on the Cisco ACI leaf nodes on which this specific bridge domain
is deployed. The second IP address (Virtual IP = True) is the common (shared) default gateway IP address
consistently configured across fabrics and used by the connected endpoints.
The endpoints connected to this bridge domain thus are configured with the virtual IP address as the default
gateway, and when an ARP request for the gateway is sent, the leaf replies with the virtual MAC address
information. Thus, to enable transparent endpoint mobility across sites, you need to configure matching values in
the bridge domain deployed in Cisco ACI Fabric 2, as shown in Figure 51.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 60 of 132
Figure 51. MAC and IP Address Configuration for an Extended Bridge Domain in Cisco ACI Fabric 2

Note: The custom MAC address for all the Layer 3 interfaces deployed in the Cisco ACI fabric has a default
value of 00:22:BD:F8:19:FF (shown earlier in Figure 50). This setting also applies to Layer 3 interfaces deployed in
different Cisco ACI fabrics. As a consequence, when interconnecting Cisco ACI fabrics using a Layer 2 DCI
solution, you have to modify the MAC address on one side for all the bridge domains that must be extended across
sites to avoid confusing the Layer 2 DCI devices deployed between the Cisco ACI fabrics. (In the example in
Figure 51, the Custom MAC address was assigned the value 00:22:BD:F8:19:02.)

After the distributed default gateway is consistently deployed across sites, Layer 3 communication can successfully
begin. This section considers two specific scenarios:

● Layer 3 communication between endpoints that belong to bridge domains that are extended across Cisco
ACI fabrics
● Layer 3 communication between endpoints that are part of bridge domains that are locally defined on
separate Cisco ACI fabrics

Note: In contrast to the discussion of Layer 2 communication, the discussion here assumes that endpoints have
already been discovered (that is, that endpoints generated some ARP traffic that populated the tables on the leaf
nodes in both fabrics).

To establish Layer 3 peering between separate Cisco ACI fabrics, the solution discussed in this document
proposes the establishment of eBGP sessions between the two pairs of border leaf nodes deployed in each Cisco
ACI fabric (Figure 52).

Figure 52. Establishing MP-EBGP Sessions Between Border Leaf Nodes

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 61 of 132
As shown in Figure 52, a full mesh of eBGP sessions is established between the border leaf nodes that are part of
the different Cisco ACI fabrics. Although the focus of the testing described in this document is on IPv4
communication, the same control plane can be used to exchange IPv6 prefixes for the various tenants deployed
across the sites.

The different border leaf nodes peer with each other as four routers on the same VLAN segment extended across
sites through the Layer 2 DCI solution of choice. This behavior is possible because the Cisco Nexus 9000 Series in
Cisco ACI mode supports the establishment of dynamic routing peering over a vPC connection. (vPC is used by
each pair of border leaf nodes to connect to the DCI functional block.)

For the configuration, the Layer 3 peering between sites requires the definition of a L3Out connection. Figure 53
shows the L3Out (L3-out-BGP) configuration defined for specific tenant TnT-14 on the border leaf nodes in Cisco
ACI Fabric 1.

Figure 53. Tenant-Specific L3Out for Layer 3 Peering Between Cisco ACI Fabrics

The configuration of L3Out shown in Figure 53 specifies the Cisco ACI devices (node-101 and node-102) that are
used to establish the eBGP peering with the second Cisco ACI fabric. The interface associated with the L3Out
connection is the same vPC logical connection used for Layer 2 communication across sites. A specific VLAN
(1239) is used to tag traffic that is routed through the L3Out connection. The border leaf nodes in Cisco ACI Fabric
1 use SVIs in that Layer 2 segment to establish eBGP peering with the remote border leaf nodes (Figure 54).

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 62 of 132
Figure 54. L3Out Logical Interface Configuration

An important configuration related to the L3Out DCI connection is the creation of the external network (External
EPG) used to represent the external destinations that can be reached through the L3Out connection (Figure 55).

Figure 55. Configuration of the External Network Associated with L3Out DCI

As shown in Figure 55, a prefix 0.0.0.0/0 is associated with the configured external EPG, so Cisco ACI classifies all
traffic received on this L3Out connection (mostly sourced from the remote data center site) as part of this specific
external EPG. Specific security policies (contracts) can hence be applied to all communication sent to internal
EPGs.

Figure 55 also shows the specific configurations associated with the external EPG:

● External Subnets for the External EPG: This configuration is used to classify traffic that is received through
this L3Out connection as part of this external EPG. In this specific example, all the traffic received (whatever
the source IP address) will be considered part of the external EPG and subject to the configured security
policies (contracts) for communication with to other EPGs defined in Cisco ACI Fabric 1.
● Export Route Control Subnet and Aggregate Export: This configuration helps ensure that all the external
prefixes learned for this tenant through different L3Out connections are advertised from this L3Out DCI
interface to the border leaf nodes in Cisco ACI Fabric 2. Without this setting, no external prefixes would be
advertised by default.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 63 of 132
Note: In this design, this configuration applies specifically to the external prefixes learned from the
L3Out connection to the firewall nodes local to Fabric 1.

Figure 56 shows the flag settings required to create the configuration just described and associated with the
0.0.0.0/0 subnet.

Figure 56. Setting the Flags for the 0.0.0.0/0 Prefix

Similar considerations apply to the creation of the L3Out DCI on the border leaf nodes in Cisco ACI Fabric 2. After
these configurations have been completed, routing traffic can flow across sites.

You also should enable Bidirectional Forwarding Detection on the eBGP peering between border leaf nodes across
sites. BFD is a detection protocol designed to provide fast-forwarding path-failure-detection times for all media
types, encapsulations, topologies, and routing protocols. It helps reduce the outage experienced under various
failure scenarios (both link and switch based). The configuration of BFD is performed in two steps:

1. On L3Out, set the BFD flag for each remote configured eBGP peer, as shown in Figure 57.

Figure 57. Enabling BFD for Each Remote eBGP Peer

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 64 of 132
2. Create a BFD interface profile associated with the logical Interface defined for L3Out (Figure 58).

Figure 58. Creating a BFD Interface Profile Associated with the L3Out Logical Interface

You can usually use the default interface profile, with the specific values shown in Figure 59.

Figure 59. BFD Default Interface Policy

As shown earlier in Figure 21 and Figure 22, there are two scenarios for routed communication across sites:

● Routing between endpoints that belong to IP subnets that are stretched across sites: In this case, the
routing always occurs locally in the Cisco ACI fabric in which the source endpoint is located. Then traffic is
sent to the destination in the second Cisco ACI fabric using the Layer 2 path across sites. This process is
similar to the process discussed earlier in the section “Deploying Layer 2 Connectivity Between Sites.”
● Routing between endpoints belonging to IP subnets that are not stretched between Cisco ACI fabrics: In
this case, the Layer 3 peering established across sites represents the Layer 3 path required to establish
communication. Figure 60 shows the steps required to enable this communication.

Note: For simplification, the discussion here assumes that EP1 and EP2 have already been discovered
within each Cisco ACI fabric.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 65 of 132
Figure 60. Routing Between Endpoints in Nonstretched IP Subnets

1. EP1 generates a data-plane packet destined for EP2.


2. Leaf1 receives the packet. Because the destination IP subnet is not internal to Cisco ACI Fabric 1, Leaf1
performs a lookup in the longest-prefix match (LPM) table containing information for external destinations and
finds a match for the destination IP subnet 10.1.5.0/24. This information was obtained from an MP-BGP
update sent by the local border leaf nodes (BL1 and BL2) when they received routing updates from the remote
border leaf nodes deployed in Cisco ACI Fabric 2 (BL3 and BL4).
Note that IP subnets associated with the bridge domain defined in a given Cisco ACI fabric are sent through a
L3Out connection if the following two conditions are met:

● The specific bridge domain is mapped to L3Out, as shown in Figure 61.

Figure 61. Association Between Bridge Domain and L3Out

● The IP subnet associated with the bridge domain to which the EPG belongs is marked for external
advertisement (Figure 62).

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 66 of 132
Figure 62. Configuring an IP Subnet to Be Advertised Externally

This configuration is required on both Cisco ACI fabrics to help ensure a successful exchange of IP prefixes
associated with IP subnets that are not stretched across sites.

3. One of the two local border leaf nodes receives the packet, decapsulates it, and performs a Layer 3 lookup,
finding a match for the IP subnet to which the destination belongs. The IP subnet advertisement is received
through the border leaf nodes deployed in the remote Cisco ACI fabric, so traffic is routed through the Layer 3
DCI infrastructure to one of the two next-hop border leafs.

Note: Currently, host-route advertisement outside the Cisco ACI fabric is not supported, and only IP
prefixes for the endpoint subnets can be advertised out.

4. One of the border leaf nodes in Fabric 2 receives the packet and performs a Layer 3 lookup. Assuming that no
traffic was previously sent to EP2, no specific information will be programmed in the border leaf tables, so the
packet is sent to one of the spine nodes (encapsulated to the proxy VTEP address).
5. The receiving spine node decapsulates the traffic, finds the information for EP2 in the local database (because
the initial assumption was that the endpoints had already been discovered), and encapsulates the packet to
Leaf6.
6. Leaf6 decapsulates the packet and forwards it to EP2.
Figure 63 shows how traffic is then sent back to EP1.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 67 of 132
Figure 63. Routing Between Endpoints in Nonstretched IP Subnets (Return Traffic)

7. EP2 sends a packet destined for EP1 IP address. Leaf6 performs a Layer 3 lookup and encapsulates traffic to
the local border leaf nodes (selecting Layer 3 or Layer 4 as the next hop).
8. One of the local border leaf nodes receive the packet, decapsulates it, and performs a Layer 3 lookup, finding
the information for the IP subnet destination learned through the eBGP session with the remote border leaf
nodes. The traffic is then routed through the Layer 3 DCI connection.
9. One of the border leaf nodes in Cisco ACI Fabric 1 receives the packet, performs a Layer 3 lookup, and
encapsulates it to Leaf1, following specific host-route information. (EP1-specific information was learned from
the previous traffic destined for Cisco ACI Fabric 2.)
10. Leaf1 receives the packet, decapsulates it, and sends it to the EP1 destination.

Deploying Hypervisor Integration


As discussed earlier in the section “Hypervisor Integration,” each fabric in a Cisco ACI dual-fabric design will likely
deploy a different VMM to manage its computing resources.

VMware vSphere Release 6.0 and later offers a solution to provide live mobility support even under those
circumstances, allowing live vMotion migration for workloads across ESXi servers managed by separate vCenter
server instances. This is known as cross vCenter server vMotion.

Note: Cisco ACI also supports integration with Microsoft Hyper-V and with OpenStack, but at the time of this
writing these hypervisors do not support mobility across servers that are part of separate VMM domains.

Figure 64 shows the solution validated in this document.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 68 of 132
Figure 64. Integrating Multiple VMM Domains

Two (or more) separate VMM domains are created in each Cisco ACI fabric by peering the local APIC with the
local vCenter server. A separate DVS is pushed to the ESXi computing clusters deployed in the separate sites.

Figure 65. Creating Separate VMM Domains

As shown in Figure 65, each APIC controller establishes a relationship with a different vCenter server (DC1-VC6-1
in Site 1 and DC2-VC6-2 in Site 2). This leads to the creation of two separate DVSs (DVS-DC1-VC-vDS and DVS-
DC2-VC-vDS), which are then pushed to the separate clusters of ESXi hosts managed by each vCenter server.

Note that the names of the port groups in the example in Figure 65 are identical. This is the case because the
application profiles (and associated EPGs) created on the two independent APIC clusters were named the same.
This naming scheme is recommended for operational ease, but it is not required to enable mobility across the two
separate domains.

Figure 66 shows how the information for the separate VMM domains is displayed on the vSphere Web Client.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 69 of 132
Figure 66. View of Separate VMM Domains in VMware vSphere 6.0

In Figure 66, you can see the two separate DVSs and the associated port groups.

As mentioned, the port groups are named the same because of the identical naming convention used for the EPGs
defined on the separate APIC clusters, but they are completely independent of each other. To verify this
independence, note that the VLAN tags associated with port groups are different because they are independently
negotiated between the APIC controllers and their local vCenter servers (Figure 67).

Figure 67. Identically Named Port Groups with Independent VLAN Tags

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 70 of 132
When migrating a virtual machine across ESXi servers managed by different vCenter servers, you must specify the
port group to which the virtual machine should be connected on the destination ESXi host, as shown in Figure 68.

Figure 68. Selecting the Virtual Machine Port Group on the Destination VMware ESXi Host

Note: The behavior discussed in this section differs from the behavior prior to vSphere 6.0. Prior to that release,
live migration was always performed in the context of the same port group of the same DVS and so required
consistent VLAN tagging and virtual machine attachment points across different ESXi hosts.

Cisco ASA Cluster Integration in a Cisco ACI Dual-Fabric Design


This section describes how to deploy and configure the ASA cluster in the Cisco ACI dual-fabric design.

The design discussed in this document uses two physical ASA 5585-X appliances in each data center, for a total of
four devices in the cluster. Within each site, firewalls are attached to the Cisco ACI fabric using vPC technology.
Each firewall uses two vPCs: one for the data traffic and one for the cluster control protocol (CCP) traffic (the
cluster control link, or CCL). Figure 69 shows the setup.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 71 of 132
Figure 69. Cisco ASA Node Connection to the Cisco ACI Fabric

The data VLANs are not extended between the sites. The ASA cluster is deployed in routed mode with multiple
contexts using individual interfaces, and it is inserted in the data path using IP routing for the north-south traffic.
Specifically, OSPF is used as the routing protocol between each unit and its local Cisco ACI fabric.

The CCL VLAN and Layer 2 segment is extended between the sites. Cisco ACI fabrics provide a Layer 2 bridge
domain on a dedicated vPC for the CCL VLAN, which is then extended through DCI to the other site. Figure 70
shows this solution.

Figure 70. CCL Connectivity Extended Across Sites

Cisco ASA Cluster Configuration: Admin Context


This section presents the configuration used on the ASA cluster in the admin context. The configuration is shown
here with the relevant parts highlighted (the configuration shown is from the master unit).

You can use the sample here as a reference to construct the configuration for all units by allocating a unique IP
address for each unit for the cluster interface. The IP address on the cluster interface on all units must be in the
same subnet: for example, 1.1.1.0/24 is used in this design.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 72 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 73 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 74 of 132
For the CCL, as shown earlier in Figure 70, the Cisco ACI fabric simply provides a Layer 2 service between the
vPCs on the fabric to connect to ASA Port-channel11 (the CCL on the ASA side). Then the CCL VLAN is extended
between the sites (Figure 71).

Figure 71. Static Binding to EPG for the Cisco ASA Failover VLAN

Screenshot taken on the APIC in DC1 shows EPG


ASA_failover configured with a static binding to the
vPC connected to ASA 1 and ASA 2 (ASAs in DC1)
and also extending the CCL EPG through the DCI link.

The VLAN ID used on the DCI link is different from the


one used to the ASA, showing that the VLAN ID is
significant only on the link, and that Cisco ACI bridges
the two because they are in the same EPG.

Figure 72 shows a logical view of the results of the preceding configuration.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 75 of 132
Figure 72. Logical Layer 2 Extension of the CCL Segment

After the configuration is applied, the ASA cluster is formed and becomes operational between the sites, as shown
here.
DC1-ASA-1_i05-5585-01/master# show cluster info
Cluster fw: On
Interface mode: individual
This is "DC1-ASA-1_i05-5585-01" in state MASTER
ID : 1
Version : 9.5(1)
Serial No.: JAD1928006U
CCL IP : 1.1.1.1
CCL MAC : 80e0.1d58.8608
Last join : 16:04:54 UTC Jan 16 2016
Last leave: N/A
Other members in the cluster:
Unit "DC2-ASA-2_E05-asa5585x-02" in state SLAVE
ID : 2
Version : 9.5(1)
Serial No.: JAD170600RE
CCL IP : 1.1.1.4
CCL MAC : acf2.c5f2.c5f0
Last join : 15:33:14 UTC Jan 16 2016
Last leave: N/A
Unit "DC2-ASA-1_i05-5585x-02" in state SLAVE
ID : 3
Version : 9.5(1)
Serial No.: JAD1928009Y
CCL IP : 1.1.1.3
CCL MAC : 54a2.747c.b668
Last join : 15:21:08 UTC Jan 16 2016
Last leave: 15:15:05 UTC Jan 16 2016
Unit "DC1-ASA-2_E05-5585x-01" in state SLAVE
ID : 4
Version : 9.5(1)
Serial No.: JAD170900KX
CCL IP : 1.1.1.2
CCL MAC : acf2.c5f2.c584

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 76 of 132
Last join : 16:49:46 UTC Feb 1 2016
Last leave: 16:36:46 UTC Feb 1 2016

ASA Cluster Configuration: Tenant Context


This section presents the configuration used on the ASA cluster in a specific tenant context. The configuration is
shown here with the relevant parts highlighted (the configuration is performed from the master unit).

Figure 73 shows the logical configuration.

Figure 73. Logical Connectivity View for a Specific Tenant

Each unit in the ASA cluster has an inside and an outside interface, with each interface assigned a unique IP
address.

Each ASA unit forms an OSPF neighborship with the local Cisco ACI border leaf nodes (nodes deployed in the
same data center site) through the inside interface. Each unit also forms an OSPF neighborship with the local edge
router through the outside interface.

The following listing is the actual configuration for a tenant context (VRF instance), in this case named TnT-14.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 77 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 78 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 79 of 132
With this configuration, each ASA unit receives a unique address in the inside and outside interfaces, as shown
here.
DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show interface summary
DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0602.0004, MTU 1500
IP address 192.14.20.102, virtual IP 192.14.20.100, subnet mask
255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0602.0002, MTU 1500
IP address 192.14.31.102, virtual IP 192.14.31.100, subnet mask
255.255.255.0

DC2-ASA-2_E05-asa5585x-02:********************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0604.0004, MTU 1500
IP address 192.14.20.103, subnet mask 255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0604.0002, MTU 1500
IP address 192.14.31.103, subnet mask 255.255.255.0

DC2-ASA-1_i05-5585x-02:***********************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0606.0004, MTU 1500
IP address 192.14.20.104, subnet mask 255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0606.0002, MTU 1500
IP address 192.14.31.104, subnet mask 255.255.255.0

DC1-ASA-2_E05-5585x-01:***********************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0608.0004, MTU 1500
IP address 192.14.20.105, subnet mask 255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0608.0002, MTU 1500
IP address 192.14.31.105, subnet mask 255.255.255.0

To form the OSPF neighborship between the ASA units and the Cisco ACI fabrics, the Cisco ACI fabric is
configured with an external routed network, also known as L3Out, within the associated tenant.

The screenshots shown in Figure 74 through 77 are from an APIC in Data Center 2. The same configuration is
applied in the APIC in Data Center 1 (using a different IP address).

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 80 of 132
Figure 74. L3Out Connection Between the Cisco ACI Fabric and the Cisco ASA Nodes

Figure 75. L3Out: Defining Logical Node Profiles

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 81 of 132
Figure 76. L3Out: Defining Logical Interface Profiles

Figure 77. L3Out: Defining a Second Logical Interface Profile

With this configuration, the ASA forms an OSPF neighborship with the Cisco ACI fabric on its inside interface. As
shown in the following output, each ASA has three OSPF neighbors on its inside interface. For example, ASA DC1-
ASA-1_i05-5585-01 forms an OSPF neighborship with the two local border leaf nodes (192.14.31.11 and
192.14.31.12) as well as with the other ASA (192.14.31.105) in the same data center.

Each ASA also forms an OSPF neighborship with the local WAN edge router on its outside interface, 192.14.20.1
in Data Center 1, as well with the other ASA in the same data center (192.14.20.105) as seen by DC1-ASA-1_i05-
5585-01.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 82 of 132
The following output shows the OSPF neighbors for each ASA unit in the cluster.
DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show ospf neighbor

DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
Neighbor ID Pri State Dead Time Address Interface
101.101.11.11 1 2WAY/DROTHER 0:00:02 192.14.31.11 inside
102.102.12.12 1 FULL/BDR 0:00:02 192.14.31.12 inside
192.14.31.105 1 FULL/DR 0:00:02 192.14.31.105 inside
192.14.1.1 10 FULL/DR 0:00:02 192.14.20.1 outside
192.14.31.105 1 FULL/DROTHER 0:00:02 192.14.20.105 outside

DC2-ASA-2_E05-asa5585x-02:********************************************
Neighbor ID Pri State Dead Time Address Interface
192.14.31.104 1 2WAY/DROTHER 0:00:02 192.14.31.104 inside
201.201.11.11 1 FULL/DR 0:00:02 192.14.31.13 inside
202.202.12.12 1 FULL/BDR 0:00:02 192.14.31.14 inside
192.14.2.1 1 FULL/DROTHER 0:00:02 192.14.20.2 outside
192.14.31.104 1 FULL/DR 0:00:02 192.14.20.104 outside

DC2-ASA-1_i05-5585x-02:***********************************************
Neighbor ID Pri State Dead Time Address Interface
192.14.31.103 1 2WAY/DROTHER 0:00:02 192.14.31.103 inside
201.201.11.11 1 FULL/DR 0:00:02 192.14.31.13 inside
202.202.12.12 1 FULL/BDR 0:00:02 192.14.31.14 inside
192.14.2.1 1 FULL/DROTHER 0:00:02 192.14.20.2 outside
192.14.31.103 1 FULL/BDR 0:00:02 192.14.20.103 outside

DC1-ASA-2_E05-5585x-01:***********************************************
Neighbor ID Pri State Dead Time Address Interface
101.101.11.11 1 FULL/DROTHER 0:00:02 192.14.31.11 inside
102.102.12.12 1 FULL/BDR 0:00:02 192.14.31.12 inside
192.14.31.102 1 FULL/DROTHER 0:00:02 192.14.31.102 inside
192.14.1.1 10 FULL/DR 0:00:02 192.14.20.1 outside
192.14.31.102 1 FULL/BDR 0:00:02 192.14.20.102 outside

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 83 of 132
WAN Integration Considerations
The previous section discussed how the ASA firewall establishes connectivity with the Cisco ACI fabric and the
WAN edge routers. This section discusses how the WAN edge routers peer with the WAN devices to allow external
communication into the Cisco ACI fabric.

Figure 78 shows the routing protocol peering sessions established in the validated solution documented here.

Figure 78. Cisco ACI Dual-Fabric Routing Peering

Figure 78 shows only one node per site, whether a firewall or a WAN edge router, but in reality you should
duplicate the nodes per site to help ensure full redundancy.

As previously discussed, peering between the Cisco ACI Fabric in DC 1 and the Cisco ACI Fabric in DC 2 using
eBGP is required to allow east-west communication between subnets that are localized to only on one fabric (not
stretched across the entire system).

Peering between the firewall and the WAN edge router uses OSPF to meet convergence requirements.

Note that the WAN edge routers also have an eBGP peering between them. This peering is established over a
Layer 2 path offered by Cisco ACI through the DCI connection. This peering is optional and is required only if one
of the sites could be isolated from the WAN in the event of a major service provider failure. If each site has a dual
connection to a pair of service providers, then this failure scenario becomes improbable.

North-South Traffic Flows


When an IP subnet is defined on only one fabric, then traffic will have optimal egress and ingress from the local
WAN and firewall path, as shown in Figure 79.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 84 of 132
Figure 79. Ingress and Egress Traffic Paths for Localized IP Subnets

As previously discussed, one of the main reasons to use a Cisco ACI dual-fabric design is the capability to stretch
IP subnets across separate sites. Traffic originating from an endpoint connected to those stretched subnets always
will follow the optimal path because of the existence of a local default gateway. Figure 80 shows this behavior.

Figure 80. Egress Traffic Path for Stretched IP Subnets

For inbound traffic coming from the WAN, one option is to not perform any optimization and to let the WAN select
the best path to reach the stretched IP prefix, whether the destination workload is located in DC1 or DC2.

Note: At the time of this writing, Cisco ACI does not offer the capability to advertise host-route information
through L3Out.

If for operational reasons you want to ensure that inbound traffic for extended subnets always enters through DC1,
then you can advertise those routes in the WAN with a better metric through the WAN edge router connection in
Site 1. You can achieve this goal in several ways. The specific validated scenario prepends the autonomous
system path (AS-Path) because BGP is the routing protocol used to peer with the WAN, as discussed later in this
section.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 85 of 132
Figure 81 shows the inbound traffic behavior after this configuration.

After the traffic ingress via data center 1, if the destination workload is local to DC 1 it will be delivered directly to it.
If the destination workload is located in data center 2 but the firewall state is present on the firewall in data center
1, then the firewall function is performed in data center 1 and the traffic is then sent to data center 2 via the layer 2
connection (DCI).

On the right side of Figure 81 it shows the case where the traffic enters via data center 1 but the state for the
connection is owned by the firewall in data center 2. In this case the firewall in data center 1 redirects the traffic to
the firewall in data center 2 via the ASA cluster CCL link that has been previously extended over the DCI links.

Figure 81. Inbound Traffic Flows from the WAN

On this validated design, to improve convergence after a failure, OSPF is used on the ASA firewalls while on the
WAN side, BGP is used. The WAN edge router interconnects the two routing protocols.

The main goal for the WAN is to help ensure reachability through backup paths to the pair of data centers. The
WAN also needs to help ensure some form of inbound traffic localization when the destination is a subnet stretched
across sites, and this goal can be difficult to achieve. Cisco ACI does not allow /32 host routing, so inbound
optimization based on advertisement of specific host routes is not possible.

However, such optimization can be performed in several ways. This section discusses the solution that was
validated.

All WAN edge routers require peering between each other. This peering helps ensure that if isolation occurs at the
WAN edge, traffic still can flow from the other site. The peering is implemented over a stretched VLAN that joins all
WAN edge routers; and the EPG, bridge domain, and VLAN DCI are created to allow the DC1 WAN edge to
connect to the DC2 WAN edge.

In this document, each data center is in a different BGP autonomous system (AS). Cross-site peering between
WAN edges thus uses eBGP.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 86 of 132
As said, it is important to ensure optimized access to local subnets that are not stretched across sites. You must
differentiate traffic advertised by one OSPF instance from another OSPF instance. To differentiate traffic, you use
an OSPF tag to control redistribution in BGP. You also use the tag to prepend the AS-Path to control traffic coming
back to the Cisco ACI dual sites. Initially Cisco ACI fabric creates this tag when L3Out DCI eBGP peering is
distributed to OSPF. Therefore, only the subnets isolated on the other fabric receive such a tag, and local subnets
are not tagged. When you append this tag as the BGP AS-Path, you help ensure that this suboptimal path is used
only in the event of a failure.

To better understand the BGP setup, refer to the BGP best-path selection algorithm described here:
http://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/13753-25.html.

This section now analyzes the routing announcement from the data center.

By default, the OSPF tag is set in NX-OS. You need to add it to the BGP path.
router bgp 100
address-family ipv4 vrf TnT-14
redistribute ospf 14 route-map OSPF-INTO-BGP

route-map OSPF-INTO-BGP permit 4


set weight 0
set as-path tag
set community 100:100

Then localization appears, shown here as seen from the remote site:
le06-2911-02_WAN#sh ip bgp vpnv4 all
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 14:14 (default for vrf TnT-14)
* 10.1.4.0/24 192.14.11.1 20 0 200 ?
*> 192.14.10.1 20 0 100 ?
* 10.100.14.0/24 192.14.11.1 1 0 200 4294967295 i
*> 192.14.10.1 20 0 100 ?
*> 10.200.14.0/24 192.14.11.1 20 0 200 ?
* 192.14.10.1 1 0 100 4294967295 i

The subnet 10.100.14.0/24 is deployed only on Cisco ACI Fabric 1. It is known from both WAN edges, but the
preferred path is the Site 1 WAN edge router, because it has the shortest autonomous system path.

The subnet 10.200.14.0/24 is known through Site 2.

The stretched subnet 10.1.4.0/24 can be known through Site 1 and through Site 2.

Here’s how to avoid a routing redistribution loop. Because the Cisco ACI fabrics have an eBGP peering between
them, creating an eBGP peering between the WAN edges would create a loop. Normally, in BGP this loop would
be avoided using the autonomous system path, but in the proposed design the ASA firewalls use OSPF, leading
the Cisco ACI and the WAN edge to redistribute between OSPF and BGP, breaking the autonomous system path
rule. To get around such a mutual redistribution, the BGP community can be used to control advertisement
between sites.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 87 of 132
OSPF advertisements are redistributed with a local community, in this specific example 100:100, and the eBGP
peering between the WAN edge devices is configured to drop all received routing advertisements carrying such
community value.
le06-2911-03_DC2#sh run | sec bgp
router bgp 200
address-family ipv4 vrf TnT-14
neighbor 192.14.15.1 send-community both
neighbor 192.14.15.1 route-map BGP-IN-INTER_SITE in

route-map BGP-IN-INTER_SITE deny 5


match community 100
route-map BGP-IN-INTER_SITE permit 10

Now look at the traffic egressing the data center. The Cisco ACI fabric is peering using OSPF with the ASA, which
is peering with the local WAN edge routers. The WAN edge router is peering with the remote router, announcing
the remote address locally to the ASA.

The exit path from the data center is then taken by the Cisco ACI, which has two L3Out paths: one through the
local ASA and WAN edge router, and one through the eBGP peering with the other site’s Cisco ACI fabric. By
default, eBGP has a better administrative distance than OSPF, and so the traffic would cross the interfabric link
before reaching the WAN, which is not what is wanted. So you need to change the administrative distance of the
eBGP in the Cisco ACI fabric to a number higher than OSPF administrative distance (a higher numbers means
lower preference) to give preference to the local OSPF routes via the ASA (Figure 82).

Figure 82. Change the Default eBGP Administrative Distance

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 88 of 132
When traffic reaches the local WAN edge, it will use the WAN network and not go through the other data center
site through the inter-WAN edge peering, because traffic will exit through the shortest autonomous system path.
le06-2911-01_DC1#sh ip bgp vpnv4 all
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 14:14 (default for vrf TnT-14)
* 192.14.99.0 192.14.15.2 4294967290 0 200 300 i
*> 192.14.10.2 0 4294967295 0 300 i

Deploying VXLAN as a DCI Solution


The goal of this section is to explain how and why to set up the VXLAN DCI network in between the Cisco ACI
fabrics. It assumes a basic knowledge of VXLAN, and it is not intended to replace a good VXLAN EVPN design
guide. At the time of this writing, a VXLAN reference guide is available at this link:
http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/guide-c07-734107.html.

The VXLAN feature used in this DCI document is very simple. VXLAN is a powerful technology that provides Layer
2 and Layer 3 overlays and multiple features such as the anycast gateway and ARP suppression. However, for the
Cisco ACI dual-fabric design discussed here, the only real requirement is Layer 2 VLAN extension (Layer 3
functions are not needed).

Several steps are needed to set up Layer 2 connectivity across sites through VXLAN:

1. Enable required features.


2. Create the underlay network.
a. Configure loopback.

b. Configure DCI links and the core network.

i. Configure core routing.

ii. Configure BFD.

3. Create the overlay network.


c. Configure anycast VTEP loopback.

d. Configure the Layer 2 VNI.

e. Configure the network virtualization endpoint (NVE) tunnel interface.

f. Configure BGP EVPN peering.

g. Configure BGP EVPN VNI control advertisement.

h. Example: Write Python script to generate extension for 1000 VLANs.

4. Connect the DCI to Cisco ACI using vPC.


a. Create vPC.

b. Optionally, configure storm control on the VXLAN side.

The following sample configuration provides details for these steps.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 89 of 132
1. Enable required features.
The solution requires MP-BGP EVPN for the control plane, VXLAN for the data-plane encapsulation, vPC (with
LACP enabled) to connect the Cisco ACI border leaf nodes to the local VXLAN devices (Nexus 9000 in NX-OS
mode), and BFD to control the underlay convergence across the VXLAN core. On the VXLAN leaf nodes in
both sites, you need to configure the following:
nv overlay evpn
feature bgp
feature vn-segment-vlan-based
feature lacp
feature vpc
feature lldp
feature bfd
feature nv overlay
2. Create the underlay network.
You need IP connectivity between sites. This connectivity requires a routing protocol to be enabled, and this
design depends on the core routing. The design proposed in this document assumes that the core offers a
BGP connection, which may be the case if the core is an independent global network or an MPLS solution
from a service provider. It can also be the case that if you are using a simple fiber or DWDM network if you
want to increase site independence for the control plane with two different autonomous systems connected by
eBGP.

a. Configure loopback for the underlay network.

You should use a different loopback for the underlay and for the overlay. In the process of recovering from
a node-down event, the overlay loopback is maintained in the down state until the expiration of a specific
delay timer (180 seconds is set by default). Sharing a common loopback would hence also delay the
reestablishment of routing connectivity in the underlay network (in addition to other services, such as
TACACS, which normally require connectivity to the loopback interface). Below is the configuration for the
loopback for the underlay network, the config for the loopback for the overlay network is shown in the
overlay section later on in the document.
interface loopback0
description Loopback for BGP peering
ip address 11.11.11.11/32
b. Configure DCI links and the core network (Figure 83).

i. Configure core routing.

In the testing performed, the core network is very simple. Each VXLAN DCI device has only two ports to
the core. One port (e1/45) is connected to the long-distance network (it could be either an IP and MPLS
network or a Fiber and DWDM network), and a second port (e1/46) is connected to the other DCI node at
the same site, as the backup path. In reality, this alternative port can be a VLAN on the vPC peer link; you
do not need to use a dedicated physical port.

The MTU must be changed on these ports to accommodate VXLAN tunnel encapsulation over a Layer 2
frame, which is an 50 additional bytes. The Cisco Nexus 9000 Series do not support fragmentation, so
you must have a core network that supports this frame size.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 90 of 132
Figure 83. VXLAN Frame Format

In this testing, a jumbo MTU size was configured for all the Layer 3 interfaces along the path.
interface Ethernet1/45
description Core underlay link
no switchport
mtu 9216
ip address 192.168.2.11/24
no shutdown
!
interface Ethernet1/46
description Core underlay backup path
no switchport
mtu 9216
ip address 192.168.1.11/24
no shutdown
You then must enable a routing protocol to help ensure connectivity and backup. In this testing, eBGP is
used to help ensure that separate autonomous systems can be deployed in different sites.

One recommended option is to enable is BGP dampening. With this option enabled, if a long-distance link
starts to flap—which is a realistic case when you use a service provider DWDM connection—after several
flaps the link will get a penalty and will be dropped from the BGP routing table.

Here is a sample of the BGP configuration for enabling underlay connection.


router bgp 100
router-id 11.11.11.11
address-family ipv4 unicast

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 91 of 132
dampening
network 11.11.11.11/32
neighbor 192.168.1.12
remote-as 100
address-family ipv4 unicast
next-hop-self
neighbor 192.168.2.1
bfd
remote-as 300
update-source Ethernet1/45
address-family ipv4 unicast
next-hop-self
The DCI core design can have several options, depending on the use of Interior Gateway Protocol (IGP)
or BGP. The tested solution uses a eBGP design and so requires the Next-Hop-Self feature to help
ensure the reachability of the next hop on any path.

The underlay can easily be verified using ping and sh ip bgp. All peers should be up.
DC1-93-01_i05-9372-01# sh ip bgp
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 45, local router ID is 11.11.11.11
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-
best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

Network Next Hop Metric LocPrf Weight Path


i10.10.10.10/32 192.168.3.1 100 0 300 i
*>e 192.168.2.1 0 300 i
*>l11.11.11.11/32 0.0.0.0 100 32768 i
* i11.11.12.12/32 192.168.1.12 100 0 i
*>l 0.0.0.0 100 32768 i
*>i12.12.12.12/32 192.168.1.12 100 0 i
i21.21.21.21/32 192.168.3.1 100 0 300
200 i
*>e 192.168.2.1 0 300
200 i
i21.21.22.22/32 192.168.3.1 100 0 300
200 i
*>e 192.168.2.1 0 300
200 i
i22.22.22.22/32 192.168.3.1 100 0 300
200 i
*>e 192.168.2.1 0 300
200 i

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 92 of 132
ii. Configure BFD.

The long-distance links are usually ‘fragile’ links, especially when they use DWDM, so you should enable
BFD to verify end-to-end connectivity with a lite protocol that does not overload the switch’s CPU and
that allows a very fast trigger.

You can tune BFD with very fast hellos down to 50 ms, but the recommendation is to use timers with 3 x
150 ms, which leads to detection of link failure within 450 ms. That time is fast enough to reach
subsecond convergence after a DCI link failure.
bfd interval 150 min_rx 150 multiplier 3
bfd startup-timer 0
router bgp 100
neighbor 192.168.2.1 remote-as 300
bfd
Verify BFD neighborship as follows:
DC1-93-01_i05-9372-01# sh bfd neighbors

OurAddr NeighAddr LD/RD RH/RS Holdown(mult)


State Int Vrf
192.168.2.11 192.168.2.1 1090519041/1090519041 Up 4409(3)
Up Eth1/45 default
3. Create the overlay network.
a. Configure anycast VTEP loopback.

As mentioned in the underlay discussion, you should use a different loopback for the overlay. The overlay
loopback will be automatically shut down on node recovery, potentially affecting any other function associated
with it.

To help ensure backup and load balancing from the VXLAN core, use the anycast capability. Use of the same
IP address for the service on two nodes generates anycast. Here, the service is the VTEP, which is identical
on the pair of DCI nodes. Use a secondary IP address on both nodes for loopback 1 for the pair of DCI nodes.
The two DCI nodes (vPC peers) need to have the exact same secondary loopback IP address. They both
advertise this anycast VTEP address on the underlay network so that the upstream devices learn the /32 route
from both vPC VTEPs and can load-share VXLAN unicast encapsulated traffic between them.
interface loopback1
ip address 11.11.11.12/32
ip address 11.11.12.12/32 secondary
The primary address shown above is not used, because it was provisioned to enable routing on it. The
recommendation, however, is to use another loopback as discussed on the section “Create the underlay
network”.

b. Configure the Layer 2 VNI.

The system uses the VNI, also called the VXLAN segment ID, along with the VLAN ID to identify the Layer 2
segments in the VXLAN overlay network.

Each VLAN is associated with one only Layer 2 VNI. A VLAN can have either global significance, which would
limit the number of VLANs to 4000, or port significance, which that allows up to 16 million VLANs. Of course,
there is a physical limitation on the total number of active VNIs that a switch can handle, and this value is
evolving and depends on the hardware used, the software loaded, and the testing recommendation.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 93 of 132
At the time of this testing, the Cisco Nexus 9300 platform is limited to 1000 active Layer 2 VNIs. This number
is evolving with new software and hardware releases, so check the latest updates. The configuration used in
the testing reported here is limited to 1000 extended VLANs between the Cisco ACI fabrics and uses the
global definition of a VLAN.

The VNI is defined using 24 bits. In this testing, 31000 was added to the VLAN number to create the VNI.
vlan 1,1001-2000
vlan 1001
vn-segment 31001
vlan 1002
vn-segment 31002

vlan 2000
vn-segment 32000
c. Configure the NVE tunnel interface.

Next create the NVE interface that is used as the VXLAN tunnel interface.

In this testing, because of the simplicity of the core DCI network, unicast is used to replicate Layer 2
broadcast, unknown unicast, and multicast traffic. In a dual-site scenario, there is not much advantage in using
multicast in the core given that multidestination traffic needs to be sent only to a remote pair of VXLAN
devices.
interface nve1
no shutdown
source-interface loopback1
host-reachability protocol bgp
member vni 31001
ingress-replication protocol bgp
member vni 31002
ingress-replication protocol bgp

member vni 32000
ingress-replication protocol bgp
The NVE interface relies on BGP for the host reachability advertisement, and it uses loopback1 as the VTEP.
In addition, the multicast replication occurs for any VTEP learned through BGP that has this VNI defined.
DC1-93-02_i05-9372-02# sh nve vni ingress
Interface VNI Replication List Source Up Time
--------- -------- ----------------- ------- -------

nve1 31001 21.21.22.22 IMET 03:24:36

nve1 31002 21.21.22.22 IMET 03:24:36


Alternatively, multicast could have been used to transport the VXLAN encapsulated broadcast, unknown
unicast, and multicast traffic in an optimal way and could be a better solution for multiple-site DCI with a core
that allows multicast. In this case, here are some complementary considerations that do not apply to this
testing:

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 94 of 132
For details on how to configure VXLAN with a multicast-enabled underlay, please refer to the VXLAN Network
with MP-BGP EVPN Control Plane Design Guide available at
http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/guide-c07-734107.html

d. Configure BGP EVPN peering.

MP-BGP EVPN can transport Layer 2 information such as MAC addresses, and also Layer 3 information such
as host IP addresses (host routes) and IP subnets. For this purpose, it uses two forms of routing
advertisements:

● Type 2
◦ Used to announce host MAC and host IP address information for the endpoint directly connected to the
VXLAN fabric
◦ Extended community: router MAC address (for Layer 3 VNI) and sequence number
● Type 5
◦ Advertises IP subnet prefixes or host routes (associated, for example, with locally defined loopback
interfaces)
◦ Extended community: router MAC address, uniquely identifying each VTEP node
In the solution presented in this document, VXLAN is used only to extend a Layer 2 broadcast domain.
Therefore, BGP is using only Type-2 advertisement, carrying MAC address information without the IP
information populated.

First, each DCI node has to establish MP-BGP EVPN peering sessions with every remote DCI node, edge to
edge. Because each data center is in a different autonomous system, multihop MP-eBGP EVPN is used.
router bgp 65500
neighbor 21.21.21.21
remote-as 65501
update-source loopback0
ebgp-multihop 10
address-family l2vpn evpn
send-community both
route-map NEXT-HOP-UNCHANGED out
neighbor 22.22.22.22
remote-as 65501
update-source loopback0
ebgp-multihop 10
address-family l2vpn evpn
send-community both
route-map NEXT-HOP-UNCHANGED out
!
route-map NEXT-HOP-UNCHANGED permit 10
set ip next-hop unchanged
This peering also must be performed on all DCI nodes of other sites.

The multihop MP-eBGP must be able to cross the core network, because overlay edge-to-edge peering is
performed, so enough hops must be allowed.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 95 of 132
By default, eBGP advertisement enforces the local node’s update source address as the next hop for all
network layer reachability information (NLRI), but in VXLAN the next hop is used to advertise the VTEP that is
not the same IP as the loopback used for the MP-eBGP peering, so the next hop must not be changed in an
intermediate-node advertisement. Each eBGP speaker must implement an outbound route map to avoid next-
hop change for the EVPN address family as per the configuration above.
sh l2route evpn mac all
Topology Mac Address Prod Next Hop (s)
----------- -------------- ------ ---------------
1181 0004.1401.0006 Local Po2
1181 0004.1402.0001 BGP 21.21.22.22
Here, the leaf node has detected two different MAC addresses. One is locally connected and detected from
the local Cisco ACI fabric, and the other was learned on the remote VTEP, which is connected to the second
Cisco ACI fabric.
sh bgp l2vpn evpn 0004.1402.0001
Advertised path-id 1
Path type: external, path is valid, is best path, no labeled nexthop
Imported from
21.21.21.21:33948:[2]:[0]:[0]:[48]:[0004.1402.0001]:[0]:[0.0.0.0]/112
AS-Path: 65501 , path sourced external to AS
21.21.22.22 (metric 0) from 21.21.21.21 (21.21.21.21)
Origin IGP, MED not set, localpref 100, weight 0
Received label 31001
Extcommunity: RT:65500:31001 SOO:21.21.22.22:0 ENCAP:8
In this BGP NLRI, the next hop is 21.21.22.22, which is the remote VTEP, and the VNI is 31181.

e. Configure BGP EVPN VNI control advertisement.

The BGP advertisement for a MAC address appends the MAC address to a route distinguisher (rd) as
recommended by the EVPN address family standard, but VXLAN makes no use of this element. More
interesting is the route-target (RT) use. When a VTEP announces a MAC address (with its appended route
distinguisher), it associates an extended community with a discriminator that allows the remote VTEP to learn
the MAC address as part of a specific L2VNI (and consequently as part of the local VLAN mapped to it).

You can automatically generate this route target, but this would require including the autonomous system
number. However, the proposed solution uses a different autonomous system for each Cisco ACI fabric, so
you cannot rely on automatic generation of the route target. Therefore, the recommended approach is to
explicitly define the route target in a symmetric fashion on every VTEP.
evpn
vni 31001 l2
rd auto
route-target import 65500:31001
route-target export 65500:31001

vni 32000 l2
rd auto
route-target import 65500:32000
route-target export 65500:32000
sh bgp l2vpn evpn 0004.1402.0001

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 96 of 132
Advertised path-id 1
Path type: external, path is valid, is best path, no labeled nexthop
Imported from
21.21.21.21:33948:[2]:[0]:[0]:[48]:[0004.1402.0001]:[0]:[0.0.0.0]/112
AS-Path: 65501 , path sourced external to AS
21.21.22.22 (metric 0) from 21.21.21.21 (21.21.21.21)
Origin IGP, MED not set, localpref 100, weight 0
Received label 31001
Extcommunity: RT:100:31001 SOO:21.21.22.22:0 ENCAP:8
f. Example: Write Python script to generate extension for 1000 VLANs.

The VXLAN CLI requires a few lines of configuration for each VNI that you set up. First you need to associate
a VLAN with a VNI. Then you need to create an entry in the NVE interface and define a route target on the
EVPN.

In the testing performed, the VXLAN DCI is set up for 1000 VLANs to cross over, and this can require many
lines of code. Here is an example of Python script running on the four DCI switches to populate VNI
automatically. You can also write a Python script running on a server to populate multiple switches from there.
python
from cli import *
i=1900
while i < 2001:
vni = "3%i" %i
command =" conf ; vlan " + str(i) + " ; vn-segment " + str(vni) + " ; exit"
print command
cli (command)
command = "conf ; int nve1 ; member vni " + str(vni) + " ; ingress-
replication protocol bgp ; exit"
print command
cli (command)
command = "conf ; evpn ; vni " + str(vni) + " l2 ; rd auto ; route-target
import 100:" + str(vni) + " ; route-target export 100:" + str(vni) + " ; exit"
print command
cli (command)
i=i+1
4. Connect the DCI nodes to Cisco ACI using vPC.
a. Create vPC.

The vPC between the DCI node and Cisco ACI fabric is not unusual. The implementation on the DCI side is
simple because the vPC is not associated with any SVI.

Nevertheless, special considerations are required to make the recovery of a leaf node part of a vPC domain.
You need to meet the conditions described here.

Traffic originating from the Cisco ACI fabric should start using the vPC leg to the recovered leaf node only
when the node has fully reestablished connectivity (control plane and data plane) with the rest of the network.

If a node is brought up before the device can successfully establish routing adjacencies, traffic will be
temporarily black-holed (the outage may last several seconds). The recommended solution for this problem is
to configure delay restore for the vPC leg connections, as shown in the sample configuration here:

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 97 of 132
vpc domain 1
peer-switch
role priority 1000
peer-keepalive destination 10.50.138.4
delay restore 180
peer-gateway
auto-recovery
ip arp synchronize
vPC connections between the hosts and the recovering leaf node will be kept down for 180 seconds, providing
the time required for the switch to reestablish routing adjacencies and exchange route information with the
neighbor devices.

Traffic destined for the Cisco ACI fabric and received from the other data center will be steered to the
recovering device as soon as it starts advertising the anycast VTEP IP address to the underlay control plane.

This advertisement may lead to a problem the opposite of the one just discussed. If the advertisement of the
anycast VTEP address occurs before the vPC peer link and the vPC leg connection to the fabric are
recovered, traffic will be black-holed as well.

NX-OS Software Release 7.0(3)I2(2) and later offers an option natively on Cisco Nexus 9000 Series Switches
to eliminate this concern. The option keeps the loopback interface used as the VTEP in the down state for a
certain period of time to help ensure that the recovering node can reestablish connectivity with the vPC peer
before inbound traffic is received from the fabric core. As shown in the sample output here, the default hold-
down-time value is 180 seconds. This timer can be tuned from 0 to 1000 seconds.
sh nve interface nve 1 detail
Source Interface hold-down-time: 180
To create the vPC, with Cisco Nexus 9000 Series in standalone NX-OS mode, a pair of peer links are
required.
interface port-channel1
switchport mode trunk
spanning-tree port type network
vpc peer-link
Then, a standard vPC connection is created to the fabric. Both two links and four links are supported.
interface port-channel2
switchport mode trunk
switchport trunk allowed vlan 1000-1919
mtu 9216
vpc 2
b. Optionally, configure storm control on the VXLAN side.

As stated earlier, you need to control the amount of traffic that is forwarded from one data center to another.
Unicast traffic is not a particular challenge because it reaches only one only virtual machine, but if the
destination MAC address is unknown, then the traffic may be flooded to all the servers in the VNI/VLAN.

The case of unknown unicast MAC address traffic is rare, especially because data centers are no longer
spanning-tree based, but both the Cisco ACI and the VXLAN standalone devices can control it. The DCI model
requires you to drastically rate-limit unknown unicast traffic.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 98 of 132
The case of broadcast traffic is more sensitive. Broadcast traffic reaches the virtual machine host CPU, and
limitations must be drastic. Some measurement shows that a standard virtual machine can handle about 100
Mbps, leading to a requirement of 1 percent of the broadcast traffic on a 10-Gbps link.

Multicast traffic also consists of sensitive frames that reach the CPU, but the rate limiting depends on the
application’s requirements. Below is an example config that applies storm control on the DCI nodes for traffic
coming from the Cisco ACI fabric.
interface port-channel2
storm-control broadcast level 1.00
storm-control multicast level 1.00
storm-control unicast level 1.00

Figure 84 provides a summary of the configuration steps needed to enable VXLAN as a Layer 2 DCI.

Figure 84. Overview of the VXLAN DCI Setup

Testing and Results


This section presents the results of the tests performed for the Cisco ACI dual-fabric design.

Traffic Generator: Emulated Device Configuration


To test the solution, five Spirent traffic generator ports were used: two in each data center and one behind the
WAN router.

DC1 used the following tester ports and emulated endpoints:

● Device name: TnT-14_AP1App_DC1


● Connected to Leaf3, port e1/3, emulating hosts: 10.100.14.100-105 belonging to the subnet that is present
only in DC 1.
● Device name: TnT-14_AP1Web_DC1

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 99 of 132
● Connected to Leaf3, port e1/4, emulating hosts: 10.1.4.100-105 belonging to the subnet stretched between
both data centers.
DC2 used the following tester ports and emulated endpoints:

● Device name: TnT-14_AP1App_DC2


● Connected to Leaf3, port e1/3, emulating hosts: 10.200.14.100-105 belonging to the subnet that is present
only in DC 2.
● Device name: TnT-14_AP1Web_DC2
● Connected to Leaf3, port e1/4, emulating hosts: 10.1.4.110-115 belonging to the subnet stretched between
both data centers.
Behind the WAN router, the following emulated endpoints were used:

● Device name: TnT-14_Wan


● Emulating hosts 192.14.99.111-116 belonging to the subnet that is present on a remote site such as branch
office.

Traffic Generator: Streams


The tests used the following traffic patterns:

● DC1 to DC2, bidirectional intra-subnet traffic:


TnT-14_AP1Web_DC1 <-> TnT-14_AP1Web_DC2
● DC1 to DC2, bidirectional inter-subnet traffic:
TnT-14_AP1App_DC1<-> TnT-14_AP1App_DC2
● DC1 to WAN: from stretched subnet but originating in DC1, bidirectional traffic:
TnT-14_AP1Web_DC1<-> TnT-14_Wan
● DC1 to WAN: from DC1 local subnet, bidirectional traffic:
TnT-14_AP1App_DC1<-> TnT-14_Wan
● DC2 to WAN: from stretched subnet but originating in DC2, bidirectional traffic:
TnT-14_AP1Web_DC2<-> TnT-14_Wan
● DC2 to WAN: from DC2 local subnet, bidirectional traffic:
TnT-14_AP1App_DC2<-> TnT-14_Wan

Each stream was configured for a 10-Mbps load, with a fixed frame length of 1024 with a UDP header.

Testing Overview
Testing was divided into two areas:

● Failure testing for devices outside the Cisco ACI fabrics:


◦ DCI nodes and links
◦ ASA nodes and links
◦ Customer edge router

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 100 of 132
● Failure testing for the Cisco ACI fabric components
◦ Leaf nodes
◦ Spine nodes

Results Summary
Table 1 summarizes the tests performed and the results.

Table 1. Test Results

Test Title On Failure On Recovery


Link from ACI Leaf 1 in DC1 to the local Nexus 9300 VXLAN DCI device 320 ms 122 ms
Nexus 9300 VXLAN DCI device node failure 390 ms 1529 ms
Peer link failure between the Nexus 9300 DCI devices 735 ms 1593 ms
ASA cluster member failure (slave node in DC1) 3255 ms 214 ms
ASA cluster member failure (master node) 3947 ms 0 ms

ASA cluster member failure (slave node in DC2) 3038 ms 0 ms


Customer edge router: link with ACI fabric failure 3094 ms 20 ms
Customer edge router WAN link failure 2745 ms 0 ms

Cisco ACI border leaf node failure 2494 ms 135 ms


Cisco ACI spine node failure 280 ms 0 ms

Test Results: Worst Affected Flows Only


This document reports only the top 4 to 10 worst affected flows. Testing can be repeated live upon request. The
test bed is available for further testing at the Cisco Proof-of-Concept (CPoC) facilities globally.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 101 of 132
Link from ACI Leaf 1 in DC1 to the local Nexus 9300 VXLAN DCI device

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 102 of 132
On Failure

Worst case: 320 ms

On Recovery

Worst case: 122 ms

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 103 of 132
Nexus 9300 VXLAN DCI device node failure

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 104 of 132
On Failure

Worst case: 390 ms

On Recovery

Worst case: 1.5 seconds

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 105 of 132
Peer link failure between the Nexus 9300 DCI devices

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 106 of 132
The vPC secondary brings down its port channels because the peer link is down.

DC1-93-02_i05-9372-02# show vpc


Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 1
Peer status : peer link is down
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 inconsistency reason : Consistency Check Not Performed
vPC role : secondary
Number of vPCs configured : 1
Peer Gateway : Enabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Enabled, timer is off.(timeout = 240s)
Delay-restore status : Timer is off.(timeout = 180s)
Delay-restore SVI status : Timer is off.(timeout = 10s)

vPC Peer-link status


---------------------------------------------------------------------
id Port Status Active vlans
-- ---- ------ --------------------------------------------------
1 Po1 down -

vPC status
----------------------------------------------------------------------
id Port Status Consistency Reason Active vlans
-- ---- ------ ----------- ------ ------------
2 Po2 down success success -

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 107 of 132
On Failure

Worst case: 735 ms

On Recovery

Worst case: 1.6 seconds

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 108 of 132
Cisco ASA Cluster Member Failure (Slave Node in DC1)

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 109 of 132
This test focuses on failure of the ASA cluster member (ASA 2 in DC1).

This was the state of the cluster before the failure:


DC1-ASA-1_i05-5585-01/master# show cluster info
Cluster fw: On
Interface mode: individual
This is "DC1-ASA-1_i05-5585-01" in state MASTER
ID : 1
Version : 9.5(1)
Serial No.: JAD1928006U
CCL IP : 1.1.1.1
CCL MAC : 80e0.1d58.8608
Last join : 11:38:25 UTC Feb 25 2016
Last leave: N/A
Other members in the cluster:
Unit "DC2-ASA-2_E05-asa5585x-02" in state SLAVE
ID : 2
Version : 9.5(1)
Serial No.: JAD170600RE
CCL IP : 1.1.1.4
CCL MAC : acf2.c5f2.c5f0
Last join : 11:48:31 UTC Feb 25 2016
Last leave: 11:46:39 UTC Feb 25 2016
Unit "DC2-ASA-1_i05-5585x-02" in state SLAVE
ID : 3
Version : 9.5(1)
Serial No.: JAD1928009Y
CCL IP : 1.1.1.3
CCL MAC : 54a2.747c.b668
Last join : 11:26:26 UTC Feb 25 2016
Last leave: 11:24:34 UTC Feb 25 2016
Unit "DC1-ASA-2_E05-5585x-01" in state SLAVE
ID : 4
Version : 9.5(1)
Serial No.: JAD170900KX
CCL IP : 1.1.1.2
CCL MAC : acf2.c5f2.c584
Last join : 11:43:08 UTC Feb 25 2016
Last leave: 11:41:16 UTC Feb 25 2016

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 110 of 132
Here is summary of connections across the cluster nodes. ASA2 in DC1 is powered off. This unit was handling 13
connections:

DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show conn count


DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
16 in use, 30 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 9 in use, 24 most used
centralized connections: 5 in use, 15 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
12 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 20 most used
centralized connections: 5 in use, 14 most used

DC2-ASA-1_i05-5585x-02:***********************************************
23 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 27 most used
centralized connections: 5 in use, 35 most used

DC1-ASA-2_E05-5585x-01:***********************************************
13 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 21 most used
centralized connections: 5 in use, 18 most used
DC1-ASA-1_i05-5585-01/TnT-14/master#

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 111 of 132
On Failure

Worst case: 3.2 seconds

At this point, all connection that were handled by the unit that went down have been rebalanced on the other units
in the cluster:

DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show conn count


DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
20 in use, 30 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 25 most used
centralized connections: 3 in use, 16 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
15 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 16 in use, 24 most used
centralized connections: 7 in use, 18 most used

DC2-ASA-1_i05-5585x-02:***********************************************
26 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 12 in use, 28 most used
centralized connections: 7 in use, 36 most used

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 112 of 132
On Recovery

Worst case: 214 ms

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 113 of 132
Cisco ASA Cluster Member Failure (Master Node)

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 114 of 132
This test focused on the failure of the ASA cluster master node (ASA 1 in DC1).

This was the state of the cluster before the failure:

DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show conn count


DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
22 in use, 33 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 7 in use, 25 most used
centralized connections: 5 in use, 23 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
12 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 24 most used
centralized connections: 5 in use, 20 most used

DC2-ASA-1_i05-5585x-02:***********************************************
23 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 28 most used
centralized connections: 5 in use, 37 most used

DC1-ASA-2_E05-5585x-01:***********************************************
7 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 11 most used
centralized connections: 5 in use, 12 most used

On Failure

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 115 of 132
Worst case: 3.9 seconds

A new master was elected based on the priority of ASA2 in DC1:

DC1-ASA-2_E05-5585x-01/slave>
Cluster unit DC1-ASA-2_E05-5585x-01 transitioned from SLAVE to MASTER

DC1-ASA-2_E05-5585x-01/master>
DC1-ASA-2_E05-5585x-01/master>

Connections have been rebalanced across the other nodes:

DC1-ASA-2_E05-5585x-01/TnT-14/master# cluster exec show conn count


DC1-ASA-2_E05-5585x-01(LOCAL):****************************************
19 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 16 most used
centralized connections: 4 in use, 13 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
13 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 14 in use, 26 most used
centralized connections: 5 in use, 22 most used

DC2-ASA-1_i05-5585x-02:***********************************************
24 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 30 most used
centralized connections: 5 in use, 39 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#

On Recovery
The node rejoins the cluster as a slave:
DC1-ASA-2_E05-5585x-01/TnT-14/master#
Beginning configuration replication to Slave DC1-ASA-1_i05-5585-01
End Configuration Replication to slave.
FROM DC1-ASA-1_i05-5585-01:
Cluster unit DC1-ASA-1_i05-5585-01 transitioned from DISABLED to SLAVE

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 116 of 132
Traffic is not affected.

Connections are rebalanced back onto the node:


DC1-ASA-2_E05-5585x-01/TnT-14/master# cluster exec show conn count
DC1-ASA-2_E05-5585x-01(LOCAL):****************************************
22 in use, 28 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 6 in use, 16 most used
centralized connections: 7 in use, 17 most used

DC1-ASA-1_i05-5585-01:************************************************
9 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 11 most used
centralized connections: 5 in use, 9 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
13 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 26 most used
centralized connections: 5 in use, 23 most used

DC2-ASA-1_i05-5585x-02:***********************************************
24 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 9 in use, 30 most used
centralized connections: 5 in use, 42 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 117 of 132
Cisco ASA Cluster Member Failure (Slave Node DC2)

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 118 of 132
This test focused on failure of the ASA cluster node (ASA 1 in DC2).

This was the state of the cluster before the failure:


DC1-ASA-2_E05-5585x-01/TnT-14/master# cluster exec show conn count
DC1-ASA-2_E05-5585x-01(LOCAL):****************************************
20 in use, 28 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 6 in use, 16 most used
centralized connections: 5 in use, 17 most used

DC1-ASA-1_i05-5585-01:************************************************
7 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 11 most used
centralized connections: 5 in use, 9 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
16 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 26 most used
centralized connections: 7 in use, 24 most used

DC2-ASA-1_i05-5585x-02:***********************************************
27 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 9 in use, 30 most used
centralized connections: 7 in use, 44 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 119 of 132
On Failure

Worst case: 3 seconds

On Recovery
The node rejoins the cluster:
DC1-ASA-2_E05-5585x-01/TnT-14/master#

Beginning configuration replication to Slave DC2-ASA-1_i05-5585x-02


End Configuration Replication to slave.
FROM DC2-ASA-1_i05-5585x-02:
Cluster unit DC2-ASA-1_i05-5585x-02 transitioned from DISABLED to SLAVE

Connections are rebalanced back onto the node:


DC1-ASA-2_E05-5585x-01/TnT-14/master# cluster exec show conn count
DC1-ASA-2_E05-5585x-01(LOCAL):****************************************
20 in use, 28 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 6 in use, 17 most used
centralized connections: 5 in use, 20 most used

DC1-ASA-1_i05-5585-01:************************************************
8 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 22 most used
centralized connections: 5 in use, 13 most used

DC2-ASA-2_E05-asa5585x-02:********************************************
29 in use, 37 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 4 in use, 26 most used
centralized connections: 5 in use, 31 most used

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 120 of 132
DC2-ASA-1_i05-5585x-02:***********************************************
7 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 18 in use, 18 most used
centralized connections: 5 in use, 11 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#

Traffic is not affected.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 121 of 132
Customer edge router: link with ACI fabric failure

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 122 of 132
This test focused on failure of the link between local customer edge router in DC1 and ACI fabric 1. This router
uses a port-channel interface to connect to the fabric. Logically, the fabric provides connectivity from the customer
edge router to the outside interface of the local ASA in DC1.

This was the state of the router before the failure:


le06-2911-01_DC1#show interfaces po10
Port-channel10 is up, line protocol is up
Hardware is GEChannel, address is c47d.4ff5.c4a9 (bia c47d.4ff5.c4a8)
MTU 1500 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 3/255, rxload 3/255
Encapsulation 802.1Q Virtual LAN, Vlan ID 1., loopback not set
Keepalive set (10 sec)
ARP type: ARPA, ARP Timeout 04:00:00
No. of active members in this channel: 2
Member 0 : GigabitEthernet0/1 , Full-duplex, 1000Mb/s
Member 1 : GigabitEthernet0/2 , Full-duplex, 1000Mb/s
No. of Non-active members in this channel: 0
No. of PF_JUMBO supported members in this channel : 0

<…snip>

le06-2911-01_DC1#show ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface


192.14.31.102 1 FULL/DROTHER 00:00:02 192.14.20.102 Port-
channel10.1238
192.14.31.105 1 FULL/BDR 00:00:02 192.14.20.105 Port-
channel10.1238

One of the members of port-channel10 failed by pulling the cable between the router and ACI leaf switch.

On Failure

Worst case: 3 seconds

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 123 of 132
On Recovery

Worst case: 20 ms

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 124 of 132
Customer Edge Router WAN Link Failure

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 125 of 132
This test focused on failure of the link between customer edge router in DC1 and the WAN router in DC1. In the
test environment, this router has a single link to the WAN. Logically, BGP is used to peer with the WAN router (next
hop 192.14.10.2). The fabric also provides BGP peering to the customer edge router in DC2 (next hop
192.14.15.2).

This was the BGP table on the router before the failure:
le06-2911-01_DC1#show ip bgp vpnv4 vrf TnT-14
BGP table version is 541, local router ID is 192.10.10.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, x best-
external, f RT-Filter
Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path


Route Distinguisher: 14:14 (default for vrf TnT-14)
*> 10.1.4.0/24 192.14.20.102 20 0 ?
*> 10.100.14.0/24 192.14.20.102 20 0 ?
*> 10.200.14.0/24 192.14.20.102 1 0 4294967295 i
*> 101.101.11.11/32 192.14.20.102 12 0 ?
*> 102.102.12.12/32 192.14.20.102 12 0 ?
* 192.14.13.0 192.14.15.2 4294967290 0 200 300 i
*> 192.14.10.2 0 4294967295 0 300 i
*> 192.14.20.0 0.0.0.0 0 0 ?
*> 192.14.31.0 192.14.20.102 11 0 ?
* 192.14.99.0 192.14.15.2 4294967290 0 200 300 i
*> 192.14.10.2 0 4294967295 0 300 i
le06-2911-01_DC1#

On Failure

Worst case: 2.7 seconds


le06-2911-01_DC1#show ip bgp vpnv4 vrf TnT-14
BGP table version is 547, local router ID is 192.10.10.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, x best-
external, f RT-Filter
Origin codes: i - IGP, e - EGP, ? - incomplete

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 126 of 132
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 14:14 (default for vrf TnT-14)
*> 10.1.4.0/24 192.14.20.102 20 0 ?
*> 10.100.14.0/24 192.14.20.102 20 0 ?
*> 10.200.14.0/24 192.14.20.102 1 0 4294967295 i
*> 101.101.11.11/32 192.14.20.102 12 0 ?
*> 102.102.12.12/32 192.14.20.102 12 0 ?
*> 192.14.13.0 192.14.15.2 4294967290 0 200 300 i
*> 192.14.20.0 0.0.0.0 0 0 ?
*> 192.14.31.0 192.14.20.102 11 0 ?
*> 192.14.99.0 192.14.15.2 4294967290 0 200 300 i

The backup link over the Cisco ACI fabric to the customer edge router in DC2 is now being used to reach the
subnets present on the remote sites such as a branch office.

On Recovery

Traffic is not affected.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 127 of 132
Cisco ACI Border Leaf Node Failure

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 128 of 132
This test focused on failure of the Cisco ACI border node (leaf 101).

On Failure

Worst case: 2.4 seconds

On Recovery

Worst case: 135 ms

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 129 of 132
Cisco ACI Spine Nodespine Failure

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 130 of 132
This test focused on failure of the Cisco ACI spine node (spine 101). Because each leaf can be reached through
two different spine nodes, the failure of one of the spine nodes has little impact on traffic.

On Failure

Worst case: 280 ms

On Recovery

Traffic is not affected.

Conclusion
This document provides a guide to designing and deploying Cisco Application Centric Infrastructure in two data
centers using an active-active architecture. Enterprises and services providers require a data center environment
that is continuously available and that protects against single points of failure, including failure of an entire data
center. The solution described in this document address those needs by offering a software-defined multiple–data
center infrastructure that reduces total cost of ownership (TCO), accelerates data center application deployment,
and supports business continuity.

The document defines the characteristics of a Cisco ACI dual-fabric deployment consisting of independent Cisco
Application Policy Infrastructure Controller clusters for each site and provides an overview of the design. For data
center interconnection for Layer 2 extension, two options are presented: one based on dark fiber that uses back-to-
back vPC, and one that uses VXLAN as a Layer 2 DCI over a Layer 3 core.

In Cisco ACI, in addition to considering connectivity, you need to consider Cisco ACI policy. The document
discussed policy design and application for Layer 2 and Layer 3 traffic between sites and between the WAN and
the Cisco ACI data centers.

Cisco ACI also supports integration with virtual machine managers and hypervisors, allowing the fabric to provide
network services to virtual machines. This document discussed how this integration works for a dual-fabric design
across two data centers, and how Cisco ACI supports cross-data center live migration when used in combination
with VMware vSphere Release 6.0 or later.

The document also discussed how security services are integrated in an active-active architecture dual-data center
design through the use of Cisco ASA firewalls in active- standby mode or, preferably, in ASA cluster mode.

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 131 of 132
Multitenancy is built in Cisco ACI. This document describes how to preserve and maintain multitenancy across data
centers and also for flows to the WAN. The document also discusses integration with the WAN on both sites and
the traffic flows between remote sites such as branch offices and the data center.

The design presented in this document has been validated by Cisco in a lab environment that replicates a real-
world customer setup, and therefore detailed test results, including convergence time, are provided.

Ultimately, this document provides a reference guide to help you design and deploy Cisco ACI in two data centers
to meet the business needs of an always-available, multisite network infrastructure.

Demonstrations of the Cisco ACI Dual Fabric Design


See the following links for videos that demonstrates the design presented in this document:

● Cisco ACI Multisite: Geo-Distributed Wireless LAN Controllers over Two Cisco ACI Fabrics: This
demonstration shows a geographically distributed redundant solution for a Cisco Wireless LAN Controller
deployed on top of the design described in this document.
● Cisco ASA Cluster over Two Cisco ACI Fabrics in a Dual–Data Center Design: This demonstration shows
the ASA cluster working as explained in this document.
● Cisco ACI Dual DC Innovations: Cisco ACI Across Two Data Centers, Intersite Toolkit, and Cross-vCenter
vMotion: This video demonstrates some of the Cisco ACI innovations introduced in Cisco ACI Release
1.2(1) that were used on the design presented in this document. This video demonstrates Cisco ACI
integration with VMware vSphere 6.0 and its support for cross-vCenter vMotion and virtual IP address and
virtual MAC address capabilities for optimized forwarding.

For More Information


For more information about Cisco ACI, please refer to the documents available at
http://www.cisco.com/c/en/us/solutions/data-center-virtualization/application-centric-infrastructure/white-paper-
listing.html.

Printed in USA C11-737077-00 05/16

© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 132 of 132

Você também pode gostar