Escolar Documentos
Profissional Documentos
Cultura Documentos
. . . . . Introduction
.................................................................... 2
. . . . . Authors
.................................................................... 4
. . . . . Acknowledgements
.................................................................... 5
. . . . . Organization
. . . . . . . . . . . . .of
. . this
. . . . book
................................................. 6
. . . . . Intended
. . . . . . . . . Audience
........................................................... 9
. . . . . Book
. . . . . Writing
. . . . . . . .Methodology
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
. . . . . Why
. . . . . VXLAN
. . . . . . .Overlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
. . . . . Why
. . . . . a. .Control
. . . . . . . Plane
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
. . . . . Looking
. . . . . . . . Ahead
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Fundamental Concepts 20
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
. . . . . What
. . . . . .is. .VXLAN?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
. . . . . How
. . . . . Does
. . . . . VXLAN
. . . . . . . Work?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
. . . . . Networking
. . . . . . . . . . . .in. .a. VXLAN
. . . . . . . Fabric
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Software Overlays 33
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
. . . . . Host-Based
. . . . . . . . . . . .Overlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
. . . . . Underlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
. . . . . Overlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
. . . . . Host
. . . . . Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
. . . . . Layer
. . . . . .3. .Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
. . . . . Layer
. . . . . .2. .Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
. . . . . Integration
. . . . . . . . . . . and
. . . . Migration
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Layer4-Layer7 Services 91
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
. . . . . Use
. . . . Cases
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Multi-POD & Multi-Site Designs 114
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
. . . . . Fundamentals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
. . . . . Multi-POD
. . . . . . . . . . .Design
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
. . . . . Design
. . . . . . . Options
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135
. . . . . Building
. . . . . . . . the
. . . .Multi-Site
. . . . . . . . . .Inter-Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .138
. . . . . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146
. . . . . Management
. . . . . . . . . . . . .tasks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .148
. . . . . Available
. . . . . . . . .Tools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153
Acronyms 162
. . . . . Acronyms
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163
Preface
Preface 2
Introduction
VXLAN EVPN
For many years now, VLANs have been the de-facto method for pro vid
ing net
work seg
-
men ta
tion in data center net
works. Standardized as IEEE 802.1Q, VLANs lever
age tra
-
di
tional loop preven
tion techniques such as Spanning Tree Pro
to
col which not only im
-
poses restric
tions on net
work design and resiliency, but it also re
sults in an in
ef
fi
cient
use of avail
able network links due to the block ing of re
dundant paths, re quired to en-
sure a loop free network topology.
Mod ern data centers require an evo lution from the re straints of tradi
tional Layer 2 net -
works. Cisco, in partner ship with other lead ing vendors, proposed the Vir tual Ex ten
si
-
ble LAN (VXLAN) stan dard to the IETF as a so lu
tion to the data cen ter network chal -
lenges posed by tra di
tional VLAN tech nol
ogy and the Span ning Tree Pro tocol. At its
core, VXLAN pro vides ben efits of elastic work load placement, higher scal a
bil
ity of Layer
2 segmen ta
tion, and con nec tiv
ity ex ten
sion across the Layer 3 net work bound ary.
How ever, without an intel
li
gent con trol plane, VXLAN has its lim its due to its flood and
learn behavior.
Preface 3
In sum
mary, the ad
van
tages pro
vided by a VXLAN EVPN so
lu
tion are as follows:
• Stan
dards-based Over
lay (VXLAN) with stan
dards-based con
trol plane (BGP)
• Layer 2 MAC and Layer 3 IP in
for
ma
tion dis
tri
b
u
tion by con
trol plane (BGP)
• For
ward
ing de
ci
sion based on scal
able con
trol plane (min
i
mizes flooding)
• In
te
grated Rout
ing/Bridg
ing (IRB) for Op
ti
mized For
ward
ing in the Overlay
• Lever
ages Layer 3 ECMP – all links for
ward
ing – in the underlay
• Sig
nif
i
cantly larger name
space in the over
lay (16M segments)
• In
te
gra
tion of phys
i
cal and vir
tual net
works with hy
brid overlays
• Fa
cil
i
ta
tion of Soft
ware-De
fined-Net
work
ing (SDN)
Authors
• Bren
den Bu
resh - Sys
tems Engineering
• Dan Eline - Sys
tems Engineering
• David Jansen - Sys
tems Engineering
• Jason Gmit
ter - Sys
tems Engineering
• Jeff Os
ter
miller - Sys
tems Engineering
• Jose Moreno - Sys
tems Engineering
• Kenny Lei - Tech
ni
cal Marketing
• Lil
ian Quan - Tech
ni
cal Marketing
• Lukas Krat
tiger - Tech
ni
cal Marketing
• Max Ardica - Tech
ni
cal Marketing
• Rahul Para
meswaran - Tech
ni
cal Marketing
• Rob Tap
pen
den - Sys
tems Engineering
• Satish Kon
dalam - Tech
ni
cal Marketing
Preface 5
Acknowledgements
Fundamental Concepts
This chap ter shifts the focus from the "Why" to the "What". Es
sential con cepts for un
-
derstand
ing the tech nol
ogy are laid out, to set the nec
es
sary founda tion for under
-
standing the rest of the book. The ba sics of VXLAN tech
nol
ogy are articu
lated, as well
as the funda
men tals of net
working in a VXLAN Fabric.
Software Overlays
The inter
sec
tion of vir
tual and phys i
cal net
working is dis
cussed in order to help the
reader gain the required perspective to decide how to best imple
ment VXLAN tech nol
-
ogy to support these vir
tu
al
ized environments.
Layer4-Layer7 Services
Ether
net routers and switches are not the only el ements pro viding net work ser
vices in
a data center. Layer 4-Layer 7 de vices like fire
walls or appli
ca
tion de liv
ery con
trollers
are often indispensable for se
cure and effi
cient ap
pli
cation deliv
ery. This chapter ad-
dresses how to con nect these net
work ap pli
ances to the VXLAN Fab ric so the data cen-
ter net
work of fers the best per
for
mance and avail abil
ity end-to-end.
Intended Audience
Introduction
IT is evolv
ing to
ward a cloud con sumption model. This tran si
tion af
fects the way appli
-
cations are being architected and im
ple
mented, driving an evo lu
tion in data center in
-
frastruc
ture design to meet these chang ing re
quirements. As the foun da
tion of the
mod ern data center, the net
work must also take part in this evolu
tion while also meet -
ing the in
creas
ing de
mands of server virtualization and new mi croservices-based ar
chi
-
tec
tures. This demands a new par
a
digm that must de liver on the fol
lowing areas:
• Flex
i
bil
ity to allow work
load mo
bil
ity across any floor tile in any site
• Re
siliency to main
tain ser
vice lev
els even in fail
ure con
di
tions (bet
ter fault
isolation)
• Multi-ten
ancy ca
pa
bil
i
ties and bet
ter work
load segmentation
• Per
for
mance to pro
vide for ad
e
quate band
width and pre
dictable la
tency, in
de
pen
-
dent of scale for de
mand
ing workloads
• Scal
a
bil
ity from small en
vi
ron
ments to cloud scale while main
tain
ing the above
characteristics
As a re
sult, mod ern data center networks are evolv ing from tradi
tional hi
er
archi
cal de
-
signs to hori
zontally-ori
ented spine-leaf architec
tures with hosts and ser vices dis
trib
-
uted through out the network. These net works are ca pable of supporting the in
creas-
ingly common east-west traf fic flows ex
peri
enced in mod ern appli
ca
tions. In ad
di
tion,
there are clus ter
ing technolo gies and vir
tual
iza
tion techniques that re quire Layer 2
adjacency.
Cisco Ap
pli
cation Cen
tric In
fra
struc
ture (ACI) is an in
nov
a
tive data center ar
chi
tec
ture
that sim
pli
fies, op
ti
mizes and ac cel
er
ates the en tire appli
ca
tion life
cy
cle through a
common pol icy manage
ment frame work. ACI pro vides a turnkey so lu
tion to build and
oper
ate an auto
mated cloud in fra
structure. An alternative option is a VXLAN Fab ric
with BGP EVPN con trol plane that provides a scal
able, flexi
ble and man age
able so
lu
tion
to support growing de
mands of cloud environments.
Network over lays are a technique used in state-of-the-art data centers to cre
ate a flex
-
i
ble in
fra
structure over an in her
ently sta
tic net
work by vir
tu
al
iz
ing the network. Be -
fore going into the de tails of how over
lays work, the chal
lenges they face, and the so lu
-
tions to overlay problems, it's worth spending some time to understand why traditional
net
works are so static.
Cisco, in partner
ship with other lead ing vendors, pro
posed the Vir tual Ex
ten si
ble LAN
(VXLAN) stan dard to the IETF as a solu
tion to the data center network chal
lenges posed
by tradi
tional VLAN technol
ogy. The VXLAN stan dard provides for the elastic work load
placement and higher scala
bil
ity of Layer 2 segmenta
tion that is re
quired by today’s ap -
pli
ca
tion demands.
VXLAN is designed to provide the same Eth ernet Layer 2 network services as VLANs do
today, but with greater exten
si
bil
ity and flex
i
bil
ity. Im
ple
menting VXLAN tech nolo
gies
in the net
work will pro
vide the fol
lowing bene
fits to every work
load in the data center:
• Flex
i
ble place
ment of any work
load in any rack through
out and be
tween data
centers
• De
cou
pling be
tween phys
i
cal and vir
tual networks
• Large Layer 2 net
work to pro
vide work
load mobility
• Cen
tral
ized Man
age
ment, pro
vi
sion
ing, and au
toma
tion, from a controller
• Scale, per
for
mance, agility and stream
lined operations
• Bet
ter uti
liza
tion of avail
able net
work paths in the un
der
ly
ing infrastructure
Why a New Approach 16
Sec
ondly, there must be a con
trol plane where the lo
ca
tion of a de
vice or ap
pli
ca
tion
can be looked up and the re
sult used to en
cap
su
late the packet so that it may be for
-
warded to its destination.
Thirdly, there must be a way to up date the control plane such that it is al
ways accu
-
rate. Having the wrong infor
mation in the con
trol plane could re
sult in packets being
sent to the wrong lo
ca
tion and likely dropped.
The sec
ond task, con trol plane lookup and encapsu
la
tion, is re
ally an issue of per
for
-
mance and capacity. If these functions were per
formed in soft ware, they would con -
sume valu
able CPU re sources and add latency when com pared to hard ware solutions.
MP-BGP EVPN
MP-BGP EVPN for VXLAN pro vides a distrib
uted control plane solu
tion that sig
nif
i
-
cantly im
proves the ability to build and in ter
con
nect SDN over lay net
works. MP-BGP
EVPN control plane for VXLAN of fers the fol
low
ing key benefits:
• Con
trol plane learn
ing for end host Layer 2 and Layer 3 reach
a
bil
ity information.
• Abil
ity to build a more ro
bust and scal
able VXLAN over
lay network
• Sup
ports multi-tenancy
• Pro
vides in
te
grated rout
ing and bridging
• Min
i
mizes net
work flood
ing through pro
to
col-dri
ven host MAC/IP route
distribution
• ARP sup
pres
sion to min
i
mize un
nec
es
sary flooding
• Peer dis
cov
ery and au
then
ti
ca
tion to im
prove security
• Op
ti
mal east-west and north-south traf
fic forwarding
Why a New Approach 18
Looking Ahead
A prominent ex
ample for the need of this flex
i
ble pro
to
col ex
ten
sion is Ser
vice Chain
ing
and the re
lated NSH approach.
Cre
ation of yet an
other en
cap
su
la
tion pro
to
col stands to add more con
fu
sion to the al
-
ready crowded en capsu
la
tion pro to
col space. The ex ten
si
bil
ity of VXLAN-GPE and
NSH promises to both re duce the amount of en capsu
la
tion in the in
dustry and accom -
modate future network en capsu
la
tion requirements. Gen eve, VXLAN-GPE, and NSH
are all re
cent pro
to
col drafts proposed to the IETF. The three pro to
cols pro
vide simi
lar
ap
proaches to achieve flex i
ble pro
tocol map pings. While Gen eve uses variable length
op
tions, VXLAN-GPE and NSH use fixed size op tions. Cisco sup ports open stan dards
and will contin
u
ously reevalu
ate support for future encapsulations.
Introduction
What is VXLAN?
Vir
tual Exten
si
ble LAN (VXLAN) as de fined in RFC 7348 is an over lay technol
ogy de -
signed to provide Layer 2 and Layer 3 con nectiv
ity ser
vices over a generic IP net work.
IP net
works pro vide in
creased scal
a
bil
ity, bal
anced per for
mance and pre dictable fail
ure
re
covery. VXLAN achieves this by tun neling Layer 2 frames in side of IP packets. VXLAN
re
quires only IP reach
a
bil
ity be
tween the VXLAN edge de
vices, pro
vided by an IP rout
-
ing protocol.
The ter
mi
nol
ogy used when de
scrib
ing the key com
po
nents of a VXLAN Fab
ric include:
• VTEP – Vir
tual Tun
nel End
point: The hard
ware or soft
ware el
e
ment at the edge of
the net
work responsi
ble for in
stan
ti
at
ing the VXLAN tun
nel and per
form
ing VXLAN
en
capsu
la
tion and decapsulation
• VNI – Vir
tual Net
work In
stance: a log
i
cal net
work in
stance pro
vid
ing Layer 2 or
Layer 3 ser
vices and defin
ing a Layer 2 broad
cast domain
• VNID – Vir
tual Net
work Iden
ti
fier: a 24-bit seg
ment ID that al
lows the ad
dress
ing of
up to 16 mil
lion log
i
cal net
works to be pre
sent in the same ad
min
is
tra
tive domain
• Bridge Do
main: A set of log
i
cal or phys
i
cal ports that share the same flood
ing or
broad
cast characteristics
Al
ter
na
tively, a software-based VTEP re moves the de pendency from the hard ware
switches, al
beit at the ex
pense of per
for
mance. Addi
tion
ally, VXLAN de ploy
ments could
adopt hybrid approaches, where the VXLAN tun nels are es
tablished be
tween hard
ware
and software VTEPs. More in for
mation on this can be found in the Soft ware Overlays
chapter.
Fundamental Concepts 24
As dis
cussed in the in
troduc
tion, the use of VXLAN tech
nol
ogy brings sev
eral ben
e
fits
to Data Cen
ter net
work ing which include:
• Multi-ten
ancy: VXLAN Fab
rics in
her
ently sup
port multi-ten
ancy both at Layer 2
(sep
arate Layer 2 VNIs rep resent log
i
cally iso
lated bridg
ing do
mains) and Layer
3 (by defin
ing dif
fer
ent VRFs for each supported tenant)
• Mo
bil
ity: The over
lay ca
pa
bil
ity of
fered by VXLAN pro
vides Layer 2 ex
ten
sion ser
-
vice across the data cen
ter to pro
vide flex
i
ble de
ploy
ment and mo
bil
ity of phys
i
cal
and vir
tual endpoints
• In
creased Layer 2 seg
ment scale: VLAN-based de
signs are lim
ited to a max
i
mum of
4,096 Layer 2 seg
ments due to the use of a 12 bit VLAN ID. VXLAN in tro
duces a 24-
bit VNID that the
o
ret
i
cally sup
ports up to 16 mil
lion dis
tinct segments
• Multi-path Layer 2 sup
port: Tra
di
tional Layer 2 net
works sup
port one ac
tive path
because Spanning Tree (STP) ex
pects and en forces a loop-free topol
ogy by blocking
re
dundant paths. A VXLAN Fab ric lever
ages a Layer 3 under
lay net
work for the use
of mul
ti
ple ac
tive paths
Fundamental Concepts 25
Data Plane
VXLAN re quires an under
ly
ing transport net
work that performs data plane forwarding.
This data plane for
warding is re
quired to pro
vide uni
cast com muni
ca
tion be
tween end -
points connected to the Fabric. The fol
low
ing di
a
gram il
lus
trates data plane for
warding
in a VXLAN network.
Two dif
fer
ent ap
proaches can be taken to allow trans
mis
sion of BUM traf
fic across the
VXLAN Fabric:
1 Lever
age mul
ti
cast tech
nol
ogy in the un
der
lay net
work (Pro
to
col In
de
pen
dent Mul
-
ti
cast or PIM), to make use of the na
tive repli
ca
tion ca
pa
bil
i
ties of the Fab
ric spines
to de
liver traf
fic to all the edge VTEP devices.
2 In sce
nar
ios where mul
ti
cast can
not be de
ployed, it is pos
si
ble to make use of the
source-repli
ca
tion ca
pa
bil
i
ties of the VTEP nodes that cre
ate mul
ti
ple uni
cast
copies of the BUM frames to be sent to each re
mote VTEP de
vice. This ap
proach is
not as ef
fi
cient as using mul
ti
cast for BUM traf
fic replication.
VXLAN doesn't change the se man tics of Layer 2 or Layer 3 for warding and al
lows the
VTEP to per form bridg ing and routing functions while lever ag
ing the VXLAN tun nel for
data plane forward ing. As such, the VTEP of fers a set of dif
fer
ent gateway functions as
out
lined in the fol
lowing diagram.
• Layer 2 Gate
way: VXLAN to VLAN bridg
ing maps a VNI seg
ment to a VLAN to cre
ate
a com
mon bridge domain
• Layer 3 Gate
way (VXLAN Router): VXLAN to VXLAN rout
ing pro
vides Layer 3 con
-
nec
tiv
ity be
tween two VNIs na
tively so no de
cap
su
la
tion func
tion is required
• Layer 3 Gate
way (VXLAN Router): VXLAN to VLAN rout
ing pro
vides Layer 3 con
nec
-
tiv
ity be
tween a VNI and a VLAN
Fundamental Concepts 27
Control Plane
Fundamental Concepts 28
Control Plane
The VXLAN RFC has to date only con cerned itself with the trans port (data plane) of
traffic, en
sur
ing con nec
tivity to all hosts in a VXLAN do main. The con trol plane, or
method by which VXLAN reach a
bil
ity and learning oc curs, was achieved through what
is known as flood and learn be havior. Simply speak ing, flood and learn is a data-dri ven
method ol
ogy wherein a VTEP that doesn’t know the lo cation of a given des ti
nation
MAC floods the frame onto the VXLAN’s as so
ci
ated mul ti
cast group. Mul ti
cast is typ i
-
cally used in order to pro vide a more man ageable approach to multi-des ti
nation traffic.
Instead of learning the source in terface asso
ci
ated with a frame’s source MAC ad dress,
the host learns the en capsulating source IP ad dress of the re mote VTEP. Flood and
learn method ol
ogy is concerned with both the dis covery (be tween peers) of VTEPs as
well as re mote endpoint location learning.
In tra
di
tional Layer 2 access networks, the Layer 3 de fault gateway is most com monly
placed at the aggre
gation layer. Gen er
ally, the pair of aggregation switches lever
age a
first-hop re
dundancy pro to
col such as HSRP, VRRP or GLBP to pro vide a re
dun
dant de
-
fault gate
way IP address. Depend ing on con fig
u
ra
tion and pro tocol, these may be con
-
fig
ured for ac
tive/standby or ac
tive/ac
tive redundancy.
With asym metric IRB, the ingress VTEP is per form ing both rout ing and bridg ing,
whereas the egress VTEP is only per form ing bridging. As a result, the re
turn traf
fic will
take a different VNI than the source traf fic. This necessi
tates that the source and des ti
-
na
tion VNIs re side on both the ingress and egress VTEPs. This leads to a more com plex
config
uration as all switches need to be con fig
ured for all pos si
ble VNIs. Perhaps a
more press ing con sid
er
a
tion is the scal
ing im pli
cations of all de
vices poten
tially need -
ing to learn a con sider
ably larger number of endpoints.
In symmetric IRB, both the ingress and egress VTEP pro vide both L2 and L3 forwarding.
This re
sults in predictable for
warding be
havior. As a re
sult, only the VNIs of lo
cally-at
-
tached end points need to be de fined in a VTEP (plus the tran sit L3 VNI), which in turn
sim
pli
fies con
fig
u
ration and re
duces scale requirements through op ti
mized use of ARP
and the MAC ad dress table. This re
sults in bet
ter scale in terms of the total number of
VNIs a VXLAN Fab ric can support.
It is important to keep in mind that as both meth ods are defined in the standard, con -
sidera
tion must be given to de vice se
lec
tion and the impli
cations for in
ter
op
er
ability.
For ex ample, Cisco supports only symmetric IRB on the Nexus plat forms as it offers
better scalability.
Software Overlays
Software Overlays 34
Introduction
Server vir
tual
ization has transformed the way in which data cen ters are op er
ated, and
the vast majority of data centers today im ple
ment it to some de gree. How ever, making
the as
sump tion that those data cen ters run ex
clu
sively vir
tu
al
ized work loads would be
a mis
take. Many or ga
ni
za
tions still make use of main frames, for ex am ple. Moreover,
new ap pli
ca
tions that do not re quire server vir
tual
iza
tion are com ing into the main
stage, such as cloud-based soft ware that makes use of Linux con tain
ers, or modern
scale-out ap pli
ca
tions such as Big Data, that de liver oper
a
tional bene
fits and scale
without the need of a hypervisor.
Al
though VXLAN is a generic over lay con cept that is com monly de ployed in the net -
work, it is some
times as
soci
ated with server vir tu
al
ization and hypervi
sors. This chap-
ter covers the ad
van
tages and dis ad
vantages of im plementing VXLAN on vir tual
ized
hosts, and how to re
al
ize the most ben e
fit out of this technology, keeping in mind that
one of the main rea
sons for inter
est in VXLAN is its open ness, that avoids vendor lock-
in (ven
dor or hypervisor).
Software Overlays 35
Host-Based Overlay
Server virtual
iza
tion of
fers sig
nif
i
cant bene
fits in
cluding flex
i
bil
ity and agility in de
liv
er
-
ing com pute ser vices in the data cen ter. Tradi
tion
ally, network ing to the hypervi
sor is
provided via VLAN trans port, and there is a new trend to adopt host-based VXLAN
overlays to im prove agility and automation of the network layer.
a soft
ware only over
lay often re
sults in a sub-opti
mal network solu
tion which does not
take into ac
count the broader aspects of op er
a
tions, in
te
gra
tion, and per
for
mance for
the net
work as a whole.
In addi
tion to the CPU im pact introduced with host-based over lays, the net work team
has to pro vide extra ef forts in trou bleshooting due to the lack of cor relation between
the over lay and the un derlay net
works. In re gards to CPU im pact, the per formance of a
software VTEP is de pen dent on CPU and mem ory avail
able on the hypervi
sor. Some im -
plemen ta
tions run the VTEP func tion in kernel space, others in user space. Both op -
tions must de liver the nec es
sary packet pro cess
ing required for ef
fi
cient ap pli
ca
tion
deliv
ery. These so lu
tions typ i
cally struggle to deliver line-rate through put even with
hard ware assis
tance at the server NIC.
Addi
tion
ally, host-based overlay net
work solutions are pri
marily fo
cused on net working
for vir
tual servers with
out con sid
er
a
tion for phys i
cal workloads or other ex ist
ing ser
-
vices in
side or outside the data center. Con nectiv
ity to both phys i
cal servers and re-
sources be yond the virtual network typi
cally require gateways, either in software or
hardware which must be in te
grated with the phys i
cal network.
The VM Tracker (VMT) func tion on Nexus leaf switches pro vides vis
i
bil
ity of the hyper
vi
sor
hosts and the VMs con nected to the VXLAN Fab ric so the net work can take de ci
sions upon
that infor
mation. For example, the VM Tracker auto-con fig fea
ture en ables au to
mated
provi
sion
ing of network re
sources to sup port vir
tual machines in a VMware vSphere en vi
-
ronment. VM Tracker com mu ni
cates with VMware vCen ter Server to re trieve in
for
mation
re
lat
ing to the vir
tual net
work con fig
u
ra
tion. The information includes:
• Phys
i
cal port attachments
• VDS port groups as
sign
ment to VM
The net
work con fig
u
ra
tion that is dy
nami
cally pro
vi
sioned, de
pend
ing on the pre
vi
ous
in
for
ma
tion, in
cludes the follow
ing attributes:
• VLAN provisioning
• VNI allocation
• L3 gateway
• VRF provisioning
This so
lu
tion provides the perfor
mance of hard ware-based en capsu
la
tion without hav
-
ing to up
grade the host phys i
cal NICs. This is especially rel
e
vant in the case of virtual
network functions. For example, certain host-based over lays use a virtual gate
way to
in
ter
act with the rest of the world, as already men tioned in a pre vi
ous section. The per
-
for
mance dis cus
sion is crit
i
cal in this case, because that sin gle vir
tual gateway might
become a bottle
neck for the whole vir tual environment.
Fur
ther de
tails about the Cisco Vir tual Topology Sys
tem are provided in the Manage-
ment And Op era
tions chap
ter, but below is a brief sum
mary of Cisco VTS architecture.
The Virtual Topol ogy Controller (VTC) is the single point of man agement for hy brid
overlays to con
figure, manage and op er
ate a VXLAN Fab ric with MP-BGP EVPN con trol
plane. The man age ment layer sup ports in
te
gra
tion with hy per
vi
sors such as VMware
vSphere or Open stack/KVM so that net work con structs can be directly provi
sioned
from the hy per
visor User In
terface. The northbound REST APIs en able in
te
gra
tion with
third party tools.
The con
trol plane is rep
re
sented by a vir tual
ized IOS-XR router to provide in
te
gra
tion
with MP-BGP EVPN and ad vertise reachabil
ity in
for
ma
tion to the soft
ware VTEP itself
over an API. The soft ware VTEP named Vir tual Topol
ogy For warder (VTF) provides
VXLAN encapsu
la
tion ca
pa
bil
ity in the hypervisor.
More de
tails on Cisco Vir
tual Topol
ogy Sys
tem ar
chi
tec
ture are avail
able at https://
www.
cisco.
com/ go/vts
Single-POD VXLAN
Design
Single-POD VXLAN Design 43
Introduction
In clas
sic hi
er
ar
chi
cal network designs, the access and ag gre
ga
tion lay
ers to
gether pro-
vide Layer 2 and Layer 3 func tional
ity as a building block for data cen ter con
nectiv
ity.
In smaller data center envi
ron
ments, this sin gle build ing block would provide suf
fi
cient
scale to meet the en tire de
mands for con nectivity and perfor
mance. As the en vi
ron-
ment scales to meet the in creased demands of the larger data cen ter, this building
block is typ
i
cally repli
cated with an addi
tional core layer in
tro
duced to con nect these
to
gether. These build ing blocks are com monly re ferred to as a Point of De liv
ery, or
POD, and allow for con sis
tent, mod
u
lar scale as the en
vi
ron
ment grows.
A single VXLAN POD can scale to hun dreds of switches and thou sands of ports which
will meet the de mands of many en ter
prise data cen ter en
vi
ron
ments; how ever, to meet
more com plex or larger scale re quirements, the VXLAN POD may be repli cated in the
form of a multi-POD de sign. In a typ i
cal deployment with mul ti
ple data center lo
ca
-
tions, these VXLAN Fab rics, whether sin gle or multi-POD- based, will be de ployed to-
gether as a multi-site VXLAN de sign. Both the multi-POD and multi-site de ploy
ment
types are de scribed fur ther in the multi-POD and multi-site De signs chapter. Ad
di
tion
-
ally, the connectiv
ity of Layer 2 and Layer 3 to the ex ter
nal net
work do main is covered
in the Exter
nal Con nectivity for the VXLAN Fab ric chapter.
Underlay
In build
ing a VXLAN EVPN Fab ric, it is es
sential to construct an appro
pri
ate under
lay
net
work as this will pro vide a scal
able, available and func tional foun
da
tion to support
the overlay. This sec
tion in
cludes impor tant consid
era
tions for the un
der
lay design.
MTU
Fol
low
ing are the IP ad
dress re
quire
ment for a cou
ple dif
fer
ent scenarios.
Single-POD VXLAN Design 46
In the ex
ample above, in a small net
work with 4 spine and 6 leaf switches, there will be
a mini
mum of 24 point-to-point links, re quir
ing a total of 68 ad
dresses for the Fabric
under
lay. This number will ex
ponen
tially in
crease to 408 in a larger scale sce
nario of 4
spine and 40 leaf switches.
for
warding be
hav
ior of BGP, there
fore, spe
cific at
ten
tion is needed to achieve equiv
a
-
lent multi-pathing as would be achieved when using SPF-based IGPs in the underlay.
When se lecting rout ing pro to cols for use in the un derlay, it is imperative to consider
how the over lay control plane pro to
col func tions and should be con figured. By using
the same pro tocol for the un der lay and over lay, a clear sep a
ration of these two do mains
can become blurred. There fore, when de signing an over lay network, it is a good prac -
tice to indepen dently build a trans port network as has been done in MPLS. The de ploy-
ment of an IGP in the un der lay offers this sep a
ra
tion of un der lay and over lay control
proto
col. This pro vides a very lean rout ing do main for the trans port net work that con -
sists of only loop back and point-to-point in ter
faces. At the same time, MAC and IP
reachabil
ity for the over lay ex ists in a dif
fer
ent pro tocol, namely MP-BGP EVPN.
IP Multicast Recommendation
IP mul ti
cast pro
vides an ef
fi
cient mech
a
nism for the dis
tri
b
u
tion of multi-des
ti
na
tion
traf
fic in the Fab
ric underlay.
To deploy IP multi
cast in the underlay, a Pro
to
col Inde
pendent Mul ti
cast (PIM) rout
ing
pro
tocol needs to be en abled and must be con sis
tent across all the devices in the un
-
der
lay network. The two com mon PIM pro to
cols are Sparse-Mode (PIM-ASM) and Bidi -
rec
tional (PIM-Bidir). This implies the requirement to deploy rendezvous Points (RPs).
It is im
por
tant to re
mem ber that the VTEP nodes rep re
sent the sources and des
ti
na
-
tions of the mul
ti
cast traf
fic used to carry BUM traf
fic be
tween endpoints con
nected to
those devices.
Single-POD VXLAN Design 50
Nor
mally, the RPs would be de ployed on the spine nodes, given the cen
tral po
si
tion
those de
vices play in the Fabric.
• Mul
ti
cast Source Dis
cov
ery Pro
to
col (MSDP): this op
tion has been around for a long
time and it is widely avail
able across differ
ent switches and routers. MSDP sessions
are es
tablished be
tween RP de vices to exchange in
for
mation about source and re
-
ceivers for each given mul ti
cast group
• PIM with Any
cast RP: this op
tion is cur
rently sup
ported only on Cisco Nexus plat
-
forms and lever
ages PIM as con
trol plane to syn
chro
nize state be
tween RPs
Ingress Replication
Single-POD VXLAN Design 51
Ingress Replication
Ingress repli ca
tion, also known as Head-End repli ca
tion, may be used as an al ter
native
to IP mul ti
cast to carry the BUM traf fic in
side the Fabric. One reason for using this al
-
ter
nate method is that IP mul ti
cast is not always an avail able op
tion due to hardware
and soft ware con straints. IP multi
cast may also not be pre ferred due to perceived com -
plexity by the network op era
tions team.
interface nve1
no shutdown
source-interface loopback0
host-reachability protocol bgp
member vni 30000
ingress-replication protocol bgp
member vni 30001
ingress-replication protocol bgp
Overlay
After build
ing a solid foun
dation for the VXLAN net work with the un der
lay, the overlay
concepts are equally impor
tant to provide the re
quired func
tion
al
ity and flexibility.
MP-BGP EVPN
EVPN uses MP-BGP as the rout ing pro
to
col to dis
trib
ute reacha
bil
ity in
for
ma
tion for
the VXLAN overlay network, in
clud
ing end
point MAC ad dresses, endpoint IP ad
dresses,
and sub
net reacha
bil
ity information.
• VNID for the L2VNI and VNID for the L3VNI for the ten
ant VRF
• BGP next-hop IP ad
dress iden
ti
fy
ing the orig
i
nat
ing VTEP device
• Router MAC ad
dress of the orig
i
nat
ing VTEP device
The EVPN Type-2 route has an em bedded se quence num ber used for end point move-
ment track ing. When an end point moves from one VTEP to an other VTEP, the new VTEP
will de
tect it as a newly attached local host. It will send a new EVPN Type-2 rout ing up
-
date with the reach a
bil
ity in
for
mation for this end
point. When doing so, it will in
cre
ment
the sequence num ber by one. When the rest of VTEPs re ceive the new route with the
higher sequence num ber they will up date their routing infor
mation for the endpoint
using the new VTEP as the next hop.
As with tra
di
tional VLAN de ploy
ments, communi
cation be
tween end points be
long
ing to
sep
a
rate L2VNIs is pos
si
ble only through a Layer 3 rout
ing function.
Single-POD VXLAN Design 56
vlan 100
vn-segment 30000
vlan 101
vn-segment 30001
Once the VLAN-to-VNI map pings have been de fined, it is then re
quired to asso
ci
ate
those cre
ated L2V
NIs to an NVE log
i
cal in
ter
face, as shown in the config
u
ra
tion sample
below.
interface nve1
no shutdown
source-interface loopback0
host-reachability protocol bgp
member vni 30000
suppress-arp
mcast-group 239.239.239.100
member vni 30001
suppress-arp
mcast-group 239.239.239.101
In the de
f
i
ni
tion of the NVE logi
cal in
terface, the loop
back in
ter
face cre
ated as part of
the underlay con fig
u
ra
tion is spec i
fied to be used for VXLAN en capsu
la
tion and
decapsulation.
It is also re
quired to as soci
ate the EVPN con trol plane to the VXLAN de ploy
ment, in-
stead of the origi
nal flood and learn model. At the time of writ ing, this con
fig
u
ra
tion has
a global scope for a given VXLAN de ployment, hence, it is not possi
ble to mix the two
modes of op er
a
tion (con trol plane or flood and learn based) in the same Fabric.
Fi
nally, as part of the L2VNI con fig
uration, it is pos
si
ble to en able ARP sup pression. This
removes the need to flood ARP re quests across the Fab ric, which usu ally repre
sents the
large ma jor
ity of L2 broad cast traffic. ARP sup pres
sion can be en abled since each leaf
node learns about all the end points con nected to the Fab ric via the EVPN con trol plane.
When re ceiving an ARP re quest orig i
nated by a lo cally con nected end point trying to
identify the MAC of the re motely con nected end point, the leaf can then per form a
lookup in a local cache pop u
lated upon re cep tion of EVPN up dates. If the MAC/IP in -
for
ma tion for the remote end point is avail able, the leaf can then reply to the local end -
point with the ARP map ping in forma tion on be half of the re mote end point. If the
MAC/IP in formation for the re mote end point is not avail able, the ARP re quest is
flooded across the Fab ric by encap sulating the packet in a VXLAN frame des tined to the
multi
cast group as soci
ated to the L2VNI of the local end point. ARP sup pres
sion can also
be enabled or disabled on a per L2VNI basis.
Because most end points send ARP re quests to announce them selves to the net work
right after they come online, the local VTEP will im medi
ately have the op portunity to
learn their MAC and IP ad dresses and dis trib
ute this in
forma tion to other VTEPs
through the MP-BGP EVPN con trol plane. Therefore, most ac tive IP hosts in VXLAN
EVPN should be learned by the VTEPs ei ther through local learn ing or control plane-
Single-POD VXLAN Design 58
based re
mote learn
ing. As a re
sult, ARP sup
pres
sion re
duces the net
work flood
ing
caused by host ARP learn
ing behavior.
The log i
cal Layer 2 segment cre ated by map ping a lo cally sig
nif
i
cant VLAN with a glob-
ally sig
nif
i
cant L2VNI is nor mally associ
ated with an IP sub net. When end points con-
nected to the L2VNI need to com mu nicate with end points be long
ing to dif
fer
ent IP
subnets, they send the traf fic to their de fault gateway. De ploy
ing VXLAN EVPN al lows
support for a dis
tributed default gateway func tional
ity on each leaf node, a deployment
model com monly re ferred to as Distrib
uted Any cast Gate way. In a VXLAN de ploy
ment,
the vari
ous Layer 2 seg ments de fined by com bining local VLANs and global VNIs can be
asso
ci
ated to a VRF if they need to communicate.
Communi ca
tion be
tween local end
points con
nected to dif
fer
ent L2V
NIs can occur via
nor
mal Layer 3 rout ing in the con
text of the VRF (i.e. no VXLAN en cap
sula
tion is
required).
The deploy
ment of Sym metric In
te
grated Rout ing and Bridg ing (IRB), al
ready intro
-
duced in the Fun damen tal Concepts chap ter, re
quires the in tro
duction of a tran sit
Layer 3 VNI (L3VNI) of fer
ing L3 segmen ta
tion services per ten ant VRF. Each VRF in -
stance is mapped to a unique L3VNI in the net work. Differ
ent L2VNIs for the same ten -
ant are usu
ally as
soci
ated to the same VRF. As a re sult, the in
ter-VXLAN rout ing is per-
formed through out the L3VNI within a par tic
u
lar VRF instance.
The Sym metric IRB model as sumes that the de
fault gate
way for all the L2VNIs is fully
dis
trib
uted to all the leaf nodes. At the time of this writ
ing, the dis
trib
uted gateway
model is the only one sup ported with VXLAN EVPN and can be en abled by ap
plying the
config
u
ra
tion below on all the leaf nodes:
Single-POD VXLAN Design 59
vlan 100
vn-segment 30000
interface Vlan100
no shutdown
vrf member Tenant-1
ip address 192.168.100.1/24 tag 21921
fabric forwarding mode anycast-gateway
The ad
di
tional re
quired con
fig
u
ra
tion for each de
fined VRF is shown below.
Single-POD VXLAN Design 60
vlan 2500
name L3_Tenant1
vn-segment 50000
interface 2500
description L3_Tenant1
no shutdown
mtu 9216
vrf member Tenant-1
ip forward
interface nve1
member vni 50000 associate-vrf
moves from one VTEP to another VTEP, it doesn’t need to send an
other ARP re
quest to
re-learn the gate
way MAC address.
Host Connectivity
• Using an Ac
tive/Standby at
tach
ment mode, where the end
point lever
ages one or
more ac
tive links to one leaf switch and one or more standby links to a sec
ond leaf
switch. This en
sures the end point can sur vive the fail
ure of a single leaf switch and
regain net
work connec
tiv
ity sim ply by ac
ti
vating the standby links. This con fig
u
ra
-
tion does not re
quire any spe cific func
tional
ity to be sup
ported on the leaf, as nor -
mal Layer 2 learn
ing and for ward ing can be per formed to de liver traf
fic to the lo
-
cally con
nected endpoints.
• Using an Ac
tive/Ac
tive at
tach
ment mode, sta
tic or dy
namic bundling of phys
i
cal
in
ter
faces using Link Ag gre
ga
tion Control Pro
to
col (LACP). This en sures that all
avail
able links are al
ways ac
tive and used to send and re
ceive traf
fic. This model re-
quires that the leaf switches sup port a Multi-Chassis Link Aggregation (MC-LAG)
func
tion
al
ity to ap
pear as a single log
i
cal en
tity to the lo
cally con
nected end
points.
Cisco Nexus switches offer Vir tual Port-Chan nel (vPC) to achieve this.
In a VXLAN Fab
ric, there are some ad
di
tional as
pects to consider.
interface loopback0
description VTEP
ip address 10.254.254.102/32
ip address 10.254.254.1/32 secondary
This subopti
mal behav
ior can be avoided by group
ing end
points based on the types of
con
nectiv
ity (Active/Standby vs LACP) and con nect
ing them to sepa
rate sets of leaf
switches.
External Connectivity
for VXLAN Fabric
External Connectivity for VXLAN Fabric 66
Introduction
In addi
tion to ex
ter
nal con nec
tiv
ity, the VXLAN Fab ric will typ
i
cally be de
ployed into an
ex
isting data center envi
ron
ment, so in ter
oper
ability with the ex ist
ing net
work and the
abil
ity to mi
grate work loads to the new Fab ric will be very relevant.
This chap
ter provides de
tail on both Layer 2 and Layer 3 exter
nal connectiv
ity to the
VXLAN Fab ric and how to use those concepts to de
ploy a VXLAN Fabric into an ex ist
ing
data center.
External Connectivity for VXLAN Fabric 67
Layer 3 Connectivity
Multi-tenancy is one of the primary use cases for deployment of a VXLAN BGP EVPN
Fab
ric. Dif
fer
ent VRFs could be de fined and seg
mented as differ
ent orga
ni
zations, busi
-
ness units, mergers and acqui
si
tions, user-groups, ap
pli
ca
tions, or simply se
curity seg
-
menta
tion and policy enforcement.
In the con text of VXLAN BGP EVPN, each in stance (i.e. VRF/VLAN) is logi
cally isolated,
but phys i
cally in
te
grated into the overall Fabric as a shared infra
struc
ture. When ex -
tending Layer 3 con nectiv
ity out
side the VXLAN Fab ric, two dif
fer
ent scenarios are
usually considered:
1 Ex
tend the log
i
cal iso
la
tion be
tween VRFs into the ex
ter
nally routed do
main. This
sce
nario is typ
i
cally de
ployed when con
nect
ing the VXLAN Fab
ric to the cam
pus
net
work or to the WAN, as shown in the fig
ure below.
The bor
der node rep
re
sents the edge of the VXLAN Fab
ric and nor
mally ter
mi
nates
the VXLAN data plane en cap
sula
tion to pro vide Layer 3 hand-off func tion
al
ity to-
ward the edge router. The bor der node role could be im ple
mented on a leaf or
spine switch. The edge router takes care of ex tending multi-ten
ancy con nectiv
ity
across the ex
ter
nal network, leveraging one of the de ploy
ment options discussed in
the sec
tions below. It is worth not ing this model allows full sup
port for overlapping
IP ad
dress space across differ
ent ten ants, provid
ing end-to-end logi
cal isolation.
2 Pro
vide shared ac
cess to a com
mon ex
ter
nal ser
vice. This sce
nario al
lows dif
fer
ent
ten
ants to have com
mon ac
cess to shared re
sources such as the Internet.
The simple use case shown above does not allow over lapping IP address space across
dif
fer
ent ten
ants, as this merges all the routing in
formation into the “Default VRF” rout
-
ing table. As an extension to the previ
ous ex
am ple, ac
cess to shared re sources may be
provided by front-end ing each ten
ant with a security device. This pro
vides an en
force-
ment point for se curity pol
icy when a ten ant needs to ac cess ex
ter
nal re
sources or to
com muni
cate with other ten ants as shown in the fig ure below.
External Connectivity for VXLAN Fabric 69
1 Bor
der node on a leaf de
vice is termed bor
der leaf. This is a nat
ural choice as the
leaf nodes are de
ployed as VTEP de
vices ca
pa
ble of sup
port
ing the re
quired con
trol
plane and data plane func
tion
al
i
ties. De
ploy
ing the VTEP ca
pa
bil
i
ties only on the
leaf nodes keeps the con
fig
u
ra
tion on the spine switches much sim
pler. The spine
pro
vides the Fab
ric back
plane func
tion
al
ity, rout
ing VXLAN en
cap
su
lated traf
fic be
-
tween the leaf nodes. The bor
der leaf only ser
vices north-south communication.
2 Bor
der node on a spine de
vice is termed bor
der spine. This de
ploy
ment op
tion pro
-
vides the ad
van
tage of op
ti
miz
ing the north-south com
mu
ni
ca
tion with ex
ter
nal
re
sources. At the same time, it in
tro
duces the re
quire
ment to de
ploy a spine de
vice
that is ca
pa
ble of sup
port
ing VXLAN con
trol and data plane func
tion
al
ity (VTEP).
The bor
der spine will most likely also serve as BGP Route Re
flec
tor (RR) and Mul
ti
-
External Connectivity for VXLAN Fabric 70
cast Ren
dezvous Point (RP). The bor
der spine ser
vices north-south as well as east-
west communication.
A good net
work de sign al
ways pro vides resiliency and redun dancy for key network ele
-
ments. The bor der node per forms a key func tion, in
ter
connecting the VXLAN Fab ric to
the ex
ter
nal network do main, so it is crit
i
cal to ensure re
siliency. It is rec
om
mended to
de
sign the Fab ric with re
dundant bor der nodes and edge routers, each lever ag
ing re-
dundant physi
cal connec
tions, as shown below.
Regard
ing Layer 3 hand-off function
al
ity, it is a fair assump tion that the links be
tween
the border nodes and the edge routers are routed in terfaces. Depending on how Layer
3 commu ni
ca
tion is ex
tended out
side the VXLAN Fab ric, those Layer 3 in
terfaces could
External Connectivity for VXLAN Fabric 71
be ded
i
cated for each ten
ant or shared across mul
ti
ple ten
ants. The fol
low
ing sec
tions
pro
vide an overview of the dif
fer
ent de
ploy
ment options. All the sce
narios de
pict a bor
-
der leaf de
ploy
ment, but the same con sid
er
a
tions can be ap plied in the border spine
case.
VRF-Lite Hand-Off
The use of VRF enables the abil
ity to have mul ti
ple rout
ing ta
bles that are com
pletely
in
de
pendent and iso
lated. VRF-Lite represents a com mon and well-known mech a
nism
to ex
tend the ten
ant Layer 3 VRF infor
mation beyond the VXLAN Fabric.
• At the con
trol plane level, the bor
der node is re
spon
si
ble for ex
chang
ing per-ten
ant
rout
ing in
for
ma tion between the VXLAN Fab ric and the ex
ter
nal network. The bor-
der node runs IPv4 or IPv6 uni cast rout
ing for each of the ten
ant VRFs with the ex -
ter
nal edge rout ing de
vice to learn the external routes and to adver
tise the Fab
ric
sub
net/host routes to the ex ter
nal net
work. The bor der node also re dis
trib
utes
and adver
tises the exter
nal routes through MP-BGP EVPN to the in ter
nal nodes on
the Fabric.
• The rout
ing pro
to
col used to com
mu
ni
cate with the edge router can be BGP or an
IGP routing pro
tocol of your choice. When using BGP to peer with ex ternal routers,
MP-BGP EVPN au to
mat i
cally im ports the BGP routes learned from the VRF-lite
IPv4 or IPv6 unicast address fam ily into the L2VPN EVPN ad dress family. This rep
-
re
sents a com mon op tion adopted in many real world de ploy
ments. With other
rout
ing pro to
cols, re
distri
b
ution of routes is re quired to en sure routes are ex -
changed be tween the VXLAN Fab ric and the exter
nal router.
External Connectivity for VXLAN Fabric 72
interface Ethernet1/10.100
encapsulation dot1q 100
vrf member Tenant-1
ip address 192.168.5.254/30
vrf Tenant-1
address-family ipv4 unicast
advertise l2vpn evpn
In this ex
ample, the “ad
ver
tise l2vpn evpn” com
mand under the VRF IPv4 ad
dress fam
ily
ensures that:
Sim
i
lar con
fig
u
ration with the ex
cep
tion of the EVPN ad dress family spe
cific commands
must then be ap plied on the edge router to en sure the BGP session can be es tab
lished
with the border node.
• Phys
i
cal Routed Ports: this im
plies using a ded
i
cated phys
i
cal in
ter
face for each
tenant
• Sub-In
ter
faces: one log
i
cal sub-in
ter
face can be carved for each ten
ant to carry
traf
fic on the same phys
i
cal connection.
As shown above, it is im portant to note, that for each VRF, man
ual con fig
u
ra
tion is re
-
quired along the en tire Layer 3 path. Since VRF-lite needs to be con figured on a hop-
by-hop basis, scala
bil
ity be
comes a con cern for large numbers of ten
ants/VRFs; this is
the ad
vantage of an MPLS hand-off.
External Connectivity for VXLAN Fabric 74
MPLS Hand-Off
In many two-de vice de ployments, the edge router can act as an MPLS-provider edge
node. Alter
na
tively, a sin
gle device solution can be used to ter
minate MPLS and VXLAN
routing on the same de vice. This so lu
tion merges the bor der node and the MPLS
Provider Edge (PE) router func tion
al
i
ties into a sin
gle phys
i
cal de
vice, usu
ally re
ferred
to as the Bor
der PE node. This sce nario is depicted below.
This sec
tion sum
ma rizes the steps for con
fig
ur
ing the Border PE device de
ployed on a
Cisco NX-OS based plat form using man ual con
fig
u
ra
tion, with ref
er
ence to the sim
ple
net
work topology shown below.
External Connectivity for VXLAN Fabric 75
The sam
ple con
fig
u
ra
tion below shows a Bor
der PE ex
am
ple configuration.
Note: The additional route-target have to match the one used in MPLS L3VPN
for each VRF.
interface Ethernet1/10
ip address 192.168.5.254/30
ip router ospf MPLS-CORE
mpls ip
External Connectivity for VXLAN Fabric 76
LISP Hand-Off
In Active/Ac tive data center de ploy
ments, work load mo bil
ity al
lows ap pli
ca
tions to
move be tween ge ograph
i
cally dis
persed lo
ca
tions. This brings the chal lenge of ingress
route opti
miza tion when the work loads change loca
tion. Lo
cator/Iden ti
fier Separa
tion
Proto
col (LISP) solves this challenge by rout
ing the client traf
fic to the cor rect lo
cation
where the re sources are lo
cated. The routing in
for
mation for LISP does not add any ad -
di
tional prefixes to the un
derlay rout
ing domain.
External Connectivity for VXLAN Fabric 77
LISP is a di
rectory of ad
dresses and their lo
ca
tions, not a tra
di
tional rout
ing pro to
col.
LISP uses a de mand-based model where edge-de vices re
quest lo
ca
tion in
formation as
re
quired. This demand model is in contrast with the push model used by rout ing proto
-
cols and results in a re
duced load on the device's hardware ta
bles. LISP has other ad -
vantages noted below:
• Mo
bil
ity: EID portability
• Scal
a
bil
ity: On-de
mand routing
• Se
cu
rity: Ten
ant ID-based segmentation
• DCI: Ingress route optimization
In this sce
nario, the spine device acts as a LISP xTR. A LISP xTR refers to a device that
can act as both a LISP Ingress Tun nel Router (ITR) and a LISP Egress Tun nel Router
(ETR). With LISP, regu
lar IPv4/IPv6 host routes origi
nat
ing from the data center are not
adver
tised which helps op ti
mize the rout
ing table.
North-South Traf
fic with VXLAN Host in the POD
In this sce
nario, packet for
ward
ing in
volves two encapsulations:
1 LISP en
cap
su
la
tion be
tween the ex
ter
nal sites and the bor
der spine
2 VXLAN en
cap
su
la
tion be
tween the bor
der spine and the leaf
The fol
low
ing sce
nario dis
cusses host de
tec
tion and packet forwarding:
4 When re
mote sites want to talk to the data cen
ter hosts, they send an in
quiry to the
map
ping sys
tem re
quest
ing the lo
ca
tion of the host. The map
ping sys
tem replies
with the lo
ca
tion of the LISP site gate
way where the des
ti
na
tion EID is located.
5 Com
mu
ni
ca
tion is then es
tab
lished be
tween the re
mote client and the data cen
ter
host lever
ag
ing the LISP and VXLAN tech
nolo
gies as de
scribed earlier
Layer 2 Connectivity
There are two major use-cases for Layer 2 hand-off and con nectiv
ity. The first is for
mi
gra
tion sce
nar
ios, where the VXLAN Fab ric needs to be con nected to an ex ist
ing
non-VXLAN net work infra
struc
ture. The sec ond is the ex ten
sion of Layer 2 broad cast
do
mains between sepa
rate VXLAN Fab rics, referred to as multi-site.
A vPC border node pair on the VXLAN Fab ric can be used as re
dun
dant Layer 2 gate
way
for the hand-off. In this case, the two en
vi
ronments can be connected via a vPC with
out
in
tro
ducing loops to the extended Layer 2 networks.
In the il
lus
tra
tion above:
The border leaf switches are aware of the IP and MAC ad dresses of all the endpoints
connected to the VTEPs in the VXLAN Fab ric, so traf
fic re
ceived from the CE POD can
be VXLAN en capsu
lated and for
warded in
side the Fab ric to
wards the des ti
na
tion VTEP.
When a de
vice in the VXLAN Fab
ric sends a mul
ti
cast packet:
When a de
vice in the CE POD sends a mul
ti
cast packet:
Loop Prevention
As Layer 2 is extended out side the VXLAN Fab ric, it is important to re
mem ber that the
bor
der node par tic
i
pates in the VXLAN Fab ric, both from a con trol and data plane per -
spective. VXLAN does not cur rently provide any in tegration with Span ning Tree (STP),
mean ing VXLAN does not for ward BPDUs across the Fab ric. Therefore, es
tab
lish
ing re
-
dundant Layer 2 con nec
tions between the VXLAN Fab ric and the exter
nal network may
re
sult in the cre
ation of a loop as highlighted in the fig ure below.
External Connectivity for VXLAN Fabric 85
Layer 2 Interconnect
Using the tech
niques de
scribed ear
lier in this chap
ter, the new VXLAN Fab ric can be in
-
ter
con
nected with the ex ist
ing net
work, lever ag
ing vPC and loop pre vention tech-
niques such as BPDU Guard, Root Guard and storm con trol to de
liver a redun
dant
Layer 2 path be
tween the two environments.
This mi
gra
tion is a four-step process:
1 Dis
able the de
fault gate
way in the ex
ist
ing network
2 Con
fig
ure the gate
way IP ad
dress as a Dis
trib
uted Any
cast Gate
way in the new
VXLAN Fab
ric. By using the MAC ad
dress of the orig
i
nal de
fault gate
way, the end
-
points do not need to re-ARP for the new de
fault gateway
External Connectivity for VXLAN Fabric 90
3 En
sure that the sub
net is ad
ver
tised up
stream to the Layer 3 net
work core
4 Re
move the de
fault gate
way and rout
ing con
fig
u
ra
tion from the ex
ist
ing network
Re
learning the gate way's MAC ad dress is not re quired if the anycast gate way in the
VXLAN Fab ric can over take the same MAC ad dress that the old default gateway had.
One restric
tion is that, as it has been de scribed in previ
ous chapters, there is a single
MAC ad dress for the whole Fab ric for all Distrib
uted Anycast Gateway, so if the default
gate
ways in the legacy net work had mul ti
ple MAC ad dresses (for example if multi
ple
HSRP or VRRP groups were used), a mi gra tion where the MAC ad dress of the de fault
gate
way stays the same will not be possible.
Introduction
A VXLAN Fab
ric pro
vides Layer 2 and Layer 3 con
nec
tiv
ity; how
ever, ad
di
tional ser
vices
are required in the data cen ter. These services are provided by dedi
cated appli
ances
(physi
cal or vir
tual), and re
quire con
nectiv
ity to the fab
ric. These dedi
cated functions
are referred to as Layer4-Lay
er7 services.
Tradi
tional hi
er
ar
chi
cal net
work designs connect Layer4-Layer7 services at the ag
gre
-
ga
tion layer. Within a VXLAN Fab ric, Lay
er4-Layer7 appli
ances can be con nected to any
leaf switch or connected to a ded
i
cated leaf pair re
ferred to as a “ser
vice leaf".
• In
tru
sion De
tec
tion (IDS) / In
tru
sion Pre
ven
tion (IPS): The so
lu
tion de
tects at
-
tacks and prevents sys
tems from being com pro
mised. It also pre
vents a compro
-
mised system from origi
nat
ing sus
pi
cious net
work ac
tiv
ity. Ex
amples are net
work
re
connais
sance with ping sweeps and port scans.
• WAN Optimization: The goal of this ser
vice is to im
prove the user ex
pe
ri
ence
through tech niques such as op
ti
miza
tion of the TCP stack, com
pres
sion, and con
-
tent caching.
• Ap
pli
ca
tion De
liv
ery Con
trollers (ADC): The ADC in
cludes server load bal
anc
ing,
SSL offload and other appli
ca
tion function
al
ity. ADCs can be de
ployed by them
-
selves or in tan
dem with other service nodes.
Deployment Models
In addi
tion to the func
tional
ity of the Layer4-Layer7 ser
vices, an impor
tant fac
tor to
consider is how to deploy the ser vice ap
pli
ances. The fol
lowing sec
tion de
scribes dif
-
fer
ent deploy
ment mod els for Layer4-Layer7 services.
Virtual vs Physical
Layer4-Lay
er7 ser
vices come in dif
fer
ent form factors in
clud
ing physi
cal and vir
tual ap
-
pli
ances. There are cer
tain con
sid
era
tions re
quired for vir
tual ap
pli
ances, in
cluding the
following:
• With vir
tual ap
pli
ances, there is typ
i
cally a vir
tual switch be
tween the phys
i
cal leaf
and the VM host
ing the ser
vices appliance
• Vir
tual ser
vices have dif
fer
ent NIC re
dun
dancy mod
els; these func
tions are pro
-
vided by the hypervisor
Layer4-Layer7 Services 94
The deci
sion whether to use virtual or phys
i
cal ap
pli
ances re
quires ad di
tional con
sid
er
-
a
tions in
cluding that phys
i
cal ap
pli
ances are gen er
ally spe
cial
ized hard ware which of-
fers better perfor
mance than generic x86 plat forms, particu
larly with en cryp
tion
services.
Transparent vs Routed
There are two de ploy
ment models with ser vice appli
ances, trans
parent mode and
routed mode. In trans
par
ent mode, the ser
vice ap
pli
ance is de
ployed as a bump-in-the-
wire and does not change any MAC in for
mation. With transpar
ent mode, a fail safe
mecha
nism needs to be im
ple
mented to prevent Layer 2 data plane loops.
On the other hand, routed de ployments are not prone to Layer 2 loops be cause they
fol
low IP routing se
man tics. Lay
er4-Layer7 ap
pli
ances inserted in routed mode can par -
tic
i
pate with dy namic rout ing proto
cols. The ben
e
fit of imple
ment ing a dy
namic rout-
ing proto
col is that it al
lows for Route Health Injec
tion (RHI) that influ
ences the ingress
routing path to the ser vices appliance.
The fol
low
ing fig
ure il
lus
trates the one-arm de
sign option.
Physical Connectivity
Layer4-7 ser
vices have dif
fer
ent con
nec
tiv
ity and re
dun
dancy de
ploy
ment mod
els, as
dis
cussed below.
• No redundancy: one log
i
cal in
ter
face maps to one phys
i
cal in
ter
face, re
sult
ing in a
sin
gle net
work connection
• Re
dun
dancy at the NIC level (port-channel): one log
i
cal in
ter
face maps to mul
ti
ple
phys
i
cal in
ter
faces. These two inter
faces are con
fig
ured as a sin
gle port-chan
nel
con
nected to a sin
gle leaf switch
• Re
dun
dancy at the NIC and switch level (vPC): one log
i
cal in
ter
face maps to mul
ti
-
ple phys
i
cal in
ter
faces. These two in ter
faces are con
fig
ured as a sin
gle port-chan-
nel con
nected to two dif fer
ent leaf switches. The two dif
fer
ent switches are imple
-
mented as a vPC pair.
Layer4-Layer7 Services 96
Redundancy Model
Dif
fer
ent re
dundancy models will have an im
pact on how the net
work will be
have in
case of an Lay
er4-Lay
er7 ap
pli
ance outage:
• No redundancy: This mode is some
times used for non-crit
i
cal en
vi
ron
ments, and is
typ
i
cally de
ployed in con junc
tion with vir
tual Lay
er4-Lay
er7 ap
pli
ances that lever
-
age High Availabil
ity fea
tures of the hypervisor.
• Ac
tive/Standby: Two Lay
er4-Lay
er7 ap
pli
ances are de
ployed, and one of them
handles all traf
fic. When the ac tive device fails, the standby device will become ac-
tive. The net work con verges away from the failed ap pli
ance while the pre vi
ous
standby node be comes ac tive. With the ac tive / standby model, traf fic flows are
deter
minis
tic and this simpli
fies the for
ward ing path through the network.
• Clus
ter
ing (Active/Active): There are two dif
fer
ent mod
els of clus
ter
ing, where all
ser
vices appli
ances are serving the workload. While one model uses the ap
proach
of a local port-chan nel per services ap
pli
ance, the sec
ond model repre
sents the
ser
vices clus
ter as a sin
gle port-channel.
Use Cases
In this de
sign, the VXLAN Fabric pro
vides a Layer 2-only ser
vice. All com
mu
ni
cation
that requires cross
ing the Layer 2 demar
cation must be sent to the fire
wall to be
routed.
For example:
vlan 1100
name WEB
vn-segment 30100
vlan 1101
name APPLICATION
vn-segment 30101
vlan 1102
name DATABASE
vn-segment 30102
For ex
am ple, an ASA fire
wall with four phys
i
cal ports grouped in two log
i
cal port-
channels:
int po10.1100
vlan 1100
nameif WEB
security-level 100
ip address 192.168.110.1 255.255.255.0
int po10.1101
vlan 1101
nameif APPLICATION
security-level 100
ip address 198.168.111.1 255.255.255.0
Layer4-Layer7 Services 100
int po10.1102
vlan 1102
nameif DATABASE
security-level 100
ip address 198.168.112.1 255.255.255.0
int po20
nameif OUTSIDE
security-level 50
ip address 192.168.100.255 255.255.255.0
The firewall becomes the sin gle point for in ter-subnet com mu nication in the fab
ric,
conse
quently, it is important to prop erly size the appli
ance for resilient, per
for
mance,
and scale reasons. When a fail ure occurs in an active/standby de ploy ment, the newly-
ac
tive fire
wall will no
tify the network of the change, nor mally send ing GARP (gra tu
itous
ARP) or RARP (re verse ARP) pack ets. These will trigger the re-learn ing of the MAC ad -
dresses on the ports con nected to the standby firewall.
From a log
i
cal standpoint, the fab
ric is the default gate
way for the servers. For exam
ple,
the servers are deployed in the 192.168.100.
0/ 24 subnet and the VXLAN Fab ric any
cast
gate
way is config
ured as the server's de fault gate
way of 192.168.100.1.
Layer4-Layer7 Services 101
Ex
am
ple:
vlan 100
name UnProtected-SVI
vn-segment 30000
vlan 1100
name Protected-VLAN
Layer4-Layer7 Services 102
vn-segment 31000
interface Vlan100
no shutdown
vrf member Tenant-1
no ip redirects
ip address 192.168.100.1/24 tag 21921
fabric forwarding mode anycast-gateway
In this con
figu
ra
tion, VLAN 100 (unpro
tected) is the out
side in
ter
face and VLAN 1100
(pro
tected) is the in
side interface.
The fire
wall con
fig
u
ra
tion to stitch VLAN 100 to VLAN 1100 would be as follows:
firewall transparent
int po10.100
vlan 100
nameif sviVLAN
bridge-group 1
security-level 0
int po10.1100
vlan 1100
nameif serverVLAN
bridge-group 1
security-level 100
SVIs are defined on VTEP for both IN SIDE-VRF and OUT SIDE-VRF and the VTEP will
peer with a fire
wall on each of these VRF to dy
nam
i
cally learn rout
ing in
for
ma
tion to go
from one VRF to the other.
FIREWALL Configuration:
int po10.3001
vlan 3001
nameif OUTSIDE
security-level 50
ip address 10.30.1.2 255.255.255.252
VTEP A Configuration
Traf
fic from VTEP 1 will be en capsu
lated to
wards VTEP A, de cap
sulated and sent to the
fire
wall. The firewall enforces the policy and sends the traf
fic back to VTEP A on the IN-
SIDE-VRF. VTEP A will en capsu
late the traf
fic and send it to the des ti
na
tion VTEP 2
where traffic is de
cap sulated and sent to the endpoint.
Firewall Failover
When the ac tive fire
wall fails and the standby fire
wall takes over, routes are with
drawn
from ser vices VTEP A. As the pre vi
ous standby be
comes ac tive, routes are now adver
-
tised to the fabric through ser vices VTEP B.
If it is not de
sir
able to run a dy
namic rout
ing pro
to
col on the fire
wall, there is a need
for sta
tic routes point ing to the fire
wall as next hop. It is crit
i
cal to en
sure that only the
VTEP serv ing the ac
tive fire
wall is ad
vertis
ing the sta
tic route.
The first way to ac com plish this task is to track ac
tive fire
wall reacha
bil
ity by val
i
dat
ing
it is lo
cally learned via HMM (Host Mo bil
ity Manager). The sec ond approach is to con -
figure the sta tic route at all the com pute VTEPs in stead of the services VTEPs. Both ap -
proaches are in troduced to en sure that only the route to wards the ser vice VTEP with
the ac tive firewall is used.
The ap proach using HMM track ing ensures that if the ac tive fire
wall is con
nected to
VTEP A, only VTEP A will have and ad ver
tise the static route. VTEP A will track how the
sta
tic route's next hop (firewall IP) is learned. Only if the next hop is learned as an HMM
route (directly con nected), VTEP A will ad vertise the static route through re dis
tri
b
u
-
tion. If the active fire
wall fails and the standby takes over, VTEP A starts to learn the
next hop IP through BGP and VTEP B starts to know the fire wall’s IP address as next
hop through HMM. VTEP A will then with draw the tracked routes and VTEP B starts ad -
ver
tis
ing its routes into the fabric.
For example:
• Client traf
fic en
ters the in
ter
face pre
sent
ing the vir
tual IP ad
dress (VIP)
• ADC de
cides which real server to send the re
quest to
• ADC then trans
lates the des
ti
na
tion ad
dress, which was pre
vi
ously the VIP, with the
IP ad
dress of the real server.
• The re
quest to
wards the real server is ex
it
ing the same in
ter
face as the client re
-
quest came from
• The source IP ad
dress is trans
lated via source NAT
• The real server will see the ADC IP ad
dress as the source IP
ad
ver
tised in the VXLAN Fab
ric as an EVPN Type-2 route. Al
ter
na
tively, the ADC can be
im
ple
mented with a dy
namic rout
ing pro
to
col and ad
ver
tise the VIP as an EVPN Type-5
route.
Traf
fic flow is as follows:
• Client traf
fic will be en
cap
su
lated by VTEP 1 to
wards ser
vices VTEP A
• VTEP A de
cap
su
lates and sends the traf
fic to the ac
tive ADC
• The ADC sends the traf
fic des
tined to the real server back to ser
vices VTEP A
• VTEP A en
cap
su
lates and sends the traf
fic to the des
ti
na
tion VTEP 2
• Traf
fic gets de
cap
su
lated at VTEP 2 and sent to the real server
• The re
sponse back from the real server is sent back to the ADC, since the ADC per
-
formed source NAT
Layer4-Layer7 Services 111
Con sid
er
a
tion needs to be given to place ment of the de
vices so traffic does not take ex -
cessive hops across the fab ric when going between the fire
wall and the load bal ancer. It
is com mon to have mul ti
ple fire
walls and ADCs con nected to a ded i
cated pair of
switches as service nodes. The place ment of the appli
ances in the VXLAN Fab ric is con
-
sol
i
dated to a pair of ser
vices under a service node pair. Traf
fic flow is as follows:
• Traf
fic will be VXLAN-en
cap
su
lated from the client VTEP 1 to
wards the ser
vices
VTEP A.
• The ser
vice VTEP re
spon
si
ble for the ac
tive fire
wall de
cap
su
lates and sends the
traf
fic to the ac
tive firewall.
• The fire
wall then sends the traf
fic to
wards the ADC's VIP ad
dress. This is done with
the assumption that the fire
wall and the ADC are con nected to the same ser
vice
VTEP. If fire
wall and ADC are on dif fer
ent VTEPs, traf
fic will be VXLAN-en
cap
su
-
lated to
wards the service VTEP hosting the ADC.
• ADC then sends the traf
fic des
tined to the real server back to the ser
vices VTEP,
which en
cap
su
lates and sends it to the des
ti
na
tion VTEP 2.
• Traf
fic gets de
cap
su
lated at VTEP 2 and sent to the real server.
• The re
sponse back from the real server is sent back to the ADC as the ADC is using
source NAT. With the usage of source NAT, the X-For warded-For HTTP header
field is going to be in serted to pre serve client IP ad
dress vis
i
bil
ity. Sub
se
quently,
the traffic will be in
spected by the fire
wall on its way back to the client.
Layer4-Layer7 Services 112
The di
a
gram below shows a log
i
cal rep
re
sen
ta
tion of a ser
vice chain.
The di
agram below shows a phys i
cal rep
re
sen
ta
tion of VXLAN Fabric with a dedi
cated
ser
vice VTEP pair. Fire
walls and ADCs are com monly con nected to the ser
vices VTEPs.
This can be achieved with or with
out vPC (vPC shown in diagram).
To avoid addi
tional en
cap
su
la
tions and de
capsu
la
tions, affin
ity can be cre
ated be
tween
the ac
tive fire
wall and the active ADC, and they can be placed on the same ser vices
VTEPs.
Multi-POD &
Multi-Site
Designs
Multi-POD & Multi-Site Designs 115
Introduction
In an increas
ingly compet i
tive, globally con nected business en
vironment, or gani
za
tions
are faced with enor mous pres sures to en sure con tin
u
ous avail
ability of crit
i
cal business
appli
ca
tions. With dig i
tal strate gies dri ving innov
a
tive new busi ness op por tu
ni
ties,
these or ga
ni
za
tions are look ing for IT infrastruc
tures that offer the agility, per formance
and avail
abil
ity re
quired to sup
port these new ap
pli
ca
tion infrastructures.
As a conse
quence, data center networks are building built as scal
able, highly avail
able
net
work fabrics which are dis
trib
uted across mul ti
ple data centers, whether sepa
rated
within or across a metro area, or across the globe.
Fundamentals
A Point of Deliv
ery (POD) is a network build
ing block which can eas ily be repli
cated
within a data center. The pre
dictable and ho
mogeneous char ac
ter
is
tics of a POD pro -
vide self-con
tainment and a pre-as signed scale and per for
mance re quire ment (POD
plan
ning). The archi
tec
ture of a POD should be mod u
lar to allow for it to be replicated
and in
ter
con
nected, keep
ing a ho
mo
ge
neous design.
In classic hi
er
ar
chi
cal net
work de sign, the POD is formed by the Ac cess and Aggrega
-
tion Layer, where the Ag gre
gation Layer pro vided the Layer 2 de mar
ca
tion. Layer 2
traf
fic is ter
mi
nated and routed across the Core to reach other PODs or ex ter
nal net
-
works. With the de marca
tion at the Aggre
gation Layer, a Layer 2 VLAN or an IP Subnet
Multi-POD & Multi-Site Designs 117
is lo
calized within a sin
gle-POD, there
fore Layer 2 commu nica
tion between PODs is not
pos si
ble. As a con
se
quence, host mobil
ity across PODs is dif
fi
cult to implement.
The in
ter
connec
tion within a multi-POD site can be achieved in var i
ous ways. Spines
can be in
ter
con
nected back to back, an ad
di
tional su
per-spine layer can be in
tro
duced,
or PODs can be inter
con
nected at des
ig
nated leaf switches.
In contrast to a sin
gle site de
ployment a net working solu
tion for mul
ti
ple sites must
also address the need to main tain a level of sep
a
ra
tion. Any event whether planned or
un
planned im pact
ing one site should not spread to any other site as it would im
pact
over
all ap
pli
ca
tion availability.
De
sign cri
te
ria to be con
sid
ered for such de
ploy
ments include:
• Phys
i
cal Connectivity: In many cases, given the con
straints out
lined above, the
avail
ability of connectiv
ity services may be lim ited. As an exam ple, dark fiber or
wave length ser vices availabil
ity may be limited or cost-prohibi
tive over large dis-
tances, whereas a routed Layer 3 or MPLS ser vice may be read ily avail
able at an
achiev able price point. The de sign must take into con sid
er
a
tion the need to allow
for mul ti
ple connection types rang ing from high band width dark fiber through to
bandwidth-con strained service provider-de liv
ered Layer 3 services.
• Fault Isolation: When con
nect
ing mul
ti
ple dis
crete net
work en
vi
ron
ments to
-
gether, the risk of a failure event prop a
gat
ing be
tween sites increases signif
i
cantly
unless controls are applied to restrict the control plane and data plane activ
ity. Ex
-
am ples in
clude selec
tion and con fig
u
ration of control plane pro
to
cols such as BGP,
and the con trol or re stric
tion of data plane ac tiv
ity such as ARP sup pres
-
sion/spoof ing and storm control.
Multi-POD & Multi-Site Designs 121
In sub
se
quent chap ters the op
tions for multi-POD and multi-site de ploy
ment are ex -
plored fur
ther, in
cluding back-to-back vPC, OTV, and PBB-EVPN for a com prehensive
DCI so
lu
tion in order to maintain con
trol plane and data plane iso
la
tion and at the same
time pro
vide work load mobility.
Multi-POD & Multi-Site Designs 122
Multi-POD Design
Scalability
When de
sign
ing for con
trol plane scale for an in
ter-POD Fab
ric, plat
form OIF, mul
ti
cast
groups, and VTEPs need to be con sid
ered in addition to host MAC and MAC/IP. It is
im
por
tant to look at the hard
ware ver
i
fied scal
a
bil
ity guidelines.
For ex ample, in a simple multi-POD sce nario, if the spine supports 256 OIF, then sub -
tract 2 OIF for the up link to
wards the L3 core, leav ing 254 OIFs for southbound con nec
-
tiv
ity to the leafs in the vPC domains. This would give 254 leaves, or 127 vPC do mains, to
con nect south bound if each leaf in the vPC domain has a single link to each spine in the
POD.
Looking closer at the above ex ample, both vPC VTEP switches in dependently send the
IP PIM regis
ter to the Rendezvous Point for the mul ti
cast group of the VXLAN VNI. Both
source the reg is
ter packets from the any cast VTEP ad dress and each installs the cor
re
-
sponding (*, G) entry in their mul ti
cast rout
ing ta
bles with the VTEP inter
face (NVE1) in
the out
put interface (OIF) list.
In addi
tion, consid
er
a
tion needs to be given to host MAC and IP scale per leaf. A leaf
will learn all BGP routes across the multi-POD en vi
ronment but will not program the
hard ware tables Forwarding Infor
mation Base / Routing Infor
ma
tion Base (FIB/RIB)
unless the leaf needs to know about them. If the leaf knows about the VRF and is im -
porting the route-tar gets it will pro
gram the RIB for the MAC/IP routes. In ad di
tion,
the leaf only programs the FIB with the MAC ad dress of the VNIs of the VRFs it has lo
-
cally defined.
IP Gateway Localization
In net
works with out Distrib
uted Any cast Gate way the de fault gate way is made re dun -
dant through the use of a First Hop Re dun
dancy Pro to
col (FHRP). When a net work seg -
ment spans across mul tiple physi
cal lo
ca
tions, the same con cept can force all traf fic
through sin gle VTEP. Al terna
tively you can pro vide local
ization by hav ing an active in-
stance of the de fault gateway in each lo ca
tion. Using lo calization provides an more op -
ti
mal for
ward ing path be tween sub nets within the same lo cation. If ap
pli
ca
tion work -
load mobil
ity is re
quired be tween locations, it is important to main tain the same de fault
gate
way IP and MAC ad dress. With gate way lo caliza
tion, end points do not need to re-
learn these information at the new location.
Multi-POD & Multi-Site Designs 126
In the il
lus
tra
tion above, the VTEP leafs in blue are Dis
trib
uted Any
cast Gate
way for
Layer 2 VNI "Blue" while the VTEP leafs in green are Distrib
uted Any
cast Gate
way for
Layer 2 VNI "Green".
Often, switch hard ware platforms with more con trol plane capacity and higher band -
width are cho sen for the spine layer. Also, due to their cen tral
ized loca
tion in a POD,
the spine nodes are often cho sen as the control point for MP-BGP EVPN route dis tri
b
u
-
tion. For ex
ample, in a MP-iBGP Fab ric, the spine nodes are often cho sen to be the iBGP
route reflec
tors. In this case, peer
ing on the spine nodes be tween PODs can take ad -
vantage of the more scal able con
trol plane and the com plete set of EVPN rout ing in
for
-
mation on the spine nodes.
MP-iBGP vs MP-eBGP
MP-BGP EVPN dis trib
utes the Layer 2 and Layer 3 reach a
bil
ity in
for
mation for the
VXLAN over lay network. It sup
ports both iBGP and eBGP topol ogy, which pro vides the
design flex
i
bil
ity to run MP-BGP in a multi-POD en vi
ronment. It is not within the scope
of this book to doc u
ment all the pos
si
ble combi
nations of iBGP and/or eBGP de signs in
a multi-POD Fab ric. The com mon practice designs will be discussed to illus
trate the
design principles.
The Figure below de scribes a com mon multi-POD de sign in which each POD runs MP-
iBGP EVPN be tween leafs and spines whereas MP-eBGP EVPN is used to in ter
con
nect
the PODs. The draw ing does not in di
cate any physi
cal topol
ogy for connect
ing mul
ti
ple
PODs to gether rather, it de picts the peering topol
ogy. Con cep
tu
ally, the Route Re
flec-
tors (RR) of dif
fer
ent PODs are ex changing EVPN routes via MP-eBGP so that reach abil
-
ity in
for
mation can be ex tented from one POD to another.
Multi-POD & Multi-Site Designs 128
• By de
fault, a router over
writes the next-hop in the route to it
self when send
ing a
route to its eBGP peers.
• If each AS gen
er
ates EVPN route-tar
gets (RT) au
to
mat
i
cally, they may end up hav
-
ing dif
fer
ent RTs for the same L2VNI or L3VNI as often the auto-RT func tion uses
the BGP AS num ber as one of the el
e
ments to derive EVPN RTs. So ad di
tional cau-
tion needs to be applied when config
ur
ing the EVPN RT im port and ex
port policies
to ensure the routes within the same VNI shall have the same im port/ex port RTs
on VTEPs with differ
ent PODs so that the route dis tri
b
u
tion can be complete end-
to-end.
Multi-POD & Multi-Site Designs 129
An
other de
sign is to use a sin
gle BGP AS across all PODs so that the multi-POD Fab
ric
runs EVPN MP-iBGP.
• iBGP by de
sign pre
serves BGP next-hop. There
fore, when an EVPN route is dis
trib
-
uted within an iBGP topol
ogy, the orig
i
nat
ing VTEP ad
dress will be pre
served in the
BGP next-hop.
• iBGP does not change the EVPN Route-Tar
get (RT) value while dis
trib
ut
ing the routes.
Multi-POD & Multi-Site Designs 130
• Auto-RT func
tion will gen
er
ate the same EVPN RT for the same VNI across dif
fer
ent
PODs. This ensures that VTEPs in dif
fer
ent PODs will have con
sis
tent im
port and
ex
port RT value for the same VNI.
MP-eBGP MP-iBGP
Cabling
Ingress replica
tion can have scale is sues as the switch needs to repli cate BUM pack ets
as many times as there are VTEPs that own the VNI need ing to see that traffic. As an
ex
ample, with 50 VTEPs that own the same VNI that re quire BUM traf fic, repli
ca
tion
needs to be per formed 50 times. Repli cated BUM trans missions con
sume a lot of band -
width in the net work. In contrast, IP mul ti
cast across a multi-POD en vironment is a
much more scal able so
lu
tion to han dle BUM traf fic as the fab
ric na
tively pro vides the
ca
pa
bil
i
ties for the required repli
ca tion. IP multi
cast reduces network load, im proves
per
for
mance, and in creases scal
a
bility across multi-POD environments.
More de
tails of con
fig
ura
tion ex
am
ples can be found at http://
www.
cisco.
com/
c/
en/
us/
support/ docs/ip/ip-multicast/
115011-anycast-pim.html
The un derlay can be built with any rout ing pro to
col. BGP may not be the best choice as
an un der
lay proto
col as it is a dis
tance vec tor rout
ing protocol and it does not take into
account link speed or path cost, and in a multi-POD en vi
ronment mul ti
ple paths with
dif
ferent link speeds might be used to in ter
con nect the PODs. Dri ving sim plic
ity in the
rout ing design in the under lay will help to im prove overall conver
gence in the over lay.
Tun ing IGP timers may help im prove con ver
gence time, how ever, there is no generic
recom men da
tion, and this must be qual i
fied and vali
dated for each deployment.
Service Integration
In a multi-POD design, it is a rec
om
mended prac tice to have all the ser
vices in
fra
struc
-
ture such as fire
walls or load bal ancers con nected to a sep a
rate services node POD.
This helps with scal
a
bil
ity and high avail
abil
ity for services across a multi-POD design.
Multi-POD & Multi-Site Designs 135
Design Options
These de
sign con
sid
er
a
tions in
clude the fol
low
ing aspects.
• De
ter
min
ing the In
ter-Site Bor
der Con
nec
tion Points: The Fab
ric bor
der pro
vides
an edge func tion to allow for exter nal con
nectiv
ity in and out of the Fabric and also
provides an attachment point for the DCI ser vices which de liver the re
quired inter-
site connec
tivity. Al
though the Fab ric bor
der for Layer 2 Layer 3 Ex ter
nal Connec-
tiv
ity and DCI ser vices have sim i
lar char
ac
teris
tics, they may or may not be com -
bined de pending on fac tors de
tailed in the Ex
ternal Con nec
tiv
ity chapter.
• DCI Ser
vice De
liv
ery: An ap
pro
pri
ate se
lec
tion of DCI ser
vice will be a pri
mary fac
-
tor in the multi-site de
sign as each will have dif
fer
ent prop
er
ties as ex
plained fur
-
ther in the Ex
ter
nal Con
nec
tiv
ity chapter.
• L3 ser
vices in
clud
ing L3VPN, VRF Lite, LISP or VXLAN
• L2 ser
vices in
clud
ing Eth
er
net over Dark Fibre/DWDM, OTV, PBB-EVPN, MPLS
EVPN, VPLS or VXLAN
Multi-POD & Multi-Site Designs 136
• Vir
tual Net
work Iden
ti
fiers (VNI) - Layer 2 and Layer 3
• MAC Addresses
• IP Host Routes (IPv4/IPv6)
Multi-POD & Multi-Site Designs 138
A contin
u
ously avail
able, ac
tive/ac
tive, flex
i
ble en
vi
ron
ment pro
vides sev
eral ben
e
fits
to the business:
• In
creased uptime
• Dis
as
ter avoidance
• Eas
ier maintenance
• Flex
i
ble work
load placement
• Ex
tremely low RTO
It is important to re
mem ber that host reach a
bil
ity in
for
mation is con
tained within a
single site and extended using a DCI technology. The Layer 3 di a
grams below demon -
strate independent con
trol planes in each site and will high
light how to extend Layer 2
connectivity.
Multi-POD & Multi-Site Designs 139
Layer 2 exten
sion must be dual homed for re dundancy while pro hibit
ing end-to-end
Layer 2 loops that would lead to traffic storms causing link overflows, satu
rate switch
CPUs and virtual machine CPUs. This is why in Data Cen ter In
ter
con nect deploy
ments,
one key com ple
mentary fea
ture to Layer 2 ex
ten
sion is storm control.
New function
al
i
ties are being added to the VXLAN con trol plane which would make it a
very vi
able DCI so lu
tion in the fu
ture. This is fur
ther discussed in the in
tro
duc
tion
chapter.
As de
picted in the dia
gram below, logi
cal back-to-back vPC connec
tions are used be-
tween the VXLAN bor der leaf nodes and the local pair of VXLAN DCI de
vices to in
ter
-
con
nect multi
ple sites.
Multi-POD & Multi-Site Designs 143
MPLS-Based Approach
Multi-POD & Multi-Site Designs 144
MPLS-Based Approach
This ap proach uses a sin gle de vice to in ter
connect multi-site VXLAN fab rics and
achieve segmen ta
tion using MPLS L3VPNs. The sin gle device called Bor der PE can be
used to termi
nate MPLS and VXLAN rout ing on the same de vice. The Ex ter
nal Con nec
-
tiv
ity chap
ter pro
vides addi
tional details about using a MPLS hand off to the VXLAN fab -
ric. The same princi
ples can be used to pro vide multi-site in
ter
connectiv
ity as well.
LISP-Based Approach
The third ap proach for interconnect
ing multi-site VXLAN fab rics is LISP. It of
fers the
same seg men ta
tion bene
fits as MPLS and can be used as an al ter
native so
lu
tion. The
External Con nec
tiv
ity chapter pro
vides ad
di
tional de
tails about using a LISP hand off to
the VXLAN fab ric. The same prin ci
ples can be used to pro vide multi-site inter
con nec-
tiv
ity as well.
Operations &
Management
Operations & Management 146
Introduction
For the last 20 years, networks have been man aged as in
de
pen dent ele
ments lever aging
pur
pose-built pro to
cols and inter
faces, such as Simple Network Man agement Pro tocol
(SNMP), Com mand-Line In terface (CLI), and Net Conf, to name a few. These pro to
cols
have served net work ad minis
trators well, and have mostly ful filled their ob
jec
tives for
Fault, Config
u
ration, Accounting, Perfor
mance and Se cu
rity man age
ment tasks (also
known as the FCAPS frame work). How ever, to meet the new scale re quirements, the
net
work has to be viewed and man aged as a system to enable faster and more con sis
-
tent de
liv
ery of services.
Sev
eral years ago, the server in
dus
try, dri
ven by scale re
quire
ments, went through the
same tran
si
tion. Server teams were faced with the need to man age large pools of re
-
sources that drove the need for more au to
mated con figu
ra
tion man age
ment tools.
Today, server man age
ment teams lever age popular con figu
ra
tion man age
ment tools
such as Puppet, Chef or Ansi
ble. These tools are chang ing orga
ni
zational processes
which sup
port agile de
vel
op
ment and De vOps initiatives.
Management tasks
Mul
ti
ple tra
di
tional frame
works exist to de
fine what the op
er
a
tions of IT in
fra
struc
ture
entail, such as IT In fra
struc
ture Library (ITIL) or FCAPS. Some or ga
niza
tions have
started to in cor
porate IT oper
a
tional prac
tices from other areas of the industry such as
appli
cation devel
op
ment taken from De vOps (De vel
op
ment + Oper
ations), as is cov
ered
in the next section.
• Day 0: install
• Day 1: configure/optimize
• Day N: up
grade and monitoring
Day 0
Operations & Management 149
Day 0
Tradi
tion
ally, Day 0 activ
i
ties have included in stalling the de vice into a rack, powering it
up, some basic boot strap con fig
ura
tion, and op tionally, updating the firmware. This is
how many or gani
za
tions have dealt with Day 0 tasks until now. As a con se
quence, it is
not un common to see a net work with mul ti
ple versions of soft ware deployed and dif -
fer
ing standards of con fig
ura
tion. In order to re duce the in con sis
ten
cies in the network
when the equip ment is de ployed, au toma tion of the ini tial deployment is a crucial first
step. This pro vides a solid foundation for suc cessful network operation.
Day 1
After the base con fig
u
ra
tion and com mon software re
leases have been de ployed across
the Fabric, the next step is to provi
sion the over
lay and de
vice-specific config
u
ra
tions.
These con fig
ura
tion steps in
clude items such as MP-BGP, Mul ti
cast, VNIs, VRFs, VLANs,
Anycast Gate way and core capabilities.
For this phase a few op tions exist to help au to
mate con fig
u
ration deployment. There
are tools such as Cisco Prime Data Cen ter Net work Man ager (DCNM), Cisco Nexus Fab -
ric Manager (NFM), Python scripts or script ing languages that can con fig
ure the de vices
di
rectly or via API. Another option is con figu
ra
tion man agement tools (CMT) such as
Puppet, Chef, and An si
ble that deliver con fig
u
ra
tion standard iza
tion. Instead of just
pushing config
u
ra
tion com mands to the switches, CMT checks the run ning con figu
ra
-
tion and updates changes to the con fig
u
ra tion. This al
lows the cre ation of man i
fests,
recipes, or playbooks with the de sired end state of the spe cific el
ements in the net -
work. For ex ample, the spine switches would have a very dif ferent config
uration than
the leaf switches, but the leaf switch con figura
tions would likely be very sim i
lar to one
another across the fabric.
As a result of vir
tu
al
iza
tion and cloud pro vi
sion
ing, an
other item to consider is VMM in-
te
gration. Whether or not the con fig
u
ra
tion of a switch should be dy nami
cally mod
i
fied
based on a trig ger event is dis
cussed at length in the Software Over lay chapter.
Operations & Management 150
Day N
Once the net
work is con
fig
ured, run
ning, and op
ti
mized, changes and soft
ware up
-
grades to the Fab ric will be needed. CMT so lu
tions can au
to
mate soft
ware up
grades
and config
u
ra
tion changes to multi
ple devices.
Another impor
tant Day-N task is con
fig
u
ra
tion backups, revi
sion con
trol, and the abil
ity
to roll back to a pre
vi
ous snap
shot. This tra
di
tional config
ura
tion Management can be
done with tools such as DCNM, NFM, the afore mentioned CMT so lu
tions, or with open
source tools such as RANCID.
Mon i
tor
ing the net work and re acting to events is a criti
cal part of Day N op er
a
tions.
Tradi
tional network man agement tools used SNMP to mon i
tor device parameters such
as inter
face utiliza
tion or avail able mem ory. With NX-OS pro gram mabil
ity functions,
using new tools, such as Car bon/Graphite, Zenoss, or Splunk en ables access to richer
in
for
ma tion. Linux-based mon i
tor
ing agents can be in stalled natively on the switch. Ex -
amples such as OpenTSDB (http:// opentsdb.net/) provide a col lec
tor agent which
sends infor
ma tion to a central repos i
tory for consolidation.
Vis
i
bil
ity is an
other impor
tant Day-N func tion. Tra
di
tional vis
i
bil
ity tools are still avail
-
able with a VXLAN-based so lu
tion in
cluding network TAPs, switch port an
alyzer (SPAN),
Netflow and/or sFlow, where ap plic
a
ble. Nexus Data Bro ker (NDB) clients can be lever -
aged to con sol
i
date SPAN from leaf switches into a com mon switch ag grega
tion point
to build scalable net
work TAPs and SPAN ag gre
ga
tion infrastructures.
An ad di
tional tool within VXLAN OAM is the “tissa”-based tra cepath, fol lowing the
“draft-tissa-nvo3-oam-fm” IETF draft. This tool not only gets the exact path plot ting
from an un derlay perspec
tive, but it also de
rives the spe cific VTEP where the des ti
na
-
tion is actu
ally at
tached. Further
more, with ad di
tional input pa rameters it is possible to
identify the egress VTEP, the un derlay path from ingress to egress VTEP in cluding all
in
termediate hops, as well as all in volved interfaces. In ad di
tion, the load and error
coun ters for those inter
faces can be pro vided as well.
The sam ple out put below shows a "tissa"-based over lay path
trace. The function
al
ity ex
-
poses the phys i
cal path (un
der
lay) from leaf via spine to border, while the request was
ini
ti
ated in the VXLAN overlay.
errors:0
bandwidth:42949672970000000
Available Tools
• Tra
di
tional (CLI, scripting)
• Off-the-shelf tools
• De
vOps (Pup
pet, Chef, Ansible)
Traditional Tools
1 VXLAN con
fig
u
ra
tion is com
mand-in
ten
sive. The cre
ation of new ten
ants or seg
-
ments re
quires mul
ti
ple lines of con
fig
u
ra
tion, po
ten
tially across a large num
ber of
devices.
2 VXLAN tech
nol
ogy de
pends on the pres
ence of a con
sid
er
able num
ber of un
der
ly
-
ing pro
to
cols, mak
ing it more bur
den
some to de
ploy when com
pared to other
tech
nolo
gies like Span
ning Tree or FabricPath.
Python Scripting
Python script
ing has been used by net work opera
tors for years; however, with NX-OS
run
ning on the switches, script
ing can be taken to a whole new di mension. APIs and
Soft
ware Devel
opment Kits (SDKs) are avail
able for NX-OS. An ex ample of an SDK for
NX-OS is the nx toolkit which is freely avail
able for down load: https://github.
com/
datacenter/nxtoolkit.
Operations & Management 154
An ex
am
ple Python script for VXLAN is lo
cated at the fol
low
ing: https://
github.
com/
erjosito/evpn_
shell. This script is es
sen tially an ex
ter
nal CLI that can be used to cre-
ate, delete, and view ten ants, VNIs, and rel e
vant config
u
ra
tion el
e
ments across all
VTEPs in a VXLAN EVPN Fab ric. This script makes use of in fra
struc
ture vari
ables such
as man age
ment IP ad dresses, credentials etc. and with a sin gle command de ploys all
the required VXLAN EVPN con fig
u
ra
tion to cre ate a tenant or a net
work inside of a
tenant.
Languages such as XML and JSON are used to struc ture com mands and out puts and
they elimi
nate the need to parse hu man-read able strings formatted in para
graphs and
ta
bles. String parsing is commonly used in script
ing but has ver sion de
penden
cies. That
puts a bur den on life cy
cle man
agement for these au toma tion scripts that have kept
many or gani
za
tions from using them. The APIs avail able in NXOS are an im provement
over tradi
tional script
ing meth
ods, and will improve the automa tion processes.
Off-the-Shelf Tools
Operations & Management 155
Off-the-Shelf Tools
In the con
text of a VXLAN-based so
lu
tion, DCNM can be uti
lized for the fol
low
ing
purposes:
1 Firstly, to pro
vide for the Fab
ric un
der
lay con
fig
u
ra
tion. DCNM has built-in Power
On Auto Pro
vi
sion
ing (POAP) sup
port to de
liver zero-touch auto-pro
vi
sion
ing of
the net
work de
vices that build the VXLAN Fabric.
3 DCNM sup
ports mon
i
tor
ing of the per
for
mance and uti
liza
tion of the net
work
switches, as well as fault man
age
ment and sys
log aggregation.
4 Man
ag
ing the soft
ware run
ning on the switches and per
form
ing soft
ware up
grades
and downgrades.
This pro
vi
sion
ing can be per
formed in a top-down (push) fashion, where DCNM tracks
de
ployment events and simply pushes the re
quired CLI con
fig for the ac
cess port onto
the switch.
Al
terna
tively, a more dy namic mech anism is pos si
ble, where the leaf switches “pull” the
config
u
ration from the LDAP data base of DCNM based on a spe cific event, such as a
local at
tach ment of an end point. A typi
cal ex am ple of this more dy namic mech a
nism is
the support on the VXLAN leaf nodes of a func tion
al
ity called Vir
tual Machine Tracker
Auto-Con fig (VM Tracker), which au tomat i
cally provi
sions a specific ten
ant con
fig
u
ra
-
tion. The com mands re quired for pro vi
sion ing the ten ant are stored in the form of a
config
u
ration pro
file. A config
u
ra
tion pro file is a set of com mands that will be required
Operations & Management 156
for provi
sion
ing a par
tic
ular ten
ant, ex
cept the re
quired pa
ra
me
ters are writ
ten as vari
-
ables instead of ac
tual val
ues in a command.
Spe
cific to VXLAN man
age
ment DCNM pro
vides the fol
low
ing capabilities:
• DCNM pro
vides in
te
grated Power-On Auto Pro
vi
sion
ing (POAP) to boot new
switches for a green
field Fab
ric or add new switches to an exist
ing VXLAN Fab ric.
DCNM man ages this POAP work flow so that an admin sim
ply assigns a de
vice to a
pre
con
fig
ured template.
• In ad
di
tion, the POAP con
fig
u
ra
tion Diff/Sync fea
ture lets the admin know if a de
-
vice’s con
fig
u
ration does not match its POAP tem
plate and then lets the user re
-
solve these differences.
• DCNM also pre
sents topol
ogy views show
ing phys
i
cal and over
lay net
works on the
same page, helping net
work ad
mins quickly iden
tify the ex
tent of vir
tual over
lay
net
works on a Fabric.
• DCNM also pre
sents smart topol
ogy views show
ing vir
tual port chan
nels (vPCs) and
vir
tual de
vice contexts. In topology view, DCNM shows VXLAN Tun nel endpoint
sta
tus as well as VXLAN search. DCNM shows VXLAN net work iden
ti
fier (VNI) sta
-
tus and other VXLAN in formation on a per-switch basis.
• Built-in search al
lows ad
mins to search by VM Name, VM IP Address, VM MAC Ad
-
dress, VNI, or Switch ID.
More in
for
ma
tion on Cisco Data Cen
ter Net
work Man
ager can be found at: http://
www.
cisco.
com/go/dcnm.
Ignite
Day-0 tasks are ex tremely im
portant in order to have a con sis
tent Fabric. Ig
nite is a
simple hands-off approach to bootstrap a device with the ap
propri
ate code level and
ini
tial de
vice setup. To achieve that, Ig
nite leverages the POAP ca pabil
i
ties of Cisco
Nexus switches.
Ig
nite is an open-source tool that can be down
loaded at no cost from Github: https://
github.com/ datacenter/
ignite.
Operations & Management 157
Cisco NFM has a fabric-wide focus and allows for the auto-provi
sion
ing and manage-
ment of the whole net
work. NFM pro vides point-and-click meth ods for per
form
ing fab
-
ric man
agement tasks such as adding, re
mov ing, and con
fig
ur
ing net
work com po
nents
such as switch
pools, switches, switch inter
faces, VRFs, port chan nels and broadcast
domains.
• Cre
ation: NFM al
lows for a zero-touch boot up of the Fab
ric, per
form
ing some
Day-0 oper
ations like ca
bling topol
ogy ver
i
fi
ca
tion and au
to
matic VXLAN un
der
lay
provisioning
Operations & Management 158
• Con
nec
tion: NFM fully man
ages the en
tire VXLAN con
fig
u
ra
tion, re
mov
ing the op
-
era
tional as
so
ci
ated hur
dles. This es
sen
tially im
plies that a user does not nec
es
sar
-
ily need to know that VXLAN with MP-BGP EVPN is de ployed as the key func
tion
al
-
ity to en
able endpoint communication
• Ex
pan
sion: there are more day-N type of op
er
a
tions, such as zero-touch ad
di
tion
of switches to the Fab
ric and auto-up
grade of ex
ist
ing fab
ric devices
• Fault Management: NFM of
fers a built-in fault man
age
ment system
• Re
port
ing: Cisco NFM com
mu
ni
cates to the switches de
ployed in the fab
ric by
lever
ag
ing soft
ware agents em
bed
ded into the switches
More infor
ma
tion re
gard
ing Cisco Nexus Fab
ric Man
ager is avail
able at: http://
www.
cisco.
com/go/nexusfabricmanager.
1 Sup
port for a mix of soft
ware and hard
ware VTEPs
2 In
te
gra
tion with the hy
per
vi
sor layer
3 Sup
port of a mul
ti
ven
dor Fabric
4 Over
lay and un
der
lay op
er
ated by dif
fer
ent teams
• Vir
tual Topol
ogy Con
troller: this is a man
age
ment plat
form that of
fers ways to de
-
ploy ten
ants and net works over a GUI or a north
bound REST ful API. It in
tegrates
with VMware vCen ter and with Openstack/KVM, so cus tomers can man age the
over
lay di
rectly from the VMM. The Vir
tual Topol
ogy Con
troller will roll out the re
-
quired changes using southbound APIs such as NX-API or NetConf/YANG.
Operations & Management 159
Cisco VTS sup ports flood and learn as well MP-BGP EVPN con trol planes. It includes
func
tional
ity such as ARP sup
pression ca
pa
bil
i
ties, sym
metric IRB, VTEP authen ti
cation
and fast con ver
gence upon network fail
ures and end point mobility.
More in
for
ma
tion re
garding Cisco Vir
tual Topol
ogy Sys
tem is avail
able at: https://
www.
cisco.
com/go/vts.
DevOps Tools
Con fig
u
ra
tion Man
agement Tools (CMT) are a new gen er
a
tion of in
tent-based tools
that have gained great pop u
lar
ity, mainly in the Linux com munity. They can be clas
si
-
fied into two cat
e
gories: Agent-based and agent less tools.
• In agent-based con
fig
u
ra
tion man
age
ment, changes are made cen
trally on a mas
ter
node, and are pulled down and ex e
cuted by the agent. The device agents pe
ri
od
i
-
cally con
nect with the master for con
fig
u
ra
tion in
for
ma
tion and the changes are
pulled down and exe
cuted. Only the changes that are needed are pulled.
Operations & Management 160
• Agent
less Con
fig
u
ra
tion Man
age
ment is push-based in
stead of pull-based. Con
fig
u
-
ra
tion management scripts are run on the mas ter and the mas
ter con
nects to the
man aged de
vices and ex
e
cutes the task over an API.
Puppet and Chef are ex am ples of agent-based con fig
u
ration man age
ment tools. With
these agent-based sys tems, the user lever ages a cus tom de clar
a
tive lan
guage to de-
scribe the system con fig
u
ration which needs to be con fig
ured on the re mote systems.
Both of these tools have sim i
lar func
tional
ity which is con tin
u
ally evolv
ing. Pup
pet re
-
cently released mod ules to con fig
ure, pro vi
sion, and man age a Cisco VXLAN-based
Fabrics plus sev
eral standard top-of-rack switch features.
Puppet uses mod ules that in clude descriptions about which fea tures are supported,
and man i
fests that are the ac tual de
scriptions of how those de vices should be con fig
-
ured. Man i
fests can be sta tic, dynami
cally in
corpo
rate condi
tions or even use Ruby
logic. Some con ditions will depend on which sys tem is being man aged, and a wealth of
that in
for
ma tion is gathered by Pup pet's com pan
ion tool "fac
ter". The Puppet agent will
pull the manifest from the Pup pet server (Puppet Master) and imple
ment it.
Chef ar
chi
tec ture is very sim
i
lar, but in
stead of mani
fests the jar
gon is "recipes", that is
where the ex pected state of the man aged devices is doc u
mented. Recipes can be
grouped to gether in Cook books for eas ier management. As al ready de scribed, Chef
runs in a client/server ar chi
tecture, but it has an addi
tional standalone mode called
"Chef solo".
As with Pup
pet, some ex
am
ples of Chef recipes for Cisco NX-OS are avail
able in Github
under https://
github.
com/
cisco/ cisco-network-chef-cookbook.
An
si
ble is an example of an agentless based con fig
u
ra
tion manage ment sys tem that
manages nodes via SSH and has the abil ity to ex
e
cute the scripts locally on the man-
aged node or on the local server con nects via the Cisco NX-API. An si
ble uses the con
-
cept of Mod ules, Tasks, Plays, and Play
books to man age the config
ura
tion on the re
-
mote devices.
Operations & Management 161
• Mod
ules: units of work that An
si
ble ships out to re
mote ma
chines. Some mod
ules
pre-in
stalled, cus
tom mod
ules can be man
u
ally in
stalled as well
• Tasks: com
bi
na
tion of mod
ules with ar
gu
ments and de
scrip
tion names
• Plays: map
ping of hosts or groups to their tasks
• Play
books: col
lec
tion of Plays by which An
si
ble or
ches
trates, con
fig
ures, ad
min
is
-
ters, or de
ploys sys
tems. Play
books are writ
ten in YAML
Summary Table
The fol
lowing table il
lus
trates how the tools dis
cussed above con
tribute to the Day 0, 1
or N oper
a
tions of network fabrics:
CLI X X
Python X X
Ignite X
Ansible X X
Puppet X X
Chef X X
Acronyms
Acronyms 163
Acronyms
ACI: Ap
pli
ca
tion Cen
tric Infrastructure
ADC: Ap
pli
ca
tion De
liv
ery Controllers
API: Ap
pli
ca
tion Pro
gram Interface
ARP: Ad
dress Res
o
lu
tion Protocol
BGP: Bor
der Gate
way Protocol
CLI: Com
mand-Line Interface
DAG: Dis
trib
uted Any
cast Gateway
EVPN: Eth
er
net Vir
tual Pri
vate Network
GEN
EVE: Generic Net
work Vir
tu
al
iza
tion Encapsulation
IDS: In
tru
sion De
tec
tion System
IEEE: In
sti
tute of Elec
tri
cal and Elec
tron
ics Engineers
IGP: In
te
rior Gate
way Protocol
IPS: In
tru
sion Pre
ven
tion System
IRB: In
te
grated Rout
ing and Bridging
LISP: Lo
ca
tor/ID Sep
a
ra
tion Protocol
MP-BGP: Multi-Pro
to
col BGP
MPLS: Multi-Pro
to
col Label Switching
MSDP: Mul
ti
cast Source Dis
cov
ery Protocol
MTU: Max
i
mum Trans
mis
sion Unit
NAT: Net
work Ad
dress Translation
NLRI: Net
work Layer Reach
a
bil
ity Information
NSH: Net
work Ser
vice Header
NVO: Net
work Vir
tu
al
iza
tion Overlay
Acronyms 165
OAM: Op
er
a
tions, Ad
min
is
tra
tion and Management
OTV: Over
lay Trans
port Virtualization
PIM: Pro
to
col-In
de
pen
dent Multicast
RP: Ren
dezvous Point
SDK: Soft
ware De
vel
op
ment Kit
SDN: Soft
ware De
fined Networking
SNMP: Sim
ple Net
work Man
age
ment Protocol
VMM: Vir
tual Ma
chine Manager
VNI: Vir
tual Net
work Instance
vPC: Vir
tual Port-Channel
VRF: Vir
tual Rout
ing and Forwarding
Acronyms 166
VTC: Vir
tual Topol
ogy Controller
VTEP: Vir
tual Tun
nel Endpoint
VTF: Vir
tual Topol
ogy Forwarder
VTS: Vir
tual Topol
ogy System