Vxlan Evpn

Preface 1
. . . . . Introduction
.................................................................... 2
. . . . . Authors
.................................................................... 4
. . . . . Acknowledgements
.................................................................... 5
. . . . . Organization
. . . . . . . . . . . . .of
. . this
. . . . book
................................................. 6
. . . . . Intended
. . . . . . . . . Audience
........................................................... 9
. . . . . Book
. . . . . Writing
. . . . . . . .Methodology
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Why a New Approach 11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
. . . . . Why
. . . . . VXLAN
. . . . . . .Overlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
. . . . . Why
. . . . . a. .Control
. . . . . . . Plane
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
. . . . . Looking
. . . . . . . . Ahead
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Fundamental Concepts 20
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
. . . . . What
. . . . . .is. .VXLAN?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
. . . . . How
. . . . . Does
. . . . . VXLAN
. . . . . . . Work?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
. . . . . Networking
. . . . . . . . . . . .in. .a. VXLAN
. . . . . . . Fabric
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Software Overlays 33
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
. . . . . Host-Based
. . . . . . . . . . . .Overlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Single-POD VXLAN Design 42
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
. . . . . Underlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
. . . . . Overlay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
. . . . . Host
. . . . . Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
External Connectivity for VXLAN Fabric 65
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
. . . . . Layer
. . . . . .3. .Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
. . . . . Layer
. . . . . .2. .Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
. . . . . Integration
. . . . . . . . . . . and
. . . . Migration
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Layer4-Layer7 Services 91
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
. . . . . Use
. . . . Cases
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Multi-POD & Multi-Site Designs 114
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
. . . . . Fundamentals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
. . . . . Multi-POD
. . . . . . . . . . .Design
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
. . . . . Design
. . . . . . . Options
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135
. . . . . Building
. . . . . . . . the
. . . .Multi-Site
. . . . . . . . . .Inter-Connectivity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .138
Operations & Management 145
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146
. . . . . Management
. . . . . . . . . . . . .tasks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .148
. . . . . Available
. . . . . . . . .Tools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153
Acronyms 162
. . . . . Acronyms
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163
Preface
Preface 2
Introduction
VXLAN EVPN
For many years now, VLANs have been the de-facto method for pro vid
ing net
work seg
-
men ta
tion in data center net
works. Standardized as IEEE 802.1Q, VLANs lever
age tra
-
di
tional loop preven
tion techniques such as Spanning Tree Pro
to
col which not only im
-
poses restric
tions on net
work design and resiliency, but it also re
sults in an in
ef
fi
cient
use of avail
able network links due to the block ing of re
dundant paths, re quired to en-
sure a loop free network topology.
VLANs also use a 12-bit VLAN iden ti

fier to address Layer 2 seg
ments, al
lowing for the
ad
dress
ing of up to a practi
cal limit of ~4,000 VLANs. In mod ern data center net
work
de
ploy
ments, VLANs have be come a lim it
ing fac
tor to IT departments and cloud
providers as they build in
creas
ingly large and com plex, multi-ten
ant data centers.
Mod ern data centers require an evo lution from the re straints of tradi
tional Layer 2 net -
works. Cisco, in partner ship with other lead ing vendors, proposed the Vir tual Ex ten
si
-
ble LAN (VXLAN) stan dard to the IETF as a so lu
tion to the data cen ter network chal -
lenges posed by tra di
tional VLAN tech nol
ogy and the Span ning Tree Pro tocol. At its
core, VXLAN pro vides ben efits of elastic work load placement, higher scal a
bil
ity of Layer
2 segmen ta
tion, and con nec tiv
ity ex ten
sion across the Layer 3 net work bound ary.
How ever, without an intel
li
gent con trol plane, VXLAN has its lim its due to its flood and
learn behavior.
Preface 3
Multi-Protocol Border Gate way Pro to

col (MP-BGP) in troduced new Net work Layer
Reacha
bil
ity In
for
ma tion (NLRI) to carry both Layer 2 MAC and Layer 3 IP in for
mation
at the same time. By hav ing the combined set of MAC and IP in for
ma tion avail
able for
for
warding deci
sions, opti
mized routing and switch
ing within a net work be comes fea si
-
ble and the need for flood and learn be hav
ior which lim
its its abil
ity to scale. The ex
ten -
sion that allows BGP to trans port Layer 2 MAC and Layer 3 IP in for
mation is called
EVPN – Eth ernet Vir
tual Private Network.
In sum
mary, the ad
van
tages pro
vided by a VXLAN EVPN so
lu
tion are as follows:
• Stan
dards-based Over
lay (VXLAN) with stan
dards-based con
trol plane (BGP)
• Layer 2 MAC and Layer 3 IP in
for
ma
tion dis
tri
b
u
tion by con
trol plane (BGP)
• For
ward
ing de
ci
sion based on scal
able con
trol plane (min
i
mizes flooding)
• In
te
grated Rout
ing/Bridg
ing (IRB) for Op
ti
mized For
ward
ing in the Overlay
• Lever
ages Layer 3 ECMP – all links for
ward
ing – in the underlay
• Sig
nif
i
cantly larger name
space in the over
lay (16M segments)
• In
te
gra
tion of phys
i
cal and vir
tual net
works with hy
brid overlays
• Fa
cil
i
ta
tion of Soft
ware-De
fined-Net
work
ing (SDN)
This book ex plores VXLAN EVPN, be gin

ning with the in troductory stages, gaining an
un
derstand ing of terms and con cepts and evolv ing through de ployments within a single
data center to mul ti
ple data centers. The book also ad dresses de sign and in
tegra
tion of
L4-L7 net work ser vices, co-exis
tence with brown field en vi
ron ments, and the tools
needed to build, op er
ate, and maintain a VXLAN EVPN Fab ric. At the con
clusion of this
book, readers will have a solid foun dation of VXLAN EVPN and a com pre
hension of real-
world use cases that can be im me di
ately uti
lized to assist in develop
ment of a plan to
suc
cessfully tran
sition to a next genera
tion data center Fabric.
VXLAN BGP EVPN fea tures and function

al
ity dis
cussed within are avail
able on the fol
-
low
ing Cisco Nexus Se
ries Switches:
• Cisco Nexus 9000 Se

ries start
ing NX-OS 7.0
• Cisco Nexus 7000 Se
ries and Nexus 5600 start
ing NX-OS 7.3
Preface 4
Authors
This book rep

resents a col
lab
o
ra
tive ef
fort be
tween Tech ni
cal Mar
keting and Sales En
-
gi
neers dur
ing a week-long inten
sive session at Cisco Headquar
ters in San Jose, CA.
• Bren
den Bu
resh - Sys
tems Engineering
• Dan Eline - Sys
tems Engineering
• David Jansen - Sys
tems Engineering
• Jason Gmit
ter - Sys
tems Engineering
• Jeff Os
ter
miller - Sys
tems Engineering
• Jose Moreno - Sys
tems Engineering
• Kenny Lei - Tech
ni
cal Marketing
• Lil
ian Quan - Tech
ni
cal Marketing
• Lukas Krat
tiger - Tech
ni
cal Marketing
• Max Ardica - Tech
ni
cal Marketing
• Rahul Para
meswaran - Tech
ni
cal Marketing
• Rob Tap
pen
den - Sys
tems Engineering
• Satish Kon
dalam - Tech
ni
cal Marketing
Preface 5
Acknowledgements
A spe cial thanks to Cisco’s Insieme and EISG BU Ex ec

u
tives, Tech
ni
cal Mar
ket
ing and
Engi
neer ing teams, who sup ported the re
al
iza
tion of this book. Thanks to Carl Sol
der,
James Christo pher, Joe On isick, Matt Smorto, Victor Moreno and Yousuf Khan for sup -
porting this effort. Thanks to Cisco Sales Leadership for sup
port
ing the group of in
di
-
vid
ual con
trib
u
tors who have ded
i
cated their time in au
thor
ing this book.
We would also like to thank Cynthia Broder

ick for her ex
cep
tional re
source orga
ni
za
-
tion and sup
port through out our journey, and Shilpa Grandhi for mak ing sure that
every
thing worked smoothly and that all thir
teen writ
ers made it to the end.
We are also gen

uinely ap
pre
cia
tive to our Book Sprint (www.
booksprints.
net) team:
• Adam Hyde (Founder)

• Hen
rik van Leeuwen (Illustrator)
• Juan Car
los Gutiérrez Bar
quero (Tech
ni
cal Support)
• Julien Taquet (Book Producer)
• Laia Ros (Facilitator)
• Raewyn Whyte (Proof Reader)
Laia and the team cre

ated an en abling en
vi
ron
ment that al
lowed us to exer
cise our col
-
lab
o
ra
tive and tech
ni
cal skills to pro
duce this tech
ni
cal pub
li
ca
tion to meet a grow ing
demand.
Preface 6
Organization of this book
Readers can read this book se

quen tially, or go di
rectly to in
dividual chapters. Gener
ally,
chapters should be self-contained, but in order not to du pli
cate infor
ma tion, point
ers
to other parts of the book may exist. Where ap plic
a
ble, hyper links will take the reader
to In
ter
net pages that pro
vide addi
tional levels of details.
Why a New Approach

The mo ti
va
tion for the de
vel
opment of this new tech nol
ogy is ex
plored, as well as the
ben
efits orga
ni
za
tions can extract from it. A brief overview explains how the data and
con
trol planes of VXLAN EVPN con tribute to solv
ing busi
ness challenges that many or-
ga
ni
zations are con fronted with.
Fundamental Concepts
This chap ter shifts the focus from the "Why" to the "What". Es
sential con cepts for un
-
derstand
ing the tech nol
ogy are laid out, to set the nec
es
sary founda tion for under
-
standing the rest of the book. The ba sics of VXLAN tech
nol
ogy are articu
lated, as well
as the funda
men tals of net
working in a VXLAN Fabric.
Software Overlays
The inter
sec
tion of vir
tual and phys i
cal net
working is dis
cussed in order to help the
reader gain the required perspective to decide how to best imple
ment VXLAN tech nol
-
ogy to support these vir
tu
al
ized environments.
Single-POD VXLAN Design

Preface 7
Single-POD VXLAN Design

A deep dive into the inner work ings of the VXLAN pro to
col, in
clud
ing best prac tices,
design recommen da
tions and lessons learned about both the un der
lay and over lay el
e
-
ments of a VXLAN Fab ric with MP-BGP EVPN. Even though this pub li
ca
tion should not
be con sid
ered as a con fig
u
ra
tion guide, com mand examples are in
cluded so the reader
in
terested in the ac
tual deployment can un der
stand the re
quired con fig
u
ra
tion for each
of the indi
vid
ual compo nents of the technology.
External Connectivity for VXLAN Fabrics

After a VXLAN Fab ric is up and run
ning, the next step is connecting it to the rest of the
world. This chap ter discusses the de
tails of con
necting the VXLAN Fab ric over Layer 2
and Layer 3 with non-VXLAN net works. These techniques are used in the last sec tion of
the chapter to demon strate a pro
ce
dure for migrat
ing a brownfield legacy net work into
a VXLAN Fab ric.
Layer4-Layer7 Services
Ether
net routers and switches are not the only el ements pro viding net work ser
vices in
a data center. Layer 4-Layer 7 de vices like fire
walls or appli
ca
tion de liv
ery con
trollers
are often indispensable for se
cure and effi
cient ap
pli
cation deliv
ery. This chapter ad-
dresses how to con nect these net
work ap pli
ances to the VXLAN Fab ric so the data cen-
ter net
work of fers the best per
for
mance and avail abil
ity end-to-end.
Multi-POD and Multi-Site Designs

Most or gani
za
tions today have busi ness con ti
nuity re
quirements that de ter
mine
how net work in fra
struc
ture is de ployed in mul ti
ple ge
o
graph i
cal lo
ca
tions. Whether
net
works are de ployed across two rooms in the same build ing or across sites thou
sands
of miles apart, this chapter il
lus
trates how to resolve the dis
tributed network prob
lem,
achieving at the same time work load mobil
ity and fault containment.
Operations and Management

Preface 8
Operations and Management

Ef
fi
cient man
age
ment prac
tices can op
ti
mize the way com
puter net
works are mon
i
-
tored and deployed, and VXLAN Fab rics are no ex
cep
tion to this. This chap
ter de
scribes
how to use both tradi
tional and mod ern network tech
niques to man age a VXLAN Fab -
ric. Off-the-shelf net
work man agement soft ware will be discussed as well as open
source approaches or DevOps-inspired tools such as Pup
pet, Chef and Ansible.
Preface 9
Intended Audience
The intended au di

ence for this book is network profes
sionals with a gen eral need to
un
derstand how to de ploy VXLAN net works in their or
gani
zations to unleash the full
po
ten
tial of mod ern networking. While in
ter
ested net
work ad min is
tra
tors will reap the
most ben e
fits from this content, the in
for
mation in
cluded within this book may be of
use to every IT pro fes
sional in
ter
ested in net
working technolo
gies. Ele
ments in this
book ex plore how VXLAN and EVPN solve net work challenges that have daunted the in -
dustry for years, as well as how to de ploy con
structs that are typi
cally seen in tra
di
-
tional networks with this new technology.
Preface 10
Book Writing Methodology
How many en gi

neers do you need to write a book? Thir teen! Thirteen highly-skilled
pro
fes
sion
als got to
gether in Build
ing 31 in Cisco headquarters in San Jose, Cal
i
fornia.
Thir
teen In, One Out: Thirteen indi
vid
u
ally-se
lected highly-skilled profes
sion
als from
di
verse backgrounds accepted the challenge to duel thoughts over the course of five
days. Fig ur
ing out how to har ness the brain power and col labo
rate ef fectively at first
seemed to be nearly im pos
si
ble, how ever, op posites at
tracted and the team per sisted
through the hur dles. The Book Sprints (www. booksprints.net) method ol
ogy cap tured
each of our strengths, fos tered a team-ori ented en vi
ronment, and ac celer
ated the
overall time to com pletion. The as sem bled group lever aged their near two hun dred
years of ex pe
ri
ence and a thou sand hours of dili gent au thorship which re sulted in this
publi
cation. Repre
senting four con ti
nents and seven na tional
i
ties, after five long days,
one book was pro duced. Fu eled by Chi nese, In dian, Japan ese, Mex i
can and Ital ian food,
but first and fore most by Amer i
can cof fee, together with their fa cil
i
ta
tor from Book
Sprints, the writers poured their ex peri
ence and knowl edge into this publication.
Why a New Approach
Introduction
IT is evolv
ing to
ward a cloud con sumption model. This tran si
tion af
fects the way appli
-
cations are being architected and im
ple
mented, driving an evo lu
tion in data center in
-
frastruc
ture design to meet these chang ing re
quirements. As the foun da
tion of the
mod ern data center, the net
work must also take part in this evolu
tion while also meet -
ing the in
creas
ing de
mands of server virtualization and new mi croservices-based ar
chi
-
tec
tures. This demands a new par
a
digm that must de liver on the fol
lowing areas:
• Flex
i
bil
ity to allow work
load mo
bil
ity across any floor tile in any site
• Re
siliency to main
tain ser
vice lev
els even in fail
ure con
di
tions (bet
ter fault
isolation)
• Multi-ten
ancy ca
pa
bil
i
ties and bet
ter work
load segmentation
• Per
for
mance to pro
vide for ad
e
quate band
width and pre
dictable la
tency, in
de
pen
-
dent of scale for de
mand
ing workloads
• Scal
a
bil
ity from small en
vi
ron
ments to cloud scale while main
tain
ing the above
characteristics
As a re
sult, mod ern data center networks are evolv ing from tradi
tional hi
er
archi
cal de
-
signs to hori
zontally-ori
ented spine-leaf architec
tures with hosts and ser vices dis
trib
-
uted through out the network. These net works are ca pable of supporting the in
creas-
ingly common east-west traf fic flows ex
peri
enced in mod ern appli
ca
tions. In ad
di
tion,
there are clus ter
ing technolo gies and vir
tual
iza
tion techniques that re quire Layer 2
adjacency.
Evolv ing user demands and ap pli

ca
tion require ments sug gest a dif
fer
ent approach that
is sim ple, and more agile. Ease of provi
sioning and speed are now crit i
cal perfor
mance
met rics for data cen ter network in fra
structure that sup ports phys i
cal, virtual, and
cloud en vi
ron
ments - without com promising scala
bil
ity or se
cu
rity. These are the main
drivers for the in
dus
try to look at Software De fined Net work (SDN) solutions.
Cisco Ap
pli
cation Cen
tric In
fra
struc
ture (ACI) is an in
nov
a
tive data center ar
chi
tec
ture
that sim
pli
fies, op
ti
mizes and ac cel
er
ates the en tire appli
ca
tion life
cy
cle through a
common pol icy manage
ment frame work. ACI pro vides a turnkey so lu
tion to build and
oper
ate an auto
mated cloud in fra
structure. An alternative option is a VXLAN Fab ric
with BGP EVPN con trol plane that provides a scal
able, flexi
ble and man age
able so
lu
tion
to support growing de
mands of cloud environments.
This chapter in

tro
duces the con
cepts of VXLAN EVPN and the prob
lem it has been de
-
signed to solve.
Why VXLAN Overlay
Network over lays are a technique used in state-of-the-art data centers to cre
ate a flex
-
i
ble in
fra
structure over an in her
ently sta
tic net
work by vir
tu
al
iz
ing the network. Be -
fore going into the de tails of how over
lays work, the chal
lenges they face, and the so lu
-
tions to overlay problems, it's worth spending some time to understand why traditional
net
works are so static.
When networks were first de vel

oped, there was no such thing as an ap pli
ca
tion mov ing
from one place to an other while it was in use. As a re sult, the origi
nal ar
chitects of
TCP/IP used the IP ad dress as both the identity of a de
vice and its lo
cation on the net-
work. This was a per fectly rea
sonable thing to do as com puters and their ap pli
ca
tions
did not move, or at least they did not move very fast or very often.
Today in the modern data cen ter, ap

pli
cations are often deployed on vir
tual ma
chines
(VMs) or con
tain
ers. The vir
tu
al
ized appli
cation workload can be stretched across mul
-
ti
ple lo
ca
tions. The ap pli
ca
tion end points (VMs, con tainers) can also be mo bile among
dif
fer
ent hosts. Their iden ti
ties (IP addresses) no longer in di
cate their lo ca
tion. Due to
the tight coupling of an end point's lo
cation with its iden tity in the tra di
tional network
model, the end point may need to change its IP ad dress to in di
cate the new lo ca
tion
when it moves. This breaks the seam less mo bil
ity model re quired by the vir tual
ized ap-
pli
ca
tions. Therefore, the net work needs to evolve from the sta tic model to a flex i
ble
one in order to con tinu
ously sup port com mu ni
cations among ap pli
cation end points re-
gardless of where they are. One ap proach is to sep a
rate the iden tity of an end point
from its physi
cal lo
cation on the net work so the lo cations can be changed at will with -
out breaking the com muni
cations to the end point. This is where over lays come into the
picture.
An overlay takes the orig i

nal mes sage sent by an ap pli
cation and encapsulates it with
the lo
ca
tion it needs to be de liv
ered to be fore send ing it through the net work. Once
the message ar rives at its final des ti
nation, it is de
capsulated and deliv
ered as de sired.
The identi
ties of the de vices (ap pli
cations) com mu
ni
cating are in the origi
nal mes sage,
and the locations are in the en capsulation, thus separat
ing the lo
cation from the iden -
tity. This en

cap
su
la
tion and de
cap
su
la
tion is done on a per-packet basis and there
fore
must be done very quickly and efficiently.
Today, ac cord ing to mar ket re

search, approxi
mately 60-70% of all ap pli
ca
tion work
-
loads are vir
tu al
ized, however, more than 80% of the servers in use today are not run -
ning a hypervi
sor. Of course, every data cen ter is unique and the mix of servers run
ning
vir
tu
al
ized work loads vs. non-virtu
al
ized work loads covers the en
tire spectrum. Any
network solution for the data center must address this mix.
Cisco, in partner
ship with other lead ing vendors, pro
posed the Vir tual Ex
ten si
ble LAN
(VXLAN) stan dard to the IETF as a solu
tion to the data center network chal
lenges posed
by tradi
tional VLAN technol
ogy. The VXLAN stan dard provides for the elastic work load
placement and higher scala
bil
ity of Layer 2 segmenta
tion that is re
quired by today’s ap -
pli
ca
tion demands.
VXLAN is designed to provide the same Eth ernet Layer 2 network services as VLANs do
today, but with greater exten
si
bil
ity and flex
i
bil
ity. Im
ple
menting VXLAN tech nolo
gies
in the net
work will pro
vide the fol
lowing bene
fits to every work
load in the data center:
• Flex
i
ble place
ment of any work
load in any rack through
out and be
tween data
centers
• De
cou
pling be
tween phys
i
cal and vir
tual networks
• Large Layer 2 net
work to pro
vide work
load mobility
• Cen
tral
ized Man
age
ment, pro
vi
sion
ing, and au
toma
tion, from a controller
• Scale, per
for
mance, agility and stream
lined operations
• Bet
ter uti
liza
tion of avail
able net
work paths in the un
der
ly
ing infrastructure
Why a Control Plane
When im plementing an over

lay, there are three major tasks that have to be ac
com-
plished. Firstly, there must be a mech a
nism to forward pack
ets through the net
work.
Tradi
tional net
working mecha
nisms are ef
fec
tive for this.
Sec
ondly, there must be a con
trol plane where the lo
ca
tion of a de
vice or ap
pli
ca
tion
can be looked up and the re
sult used to en
cap
su
late the packet so that it may be for
-
warded to its destination.
Thirdly, there must be a way to up date the control plane such that it is al
ways accu
-
rate. Having the wrong infor
mation in the con
trol plane could re
sult in packets being
sent to the wrong lo
ca
tion and likely dropped.
The first task, for

warding the packet, is some thing that net
work ing equipment has al
-
ways deliv
ered. Perfor
mance, cost, reli
a
bil
ity, and sup
porta
bil
ity are fun
da
mental con
-
sid
er
a
tions for the net
work which must equally apply to both the phys
i
cal and over
lay
net
works respectively.
The sec
ond task, con trol plane lookup and encapsu
la
tion, is re
ally an issue of per
for
-
mance and capacity. If these functions were per
formed in soft ware, they would con -
sume valu
able CPU re sources and add latency when com pared to hard ware solutions.
The third com po

nent of an over lay is the means by which mod i
fi
ca
tions to the control
plane are updated across all network el e
ments. This updat
ing is a real challenge and a
concern for any data center administra
tor due to the po
tential for ap
pli
ca
tion impact
from packet loss if the con
trol plane malfunctions.
VXLAN Control Plane

VXLAN Control Plane

VXLAN as an over lay tech
nology does not pro vide many of the mech a
nisms for scale
and fault tol
erance that other net work ing tech
nologies have de vel
oped and are now
tak
ing for granted. In a VXLAN net work, each switch builds a data base with the lo
cally
connected hosts. A mech a
nism is required so that other switches learn about those
hosts. In a tra
di
tional network, there is no mech a
nism to dis trib
ute this in
for
ma
tion.
The only con trol plane previ
ously available was a data plane-dri ven model called flood
and learn. For a host to be reach able, its in
for
ma
tion has to be flooded across the net -
work. Ether
net net works have operated with this defi
ciency for decades.
While the de mand for scal able net

works increases, the effects of flood and learn need
to be miti
gated. For a VXLAN over lay, a con
trol plane is re
quired that is ca
pable of dis-
trib
ut
ing the Layer 2 and Layer 3 host reach abil
ity in
for
mation across the net work.
Early im
plementations of VXLAN lacked the ability to carry Layer 2 network reachabil
ity
in
for
mation, therefore, Eth
er
net VPN (EVPN) ex tensions were added to Multi-Pro tocol
BGP (MP-BGP) to carry this information.
MP-BGP EVPN
MP-BGP EVPN for VXLAN pro vides a distrib
uted control plane solu
tion that sig
nif
i
-
cantly im
proves the ability to build and in ter
con
nect SDN over lay net
works. MP-BGP
EVPN control plane for VXLAN of fers the fol
low
ing key benefits:
• Con
trol plane learn
ing for end host Layer 2 and Layer 3 reach
a
bil
ity information.
• Abil
ity to build a more ro
bust and scal
able VXLAN over
lay network
• Sup
ports multi-tenancy
• Pro
vides in
te
grated rout
ing and bridging
• Min
i
mizes net
work flood
ing through pro
to
col-dri
ven host MAC/IP route
distribution
• ARP sup
pres
sion to min
i
mize un
nec
es
sary flooding
• Peer dis
cov
ery and au
then
ti
ca
tion to im
prove security
• Op
ti
mal east-west and north-south traf
fic forwarding
Looking Ahead
Even though VXLAN tech nology has attained a consid

er
able degree of ma tu
rity in a
very short time, the in
dus
try is al
ready de
sign
ing the next evo
lu
tion of this technology.
Generic Protocol Encapsulation (VXLAN-GPE)

VXLAN is one of many data plane en capsu
la
tions available. Examples of other UDP-
based en cap
sula
tions are LISP (Lo ca
tor/ID Sep a
ra
tion Pro to
col) and OTV (Over lay
Trans port Vir
tual
iza
tion). These three en capsula
tions are very sim i
lar, the dif
fer
ences
lying in the overlay shim header. While all three use the same size header, the field al lo-
cation and the nam ing are slightly dif
fer
ent. Within the en capsu
la
tion, there are also
varia
tions. While VXLAN main tains an in
ner-MAC header, LISP only car ries an in
ner-IP
header. It becomes ev i
dent that an ap proach for header ex tensions is needed to avoid
adding yet an other UDP-based encapsulation.
VXLAN-GPE was in vented to bring some con sol

i
da
tion in the UDP-based en capsula
tion
family. A major part of VXLAN-GPE is the in clu
sion of a proto
col-type field to de fine
what is being encapsu
lated and set the meaning for the var
i
ous flags and options in the
overlay shim header. This pro to
col type describes the packet pay load; currently de-
fined types in
clude IPv4, IPv6, Eth
er
net, and Network Service Header (NSH).
A prominent ex
ample for the need of this flex
i
ble pro
to
col ex
ten
sion is Ser
vice Chain
ing
and the re
lated NSH approach.
NSH en ables the possi

bil
ity of dynam i
cally speci
fy
ing that certain net
work traffic is sent
through a chain of one or more net work ser vices. The goal of NSH is to cre ate a topol-
ogy-in dependent way of spec i
fy
ing a ser vice path. NSH also in cludes a num ber of
manda tory, fixed-size con text head ers de signed to cap ture net work plat
form in forma-
tion. NSH even con tains an op tional variable length meta data field for ad
di
tional exten-
si
bil
ity and is de
signed to in clude all required information in
side fixed-size fields.
Cre
ation of yet an
other en
cap
su
la
tion pro
to
col stands to add more con
fu
sion to the al
-
ready crowded en capsu
la
tion pro to
col space. The ex ten
si
bil
ity of VXLAN-GPE and
NSH promises to both re duce the amount of en capsu
la
tion in the in
dustry and accom -
modate future network en capsu
la
tion requirements. Gen eve, VXLAN-GPE, and NSH
are all re
cent pro
to
col drafts proposed to the IETF. The three pro to
cols pro
vide simi
lar
ap
proaches to achieve flex i
ble pro
tocol map pings. While Gen eve uses variable length
op
tions, VXLAN-GPE and NSH use fixed size op tions. Cisco sup ports open stan dards
and will contin
u
ously reevalu
ate support for future encapsulations.
Evolution of the EVPN Control Plane

The current implemen ta
tion of the EVPN con trol plane is fo
cused on de liv
ering scal
able
data center Fabrics with mo bil
ity and seg
men ta
tion. As EVPN con trol plane im ple
men -
ta
tions become more com plete, the EVPN con trol plane may ad dress ad ditional use-
cases such as DCI. The com plete theo
ret
i
cal def
i
n
i
tion of the EVPN con trol plane is
captured in a se
ries of Inter
net drafts being worked on at the IETF. The gen eral speci
fi
-
ca
tion of EVPN ac com mo dates use cases be yond the Data Cen ter Fab ric, includ
ing
Layer 2 Data Cen ter In
tercon
nect.
In order to prop erly ad

dress the DCI re quire
ments, the EVPN con trol plane imple
men-
ta
tion must be ex panded to in clude the multi-hom
ing functional
ity de
fined in the EVPN
speci
fi
ca
tion to deliver fail
ure con
tain
ment, loop protec
tion, site-aware ness, and op
ti
-
mized mul ti
cast repli
ca
tion.
Fundamental Concepts
Introduction
In the net work

ing world, an over lay network is a vir
tual net
work run ning on top of a
phys i
cal net
work infrastruc
ture. The phys i
cal net
work pro vides an un der
lay function,
of
fering the con
nectivity and services re
quired to support the vir
tual net
work instances
deliv
ered in the overlay. The virtual network al
lows for an independent set of network
ser
vices to be of
fered re
gard
less of the un derlay in
fra
struc ture, even though those ser -
vices may be the same. As an exam ple, it is possi
ble to deliver Layer 2 con
nectiv
ity ser
-
vices on top of a Layer 3 net
work infrastructure via an over lay net
work. A com mon ex -
ample of this would be VPLS ser
vice offered over a car rier's MPLS infrastructure.
An overlay network typ

i
cally provides transport of network traf fic be
tween tun nel end
-
points on top of the underlay by en capsu
lat
ing and de capsu
lat
ing traffic be
tween tun-
nel endpoints. The tunnel end point may be de livered through a phys i
cal net
work de-
vice, and perform tunnel encapsula
tion/decapsula
tion in hardware. It also may be vir-
tual, with the tun
nel endpoint process run ning in a hypervi
sor. A hardware tun nel end
-
point pro vides greater per for
mance lever ag
ing hardware-based for ward
ing, but has
less flex
i
bil
ity implementing new ca pabil
i
ties. In con
trast, a soft
ware end
point pro
vides
in
creased flex i
bil
ity but at the cost of limited performance.
This chapter provides an overview of the con

cepts re
quired to have a basic un
der
-
stand
ing of the technol
ogy and how it works.
What is VXLAN?
Vir
tual Exten
si
ble LAN (VXLAN) as de fined in RFC 7348 is an over lay technol
ogy de -
signed to provide Layer 2 and Layer 3 con nectiv
ity ser
vices over a generic IP net work.
IP net
works pro vide in
creased scal
a
bil
ity, bal
anced per for
mance and pre dictable fail
ure
re
covery. VXLAN achieves this by tun neling Layer 2 frames in side of IP packets. VXLAN
re
quires only IP reach
a
bil
ity be
tween the VXLAN edge de
vices, pro
vided by an IP rout
-
ing protocol.
There are pros and cons to consider when se lect

ing the un
der
lay rout
ing pro
to
col and
these are dis
cussed in more de
tail in the Sin
gle-POD VXLAN De sign Chapter.
The VXLAN stan

dard de
fines the packet for
mat il
lus
trated by the fol
low
ing diagram:
Figure: VXLAN Packet Format

VXLAN uses an 8-byte header that consists of a 24-bit identi

fier (VNID) and mul ti
ple re
-
served bits. The VXLAN header, along with the orig i
nal Ethernet frame, is placed in the
UDP pay load. The 24-bit VNID is used to iden tify Layer 2 seg ments and to main tain
Layer 2 iso
la
tion be
tween the seg
ments. With 24 bits allo
cated for the VNID, VXLAN can
sup
port up to 16 mil
lion log
i
cal segments.
The ter
mi
nol
ogy used when de
scrib
ing the key com
po
nents of a VXLAN Fab
ric include:
• VTEP – Vir
tual Tun
nel End
point: The hard
ware or soft
ware el
e
ment at the edge of
the net
work responsi
ble for in
stan
ti
at
ing the VXLAN tun
nel and per
form
ing VXLAN
en
capsu
la
tion and decapsulation
• VNI – Vir
tual Net
work In
stance: a log
i
cal net
work in
stance pro
vid
ing Layer 2 or
Layer 3 ser
vices and defin
ing a Layer 2 broad
cast domain
• VNID – Vir
tual Net
work Iden
ti
fier: a 24-bit seg
ment ID that al
lows the ad
dress
ing of
up to 16 mil
lion log
i
cal net
works to be pre
sent in the same ad
min
is
tra
tive domain
• Bridge Do
main: A set of log
i
cal or phys
i
cal ports that share the same flood
ing or
broad
cast characteristics
The VXLAN tun nel endpoint func

tion can be per
formed by a hard ware device or by a
soft
ware entity such as a hyper
vi
sor. The main advan
tage of using a hard
ware-based
tunnel end
point is the enhanced per for
mance of
fered through the capa
bil
i
ties of the
switch ASICs.
Al
ter
na
tively, a software-based VTEP re moves the de pendency from the hard ware
switches, al
beit at the ex
pense of per
for
mance. Addi
tion
ally, VXLAN de ploy
ments could
adopt hybrid approaches, where the VXLAN tun nels are es
tablished be
tween hard
ware
and software VTEPs. More in for
mation on this can be found in the Soft ware Overlays
chapter.
As dis
cussed in the in
troduc
tion, the use of VXLAN tech
nol
ogy brings sev
eral ben
e
fits
to Data Cen
ter net
work ing which include:
• Multi-ten
ancy: VXLAN Fab
rics in
her
ently sup
port multi-ten
ancy both at Layer 2
(sep
arate Layer 2 VNIs rep resent log
i
cally iso
lated bridg
ing do
mains) and Layer
3 (by defin
ing dif
fer
ent VRFs for each supported tenant)
• Mo
bil
ity: The over
lay ca
pa
bil
ity of
fered by VXLAN pro
vides Layer 2 ex
ten
sion ser
-
vice across the data cen
ter to pro
vide flex
i
ble de
ploy
ment and mo
bil
ity of phys
i
cal
and vir
tual endpoints
• In
creased Layer 2 seg
ment scale: VLAN-based de
signs are lim
ited to a max
i
mum of
4,096 Layer 2 seg
ments due to the use of a 12 bit VLAN ID. VXLAN in tro
duces a 24-
bit VNID that the
o
ret
i
cally sup
ports up to 16 mil
lion dis
tinct segments
• Multi-path Layer 2 sup
port: Tra
di
tional Layer 2 net
works sup
port one ac
tive path
because Spanning Tree (STP) ex
pects and en forces a loop-free topol
ogy by blocking
re
dundant paths. A VXLAN Fab ric lever
ages a Layer 3 under
lay net
work for the use
of mul
ti
ple ac
tive paths
How Does VXLAN Work?
Data Plane
VXLAN re quires an under
ly
ing transport net
work that performs data plane forwarding.
This data plane for
warding is re
quired to pro
vide uni
cast com muni
ca
tion be
tween end -
points connected to the Fabric. The fol
low
ing di
a
gram il
lus
trates data plane for
warding
in a VXLAN network.
Figure: VXLAN Overlay Network

At the same time, the un

der
lay net
work can be used to de
liver multi-des
ti
na
tion traf
fic
to end
points con nected to a com mon Layer 2 broad cast do
main in the over
lay net
work.
Often this traf
fic is re
ferred to as BUM, since it in
cludes Broad
cast, Unknown Uni cast
and Multi
cast traffic.
Two dif
fer
ent ap
proaches can be taken to allow trans
mis
sion of BUM traf
fic across the
VXLAN Fabric:
1 Lever
age mul
ti
cast tech
nol
ogy in the un
der
lay net
work (Pro
to
col In
de
pen
dent Mul
-
ti
cast or PIM), to make use of the na
tive repli
ca
tion ca
pa
bil
i
ties of the Fab
ric spines
to de
liver traf
fic to all the edge VTEP devices.
2 In sce
nar
ios where mul
ti
cast can
not be de
ployed, it is pos
si
ble to make use of the
source-repli
ca
tion ca
pa
bil
i
ties of the VTEP nodes that cre
ate mul
ti
ple uni
cast
copies of the BUM frames to be sent to each re
mote VTEP de
vice. This ap
proach is
not as ef
fi
cient as using mul
ti
cast for BUM traf
fic replication.
VXLAN doesn't change the se man tics of Layer 2 or Layer 3 for warding and al
lows the
VTEP to per form bridg ing and routing functions while lever ag
ing the VXLAN tun nel for
data plane forward ing. As such, the VTEP of fers a set of dif
fer
ent gateway functions as
out
lined in the fol
lowing diagram.
• Layer 2 Gate
way: VXLAN to VLAN bridg
ing maps a VNI seg
ment to a VLAN to cre
ate
a com
mon bridge domain
• Layer 3 Gate
way (VXLAN Router): VXLAN to VXLAN rout
ing pro
vides Layer 3 con
-
nec
tiv
ity be
tween two VNIs na
tively so no de
cap
su
la
tion func
tion is required
• Layer 3 Gate
way (VXLAN Router): VXLAN to VLAN rout
ing pro
vides Layer 3 con
nec
-
tiv
ity be
tween a VNI and a VLAN
Figure: VXLAN Gateway Functions
Control Plane
Control Plane
The VXLAN RFC has to date only con cerned itself with the trans port (data plane) of
traffic, en
sur
ing con nec
tivity to all hosts in a VXLAN do main. The con trol plane, or
method by which VXLAN reach a
bil
ity and learning oc curs, was achieved through what
is known as flood and learn be havior. Simply speak ing, flood and learn is a data-dri ven
method ol
ogy wherein a VTEP that doesn’t know the lo cation of a given des ti
nation
MAC floods the frame onto the VXLAN’s as so
ci
ated mul ti
cast group. Mul ti
cast is typ i
-
cally used in order to pro vide a more man ageable approach to multi-des ti
nation traffic.
Instead of learning the source in terface asso
ci
ated with a frame’s source MAC ad dress,
the host learns the en capsulating source IP ad dress of the re mote VTEP. Flood and
learn method ol
ogy is concerned with both the dis covery (be tween peers) of VTEPs as
well as re mote endpoint location learning.
While flood and learn method ol

ogy presents a rea
sonably low bar rier to entry for net
-
work ven dors to im ple
ment a VXLAN stack, the draw back to flood and learn is first and
foremost scal
a
bil
ity. The amount of ad ditional mul
ti
cast traf
fic in
troduced into an envi
-
ronment can be dif fi
cult to pre
dict and as such has been a bar rier to adoption for some
enter
prise customers.
In order to ad dress the con cerns of scala

bil
ity, the con cept of a con trol plane to man -
age MAC learn ing and VTEP peer dis covery is de sirable, and prefer ably one that could
be based on ex ist
ing pro tocols that are gen er
ally well un der stood. Multi-Pro tocol Bor -
der Gate way Pro to col (MP-BGP) with Eth er
net Vir tual Pri vate Net work (EVPN) ex ten -
sions has been pro posed as the IETF stan dard con trol plane for VXLAN. Based on the
exist
ing MP-BGP stan dard, the MP-BGP EVPN con trol plane pro vides pro tocol-based
VTEP peer dis cov ery and end point reach abil
ity in for
ma tion dis tri
b
u
tion that al lows
more scal able VXLAN over lay network de signs. The MP-BGP EVPN con trol plane in tro-
duces a set of fea tures that re duces the amount of traf fic flood ing in the over lay net -
work and en ables op ti
mal forward ing for both east-west and north-south traf fic. Rel e-
vant to the data cen ter use case, EVPN pro vides reach abil
ity infor
mation for both L2
and L3 end points. Ex tending this level of reach abil
ity, and adding the ca pa
bil
ity for ARP
sup pression, re duces the re quired amount of flood ing in the net work. One ad di
tional
ben e
fit of the EVPN con trol plane is that it pro vides VTEP peer dis covery and au then ti
-
cation, mit i
gat
ing the risk of rogue VTEPs in the VXLAN over lay network.
In order to un derstand MP-BGP EVPN func tional

ity, it is help
ful to have a back ground
under
standing of MP-BGP as it is com monly used in MPLS net works. A tra di
tional
MPLS net work has a full mesh of BGP routers or route re flec
tors for scaling that ex-
change reach a
bil
ity and pro
file infor
mation for L3VPNs (or L2VPNs in the case of VPLS
for ex
ample). The com bi
na
tion of route dis tin
guishers (RD) and VPNv4 ad dresses en -
sure the abil
ity to uniquely iden tify a tar
get, and routes can be se lec
tively learned using
route tar
get (RT) filtering.
In the EVPN con trol plane, there are tech

ni
cally three data plane options: Multi-Pro
to
-
col Label Switch ing (MPLS, draft-ietf-l2vpn-evpn), Provider Back bone Bridging (PBB,
draft-ietf-l2vpn-pbb-evpn), and Net work Vir
tual
iza
tion Overlay (NVO, draft-ietf-bess-
evpn-overlay).
Figure: EVPN IETF Draft
For the pur

poses of this book, NVO will be as
sumed when dis
cussing EVPN.
Networking in a VXLAN Fabric
In tra
di
tional Layer 2 access networks, the Layer 3 de fault gateway is most com monly
placed at the aggre
gation layer. Gen er
ally, the pair of aggregation switches lever
age a
first-hop re
dundancy pro to
col such as HSRP, VRRP or GLBP to pro vide a re
dun
dant de
-
fault gate
way IP address. Depend ing on con fig
u
ra
tion and pro tocol, these may be con
-
fig
ured for ac
tive/standby or ac
tive/ac
tive redundancy.
With the rise of vir tu

al
iza
tion in the data cen ter, the phys
i
cal design of the net work and
its logi
cal repre
sen
ta
tion are in creasingly dif
fer
ent. Vir
tu
al
iza
tion en cour ages workload
mo bil
ity and this introduces in ef
fi
cien
cies, given that de fault gate way place ment was
pred i
cated upon the phys i
cal lo
cation of network resources. Traf fic forwarding contin
-
ues to func tion, however, the in herent inef
fi
ciency created by traf fic hair-pin
ning is
suboptimal.
Distributed Anycast Gateway

The use of the MP-BGP EVPN con trol plane intro
duces Dis trib
uted Any cast Gate
way
function
al
ity. In this model, the de fault gate
way func tion is fully dis trib
uted across all
leaf nodes within the VXLAN Fab ric. Leverag
ing the Dis trib
uted Any cast Gateway func-
tion provides improved effi
ciency and higher cross-sec tional band width while elim
i
nat
-
ing the need to run a First Hop Re dun dancy Protocol (FHRP). Fur ther
more, routed traf-
fic between work loads connected to the same leaf is lo cally forwarded with out hav
ing
to be sent to the spine layer. By de creasing hop count, the Dis tributed Anycast Gate
way
greatly reduces net work latency.
Integrated Routing and Bridging

EVPN VXLAN Fab rics intro
duce In te
grated Rout ing and Bridging (IRB) functional
ity,
which offers the capa
bil
ity of both Layer 2 and Layer 3 for warding di
rectly at the leaf
switch. This is funda
men tal to the Distributed Any cast Gateway func tion which pro -
vides a dis
trib
uted de
fault gate way capabil
ity clos
est to the endpoints.
Asymmetric vs Symmetric Forwarding

The EVPN draft de fines two dif
fer
ent meth
ods for routing traf
fic be
tween VXLAN over
-
lays. The first method is referred to as asym
met ric IRB and the sec ond is known as
symmetric IRB.
With asym metric IRB, the ingress VTEP is per form ing both rout ing and bridg ing,
whereas the egress VTEP is only per form ing bridging. As a result, the re
turn traf
fic will
take a different VNI than the source traf fic. This necessi
tates that the source and des ti
-
na
tion VNIs re side on both the ingress and egress VTEPs. This leads to a more com plex
config
uration as all switches need to be con fig
ured for all pos si
ble VNIs. Perhaps a
more press ing con sid
er
a
tion is the scal
ing im pli
cations of all de
vices poten
tially need -
ing to learn a con sider
ably larger number of endpoints.
Figure: Asymmetric IRB

In symmetric IRB, both the ingress and egress VTEP pro vide both L2 and L3 forwarding.
This re
sults in predictable for
warding be
havior. As a re
sult, only the VNIs of lo
cally-at
-
tached end points need to be de fined in a VTEP (plus the tran sit L3 VNI), which in turn
sim
pli
fies con
fig
u
ration and re
duces scale requirements through op ti
mized use of ARP
and the MAC ad dress table. This re
sults in bet
ter scale in terms of the total number of
VNIs a VXLAN Fab ric can support.
Figure: Symmetric IRB
It is important to keep in mind that as both meth ods are defined in the standard, con -
sidera
tion must be given to de vice se
lec
tion and the impli
cations for in
ter
op
er
ability.
For ex ample, Cisco supports only symmetric IRB on the Nexus plat forms as it offers
better scalability.
Software Overlays
Introduction
Server vir
tual
ization has transformed the way in which data cen ters are op er
ated, and
the vast majority of data centers today im ple
ment it to some de gree. How ever, making
the as
sump tion that those data cen ters run ex
clu
sively vir
tu
al
ized work loads would be
a mis
take. Many or ga
ni
za
tions still make use of main frames, for ex am ple. Moreover,
new ap pli
ca
tions that do not re quire server vir
tual
iza
tion are com ing into the main
stage, such as cloud-based soft ware that makes use of Linux con tain
ers, or modern
scale-out ap pli
ca
tions such as Big Data, that de liver oper
a
tional bene
fits and scale
without the need of a hypervisor.
Al
though VXLAN is a generic over lay con cept that is com monly de ployed in the net -
work, it is some
times as
soci
ated with server vir tu
al
ization and hypervi
sors. This chap-
ter covers the ad
van
tages and dis ad
vantages of im plementing VXLAN on vir tual
ized
hosts, and how to re
al
ize the most ben e
fit out of this technology, keeping in mind that
one of the main rea
sons for inter
est in VXLAN is its open ness, that avoids vendor lock-
in (ven
dor or hypervisor).
Host-Based Overlay
Server virtual
iza
tion of
fers sig
nif
i
cant bene
fits in
cluding flex
i
bil
ity and agility in de
liv
er
-
ing com pute ser vices in the data cen ter. Tradi
tion
ally, network ing to the hypervi
sor is
provided via VLAN trans port, and there is a new trend to adopt host-based VXLAN
overlays to im prove agility and automation of the network layer.
Figure: Host-Based Overlay
The host-based over lay typ

i
cally runs between host VTEPs over an IP transport and of
-
fers ease of de
ploy
ment and au tomation ca
pa
bil
ity, em
power
ing the server team to de
-
liver vir
tual net
working services without needing to in
volve the network team. This
abil
ity to au
to
mate networking di
rectly through the Vir
tual Machine Manager (VMM) as
a soft
ware only over
lay often re
sults in a sub-opti
mal network solu
tion which does not
take into ac
count the broader aspects of op er
a
tions, in
te
gra
tion, and per
for
mance for
the net
work as a whole.
In addi
tion to the CPU im pact introduced with host-based over lays, the net work team
has to pro vide extra ef forts in trou bleshooting due to the lack of cor relation between
the over lay and the un derlay net
works. In re gards to CPU im pact, the per formance of a
software VTEP is de pen dent on CPU and mem ory avail
able on the hypervi
sor. Some im -
plemen ta
tions run the VTEP func tion in kernel space, others in user space. Both op -
tions must de liver the nec es
sary packet pro cess
ing required for ef
fi
cient ap pli
ca
tion
deliv
ery. These so lu
tions typ i
cally struggle to deliver line-rate through put even with
hard ware assis
tance at the server NIC.
Addi
tion
ally, host-based overlay net
work solutions are pri
marily fo
cused on net working
for vir
tual servers with
out con sid
er
a
tion for phys i
cal workloads or other ex ist
ing ser
-
vices in
side or outside the data center. Con nectiv
ity to both phys i
cal servers and re-
sources be yond the virtual network typi
cally require gateways, either in software or
hardware which must be in te
grated with the phys i
cal network.
In summary, when eval

u
at
ing host-based over lay so
lutions, it is crit
i
cal to con
sider the
broader business and techni
cal im
pli
ca
tions for the data cen ter in
cluding li
cens
ing
cost, com
pute overhead, perfor
mance penalty, ad di
tional gate way in fra
struc
ture re
-
quire
ments, and the im
pact to network operations.
An Alternative to Host-Based Overlays

A network-based over lay de ploys net work switches as VXLAN tun nel endpoints
(VTEPs). Com pared to host-based over lay so
lutions, network over lays de
liver hard -
ware-ac cel
erated en cap
sula
tion provided by ASICs, which is the core of a net work
switch. By of floading the VXLAN over lay functions to the net work switch, a sim ple
VLAN-based de ployment can be used to con nect phys i
cal or vir
tual workloads to the
VXLAN Fab ric. By re moving the bur den of data traf fic forwarding and en capsu
la
tion
from the hy pervi
sor, CPU re sources are freed. In other words, the hy per
vi
sor can allo
-
cate all avail
able hard ware re sources to its key function, serving Vir
tual Machines (VMs)
and applications.
With a VXLAN EVPN Fab ric and the as so

ci
ated op
er
ation and man agement tools, it is
pos
si
ble to de
liver the flexi
bil
ity of au
to
mated net work provi
sioning for vir
tual ma -
chines while overcoming some of the pre vi
ously dis
cussed limi
tations af
fect
ing the
host-based over lay model. De
ploy
ing a network-based over lay does not have an impact
on the over
all network per
for
mance, vis
i
bil
ity, and trou
bleshooting of net
work issues.
Figure: Network-Based Overlay
The VM Tracker (VMT) func tion on Nexus leaf switches pro vides vis
i
bil
ity of the hyper
vi
sor
hosts and the VMs con nected to the VXLAN Fab ric so the net work can take de ci
sions upon
that infor
mation. For example, the VM Tracker auto-con fig fea
ture en ables au to
mated
provi
sion
ing of network re
sources to sup port vir
tual machines in a VMware vSphere en vi
-
ronment. VM Tracker com mu ni
cates with VMware vCen ter Server to re trieve in
for
mation
re
lat
ing to the vir
tual net
work con fig
u
ra
tion. The information includes:
• vSphere ESX host to VM mappings

• VM state
• Phys
i
cal port attachments
• VDS port groups as
sign
ment to VM
The net
work con fig
u
ra
tion that is dy
nami
cally pro
vi
sioned, de
pend
ing on the pre
vi
ous
in
for
ma
tion, in
cludes the follow
ing attributes:
• VLAN provisioning
• VNI allocation
• L3 gateway
• VRF provisioning
Figure: vCenter Integration for Dynamic Network Configuration

For ex ample, proper VLAN and VNI con fig

ura
tion is de
ployed on a spe cific leaf when
the first VM as so
ci
ated to that VNI is lo
cally connected and removed when the last VM
is mi
grated to a differ
ent server or pow ered off. With such function
al
ity, the server ad
-
minis
tra
tor can achieve the agility re
quired to quickly de ploy VMs, and at the same time
provi
sion network resources in a dy
namic manner.
This so
lu
tion provides the perfor
mance of hard ware-based en capsu
la
tion without hav
-
ing to up
grade the host phys i
cal NICs. This is especially rel
e
vant in the case of virtual
network functions. For example, certain host-based over lays use a virtual gate
way to
in
ter
act with the rest of the world, as already men tioned in a pre vi
ous section. The per
-
for
mance dis cus
sion is crit
i
cal in this case, because that sin gle vir
tual gateway might
become a bottle
neck for the whole vir tual environment.
In conclusion, by leverag ing network-based over lays it is pos

si
ble to achieve line-rate
through put for both east-west and north-south traf fic flows while elim i
nating the need
for soft
ware gate ways. At the same time, the net work ad minis
trator gains vis i
bil
ity of
the vir
tual infra
structure at tached to the Fabric, as
sisting with the on go ing operations
and troubleshoot ing of the en vi
ronment. Fi
nally, VM Tracker func tional
ity ensures that
the vir
tual
iza
tion admin istra
tor gains net
work agility on top of the ben e
fits that server
vir
tu
al
iza
tion already provides.
Hybrid Overlays with VXLAN EVPN

As previ
ously discussed, pure host-based over lays bring lit
tle value to data cen
ters, but
there are sit
u
ations where a hybrid approach might solve some chal lenges or use cases.
Ser
vice Providers have very spe cific re
quire
ments regarding net work man age
ment and
op
er
ations including:
• Sup
port for a mix of soft
ware and hard
ware VTEPs
• In
te
gra
tion with the hy
per
vi
sor layer
• Sup
port multi-ven
dor Fabrics
• Over
lay and un
der
lay are op
er
ated by dif
fer
ent teams
Hybrid VXLAN overlays consist of both host-based soft

ware VTEPs and switch-based
hardware VTEPs. A co
hesive oper
a
tional and manage
ment model is needed to in
te
grate
the two types of VTEP together. Cisco Virtual Topol
ogy Sys
tem (VTS) is an ex
ample of
such a solution.
Fur
ther de
tails about the Cisco Vir tual Topology Sys
tem are provided in the Manage-
ment And Op era
tions chap
ter, but below is a brief sum
mary of Cisco VTS architecture.
Figure: Hybrid Overlays
Cisco Vir tual Topology Sys

tems (VTS) pro
vi
sions hard
ware and soft
ware VTEPs. The
abil
ity to in
te
grate a VXLAN soft
ware-based VTEP allows the de
ploy
ment of the VXLAN
technology on top of legacy net work hardware or to com ple
ment hard
ware-based
VTEP deployments.
The Virtual Topol ogy Controller (VTC) is the single point of man agement for hy brid
overlays to con
figure, manage and op er
ate a VXLAN Fab ric with MP-BGP EVPN con trol
plane. The man age ment layer sup ports in
te
gra
tion with hy per
vi
sors such as VMware
vSphere or Open stack/KVM so that net work con structs can be directly provi
sioned
from the hy per
visor User In
terface. The northbound REST APIs en able in
te
gra
tion with
third party tools.
The con
trol plane is rep
re
sented by a vir tual
ized IOS-XR router to provide in
te
gra
tion
with MP-BGP EVPN and ad vertise reachabil
ity in
for
ma
tion to the soft
ware VTEP itself
over an API. The soft ware VTEP named Vir tual Topol
ogy For warder (VTF) provides
VXLAN encapsu
la
tion ca
pa
bil
ity in the hypervisor.
More de
tails on Cisco Vir
tual Topol
ogy Sys
tem ar
chi
tec
ture are avail
able at https://
www.
cisco.
com/ go/vts
Single-POD VXLAN
Design
Introduction
In clas
sic hi
er
ar
chi
cal network designs, the access and ag gre
ga
tion lay
ers to
gether pro-
vide Layer 2 and Layer 3 func tional
ity as a building block for data cen ter con
nectiv
ity.
In smaller data center envi
ron
ments, this sin gle build ing block would provide suf
fi
cient
scale to meet the en tire de
mands for con nectivity and perfor
mance. As the en vi
ron-
ment scales to meet the in creased demands of the larger data cen ter, this building
block is typ
i
cally repli
cated with an addi
tional core layer in
tro
duced to con nect these
to
gether. These build ing blocks are com monly re ferred to as a Point of De liv
ery, or
POD, and allow for con sis
tent, mod
u
lar scale as the en
vi
ron
ment grows.
Figure: Hierarchical Network Design
When de sign

ing a VXLAN Fab ric, a sin
gle-POD also de
fines a sin
gle VXLAN Fab
ric based
on a scal
able spine-leaf ar
chi
tecture as shown in the di
a
gram below.
Figure: VXLAN Fabric
A single VXLAN POD can scale to hun dreds of switches and thou sands of ports which
will meet the de mands of many en ter
prise data cen ter en
vi
ron
ments; how ever, to meet
more com plex or larger scale re quirements, the VXLAN POD may be repli cated in the
form of a multi-POD de sign. In a typ i
cal deployment with mul ti
ple data center lo
ca
-
tions, these VXLAN Fab rics, whether sin gle or multi-POD- based, will be de ployed to-
gether as a multi-site VXLAN de sign. Both the multi-POD and multi-site de ploy
ment
types are de scribed fur ther in the multi-POD and multi-site De signs chapter. Ad
di
tion
-
ally, the connectiv
ity of Layer 2 and Layer 3 to the ex ter
nal net
work do main is covered
in the Exter
nal Con nectivity for the VXLAN Fab ric chapter.
This chapter ex

plores the design con
sid
er
a
tions for building a sin
gle VXLAN POD com -
pris
ing the under
lay net
work foun da
tion and the over lay net
work to gether with their
as
soci
ated data and control planes, as well as guidelines for endpoint connectiv
ity to
the Fabric.
Underlay
In build
ing a VXLAN EVPN Fab ric, it is es
sential to construct an appro
pri
ate under
lay
net
work as this will pro vide a scal
able, available and func tional foun
da
tion to support
the overlay. This sec
tion in
cludes impor tant consid
era
tions for the un
der
lay design.
Routed Interface Considerations
MTU
In order to improve the through put and net work perfor

mance, it is recommended to
avoid frag
menta
tion and re assembly on network devices per
form ing VXLAN encapsula
-
tion and de
capsu
la
tion. It is there
fore re
quired to in
crease the max i
mum transmission
unit (MTU) in the transport net work by at least 50 bytes (54 if an 802.1Q header is pre-
sent in the encap
sulated frame). If the over lay uses a 1500-byte MTU, the trans port
net
work needs to be config
ured to ac
commo date 1550 byte (1554 bytes if in
clud
ing the
802.1Q header) frames as a min
i
mum. Jumbo frame sup port in the trans
port network is
strongly rec
ommended if the over lay ap
pli
ca
tions use frame sizes larger than 1500
bytes.
In order to ensure that VXLAN en

cap
sulated pack
ets can be success
fully carried across
the Fabric, the in
crease of MTU must be config
ured on all the Layer 3 inter
faces con-
necting the Fabric nodes.
Routed Interface Addressing

The con nec
tiv
ity between network de
vices in a VXLAN Fabric typ
i
cally lever
age routed
point-to-point in ter
faces which can be sim
ply addressed with a /30 or even a /31 sub-
net mask. In a large data cen ter Layer 3 under
lay net
work, there will be many routed
links, lead
ing to high IP address consumption.
Fol
low
ing are the IP ad
dress re
quire
ment for a cou
ple dif
fer
ent scenarios.
Figure: IP Address Requirement
In the ex
ample above, in a small net
work with 4 spine and 6 leaf switches, there will be
a mini
mum of 24 point-to-point links, re quir
ing a total of 68 ad
dresses for the Fabric
under
lay. This number will ex
ponen
tially in
crease to 408 in a larger scale sce
nario of 4
spine and 40 leaf switches.
A recom mended approach is to use IP unnumbered for the inter

face IP address con fig
-
u
ration re
quir
ing only a sin
gle IP ad
dress per de
vice, regard
less of the num ber of links
deployed. As shown below, in the smaller scale ex am ple the total num ber of IP ad -
dresses con sumed would be re duced to 16 for the en tire Fab
ric under
lay. When ex -
pand ing the net
work, the IP ad
dress re
quire
ment will increase lin
early with the num ber
of devices.
Loopback Interface Addressing

As high lighted in the ex
amples below, each leaf switch with a VTEP should have a min i
-
mum of two loop back in
terfaces. The first loop
back is used as Router-ID (RID) and for
as
signing an IP ad dress to the un numbered Layer 3 links. The second loop
back repre-
sents the VTEP IP ad dress used as source and des ti
na
tion for VXLAN encapsu
lated
traffic.
Figure: IP Address Requirement with IP Unnumbered
Routing Protocol Considerations

The choice of rout ing pro to
col for the un derlay net work has nu mer ous op tions, how -
ever, it is typ
i
cally de termined by what pro to
cols are al ready in use and are fa miliar to
the net work admin istrator. In mak ing this de ci
sion, it is impor
tant to con sider the pro -
to
col con ver
gence char acter
is
tics as this will de termine the over all speed of con ver
-
gence of the over lay net work. Specif i
cally, Open Short est Path First (OSPF) and In ter
-
mediate System - In terme di
ate System (IS-IS) are two types of In te
rior Gate way Pro to
-
col (IGP) that are par ticu
larly suitable for multi-stage spine-leaf Fab rics. As the spine-
leaf design in
herently pro vides mul ti
ple paths be tween leaf switches via the spine, SPF-
based pro to
cols will com pute a topol ogy con sist
ing of mul ti
ple equal cost paths
through the net work and pro vide rapid con ver
gence around failures.
While BGP also has merit as an un der

lay rout ing proto
col, it is a Path Vec tor Protocol
and pri marily con siders Au tonomous Sys tems (AS) to cal cu late paths. De spite this, an
experi
enced net work en gi
neer can ma nip
ulate BGP to achieve com pa
ra
ble con ver-
gence out comes to SPF-based rout ing proto
cols by leverag ing the many at trib
utes and
options available. The main per ceived ad vantage of using BGP in the un der
lay is having
only one rout ing pro to
col in use across the en tire net
work (un derlay + overlay). While
this offers simpli
fi
cation, po tential dis
advan
tages exist due to the ad di
tional config
ura
-
tion required. Since the over lay predom i
nantly uses a single path from VTEP to VTEP, it
is as
sumed that the un derlay pro vides multi-path for ward ing. This is not the de fault
for
warding be
hav
ior of BGP, there
fore, spe
cific at
ten
tion is needed to achieve equiv
a
-
lent multi-pathing as would be achieved when using SPF-based IGPs in the underlay.
When se lecting rout ing pro to cols for use in the un derlay, it is imperative to consider
how the over lay control plane pro to
col func tions and should be con figured. By using
the same pro tocol for the un der lay and over lay, a clear sep a
ration of these two do mains
can become blurred. There fore, when de signing an over lay network, it is a good prac -
tice to indepen dently build a trans port network as has been done in MPLS. The de ploy-
ment of an IGP in the un der lay offers this sep a
ra
tion of un der lay and over lay control
proto
col. This pro vides a very lean rout ing do main for the trans port net work that con -
sists of only loop back and point-to-point in ter
faces. At the same time, MAC and IP
reachabil
ity for the over lay ex ists in a dif
fer
ent pro tocol, namely MP-BGP EVPN.
OSPF Deployment Recommendation

OSPF is a link-state routing pro
to
col com monly used in en ter
prise en
vi
ronments. The
OSPF de fault in
ter
face type used for Eth er
net in
ter
faces is “Broadcast,” which in
her
-
ently re
sults in a Des
ig
nated Router (DR) and/or Backup Des ig
nated Router (BDR) elec
-
tion thus reducing rout
ing up
date traffic. While this is fine in a Multi-Access net
work
(such as a shared Ether
net seg
ment), it is unnec
essary in a point-to-point network.
In a point-to-point net work, the “Broad cast” in

ter
face type of OSPF adds a DR/BDR
election process and an ad ditional Type 2 Link State Ad ver
tise
ment (LSA). This re
sults
in un nec
es
sary addi
tional over head, which can be avoided by chang ing the in
terface
type to “point-to-point”. In this way, the DR/BDR elec tion process can be avoided, re-
ducing the amount of time to bring up the OSPF ad ja
cency between the leaf and spine
switches. In addi
tion, with the point-to-point in ter
face mode, the need for Type-2 LSAs
is re
moved with only Type-1 LSA needed since there is no Multi-Ac cess (or Broadcast)
segment present. As a result, the OSPF LSA data base remains lean.
IS-IS Deployment Recommendation

IS-IS Deployment Recommendation

An
other standard based IGP rout ing pro
tocol is In
ter
mediate System – In ter
me
di
ate
Sys
tem (IS-IS). This link state rout
ing pro
tocol is gain
ing pop u
lar
ity with fast con
ver
-
gence in a large-scale en vi
ronment al
though has pri mar
ily been de ployed in ser
vice
provider en
vi
ronments. IS-IS uses Con nection
less Net work Pro to
col (CLNP) for com -
mu ni
ca
tion be
tween peers and doesn’t de pend on IP. There is no SPF cal cu
la
tion on
link change and SPF cal cula
tion only hap pens when there is a topol ogy change which
helps with faster conver
gence and sta bility in the under
lay. No sig
nif
i
cant tun
ing is re
-
quired for IS-IS to achieve an effi
cient, fast con verg
ing under
lay network.
IP Multicast Recommendation
IP mul ti
cast pro
vides an ef
fi
cient mech
a
nism for the dis
tri
b
u
tion of multi-des
ti
na
tion
traf
fic in the Fab
ric underlay.
To deploy IP multi
cast in the underlay, a Pro
to
col Inde
pendent Mul ti
cast (PIM) rout
ing
pro
tocol needs to be en abled and must be con sis
tent across all the devices in the un
-
der
lay network. The two com mon PIM pro to
cols are Sparse-Mode (PIM-ASM) and Bidi -
rec
tional (PIM-Bidir). This implies the requirement to deploy rendezvous Points (RPs).
Multicast Rendezvous Point (RP) Consideration

Several meth ods are available to achieve a highly avail
able RP de ployment, including for
example the use of pro tocols such as auto-RP and Boot strap. How ever, to im
prove the
con ver
gence ex peri
ence in the RP fail ure sce
nario, the rec ommen da
tion is to deploy
Any cast RP, which con sists of using a common IP ad dress on dif fer
ent devices to iden-
tify the RP. Sim ple sta
tic RP map ping con
fig
u
ra
tion is then ap plied to each node in the
Fabric to as
soci
ate multi
cast groups to the RP, so that each source or re ceiver can then
uti
lize the local RP that is the closest from a topolog
i
cal point of view.
It is im
por
tant to re
mem ber that the VTEP nodes rep re
sent the sources and des
ti
na
-
tions of the mul
ti
cast traf
fic used to carry BUM traf
fic be
tween endpoints con
nected to
those devices.
Nor
mally, the RPs would be de ployed on the spine nodes, given the cen
tral po
si
tion
those de
vices play in the Fabric.
Figure: Multicast RP Placement
When de ploy

ing Anycast RP, it is crit
i
cal to syn
chro
nize infor
mation between the dif-
fer
ent RPs deployed in the network, as it may hap pen that sources and re ceivers join
dif
fer
ent RPs, de
pending where they are con nected in the network. Two mech anisms
are supported on Cisco Nexus plat forms to syn chro
nize state in
for
mation be
tween RPs:
• Mul
ti
cast Source Dis
cov
ery Pro
to
col (MSDP): this op
tion has been around for a long
time and it is widely avail
able across differ
ent switches and routers. MSDP sessions
are es
tablished be
tween RP de vices to exchange in
for
mation about source and re
-
ceivers for each given mul ti
cast group
• PIM with Any
cast RP: this op
tion is cur
rently sup
ported only on Cisco Nexus plat
-
forms and lever
ages PIM as con
trol plane to syn
chro
nize state be
tween RPs
Ingress Replication
Ingress Replication
Ingress repli ca
tion, also known as Head-End repli ca
tion, may be used as an al ter
native
to IP mul ti
cast to carry the BUM traf fic in
side the Fabric. One reason for using this al
-
ter
nate method is that IP mul ti
cast is not always an avail able op
tion due to hardware
and soft ware con straints. IP multi
cast may also not be pre ferred due to perceived com -
plexity by the network op era
tions team.
When de ploy

ing ingress replica
tion it is im por
tant to consider the overall scale of the
Fabric and the amount of multi-des ti
na
tion traffic ex
pected in the en
vi
ronment. This is
be
cause for VXLAN EVPN ingress repli cation, the VXLAN VTEP uses a list of IP ad -
dresses of other VTEPs in the net work to send BUM traf fic as uni
cast traffic, cre
at
ing
multi
ple copies of the same traffic type. It is worth noticing that the deployment of the
MP-BGP con trol plane enables the list of VTEPs con nected to the same VXLAN Fab ric
to be dynami
cally built. These IP addresses are ex changed be tween VTEPs through the
BGP EVPN con trol plane.
interface nve1
no shutdown
source-interface loopback0
host-reachability protocol bgp
member vni 30000
ingress-replication protocol bgp
member vni 30001
ingress-replication protocol bgp
As shown in the configu

ration sample above, the ingress replica
tion mode is con fig
-
urable on a per-L2VNI. It is not pos
si
ble to mix mul
ti
cast and ingress repli
ca
tion for the
same L2VNI in the same VXLAN Fabric.
Overlay
After build
ing a solid foun
dation for the VXLAN net work with the un der
lay, the overlay
concepts are equally impor
tant to provide the re
quired func
tion
al
ity and flexibility.
VXLAN EVPN Control Plane

As an in
dustry standard overlay technol
ogy, VXLAN has seen in creasing adoption in the
data cen ter space. EVPN is the con trol plane for VXLAN and pro vides an effi
cient
method for route learn ing and distri
b
u
tion in the VXLAN over lay network. The rout ing
in
for
mation in
cludes Layer 2 MAC routes, Layer 3 Host IP routes, and Layer 3 sub net IP
routes. EVPN con trol plane also in
troduces multi-ten ancy sup port to the VXLAN over -
lay net
work, as well as a VTEP peer dis covery, se
curity, and authenti
cation mechanism.
This sec
tion is in
tended to pro vide a deeper un der
stand ing of the VXLAN EVPN con trol
plane.
MP-BGP EVPN
EVPN uses MP-BGP as the rout ing pro
to
col to dis
trib
ute reacha
bil
ity in
for
ma
tion for
the VXLAN overlay network, in
clud
ing end
point MAC ad dresses, endpoint IP ad
dresses,
and sub
net reacha
bil
ity information.
EVPN is an other MP-BGP ad dress fam

ily lever
ag
ing simi
lar constructs as the VPNv4 ad -
dress fam ily tra
di
tion
ally de
ployed in MPLS VPN ar chitec
tures. Those con structs in
-
clude VRFs, Route Dis tin
guish
ers (RD) and Route Tar gets (RT). The pe culiar
ity of the
EVPN con trol plane when com pared to VPNv4 is the ca pabil
ity of ex
chang ing not only
IP but also MAC ad dress information.
Virtual Routing and Forwarding (VRF)

Virtual Routing and Forwarding (VRF)

Vir
tual Rout
ing and Forward ing (VRF) de fines the Layer 3 routing do
main for each ten
-
ant supported in the VXLAN Fab ric. In VXLAN EVPN net works, each ten
ant VRF has a
Layer 3 VNI used as a vir
tual backbone for rout ing within the VRF.
Route Distinguisher (RD)

Route Distin
guisher (RD) is the identi
fier of a VRF since each VRF has its own unique RD
in the network. When an EVPN ad vertise
ment is sent out to the peers, the RD of the
VRF to which this route be longs is prepended to the orig i
nal route it
self to ren
der it
unique within the net work. This al
lows dif fer
ent VRFs to use overlap
ping IP addresses
so that dif
fer
ent tenants can have true au tonomy for IP address management. The RD
can be auto
mati
cally de
fined to simplify configuration.
Route Target (RT)

Route Tar get (RT) is an ex tended at tribute in EVPN route up dates used to con trol route
distri
b
u
tion in a multi-ten ant network. EVPN VTEPs have an im port RT setting and an
export RT set ting for each VRF and each L2VNI. When a VTEP ad ver
tises EVPN routes,
it af
fixes its ex
port RT in the route up date. The routes will be re
ceived by other VTEPs
in the net work. These de vices will com pare the RT value carried with the route against
their own local im port RT set ting. If the two values match, the route will be ac cepted
and pro grammed in the rout ing table. Oth er
wise, the route will not be imported. The
RT can be au tomati
cally defined to sim plify configuration.
EVPN Route Types

The EVPN con
trol plane ad
ver
tises dif
fer
ent types of rout
ing information:
• Type-2 - End
point reach
a
bil
ity in
for
ma
tion, in
clud
ing MAC and IP ad
dresses of the
endpoints
• Type-3 - Mul
ti
cast route ad
ver
tise
ment-an
nounc
ing ca
pa
bil
ity and in
ten
tion to use
Ingress Repli
ca
tion for spe
cific VNIs
• Type-5 - IP pre
fix route used to ad
ver
tise in
ter
nal IP sub
net and ex
ter
nally learned
routes onto the VXLAN Fabric
The EVPN route up

date also in
cludes the fol
low
ing information:
• VNID for the L2VNI and VNID for the L3VNI for the ten
ant VRF
• BGP next-hop IP ad
dress iden
ti
fy
ing the orig
i
nat
ing VTEP device
• Router MAC ad
dress of the orig
i
nat
ing VTEP device
Route Reflector Placement

As dis
cussed in the pre vi
ous chap ter, iBGP is the most com mon rout ing proto
col de-
ployed for the EVPN con trol plane in VXLAN Fab rics. With iBGP, there is a re
quirement
to have a full mesh be tween all of the iBGP speak ers. To help scale and sim plify the
iBGP config
u
ra
tion, it is recom mended to im plement iBGP Route Re flec
tors (RR). The
place
ment of the iBGP route re flec
tors is recom mended to be im plemented on the
spines as they are cen tral to all of the leaf switches. In this case, two of the spines will
have BGP route re flector con fig
ured and all of the leaf switches will be con fig
ured as
the BGP route re flec
tor clients. The route re flec
tor will re
flect EVPN routes for the
VTEP leaf switches.
Figure: iBGP Route Reflector Placement

Endpoint Detection and Tracking

A VTEP in MP-BGP EVPN de tects at
tached endpoints via local learning. MAC ad dresses
are learned in the data plane from the in coming Ether
net frames whereas the IP ad -
dress is learned via ARP or Gra tu
itous ARP (GARP) con trol plane pack ets sent by the
endpoint. Al
ter
nately, the learning can be achieved by using a con trol plane or through
man age
ment plane in te
gration be
tween the VTEP and the local hosts.
Once a VTEP de tects its local endpoints, it will in

stall a Host Mo
bil
ity Man
ager (HMM)
route to track it. The VTEP will also con struct an EVPN Type-2 route to ad ver
tise the
learned MAC and IP ad dress of the endpoint to the rest of the VTEPs in the same Fabric.
The EVPN Type-2 route has an em bedded se quence num ber used for end point move-
ment track ing. When an end point moves from one VTEP to an other VTEP, the new VTEP
will de
tect it as a newly attached local host. It will send a new EVPN Type-2 rout ing up
-
date with the reach a
bil
ity in
for
mation for this end
point. When doing so, it will in
cre
ment
the sequence num ber by one. When the rest of VTEPs re ceive the new route with the
higher sequence num ber they will up date their routing infor
mation for the endpoint
using the new VTEP as the next hop.
Layer 2 Logical Isolation (Layer 2 VNIs)

The creation of VXLAN over lay net
works pro vides the logi
cal ab
strac
tion allowing end
-
points connected to differ
ent leaf nodes sep a
rated by mul ti
ple Layer 3 Fab ric nodes to
func
tion as they were con nected to the same Layer 2 seg ment. This log i
cal Layer 2 seg
-
ment is usually re
ferred to as Layer 2 Vir
tual Network Instance (L2VNI).
The VXLAN seg ments are indepen

dent of the underly
ing network topology; like
wise,
the underly
ing IP net
work be tween VTEPs is inde
pen dent of the VXLAN over lay. The
combi
nation of lo
cally de
fined VLANs and their map ping to asso
ci
ated L2VNIs allows
the cre
ation of Layer 2 log
i
cal seg
ments that can be extended across the Fabric.
As with tra
di
tional VLAN de ploy
ments, communi
cation be
tween end points be
long
ing to
sep
a
rate L2VNIs is pos
si
ble only through a Layer 3 rout
ing function.
The sam ple below shows the cre

ation of VLAN-to-VNI map
pings on a VTEP de
vice,
which is usu
ally a leaf node.
vlan 100
vn-segment 30000
vlan 101
vn-segment 30001
Once the VLAN-to-VNI map pings have been de fined, it is then re
quired to asso
ci
ate
those cre
ated L2V
NIs to an NVE log
i
cal in
ter
face, as shown in the config
u
ra
tion sample
below.
interface nve1
no shutdown
source-interface loopback0
host-reachability protocol bgp
member vni 30000
suppress-arp
mcast-group 239.239.239.100
member vni 30001
suppress-arp
mcast-group 239.239.239.101
In the de
f
i
ni
tion of the NVE logi
cal in
terface, the loop
back in
ter
face cre
ated as part of
the underlay con fig
u
ra
tion is spec i
fied to be used for VXLAN en capsu
la
tion and
decapsulation.
It is also re
quired to as soci
ate the EVPN con trol plane to the VXLAN de ploy
ment, in-
stead of the origi
nal flood and learn model. At the time of writ ing, this con
fig
u
ra
tion has
a global scope for a given VXLAN de ployment, hence, it is not possi
ble to mix the two
modes of op er
a
tion (con trol plane or flood and learn based) in the same Fabric.
When mul ti

cast is the de
ploy
ment choice for handling the repli
ca
tion of BUM traf
fic, a
spe
cific mul
ti
cast group is as
so
ci
ated to each de
fined L2VNI. The as sign
ment of multi
-
cast groups to the L2V

NIs is quite flex
i
ble and the cho
sen con
fig
u
ra
tion de
pends on the
fol
low
ing considerations:
• Using a unique mul

ti
cast group for each de
fined VNI would allow the most gran
u
lar
dis
tri
bu
tion of BUM traffic, which will be only flooded to the leaf nodes where that
specific L2VNI is de
fined. On the other side, this de sign choice would drasti
cally in
-
crease the amount of mul ti
cast state in the Fab
ric leaf and spine devices.
• Using a com
mon mul
ti
cast group for all the de
fined L2V
NIs would re
duce at a min
i
-
mum the amount of mul ti
cast state in the core of the network, but would cause
BUM traf fic for a given L2VNI to be flooded to all the leaf nodes even where that
spe
cific VNI is not present (the traf
fic would then be dis
carded by the leaf).
The gen erally rec

ommended ap proach is a bal
ance of the two options above and sug-
gests to assign a com mon multi
cast group to all the L2V
NIs de
fined for a given ten
ant
(VRF); dif
ferent mul
ti
cast groups can in
stead be used across tenants.
Fi
nally, as part of the L2VNI con fig
uration, it is pos
si
ble to en able ARP sup pression. This
removes the need to flood ARP re quests across the Fab ric, which usu ally repre
sents the
large ma jor
ity of L2 broad cast traffic. ARP sup pres
sion can be en abled since each leaf
node learns about all the end points con nected to the Fab ric via the EVPN con trol plane.
When re ceiving an ARP re quest orig i
nated by a lo cally con nected end point trying to
identify the MAC of the re motely con nected end point, the leaf can then per form a
lookup in a local cache pop u
lated upon re cep tion of EVPN up dates. If the MAC/IP in -
for
ma tion for the remote end point is avail able, the leaf can then reply to the local end -
point with the ARP map ping in forma tion on be half of the re mote end point. If the
MAC/IP in formation for the re mote end point is not avail able, the ARP re quest is
flooded across the Fab ric by encap sulating the packet in a VXLAN frame des tined to the
multi
cast group as soci
ated to the L2VNI of the local end point. ARP sup pres
sion can also
be enabled or disabled on a per L2VNI basis.
Because most end points send ARP re quests to announce them selves to the net work
right after they come online, the local VTEP will im medi
ately have the op portunity to
learn their MAC and IP ad dresses and dis trib
ute this in
forma tion to other VTEPs
through the MP-BGP EVPN con trol plane. Therefore, most ac tive IP hosts in VXLAN
EVPN should be learned by the VTEPs ei ther through local learn ing or control plane-
based re
mote learn
ing. As a re
sult, ARP sup
pres
sion re
duces the net
work flood
ing
caused by host ARP learn
ing behavior.
Layer 3 Multi-Tenancy (VRFs and Layer 3 VNIs)
The log i
cal Layer 2 segment cre ated by map ping a lo cally sig
nif
i
cant VLAN with a glob-
ally sig
nif
i
cant L2VNI is nor mally associ
ated with an IP sub net. When end points con-
nected to the L2VNI need to com mu nicate with end points be long
ing to dif
fer
ent IP
subnets, they send the traf fic to their de fault gateway. De ploy
ing VXLAN EVPN al lows
support for a dis
tributed default gateway func tional
ity on each leaf node, a deployment
model com monly re ferred to as Distrib
uted Any cast Gate way. In a VXLAN de ploy
ment,
the vari
ous Layer 2 seg ments de fined by com bining local VLANs and global VNIs can be
asso
ci
ated to a VRF if they need to communicate.
Communi ca
tion be
tween local end
points con
nected to dif
fer
ent L2V
NIs can occur via
nor
mal Layer 3 rout ing in the con
text of the VRF (i.e. no VXLAN en cap
sula
tion is
required).
The deploy
ment of Sym metric In
te
grated Rout ing and Bridg ing (IRB), al
ready intro
-
duced in the Fun damen tal Concepts chap ter, re
quires the in tro
duction of a tran sit
Layer 3 VNI (L3VNI) of fer
ing L3 segmen ta
tion services per ten ant VRF. Each VRF in -
stance is mapped to a unique L3VNI in the net work. Differ
ent L2VNIs for the same ten -
ant are usu
ally as
soci
ated to the same VRF. As a re sult, the in
ter-VXLAN rout ing is per-
formed through out the L3VNI within a par tic
u
lar VRF instance.
The Sym metric IRB model as sumes that the de
fault gate
way for all the L2VNIs is fully
dis
trib
uted to all the leaf nodes. At the time of this writ
ing, the dis
trib
uted gateway
model is the only one sup ported with VXLAN EVPN and can be en abled by ap
plying the
config
u
ra
tion below on all the leaf nodes:
fabric forwarding anycast-gateway-mac 2020.2020.2020
vlan 100
vn-segment 30000
interface Vlan100
no shutdown
vrf member Tenant-1
ip address 192.168.100.1/24 tag 21921
fabric forwarding mode anycast-gateway
The first com mand de fines a com mon vir

tual MAC ad dress to be used for the de fault
gateway. The same value is used for all the IP subnets associ
ated with the L2VNI seg -
ments, in de
pendently from the VRF they be long to (fabric-wide config
u
ra
tion). The
“fabric for
ward
ing mode any cast-gate
way” command is used to en able the distrib
uted
default gate
way functional
ity on the VTEP nodes. This com mand must be ap plied to all
the SVIs for the VLANs that are mapped to L2VNIs.
The con fig

ura
tion shown above must be re peated for all the local VLANs mapped to
L2VNIs if there is a re quirement to route traf
fic to sep
arate IP sub
nets. The SVI is as
so
-
ci
ated to a spe cific VRF in stance (“vrf mem ber” com mand). The use of VRFs pro vides
Layer 3 logi
cal isola
tion, a concept often re
ferred to as “multi-tenancy”.
The ad
di
tional re
quired con
fig
u
ra
tion for each de
fined VRF is shown below.
vlan 2500
name L3_Tenant1
vn-segment 50000
vrf context Tenant-1

vni 50000
rd auto
address-family ipv4 unicast
route-target import auto
route-target import auto evpn
route-target export auto
route-target export auto evpn
interface 2500
description L3_Tenant1
no shutdown
mtu 9216
vrf member Tenant-1
ip forward
interface nve1
member vni 50000 associate-vrf
In MP-BGP EVPN, any VTEP in a VNI can be the Dis trib

uted Any cast Gateway for end
hosts in an IP sub net by sup porting the same vir tual gateway IP ad dress and vir
tual
gateway MAC ad dress. When using Dis trib
uted Any cast Gate way with EVPN, routed
traf
fic from an end point is always processed by the clos est leaf node. This capa
bility
enables op ti
mal forwarding for north bound traf fic from end points in the VXLAN over lay
network. East-West traf fic between end points con nected to the same leaf is lo cally
routed by that leaf. This is es pecially im por
tant for ap pli
cations with rack aware ness
like some Hadoop dis tri
bu
tions. A Dis trib
uted Any cast Gate way also offers seamless
host mo bil
ity in the VXLAN over lay net work. The gate way IP and vir tual MAC address
are iden ti
cally provi
sioned on all VTEPs within a VNI, there fore, when an end host
moves from one VTEP to another VTEP, it doesn’t need to send an
other ARP re
quest to
re-learn the gate
way MAC address.
Figure: Distributed Anycast Gateway
Multicast in the Overlay

At the time of writing of this publi
cation, there is no stan dard cov er
ing the im ple
men-
ta
tion of IP multi
cast in the VXLAN over lay. This implies that if Layer 3 mul ti
cast ser
-
vices are required in the VXLAN over lay net works, the only possi
bil
ity is connecting ex
-
ter
nal multi
cast routers to the Fab ric. Con fig
u
ra
tion of the ex
ternal mul ti
cast routers is
outside the scope of this publication.
Host Connectivity
When con nect ing end

points (bare-metal servers, hyper
vi
sor hosts, network ser
vice
nodes, etc.) to the net
work in a re
dun
dant fash
ion, two op
tions are available:
• Using an Ac
tive/Standby at
tach
ment mode, where the end
point lever
ages one or
more ac
tive links to one leaf switch and one or more standby links to a sec
ond leaf
switch. This en
sures the end point can sur vive the fail
ure of a single leaf switch and
regain net
work connec
tiv
ity sim ply by ac
ti
vating the standby links. This con fig
u
ra
-
tion does not re
quire any spe cific func
tional
ity to be sup
ported on the leaf, as nor -
mal Layer 2 learn
ing and for ward ing can be per formed to de liver traf
fic to the lo
-
cally con
nected endpoints.
• Using an Ac
tive/Ac
tive at
tach
ment mode, sta
tic or dy
namic bundling of phys
i
cal
in
ter
faces using Link Ag gre
ga
tion Control Pro
to
col (LACP). This en sures that all
avail
able links are al
ways ac
tive and used to send and re
ceive traf
fic. This model re-
quires that the leaf switches sup port a Multi-Chassis Link Aggregation (MC-LAG)
func
tion
al
ity to ap
pear as a single log
i
cal en
tity to the lo
cally con
nected end
points.
Cisco Nexus switches offer Vir tual Port-Chan nel (vPC) to achieve this.
Figure: VTEPs and Server Attachment Models

In a VXLAN Fab
ric, there are some ad
di
tional as
pects to consider.
• When the end

point con
nects in Ac
tive/Standby mode, each leaf switch is con
fig
-
ured with an independent VTEP IP ad
dress. Traf
fic orig
i
nated by the server will al-
ways be VXLAN en cap
su
lated and de
cap
sulated by the leaf switch con nected to the
ac
tive port. Re
mote VTEPs will al ways point to VTEP 1 or VTEP 2 (de pend ing on
which link is ac
tive) when remote endpoints need to send traf fic to de
vices lo
cally
connected to those leaf switches.
• When the end
point con
nects in vPC mode (Ac
tive/Ac
tive), the two VXLAN leaf
switches are de ployed as part of the same vPC do main and a com mon Any cast
VTEP is de fined. As a conse
quence, no mat ter which physi
cal uplink is used by the
local endpoint to send the traffic into the network, re
mote VTEPs al ways as so
ciate
the endpoint information to the source Any cast VTEP ad dress. This is criti
cal for
the consis
tent Layer 2 MAC learn ing of the endpoint within the Fab ric, in order to
avoid continu
ous flap
ping of in
formation in the MAC tables of the re
mote VTEPs.
In the Cisco NX-OS im ple

menta
tion, the Any
cast VTEP ad
dress is de
fined as a com
mon
secondary IP address as
so
ci
ated to the VTEP loop back in
ter
face of both VXLAN leaf
switches part of the same vPC domain.
interface loopback0
description VTEP
ip address 10.254.254.102/32
ip address 10.254.254.1/32 secondary
It is worth noting that once a pair of VXLAN switches is con fig

ured as part of a vPC do-
main, the Any cast VTEP is al ways used as next-hop for all the EVPN ad ver
tise
ments rel-
a
tive to di
rectly con nected end points. This is valid also for local endpoints con
nected in
Ac tive/Standby fash ion. The con sequence is that roughly half of the flows des tined to
those de vices may be de liv
ered from the spines to the VTEP de vice connected to the
standby end points (the spines have two equal cost paths to reach the Any cast VTEP IP
address); the traffic would hence have to take an extra hop across the peer-link in
order to be de liv
ered to the ac tive in
ter
face of the endpoint.
This subopti
mal behav
ior can be avoided by group
ing end
points based on the types of
con
nectiv
ity (Active/Standby vs LACP) and con nect
ing them to sepa
rate sets of leaf
switches.
External Connectivity
for VXLAN Fabric
Introduction
In real world data cen ter de

ployments, the Fabric is never an iso
lated envi
ronment and
connectiv
ity with ex
ter
nal net
works is al
ways required. The re quired con nec
tiv
ity typ
i
-
cally depends on which type of ex ter
nal net
works the VXLAN Fab ric is con
nected to.
For exam ple, when connecting to the campus, WAN, or the In ter
net, Layer 3 routing is
nor
mally used. When ex tending Layer 2 out
side of the VXLAN Fab
ric, ad
di
tional con
-
nec
tiv
ity con
sid
er
a
tions are required.
In addi
tion to ex
ter
nal con nec
tiv
ity, the VXLAN Fab ric will typ
i
cally be de
ployed into an
ex
isting data center envi
ron
ment, so in ter
oper
ability with the ex ist
ing net
work and the
abil
ity to mi
grate work loads to the new Fab ric will be very relevant.
This chap
ter provides de
tail on both Layer 2 and Layer 3 exter
nal connectiv
ity to the
VXLAN Fab ric and how to use those concepts to de
ploy a VXLAN Fabric into an ex ist
ing
data center.
Layer 3 Connectivity
Multi-tenancy is one of the primary use cases for deployment of a VXLAN BGP EVPN
Fab
ric. Dif
fer
ent VRFs could be de fined and seg
mented as differ
ent orga
ni
zations, busi
-
ness units, mergers and acqui
si
tions, user-groups, ap
pli
ca
tions, or simply se
curity seg
-
menta
tion and policy enforcement.
In the con text of VXLAN BGP EVPN, each in stance (i.e. VRF/VLAN) is logi
cally isolated,
but phys i
cally in
te
grated into the overall Fabric as a shared infra
struc
ture. When ex -
tending Layer 3 con nectiv
ity out
side the VXLAN Fab ric, two dif
fer
ent scenarios are
usually considered:
1 Ex
tend the log
i
cal iso
la
tion be
tween VRFs into the ex
ter
nally routed do
main. This
sce
nario is typ
i
cally de
ployed when con
nect
ing the VXLAN Fab
ric to the cam
pus
net
work or to the WAN, as shown in the fig
ure below.
Figure: Extending Layer 3 Multi-Tenancy to the External Layer 3 Domain

The bor
der node rep
re
sents the edge of the VXLAN Fab
ric and nor
mally ter
mi
nates
the VXLAN data plane en cap
sula
tion to pro vide Layer 3 hand-off func tion
al
ity to-
ward the edge router. The bor der node role could be im ple
mented on a leaf or
spine switch. The edge router takes care of ex tending multi-ten
ancy con nectiv
ity
across the ex
ter
nal network, leveraging one of the de ploy
ment options discussed in
the sec
tions below. It is worth not ing this model allows full sup
port for overlapping
IP ad
dress space across differ
ent ten ants, provid
ing end-to-end logi
cal isolation.
2 Pro
vide shared ac
cess to a com
mon ex
ter
nal ser
vice. This sce
nario al
lows dif
fer
ent
ten
ants to have com
mon ac
cess to shared re
sources such as the Internet.
Figure: Accessing External Layer 3 Shared Resources
The simple use case shown above does not allow over lapping IP address space across
dif
fer
ent ten
ants, as this merges all the routing in
formation into the “Default VRF” rout
-
ing table. As an extension to the previ
ous ex
am ple, ac
cess to shared re sources may be
provided by front-end ing each ten
ant with a security device. This pro
vides an en
force-
ment point for se curity pol
icy when a ten ant needs to ac cess ex
ter
nal re
sources or to
com muni
cate with other ten ants as shown in the fig ure below.
In this case, it is common to leverage NAT (Network Address Trans

la
tion) func
tion
al
ity
of
fered by a firewall for ten
ants that must sup
port over
lap
ping ad
dress space.
Figure: Secure Access to External Layer 3 Shared Resources
The above di

agrams il
lus
trate how Layer 3 con nec
tiv
ity ex
ternal to the VXLAN Fab
ric is
pro
vided by a bor
der node de vice. The first design deci
sion to make is the place
ment of
the bor
der node. Two main op tions are usually considered:
1 Bor
der node on a leaf de
vice is termed bor
der leaf. This is a nat
ural choice as the
leaf nodes are de
ployed as VTEP de
vices ca
pa
ble of sup
port
ing the re
quired con
trol
plane and data plane func
tion
al
i
ties. De
ploy
ing the VTEP ca
pa
bil
i
ties only on the
leaf nodes keeps the con
fig
u
ra
tion on the spine switches much sim
pler. The spine
pro
vides the Fab
ric back
plane func
tion
al
ity, rout
ing VXLAN en
cap
su
lated traf
fic be
-
tween the leaf nodes. The bor
der leaf only ser
vices north-south communication.
2 Bor
der node on a spine de
vice is termed bor
der spine. This de
ploy
ment op
tion pro
-
vides the ad
van
tage of op
ti
miz
ing the north-south com
mu
ni
ca
tion with ex
ter
nal
re
sources. At the same time, it in
tro
duces the re
quire
ment to de
ploy a spine de
vice
that is ca
pa
ble of sup
port
ing VXLAN con
trol and data plane func
tion
al
ity (VTEP).
The bor
der spine will most likely also serve as BGP Route Re
flec
tor (RR) and Mul
ti
-
cast Ren
dezvous Point (RP). The bor
der spine ser
vices north-south as well as east-
west communication.
A good net
work de sign al
ways pro vides resiliency and redun dancy for key network ele
-
ments. The bor der node per forms a key func tion, in
ter
connecting the VXLAN Fab ric to
the ex
ter
nal network do main, so it is crit
i
cal to ensure re
siliency. It is rec
om
mended to
de
sign the Fab ric with re
dundant bor der nodes and edge routers, each lever ag
ing re-
dundant physi
cal connec
tions, as shown below.
Figure: Redundant Border Nodes and Connections to the Edge Routers
Regard
ing Layer 3 hand-off function
al
ity, it is a fair assump tion that the links be
tween
the border nodes and the edge routers are routed in terfaces. Depending on how Layer
3 commu ni
ca
tion is ex
tended out
side the VXLAN Fab ric, those Layer 3 in
terfaces could
be ded
i
cated for each ten
ant or shared across mul
ti
ple ten
ants. The fol
low
ing sec
tions
pro
vide an overview of the dif
fer
ent de
ploy
ment options. All the sce
narios de
pict a bor
-
der leaf de
ploy
ment, but the same con sid
er
a
tions can be ap plied in the border spine
case.
VRF-Lite Hand-Off
The use of VRF enables the abil
ity to have mul ti
ple rout
ing ta
bles that are com
pletely
in
de
pendent and iso
lated. VRF-Lite represents a com mon and well-known mech a
nism
to ex
tend the ten
ant Layer 3 VRF infor
mation beyond the VXLAN Fabric.
The VRF-Lite ap proach dictates using a two-box so lu

tion where the bor der node and
the edge router are phys i
cally in
de
pendent devices. With VRF-Lite, con nectiv
ity for dif
-
fer
ent tenants from the VXLAN Fab ric is ex
tended ex ter
nally on a hop-by-hop basis.
The border leaf par
tic
i
pates in the VXLAN Fab ric and has the full VTEP con fig
uration to
per
form the VXLAN en capsula
tion and de cap
sula
tion along with rout ing toward the
edge routing device.
For this to hap

pen, the fol
low
ing two re
quire
ments must be met:
• At the con
trol plane level, the bor
der node is re
spon
si
ble for ex
chang
ing per-ten
ant
rout
ing in
for
ma tion between the VXLAN Fab ric and the ex
ter
nal network. The bor-
der node runs IPv4 or IPv6 uni cast rout
ing for each of the ten
ant VRFs with the ex -
ter
nal edge rout ing de
vice to learn the external routes and to adver
tise the Fab
ric
sub
net/host routes to the ex ter
nal net
work. The bor der node also re dis
trib
utes
and adver
tises the exter
nal routes through MP-BGP EVPN to the in ter
nal nodes on
the Fabric.
• The rout
ing pro
to
col used to com
mu
ni
cate with the edge router can be BGP or an
IGP routing pro
tocol of your choice. When using BGP to peer with ex ternal routers,
MP-BGP EVPN au to
mat i
cally im ports the BGP routes learned from the VRF-lite
IPv4 or IPv6 unicast address fam ily into the L2VPN EVPN ad dress family. This rep
-
re
sents a com mon op tion adopted in many real world de ploy
ments. With other
rout
ing pro to
cols, re
distri
b
ution of routes is re quired to en sure routes are ex -
changed be tween the VXLAN Fab ric and the exter
nal router.
When the bor der node learns the exter

nal routes from the edge router, it advertises the
prefixes in
side the VXLAN Fab ric do
main as EVPN Type-5 routes. This in for
ma tion is
dis
tributed to the other VTEP nodes. At the same time, the bor der node is config
ured to
send EVPN routes learned from the L2VPN EVPN ad dress fam
ily to the IPv4 or IPv6 uni-
cast address family and ad
ver
tise them to the ex
ternal edge router.
The sam ple config

u
ra
tion below shows the ex
ample where eBGP with a sub-in
ter
face is
used as routing proto
col be
tween the bor
der node and the edge router.

vni 50000
rd auto
interface Ethernet1/10.100
encapsulation dot1q 100
vrf member Tenant-1
ip address 192.168.5.254/30
router bgp 65500

router-id 10.254.254.200
neighbor 10.254.254.3
remote-as 65500
update-source loopback1
address-family l2vpn evpn
send-community both
neighbor 192.168.1.1
remote-as 65535
prefix-list filter-host-routes out
vrf Tenant-1
advertise l2vpn evpn
ip prefix-list filter-host-routes seq 10 deny 0.0.0.0/0 eq 32

ip prefix-list filter-host-routes seq 20 permit 0.0.0.0/0 le 32
In this ex
ample, the “ad
ver
tise l2vpn evpn” com
mand under the VRF IPv4 ad
dress fam
ily
ensures that:
• All the EVPN Fab

ric in
ter
nal IP pre
fixes are ad
ver
tised from EVPN into the VRF
• All the ex
ter
nal IP pre
fixes learned from the edge router are ad
ver
tised from the
VRF into EVPN
• By de
fault, all the Fab
ric in
ter
nal pre
fixes, in
clud
ing host routes for the con
nected
end points, are ad ver
tised to
ward the edge router. If this is not the de
sir
able be
hav
-
ior, it is pos
si
ble to apply route pol
icy to elim
i
nate host routes.
Sim
i
lar con
fig
u
ration with the ex
cep
tion of the EVPN ad dress family spe
cific commands
must then be ap plied on the edge router to en sure the BGP session can be es tab
lished
with the border node.
At the data plane level, traf

fic for dif
fer
ent ten
ants must be car
ried be
tween the bor der
node and the edge router. This can be achieved by ded i
cat
ing an in
ter
face (log
i
cal or
physi
cal) to each VRF. The available options are:
• Phys
i
cal Routed Ports: this im
plies using a ded
i
cated phys
i
cal in
ter
face for each
tenant
• Sub-In
ter
faces: one log
i
cal sub-in
ter
face can be carved for each ten
ant to carry
traf
fic on the same phys
i
cal connection.
As shown above, it is im portant to note, that for each VRF, man
ual con fig
u
ra
tion is re
-
quired along the en tire Layer 3 path. Since VRF-lite needs to be con figured on a hop-
by-hop basis, scala
bil
ity be
comes a con cern for large numbers of ten
ants/VRFs; this is
the ad
vantage of an MPLS hand-off.
MPLS Hand-Off
In many two-de vice de ployments, the edge router can act as an MPLS-provider edge
node. Alter
na
tively, a sin
gle device solution can be used to ter
minate MPLS and VXLAN
routing on the same de vice. This so lu
tion merges the bor der node and the MPLS
Provider Edge (PE) router func tion
al
i
ties into a sin
gle phys
i
cal de
vice, usu
ally re
ferred
to as the Bor
der PE node. This sce nario is depicted below.
Figure: Single Device Solution with Border PE Nodes
This sec
tion sum
ma rizes the steps for con
fig
ur
ing the Border PE device de
ployed on a
Cisco NX-OS based plat form using man ual con
fig
u
ra
tion, with ref
er
ence to the sim
ple
net
work topology shown below.
Figure: Single Device Configuration Example
The sam
ple con
fig
u
ra
tion below shows a Bor
der PE ex
am
ple configuration.

vni 50000
rd auto
route-target import 65535:1
route-target export 65535:1
Note: The additional route-target have to match the one used in MPLS L3VPN
for each VRF.
interface Ethernet1/10
ip address 192.168.5.254/30
ip router ospf MPLS-CORE
mpls ip
router bgp 65500

router-id 10.254.254.200
neighbor 10.254.254.3
remote-as 65500
update-source loopback1
address-family l2vpn evpn
import vpn unicast reoriginate
send-community both
neighbor 192.168.1.1
remote-as 65535
address-family vpnv4 unicast
import l2vpn evpn reoriginate
vrf Tenant-1
The Bor der PE re-orig

i
nates IP pre fixes from the VXLAN Fab ric EVPN ad dress fam ily to
the MPLS VPNv4 ad dress fam ily and vice versa. The re
quired com mands to achieve this
are “import vpn unicast reo
rigi
nate” or “import l2vpn evpn reo
rig
i
nate” re
spectively in
the opposite ad
dress-family. It is re
quired to use an eBGP peer ing between the Bor der
PE and the MPLS PEs. For the im port and export to MPLS L3VPN, the ap pro priate
route-targets have to be cho sen for each VRF.
LISP Hand-Off
In Active/Ac tive data center de ploy
ments, work load mo bil
ity al
lows ap pli
ca
tions to
move be tween ge ograph
i
cally dis
persed lo
ca
tions. This brings the chal lenge of ingress
route opti
miza tion when the work loads change loca
tion. Lo
cator/Iden ti
fier Separa
tion
Proto
col (LISP) solves this challenge by rout
ing the client traf
fic to the cor rect lo
cation
where the re sources are lo
cated. The routing in
for
mation for LISP does not add any ad -
di
tional prefixes to the un
derlay rout
ing domain.
LISP, as de fined in RFC 6830, is a rout ing archi

tecture that en ables a new par a
digm for
IP ad
dress ing. IP addresses are scoped in two dis tinct name spaces: End point Iden ti
fiers
(EIDs), which are as signed to end-hosts, and Rout ing Locators (RLOCs), which are as -
signed to net work ing devices. The LISP pro to
col pro vides all the mes sag
ing nec essary
to maintain and ac cess a map ping data base in which EIDs are cor re
lated to RLOCs. LISP
uses a map-and-en capsulate forward ing model in which traf fic destined for an EID is
encap
sulated and sent to the RLOC of the de vice through which it is con nected, based
on the re sults of a lookup in a map ping data base. Traffic is sent to a de vice's RLOC
rather than di rectly to the des ti
na
tion EID. This ap proach re lieves the core net work of
the re
spon si
bil
ity of handling EID in for ma
tion. Using this ap proach, the LISP ar chitec-
ture aug ments the cur rent routed in fra
structure to fa cil
i
tate new func tional
ity with
mini
mal dis ruption to the exist
ing network infrastructure.
LISP is a di
rectory of ad
dresses and their lo
ca
tions, not a tra
di
tional rout
ing pro to
col.
LISP uses a de mand-based model where edge-de vices re
quest lo
ca
tion in
formation as
re
quired. This demand model is in contrast with the push model used by rout ing proto
-
cols and results in a re
duced load on the device's hardware ta
bles. LISP has other ad -
vantages noted below:
• Mo
bil
ity: EID portability
• Scal
a
bil
ity: On-de
mand routing
• Se
cu
rity: Ten
ant ID-based segmentation
• DCI: Ingress route optimization
LISP map pings can be clas si

fied to give VPN and ten ant se man tics to each prefix han-
dled by LISP. This clas si
fi
ca
tion is encoded in the LISP con trol plane as stip
u
lated in the
stan dard def
i
n
i
tion of the pro to
col. The LISP data plane also sup ports segmentation of
traffic into multi
ple VPNs. LISP binds VRFs to in stance IDs, and then these IDs are in -
cluded in the LISP header to pro vide data plane (traf fic flow) sep a
ra
tion for single or
multi-hop for ward ing. The LISP multi-ten ancy so lu
tion promises to ex ceed the scal a
-
bil
ity of current seg men ta
tion solu
tions sig
nif
i
cantly be cause it uses on-de mand rout -
ing and does not re quire main te
nance of tra
di
tional rout ing adjacencies.
Figure: Border Spine and LISP Hand-Off
Border Spine and LISP Hand-Off

Border Spine and LISP Hand-Off

This sec
tion fo
cuses on a de sign where the spine de vice acts as a border node and sup -
ports LISP handoff. Since all of the spines are connected to the WAN edge routers, this
al
lows us to have ECMP from the spine de vices to given exter
nal sites. This al
lows hosts
connected to the VXLAN Fab ric to communi
cate to the ex ter
nal sites. Although we
focus on the sce
nario where the spines act as bor der nodes, a sim i
lar de
sign can be im-
plemented on border leaf nodes.
In this sce
nario, the spine device acts as a LISP xTR. A LISP xTR refers to a device that
can act as both a LISP Ingress Tun nel Router (ITR) and a LISP Egress Tun nel Router
(ETR). With LISP, regu
lar IPv4/IPv6 host routes origi
nat
ing from the data center are not
adver
tised which helps op ti
mize the rout
ing table.
LISP has a map ping data

base sys tem that keeps track of routes learned from all spine
devices in the EVPN Fab ric. LISP also tracks addresses on remote sites and adds them
to the map ping database. Routes learned from leaf VTEPs are added into the Rout ing
In
formation Base (RIB) at the xTR. LISP selects these routes from the RIB and adds them
dynami
cally to the mapping data base as Loca
tor-Identity mappings.
The spine ad

vertises a de
fault route to at
tract north
bound traf
fic from the leaf VTEPs.
When a leaf VTEP re ceives a packet and it does not have a spe cific route, it sends the
packet to the spine using the over lay. The spine de
cap
su
lates the pack ets, performs a
lookup in the LISP map ping data base, does a LISP encapsu
la
tion and for wards the
packet north
bound across the WAN.
The spine de vices continue to have L3V NIs config

ured, as they act as VTEPs for north -
bound traffic com ing from at tached leaf de vices. They would also act as tun nel end
-
points for south bound traffic com ing from the re mote LISP xTR des tined to the hosts
con
nected to the leaf. The spine de vices would not need to be con figured with L2VNIs
and has the ad van
tage of al
low ing Layer 3 multi-pathing across the VXLAN Fabric.
North-South Traf
fic with VXLAN Host in the POD
In this sce
nario, packet for
ward
ing in
volves two encapsulations:
1 LISP en
cap
su
la
tion be
tween the ex
ter
nal sites and the bor
der spine
2 VXLAN en
cap
su
la
tion be
tween the bor
der spine and the leaf
The fol
low
ing sce
nario dis
cusses host de
tec
tion and packet forwarding:
1 The VXLAN Fab

ric can be con
nected across an IP cloud to con
nect to ex
ter
nal sites
for north-south traf
fic using LISP. Mak
ing use of the Bor
der PE provider edge so
lu
-
tion to con
nect the data cen
ters and ex
ter
nal sites using LISP.
2 In the VXLAN Fab

ric, the host routes and MAC ad
dress in
for
ma
tion are dis
trib
uted
in the MP-BGP EVPN con
trol plane from the leaf nodes, which means that the Fab
-
ric it
self per
forms the host de
tec
tion. The LISP site gate
ways use these host routes
for trig
ger
ing the LISP en
cap
su
la
tion and de-encapsulation.
3 When the LISP site gate

way (Bor
der PE, also run
ning MP-BGP EVPN in the Fab
ric)
de
tects this host based on the route re
ceived in BGP, it sends a map-reg
is
ter mes
-
sage to the map sys
tem data
base to reg
is
ter the new IP ad
dress in its own data
center
4 When re
mote sites want to talk to the data cen
ter hosts, they send an in
quiry to the
map
ping sys
tem re
quest
ing the lo
ca
tion of the host. The map
ping sys
tem replies
with the lo
ca
tion of the LISP site gate
way where the des
ti
na
tion EID is located.
5 Com
mu
ni
ca
tion is then es
tab
lished be
tween the re
mote client and the data cen
ter
host lever
ag
ing the LISP and VXLAN tech
nolo
gies as de
scribed earlier
Layer 3 Connectivity Summary

Layer 3 Connectivity Summary

Ex
ternal Layer 3 con
nec
tiv
ity from a VXLAN Fab
ric can be achieved using three dif
fer
-
ent technologies.
• VRF-lite pro
vides an IP hand-off using sub-in
ter
faces with IEEE 802.1Q tags to sep
-
a
rate the VRFs
• MPLS uses VPN la
bels to sep
a
rate traf
fic on a per-VRF basis
• LISP uses an IP-in-IP en
cap
su
la
tion and in
stance-ID to seg
re
gate the VRFs
Layer 2 Connectivity
There are two major use-cases for Layer 2 hand-off and con nectiv
ity. The first is for
mi
gra
tion sce
nar
ios, where the VXLAN Fab ric needs to be con nected to an ex ist
ing
non-VXLAN net work infra
struc
ture. The sec ond is the ex ten
sion of Layer 2 broad cast
do
mains between sepa
rate VXLAN Fab rics, referred to as multi-site.
Layer 2 vPC Hand-Off

In this sce
nario, the VXLAN Fabric is connected to an exter
nal Layer 2 network via Eth-
er
net 802.1Q VLAN trunks. This ex ternal Layer 2 net
work will be referred to as a Clas
si
-
cal Ether
net (CE) POD. Layer 2 connectiv
ity can be ex
tended be tween the two en vi
ron-
ments, taking the form of an L2VNI on the VXLAN Fab ric and a tra
di
tional VLAN on the
CE POD.
A vPC border node pair on the VXLAN Fab ric can be used as re
dun
dant Layer 2 gate
way
for the hand-off. In this case, the two en
vi
ronments can be connected via a vPC with
out
in
tro
ducing loops to the extended Layer 2 networks.
Figure: Traffic Flow Between a VXLAN Fabric and a CE POD

In the il
lus
tra
tion above:
• BL1 and BL2 pro

vide Layer 2 bor
der leaf func
tion
al
ity for the VXLAN Fabric
• The bor
der leafs form a back-back vPC to re
dun
dantly con
nect with the CE pod ag
-
gre
ga
tion switches
• The as
sump
tion is that the ag
gre
ga
tion layer switches sup
port vPC
vPC Unicast Communication

With vPC, the pair of bor der leaf switches shares a single vir
tual VTEP IP address and
MAC ad dress. This al
lows both devices to han
dle the for
ward ing and re
ceipt of uni
cast
traffic.
When the VTEPs learn MAC reach a

bil
ity informa tion for devices in the CE POD, they
in
ject this infor
mation into the EVPN Fab ric con trol plane. They as so
ci
ate the vir
tual
VTEP IP ad dress to the end point MAC ad dresses con nected to the CE pod. This en sures
that the other VTEPs within the VXLAN Fab ric receive this information and pro gram it
in their Layer 2 for warding ta
bles. Any leaf in the VXLAN Fab ric can reach re sources
connected to the CE pod by en cap
sulat
ing traffic in VXLAN pack ets destined to the sin
-
gle vir
tual VTEP next-hop ad dress. This im plies traf fic in the underlay Fabric net
work
can be load-bal anced across Equal Cost Mul ti
path (ECMP) paths. In the event of a fail -
ure of a bor der leaf node, mini
mal im pact is ob served given the re dundancy built into
the Fabric. MAC ad dresses learned from the CE POD in the data plane are synched
across the vPC peer link so both bor der leaf switches are ca pable of for
ward ing uni
cast
Layer 2 traffic di
rected to
wards them.
The border leaf switches are aware of the IP and MAC ad dresses of all the endpoints
connected to the VTEPs in the VXLAN Fab ric, so traf
fic re
ceived from the CE POD can
be VXLAN en capsu
lated and for
warded in
side the Fab ric to
wards the des ti
na
tion VTEP.
vPC Multicast Communication

vPC Multicast Communication

For BUM traf
fic han
dling, the bor
der leaf sim
ply floods the packet to the vPC which
sup
ports an Ac tive-Standby model for multi-des ti
na
tion packet for
warding. Only one of
the vPC peers is se lected as the des
ig
nated forwarder on a per group basis and is re -
sponsi
ble for for
warding the BUM traffic to avoid cre
at
ing mul
ti
ple copies of the same
packets.
When a de
vice in the VXLAN Fab
ric sends a mul
ti
cast packet:
• Both vPC bor

der nodes re
ceive the mul
ti
cast traf
fic en
cap
su
lated in VXLAN
• The des
ig
nated for
warder will de
cap
su
late the packet and for
ward it to the CE POD
• The non-des
ig
nated for
warder switch drops the ingress packet
When a de
vice in the CE POD sends a mul
ti
cast packet:
• When the packet reaches the bor

der nodes, one of the vPC peers is des
ig
nated as
the for
warder for that VLAN. That vPC peer will take re
spon
si
bil
ity for en
cap
su
lat
-
ing the mul
ti
cast packet into the VXLAN EVPN Fabric.
Loop Prevention
As Layer 2 is extended out side the VXLAN Fab ric, it is important to re
mem ber that the
bor
der node par tic
i
pates in the VXLAN Fab ric, both from a con trol and data plane per -
spective. VXLAN does not cur rently provide any in tegration with Span ning Tree (STP),
mean ing VXLAN does not for ward BPDUs across the Fab ric. Therefore, es
tab
lish
ing re
-
dundant Layer 2 con nec
tions between the VXLAN Fab ric and the exter
nal network may
re
sult in the cre
ation of a loop as highlighted in the fig ure below.
Figure: Creation of a Layer 2 Loop
In order to have a multi-homed loop-free topol ogy, Cisco rec

om
mends using vPC for
the south
bound connectiv
ity of Edge De
vices as shown below.
Figure: Layer 2 Loop-Free Topology with vPC
Since the bor der nodes are already partic

i
pat
ing in the Layer 3 hand-off, it is a nat ural
choice to lever age them to extend Layer 2 con nectiv
ity out
side the Fab ric. The recom -
men dation is to enable vPC on the bor der node and pro vi
sion par al
lel in
ter
faces be -
tween the bor der node for the Layer 2 VLANs that need to be ex tended. It is im por
tant
to note, that the bor der node does not have to be used for Layer 2 ex tension, and it is
possi
ble to leverage another pair of leaf switches for this func tion. It is not rec om -
mended to lever age the border spine for the Layer 2 con nectiv
ity be
tween lo cations as
the spine de vices should be in
de
pendent nodes.
Integration and Migration
Green field scenar

ios do not require much focus on in te
gra
tion with legacy technolo
-
gies. This sec tion fo
cuses on brown field sce
nar
ios, using the in
for
mation pro
vided in
pre
vious sections of this chapter.
Brownfield data centers typ

i
cally in
te
grate new net
work tech
nolo
gies, VXLAN Fab
rics
are no ex
ceptions, using one of the fol
low
ing methodologies:
• Layer 2 POD expansion: the new VXLAN Fab

ric will be con
nected with the ex
ist
ing
net
work using Layer 2.
• Layer 3 POD addition: the new VXLAN Fab
ric will be con
nected with the ex
ist
ing
net
work using Layer 3
Since the im

ple
menta
tion of the new VXLAN Fab ric as an ad
di
tional Layer 3, POD in the
DC does not typi
cally in
volve migrat
ing work
loads from the ex ist
ing net
work to the
new Fabric. The next section will focus on the first use case, de
ploy
ing a VXLAN Fab
ric
as a Layer 2 ex
ten
sion of an ex
ist
ing network.
Expansion of an Existing POD with a VXLAN Fabric

This sec
tion provides an overview of mi grating to a VXLAN Fab ric from an exist
ing net
-
work that could have been built lever ag
ing vPC, Fab ric
path or tra
ditional STP technolo
-
gies. The VXLAN Fab ric is built as a new POD de ployment and the scope does not cover
conver
sion of the ex ist
ing network de vices into VXLAN nodes. The goal is to pro vide
net
work integra
tion and a path for the mi gration of endpoints and ser vices to the new
VXLAN Fab ric with min i
mal service disrup tion. Once the integration is complete, the
Layer 3 and L4-L7 ser vices could op tionally be migrated to the VXLAN Fabric.
Figure: Layer 2 and Layer 3 Interconnect to Assist Integration and Migration
Layer 2 Interconnect
Using the tech
niques de
scribed ear
lier in this chap
ter, the new VXLAN Fab ric can be in
-
ter
con
nected with the ex ist
ing net
work, lever ag
ing vPC and loop pre vention tech-
niques such as BPDU Guard, Root Guard and storm con trol to de
liver a redun
dant
Layer 2 path be
tween the two environments.
At the VXLAN Fab ric bor

der nodes, the same VLAN IDs need to be used in order to map
Layer 2 seg ments from the exist
ing network to L2VNIs to es
tab
lish Layer 2 connectiv
ity.
Vir
tual ma chines and endpoints can now be seam lessly moved from the ex ist
ing net -
work to the new VXLAN Fab ric with min i
mal im
pact. The endpoints in the VXLAN Fab -
ric will still have Layer 2 con nectiv
ity with the endpoints that have not yet been mi -
grated. The default gate

way for all Layer 2 seg
ments still re
sides in the orig
i
nal net
work
at this point.
Moving Endpoints to the VXLAN Fabric

Once Layer 2 con nectiv
ity between the legacy net work and the new VXLAN Fab ric is op-
er
a
tional, work
loads can be mi grated. Migrat
ing phys
i
cal servers will typ
i
cally re
quire re-
ca
bling and a ser
vice disruption for the server being migrated. On the other hand, vir tual
machines can be migrated over live mi gration with
out any notice
able network impact.
Moving L4-L7 Network Services to the VXLAN Fabric

L4-L7 services appli
ances are es sentially end
points, so they may be mi grated in the
same way as the server work loads. It may be pos si
ble to migrate vir
tual L4-L7 ap pli
-
ances without dis
rup tion de
pend ing on their capa
bil
i
ties and config
u
ra
tion. In the case
of phys
i
cal ap
pli
ances, high availabil
ity fea
tures such as cluster
ing can help to min i
mize
dis
rup
tion and allow for mi gra
tion to the new Fabric.
You can find addi

tional de
tails about how L4-L7 ser
vices can be con
nected to a VXLAN
Fab
ric in the chap
ter Layer 4-Layer 7 Services.
Moving the Default Gateway

Once all end points have been mi grated to the VXLAN Fab ric, it would be sub opti
mal to
still have the de fault gate
way in the ex
ist
ing network. The next step in the mi gration is
to move the de fault gateway to the new Fab ric. The new Fab ric must have an ex ist
ing
Layer 3 up link into the network core to pre serve con nec
tivity to the ex
ist
ing routed
network.
This mi
gra
tion is a four-step process:
1 Dis
able the de
fault gate
way in the ex
ist
ing network
2 Con
fig
ure the gate
way IP ad
dress as a Dis
trib
uted Any
cast Gate
way in the new
VXLAN Fab
ric. By using the MAC ad
dress of the orig
i
nal de
fault gate
way, the end
-
points do not need to re-ARP for the new de
fault gateway
3 En
sure that the sub
net is ad
ver
tised up
stream to the Layer 3 net
work core
4 Re
move the de
fault gate
way and rout
ing con
fig
u
ra
tion from the ex
ist
ing network
Note that al though step 3 can be con fig

ured in ad vance, step 2 needs to be done se-
quentially after step 1. There will be a short out
age from step 1 until the anycast gate
-
way is fully functional in the VXLAN Fab ric and all end
points have learned the MAC ad -
dress for the new gateway.
Re
learning the gate way's MAC ad dress is not re quired if the anycast gate way in the
VXLAN Fab ric can over take the same MAC ad dress that the old default gateway had.
One restric
tion is that, as it has been de scribed in previ
ous chapters, there is a single
MAC ad dress for the whole Fab ric for all Distrib
uted Anycast Gateway, so if the default
gate
ways in the legacy net work had mul ti
ple MAC ad dresses (for example if multi
ple
HSRP or VRRP groups were used), a mi gra tion where the MAC ad dress of the de fault
gate
way stays the same will not be possible.
In case there are still end points in the orig

i
nal net
work, they will have to use the any -
cast gate way in the VXLAN Fab ric to communicate with other net work segments. The
ARP sup pression mech a
nism may cause traf fic black
holing so it should not be en abled
until all endpoints have been mi grated. That is the reason why, in order to reduce the
com plexity of the mi gra
tion, the recommendation is to com pletely migrate all end-
points from the legacy net work to the new VXLAN Fab ric. Once that is done, the whole
VXLAN con fig
ura
tion can be re
moved from the legacy network.
At this point, com muni

cation be
tween sub nets where the de fault gate
way is still in the
legacy network and sub nets whose de fault gate way has already been mi grated to the
VXLAN Fab ric is sub
opti
mal and can po ten tially fol
low asymmet ric paths. There fore, it
is rec
ommended to com plete the migration of all end points and seg ments from the old
network to the new VXLAN Fab ric in a period of time as short as possible.
Layer4-Layer7
Services
Introduction
This chapter provides an overview of Lay

er4-Lay
er7 ser
vices, de
ploy
ment mod
els, a
focus on design and on de
ploy
ment use-cases.
A VXLAN Fab
ric pro
vides Layer 2 and Layer 3 con
nec
tiv
ity; how
ever, ad
di
tional ser
vices
are required in the data cen ter. These services are provided by dedi
cated appli
ances
(physi
cal or vir
tual), and re
quire con
nectiv
ity to the fab
ric. These dedi
cated functions
are referred to as Layer4-Lay
er7 services.
Tradi
tional hi
er
ar
chi
cal net
work designs connect Layer4-Layer7 services at the ag
gre
-
ga
tion layer. Within a VXLAN Fab ric, Lay
er4-Layer7 appli
ances can be con nected to any
leaf switch or connected to a ded
i
cated leaf pair re
ferred to as a “ser
vice leaf".
There are dif

fer
ent con
nec
tiv
ity op
tions for the phys
i
cal and virtual ap
pli
ances. The fol
-
low
ing sec
tion dis
cusses dif
fer
ent options for con
nectivity for the Lay
er4-Lay er7 ser
-
vices devices.
Layer4-Layer7 Device Types

Depending on the re
quire
ments, mul
ti
ple Lay
er4-Layer7 ser
vices may be imple
mented
to pro
vide a complete net
work and ser
vice func
tion stack. These func
tions in
clude the
following:
• State
ful Layer 4 firewalling: Many or
ga
ni
za
tions im
ple
ment net
work se
cu
rity on
dedi
cated firewalls where complex firewall poli
cies are en
forced. The fire
wall poli
-
cies per
mit or deny com mu
ni
ca
tion between differ
ent orga
ni
za
tional or ap
pli
ca
tion
tiers. There are many other func tions fire
walls can per form such as Net work Ad -
dress Trans la
tion (NAT).
• Ap
pli
ca
tion Firewalls: Most at
tack vec
tors today focus on the ap
pli
ca
tion. The at
-
tacks lever
age stan
dard TCP ports to ex ploit appli
ca
tion vulner
a
bil
i
ties. Ex
am ples
in
clude SQL Code In jec
tion or Cross-Site Scripting. Appli
ca
tion-level firewalls can
help pre
vent these types of mod ern day attacks.
• In
tru
sion De
tec
tion (IDS) / In
tru
sion Pre
ven
tion (IPS): The so
lu
tion de
tects at
-
tacks and prevents sys
tems from being com pro
mised. It also pre
vents a compro
-
mised system from origi
nat
ing sus
pi
cious net
work ac
tiv
ity. Ex
amples are net
work
re
connais
sance with ping sweeps and port scans.
• WAN Optimization: The goal of this ser
vice is to im
prove the user ex
pe
ri
ence
through tech niques such as op
ti
miza
tion of the TCP stack, com
pres
sion, and con
-
tent caching.
• Ap
pli
ca
tion De
liv
ery Con
trollers (ADC): The ADC in
cludes server load bal
anc
ing,
SSL offload and other appli
ca
tion function
al
ity. ADCs can be de
ployed by them
-
selves or in tan
dem with other service nodes.
Some Lay er4-Layer7 ap

pli
ance ven dors might in
te
grate sev eral of these above cat
e
-
gories in a sin
gle prod
uct such as FW and IPS. In ad di
tion, another commonly used
term is a ser
vice-chain, when mul ti
ple Lay
er4-Layer7 devices are imple
mented in se
-
quence, such as WAN op ti
miza
tion, FW and ADC.
Deployment Models
In addi
tion to the func
tional
ity of the Layer4-Layer7 ser
vices, an impor
tant fac
tor to
consider is how to deploy the ser vice ap
pli
ances. The fol
lowing sec
tion de
scribes dif
-
fer
ent deploy
ment mod els for Layer4-Layer7 services.
Virtual vs Physical
Layer4-Lay
er7 ser
vices come in dif
fer
ent form factors in
clud
ing physi
cal and vir
tual ap
-
pli
ances. There are cer
tain con
sid
era
tions re
quired for vir
tual ap
pli
ances, in
cluding the
following:
• With vir
tual ap
pli
ances, there is typ
i
cally a vir
tual switch be
tween the phys
i
cal leaf
and the VM host
ing the ser
vices appliance
• Vir
tual ser
vices have dif
fer
ent NIC re
dun
dancy mod
els; these func
tions are pro
-
vided by the hypervisor
The deci
sion whether to use virtual or phys
i
cal ap
pli
ances re
quires ad di
tional con
sid
er
-
a
tions in
cluding that phys
i
cal ap
pli
ances are gen er
ally spe
cial
ized hard ware which of-
fers better perfor
mance than generic x86 plat forms, particu
larly with en cryp
tion
services.
Transparent vs Routed
There are two de ploy
ment models with ser vice appli
ances, trans
parent mode and
routed mode. In trans
par
ent mode, the ser
vice ap
pli
ance is de
ployed as a bump-in-the-
wire and does not change any MAC in for
mation. With transpar
ent mode, a fail safe
mecha
nism needs to be im
ple
mented to prevent Layer 2 data plane loops.
Figure: Layer4-Layer7 Service in Transparent Mode
On the other hand, routed de ployments are not prone to Layer 2 loops be cause they
fol
low IP routing se
man tics. Lay
er4-Layer7 ap
pli
ances inserted in routed mode can par -
tic
i
pate with dy namic rout ing proto
cols. The ben
e
fit of imple
ment ing a dy
namic rout-
ing proto
col is that it al
lows for Route Health Injec
tion (RHI) that influ
ences the ingress
routing path to the ser vices appliance.
Figure: Layer4-Layer7 Service in Routed Mode
One-arm vs Two-arm designs

One-arm vs Two-arm designs

Fire
walls have two or more in ter
faces, an in
ter
nal in
ter
face, and an ex
ter
nal inter
face.
ADC can be con nected in a two-arm or one-arm mode; one-arm mode im plements a
sin
gle logi
cal or phys
i
cal in
ter
face. The ADC typ i
cally im
plements Network Ad dress
Transla
tion (NAT) to en sure that the return traf
fic is sent back to the orig i
nal ADC
appliance.
The fol
low
ing fig
ure il
lus
trates the one-arm de
sign option.
Figure: Layer4-Layer7 service in one-arm mode
Physical Connectivity
Layer4-7 ser
vices have dif
fer
ent con
nec
tiv
ity and re
dun
dancy de
ploy
ment mod
els, as
dis
cussed below.
• No redundancy: one log
i
cal in
ter
face maps to one phys
i
cal in
ter
face, re
sult
ing in a
sin
gle net
work connection
• Re
dun
dancy at the NIC level (port-channel): one log
i
cal in
ter
face maps to mul
ti
ple
phys
i
cal in
ter
faces. These two inter
faces are con
fig
ured as a sin
gle port-chan
nel
con
nected to a sin
gle leaf switch
• Re
dun
dancy at the NIC and switch level (vPC): one log
i
cal in
ter
face maps to mul
ti
-
ple phys
i
cal in
ter
faces. These two in ter
faces are con
fig
ured as a sin
gle port-chan-
nel con
nected to two dif fer
ent leaf switches. The two dif
fer
ent switches are imple
-
mented as a vPC pair.
Figure: Physical Connectivity Options
Redundancy Model
Dif
fer
ent re
dundancy models will have an im
pact on how the net
work will be
have in
case of an Lay
er4-Lay
er7 ap
pli
ance outage:
• No redundancy: This mode is some
times used for non-crit
i
cal en
vi
ron
ments, and is
typ
i
cally de
ployed in con junc
tion with vir
tual Lay
er4-Lay
er7 ap
pli
ances that lever
-
age High Availabil
ity fea
tures of the hypervisor.
• Ac
tive/Standby: Two Lay
er4-Lay
er7 ap
pli
ances are de
ployed, and one of them
handles all traf
fic. When the ac tive device fails, the standby device will become ac-
tive. The net work con verges away from the failed ap pli
ance while the pre vi
ous
standby node be comes ac tive. With the ac tive / standby model, traf fic flows are
deter
minis
tic and this simpli
fies the for
ward ing path through the network.
• Clus
ter
ing (Active/Active): There are two dif
fer
ent mod
els of clus
ter
ing, where all
ser
vices appli
ances are serving the workload. While one model uses the ap
proach
of a local port-chan nel per services ap
pli
ance, the sec
ond model repre
sents the
ser
vices clus
ter as a sin
gle port-channel.
Integration into the VXLAN Fabric

Integration into the VXLAN Fabric

In most cases the Lay er4-Lay er7 appli
ance is seen by the fab ric as an endpoint. This
does not require any ad di
tional con trol plane interaction with the fab
ric. However, in
some cases, the Lay er4-Layer7 vendor has im ple
mented VXLAN en cap
su
la
tion support
to the ser
vice appli
ance. This pro vides the flexi
bil
ity to lever
age VXLAN for data plane
in
te
gra
tion. In this sce
nario, the service appli
ance would act as a VTEP.
When con sid

ering in
te
grating Lay
er4-Lay er7 ser
vice appli
ances into a VXLAN Fab ric,
the implemen tation de
tail needs to align between the two. For ex am ple, if the VXLAN
Fabric is running with a BGP EVPN con trol plane, the ser
vice appli
ance needs to sup -
port this deploy ment model also. Within this book, use of a ser vice appli
ance as a VTEP
is not considered.
Use Cases
There are mul ti

ple options that are pos si
ble to de
ploy Layer 4-Layer 7 ser vices in a
VXLAN Fab ric: phys i
cal sin
gle-arm, vPC-based, ADC de ploy
ment in ac tive/standby
mode, vir
tual ac tive/active fire
walls in routed mode, transparent vir
tual in
trusion pre
-
ven
tion sys
tems, etc. The fol low
ing sections focus on the most fre
quent use cases.
Firewall as Default Gateway

Using the fire
wall as the de
fault gate
way is one of the sim
plest use cases.
In this de
sign, the VXLAN Fabric pro
vides a Layer 2-only ser
vice. All com
mu
ni
cation
that requires cross
ing the Layer 2 demar
cation must be sent to the fire
wall to be
routed.
For example:
vlan 1100
name WEB
vn-segment 30100
vlan 1101
name APPLICATION
vn-segment 30101
vlan 1102
name DATABASE
vn-segment 30102
The firewall will have a log

i
cal Layer 3 in ter
face in each VNI that will serve as the de
-
fault gate
way for all endpoints. Rout ing between IP subnets, rep
re
sented by a VNI, has
to flow through the fire wall. The firewall becomes the Layer 3 gateway for all VNIs for
the VXLAN Fabric.
Figure: Firewall as a Default Gateway with a Layer 2 VXLAN Fabric
For ex
am ple, an ASA fire
wall with four phys
i
cal ports grouped in two log
i
cal port-
channels:
int po10.1100
vlan 1100
nameif WEB
security-level 100
ip address 192.168.110.1 255.255.255.0
int po10.1101
vlan 1101
nameif APPLICATION
security-level 100
ip address 198.168.111.1 255.255.255.0
int po10.1102
vlan 1102
nameif DATABASE
security-level 100
ip address 198.168.112.1 255.255.255.0
int po20
nameif OUTSIDE
security-level 50
ip address 192.168.100.255 255.255.255.0
The firewall becomes the sin gle point for in ter-subnet com mu nication in the fab
ric,
conse
quently, it is important to prop erly size the appli
ance for resilient, per
for
mance,
and scale reasons. When a fail ure occurs in an active/standby de ploy ment, the newly-
ac
tive fire
wall will no
tify the network of the change, nor mally send ing GARP (gra tu
itous
ARP) or RARP (re verse ARP) pack ets. These will trigger the re-learn ing of the MAC ad -
dresses on the ports con nected to the standby firewall.
Transparent Firewall Insertion

Another popular op
tion for deploying firewalls is to trans
par
ently in
sert the fire
wall
into the net
work, between the server's de fault gateway and the server it
self. Some rea
-
sons to use transpar
ent fire
wall in
sertion include:
• Abil
ity to add fire
wall ser
vices with
out chang
ing ex
ist
ing IP ad
dress
ing of the
servers
• Mul
ti
cast streams can eas
ily tra
verse the firewall
• Non-IP traf
fic can be for
warded via the firewall
• Pro
to
cols such as HSRP and VRPP can pass through the firewall
• Rout
ing Pro
to
cols can es
tab
lish ad
ja
cen
cies through the firewall
From a log
i
cal standpoint, the fab
ric is the default gate
way for the servers. For exam
ple,
the servers are deployed in the 192.168.100.
0/ 24 subnet and the VXLAN Fab ric any
cast
gate
way is config
ured as the server's de fault gate
way of 192.168.100.1.
The firewall needs to be inserted trans par

ently into the datapath. In
stead of the servers
being de ployed in the same VLAN/VNI as the de fault gateway, the servers will be con-
fig
ured in a differ
ent VLAN/VNI. For ex ample, the default gate
way resides in VLAN 100
(unprotected), while the servers are being placed in VLAN 1100 (pro tected). The fire
wall
in transparent mode is stitch ing both VLAN/VNI to gether, mean ing the fire
wall is in
the data path between VLAN 100 and VLAN 1100. When ever a server re quires reaching
the default gateway, the traf
fic has to pass the firewall.
Figure: Transparent Firewall
The firewall enforces the se

cu
rity poli
cies applied for data passing between the pro -
tected and un protected VLANs and main tains the ap propri
ate forwarding between
them. This de sign can be used to deploy a micro-segmenta
tion service in
side of a sub
-
net for servers that might have been com promised. As an ex ample, in
stead of read-
dressing the servers you can dy nami
cally move them be hind a fire
wall and isolate the
in
fected hosts from the rest of the fabric.
Ex
am
ple:
vlan 100
name UnProtected-SVI
vn-segment 30000
vlan 1100
name Protected-VLAN
vn-segment 31000
interface Vlan100
no shutdown
vrf member Tenant-1
no ip redirects
ip address 192.168.100.1/24 tag 21921
fabric forwarding mode anycast-gateway
In this con
figu
ra
tion, VLAN 100 (unpro
tected) is the out
side in
ter
face and VLAN 1100
(pro
tected) is the in
side interface.
The fire
wall con
fig
u
ra
tion to stitch VLAN 100 to VLAN 1100 would be as follows:
firewall transparent
int po10.100
vlan 100
nameif sviVLAN
bridge-group 1
security-level 0
int po10.1100
vlan 1100
nameif serverVLAN
bridge-group 1
security-level 100
Integrating Layer 3 Firewall - Multi-Tenancy

Integrating Layer 3 Firewall - Multi-Tenancy

A com mon require
ment is to provide se
curity pol
icy for in
ter-ten ant traf
fic and for ac
-
cessing shared-services in a ded
i
cated VRF. As we have seen, the VXLAN Fab ric pro
-
vides multi-tenancy through MP-BGP and VRF tech nologies. Multi-ten ant com mu ni
ca
-
tion is routed throughout the VXLAN Fab ric and tenant isola
tion is maintained.
A Layer 3 firewall in

volves sep a
rat
ing differ
ent se
cu
rity zones using different sub
nets.
The firewall routes traffic be
tween sub nets and applies the fire
wall rules.When inte
-
grating Layer 3 firewall in a VXLAN EVPN Fab ric using Distrib
uted Any cast Gate
way,
each of these zones must cor re
spond to a VRF on the fab ric. The traf
fic within a VRF
will be routed by the fabric and traffic be
tween the VRFs will be routed by the firewall.
Figure: Layer 3 Firewall Traffic Flow
The example below shows a con

fig
u
ra
tion snip
pet from VTEP A run
ning OSPF with the
firewall.
SVIs are defined on VTEP for both IN SIDE-VRF and OUT SIDE-VRF and the VTEP will
peer with a fire
wall on each of these VRF to dy
nam
i
cally learn rout
ing in
for
ma
tion to go
from one VRF to the other.
FIREWALL Configuration:
int po10.3001
vlan 3001
nameif OUTSIDE
security-level 50
ip address 10.30.1.2 255.255.255.252
int po10.3002&#; vlan 3002&#; nameif INSIDE&#; security-level

100&#; ip address 10.30.2.2 255.255.255.252&#; router ospf 1
network 10.30.1.0 255.255.255.0 area 0
network 10.30.2.0 255.255.255.0 area 0
VTEP A Configuration
interface VLAN 3001

description outside_vlan
vrf member OUTSIDE-VRF
ip address 10.30.1.1/30
ip router ospf 1 area 0
interface VLAN 3002

description inside_vlan
VRF member Tenant-1
ip address 10.30.2.1/30
ip router ospf 1 area 0
router bgp 65500

vrf OUTSIDE-VRF
redistribute ospf 1 route-map OSPF_OUT

vrf Tenant-1
address-family ipv4 unciast
redistribute ospf 1 route-map OSPF_TENANT1
Inspecting these routes on VTEP 1
show ip route ospf-1 vrf OUTSIDE-VRF

IP Route Table for VRF "OUTSIDE-VRF"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.100.0/24, ubest/mbest: 1/0

*via 10.30.1.2 Vlan3001, [110/41], 1w5d, ospf-1, intra
The OSPF routes are ad ver

tised by the VTEP into the VXLAN Fab ric. All other VTEPs
will import these routes in each VRF, pointing to VTEP A as the next hop. The ex am
ple
below shows the rout ing table on VTEP 1. VTEP A's IP ad
dress 10.
30.
1.
2 (OUT SIDE-VRF)
is the next hop.
VTEP1# show ip route 192.168.100.0/24 vrf OUTSIDE-VRF

IP Route Table for VRF "OUTSIDE-VRF"
192.168.100.0/24 ubest/mbest: 1/0

*via 10.30.1.2%default, [200/41], 1w1d, bgp-65500, internal, tag 65500
(evpn) segid: 55555 tunnelid: 0xa010112 encap: VXLAN
Traf
fic from VTEP 1 will be en capsu
lated to
wards VTEP A, de cap
sulated and sent to the
fire
wall. The firewall enforces the policy and sends the traf
fic back to VTEP A on the IN-
SIDE-VRF. VTEP A will en capsu
late the traf
fic and send it to the des ti
na
tion VTEP 2
where traffic is de
cap sulated and sent to the endpoint.
Firewall Failover
When the ac tive fire
wall fails and the standby fire
wall takes over, routes are with
drawn
from ser vices VTEP A. As the pre vi
ous standby be
comes ac tive, routes are now adver
-
tised to the fabric through ser vices VTEP B.
If it is not de
sir
able to run a dy
namic rout
ing pro
to
col on the fire
wall, there is a need
for sta
tic routes point ing to the fire
wall as next hop. It is crit
i
cal to en
sure that only the
VTEP serv ing the ac
tive fire
wall is ad
vertis
ing the sta
tic route.
The first way to ac com plish this task is to track ac
tive fire
wall reacha
bil
ity by val
i
dat
ing
it is lo
cally learned via HMM (Host Mo bil
ity Manager). The sec ond approach is to con -
figure the sta tic route at all the com pute VTEPs in stead of the services VTEPs. Both ap -
proaches are in troduced to en sure that only the route to wards the ser vice VTEP with
the ac tive firewall is used.
The ap proach using HMM track ing ensures that if the ac tive fire
wall is con
nected to
VTEP A, only VTEP A will have and ad ver
tise the static route. VTEP A will track how the
sta
tic route's next hop (firewall IP) is learned. Only if the next hop is learned as an HMM
route (directly con nected), VTEP A will ad vertise the static route through re dis
tri
b
u
-
tion. If the active fire
wall fails and the standby takes over, VTEP A starts to learn the
next hop IP through BGP and VTEP B starts to know the fire wall’s IP address as next
hop through HMM. VTEP A will then with draw the tracked routes and VTEP B starts ad -
ver
tis
ing its routes into the fabric.
For example:
VRF context Tenant-1

ip route 0.0.0.0/0 10.30.2.2 track 10
track 10 ip route 0.0.0.0/0 reachability hmm
vrf member Tenant-1
VTEPA# show track 10

Track 10
IP Route 0.0.0.0/0 Reachability
Reachability is UP
VTEPA# show ip route 0.0.0.0 vrf Tenant-1

IP Route Table for VRF "Tenant-1"
0.0.0.0/0, ubest/mbest: 1/0

*via 10.30.2.2 [1/0], 00:00:08, static
Firewall Failure on VTEP A caused the track to go down causing VTEP A to

withdraw the static route
VTEPA# show track 10

Track 10
IP Route 0.0.0.0/0 Reachability
Reachability is DOWN
VTEPA# show ip route 0.0.0.0 vrf Tenant-1

Route not found

In this case, where the static route is con fig

ured in the com pute at
tached VTEPs (VTEP
1 and VTEP 2), no ad di
tional config
u
ration is neces
sary as recur
sive route lookup will
ensure that the static route is only active if the next hop is reach able. Only the VTEP
with the active fire
wall will ad
ver
tise the firewall IP. This ap
proach en sures that traf
fic
will only be routed towards the VTEP with the ac tive firewall.
VRF context Tenant-1

ip route 0.0.0.0/0 10.30.2.2
VTEP1# show ip route 0.0.0.0 vrf Tenant-1

0.0.0.0/0, ubest/mbest: 1/0

*via 10.30.2.2 [1/0], 00:00:08, static
VTEP1# show ip route 10.30.2.2/32 vrf Tenant-1

10.30.2.2/32 ubest/mbest: 1/0

*via 10.254.254.111%default, [200/41], 1w1d, bgp-65500, internal, tag
65500 (evpn) segid: 50000 tunnelid: 0xa010112 encap: VXLAN
Firewall Failure on VTEP A (10.254.254.111) caused the recursive lookup to

change toward VTEP B (10.254.254.112)
VTEP1# show ip route 10.30.2.2/32 vrf Tenant-1

10.30.2.2/32 ubest/mbest: 1/0

*via 10.254.254.112%default, [200/41], 00:00:01, bgp-65500, internal, tag
65500 (evpn) segid: 50000 tunnelid: 0xa010112 encap: VXLAN
Integrating Application Delivery Controllers

ADC is another cat
e
gory of net
work service that many ap
pli
ca
tions re
quire. The in
tro
-
duc
tion described the dif
fer
ent de
ploy
ment modes to bring an ADC into a net work.
This sec
tion fo
cuses on the one-arm design with source NAT.
In the one-arm de

ploy
ment model, the traf
fic flow is as follows:
• Client traf
fic en
ters the in
ter
face pre
sent
ing the vir
tual IP ad
dress (VIP)
• ADC de
cides which real server to send the re
quest to
• ADC then trans
lates the des
ti
na
tion ad
dress, which was pre
vi
ously the VIP, with the
IP ad
dress of the real server.
• The re
quest to
wards the real server is ex
it
ing the same in
ter
face as the client re
-
quest came from
• The source IP ad
dress is trans
lated via source NAT
• The real server will see the ADC IP ad
dress as the source IP
The ADC is con nected to a ser

vice VTEP or a pair of ser vice VTEPs with vPC. ADCs are
com monly deployed as a High-Availabil
ity pair. Within the HA pair, the ac
tive ADC ad
-
ver
tises the VIP to the ser
vice VTEP. This can be achieved by sim ple MAC/IP learn ing
ad
ver
tised in the VXLAN Fab
ric as an EVPN Type-2 route. Al
ter
na
tively, the ADC can be
im
ple
mented with a dy
namic rout
ing pro
to
col and ad
ver
tise the VIP as an EVPN Type-5
route.
Figure: ADC Traffic Flow
Traf
• Client traf
fic will be en
cap
su
lated by VTEP 1 to
wards ser
vices VTEP A
• VTEP A de
cap
su
lates and sends the traf
fic to the ac
tive ADC
• The ADC sends the traf
fic des
tined to the real server back to ser
vices VTEP A
• VTEP A en
cap
su
lates and sends the traf
fic to the des
ti
na
tion VTEP 2
• Traf
fic gets de
cap
su
lated at VTEP 2 and sent to the real server
• The re
sponse back from the real server is sent back to the ADC, since the ADC per
-
formed source NAT
Application Delivery Controller Failover

When the ac tive ADC fails and standby one takes over, routes are withdrawn from the
ser
vices VTEP A. The newly ac tive ADC ad
vertises the VIP on ser
vice VTEP B, the same
way as the previ
ous ac
tive ADC has done it on VTEP A. Service VTEP B is now respon
si
-
ble for re
quests to
wards the ADC VIP.
Integrating Service Chaining

Typi
cally, or
ga
ni
za
tions find that Layer 4-Layer 7 ser vices are not inserted indi
vidu
ally,
but as a chain. For example, a cer
tain appli
ca
tion might re quire se
cu
rity services from a
fire
wall, and load bal ancing services from an Ap pli
cation Deliv
ery Con troller. Tying
these net work ser
vices together is com monly referred to as a service chain.
Con sid
er
a
tion needs to be given to place ment of the de
vices so traffic does not take ex -
cessive hops across the fab ric when going between the fire
wall and the load bal ancer. It
is com mon to have mul ti
ple fire
walls and ADCs con nected to a ded i
cated pair of
switches as service nodes. The place ment of the appli
ances in the VXLAN Fab ric is con
-
sol
i
dated to a pair of ser
vices under a service node pair. Traf
• Traf
fic will be VXLAN-en
cap
su
lated from the client VTEP 1 to
wards the ser
vices
VTEP A.
• The ser
vice VTEP re
spon
si
ble for the ac
tive fire
wall de
cap
su
lates and sends the
traf
fic to the ac
tive firewall.
• The fire
wall then sends the traf
fic to
wards the ADC's VIP ad
dress. This is done with
the assumption that the fire
wall and the ADC are con nected to the same ser
vice
VTEP. If fire
wall and ADC are on dif fer
ent VTEPs, traf
fic will be VXLAN-en
cap
su
-
lated to
wards the service VTEP hosting the ADC.
• ADC then sends the traf
fic des
tined to the real server back to the ser
vices VTEP,
which en
cap
su
lates and sends it to the des
ti
na
tion VTEP 2.
• Traf
fic gets de
cap
su
lated at VTEP 2 and sent to the real server.
• The re
sponse back from the real server is sent back to the ADC as the ADC is using
source NAT. With the usage of source NAT, the X-For warded-For HTTP header
field is going to be in serted to pre serve client IP ad
dress vis
i
bil
ity. Sub
se
quently,
the traffic will be in
spected by the fire
wall on its way back to the client.
The di
a
gram below shows a log
i
cal rep
re
sen
ta
tion of a ser
vice chain.
Figure: Layer 4-Layer 7 Service Chain
The di
agram below shows a phys i
cal rep
re
sen
ta
tion of VXLAN Fabric with a dedi
cated
ser
vice VTEP pair. Fire
walls and ADCs are com monly con nected to the ser
vices VTEPs.
This can be achieved with or with
out vPC (vPC shown in diagram).
Figure: VXLAN Fabric with Service Leaf

To avoid addi
tional en
cap
su
la
tions and de
capsu
la
tions, affin
ity can be cre
ated be
tween
the ac
tive fire
wall and the active ADC, and they can be placed on the same ser vices
VTEPs.
Multi-POD &
Multi-Site
Designs
Introduction
In an increas
ingly compet i
tive, globally con nected business en
vironment, or gani
za
tions
are faced with enor mous pres sures to en sure con tin
u
ous avail
ability of crit
i
cal business
appli
ca
tions. With dig i
tal strate gies dri ving innov
a
tive new busi ness op por tu
ni
ties,
these or ga
ni
za
tions are look ing for IT infrastruc
tures that offer the agility, per formance
and avail
abil
ity re
quired to sup
port these new ap
pli
ca
tion infrastructures.
When build ing the IT infra

structure to sup
port these business crit
i
cal envi
ron
ments,
today's data cen ter deploy
ments re quire ge
o
graph
i
cal di
ver
sity and scale, ensur
ing the
abil
ity to de
liver rapid scale, high per
for
mance and "al
ways on" availability.
As a conse
quence, data center networks are building built as scal
able, highly avail
able
net
work fabrics which are dis
trib
uted across mul ti
ple data centers, whether sepa
rated
within or across a metro area, or across the globe.
This chapter presents dif

ferent de
ploy
ment op tions for the in
ter
con
nection of VXLAN
Fab
rics, dis
tin
guish
ing between the multi-POD and multi-site ap proaches based on the
spe
cific needs for scala
bil
ity and avail
abil
ity and the exist
ing phys
i
cal and oper
a
tional
constraints.
Fundamentals
A Point of Deliv
ery (POD) is a network build
ing block which can eas ily be repli
cated
within a data center. The pre
dictable and ho
mogeneous char ac
ter
is
tics of a POD pro -
vide self-con
tainment and a pre-as signed scale and per for
mance re quire ment (POD
plan
ning). The archi
tec
ture of a POD should be mod u
lar to allow for it to be replicated
and in
ter
con
nected, keep
ing a ho
mo
ge
neous design.
Figure: 3 Tier Architecture
In classic hi
er
ar
chi
cal net
work de sign, the POD is formed by the Ac cess and Aggrega
-
tion Layer, where the Ag gre
gation Layer pro vided the Layer 2 de mar
ca
tion. Layer 2
traf
fic is ter
mi
nated and routed across the Core to reach other PODs or ex ter
nal net
-
works. With the de marca
tion at the Aggre
gation Layer, a Layer 2 VLAN or an IP Subnet
is lo
calized within a sin
gle-POD, there
fore Layer 2 commu nica
tion between PODs is not
pos si
ble. As a con
se
quence, host mobil
ity across PODs is dif
fi
cult to implement.
Figure: 2-Stage Clos Architecture
With the evo lu

tion of the hierar
chical Core/Ag gre
gation/Ac cess de sign into spine-leaf
topologies, the loca
tion of func tions and in terconnectiv
ity within a POD shifts. Sim ply
migrat
ing the topol ogy from a hi er
ar
chical design to spine-leaf does not bring about a
change in func tions, how ever, the ad di
tion of over lays introduces greater ver satil
ity.
With Inte
grated Route and Bridg ing (IRB) and VXLAN, the leaf not only pro vides the de -
fault gate
way but also a Layer 2 bridg ing service to other leaf switches. With this ap -
proach, it is possi
ble to ex
tend Layer 2 ser vices beyond a sin gle-POD by using an over -
lay with end-to-end en cap
sula
tion. When struc tur
ing mul ti
ple PODs and en abling ex -
tended Layer 2 and Layer 3 ser vices, use cases such as host mo bil
ity are now eas ier to
implement.
Figure: Interconnecting Two Clos Networks
The in
ter
connec
tion within a multi-POD site can be achieved in var i
ous ways. Spines
can be in
ter
con
nected back to back, an ad
di
tional su
per-spine layer can be in
tro
duced,
or PODs can be inter
con
nected at des
ig
nated leaf switches.
When mul ti

ple physi
cal lo
ca
tions are present, multi-site de
signs come into con sid
er
a
-
tion. A site defines a set of PODs (multi-POD) which share the same do main con structs,
provid
ing the same set of Layer 2 and Layer 3 seg ments at a given phys i
cal lo
ca
tion. As
a re
sult, within a given site the end-to-end over lay en
capsu
la
tion starts at a leaf in one
POD and can ex tend to a leaf in an
other POD.
Multi-POD de signs can be stretched across phys i

cal lo
ca
tions, how ever, this is not the
rec
ommendation given the high avail
abil
ity re
quire
ments for ge ographi
cally dis
persed
data cen
ters. Fur
ther de
sign as
pects are covered through out this chapter.
In a multi-site de sign, the most sig nif

i
cant aspects to consider are how to con nect the
sites with each other at the con trol and data plane lev els. When con sid
er
ing north-
south con nectiv
ity, the first op
tion in a multi-POD de sign is to con
sol
i
date all ex
ter
nal
connec
tiv
ity into a sin gle point of ac cess. Al
ter
nately, sin
gle points of access for each
in
di
vid
ual POD for dis trib
uted ingress and egress for warding can be defined.
The east-west com muni

ca
tion has to solve the chal lenge of con necting sites to each
other al
lowing workload mo bil
ity, but at the same time isolat
ing the sites so that they
are in
de
pen dent from each other from a busi ness con
ti
nu
ity perspec
tive. A fault in one
site should not propa
gate to the other.
Why Deploy Multiple PODs?

The op ti
mal way to ef fi
ciently scale a sys
tem is through mod u
lar
ity. Any mono lithic ar
-
chi
tecture will only grow to a cer tain point, after which in ef
fi
ciencies will appear. A
data cen ter is an exam ple of a sys
tem that requires a flex
i
ble way to scale the net work
in
fra
structure. Fre quently a data cen ter build-out starts in a single room and later ex -
pands across mul ti
ple rooms.
Beside scale, phys i

cal fa
cil
ity and in frastructure lay
outs can be an other mo ti
vation for
multi-POD de signs. Multi-POD de signs fit very well in situ
ations where a phys i
cal lo
ca-
tion is parti
tioned across mul ti
ple rooms with lim ited cabling, but main taining end-to-
end Layer 2 and Layer 3 con nectiv
ity is still re
quired. Any ser vice within one POD can
be made avail able to any other POD within this multi-POD topol ogy. As an ex am ple,
con sider a high avail abil
ity (HA) clus ter being de ployed at a sin gle phys i
cal location but
spread across dif fer
ent rooms due to the site's local HA ca pa
bil
i
ties (different Power
Distri
b
u
tion Unit - PDU, Un in
ter
rupt ible Power Sup ply - UPS etc.).
Why Deploy Multiple Sites?

Mod ern data cen ter en
vi
ronments must meet the needs for high avail abil
ity within the
data cen ter and across geo
graph i
cally-distributed data cen ter infra
structure. This type
of dis
tributed ar
chi
tec
ture offers multi
ple ben e
fits for highly available ap
pli
cation de
liv
-
ery. Applica
tions can be delivered in an ac tive/ac tive or active/standby de ploy
ment
model and form the foun da
tion for an effective busi ness conti
nuity or dis
aster re
cov
ery
strategy.
There are many fac tors which deter

mine the ap plic
a
bil
ity and design of the multi-site
data center envi
ron ment includ
ing physi
cal con
straints such as site lo ca
tion and re-
quire
ments for ge o
graphi
cal di
ver
sity. Other con sid
era
tions in
clude band width and
ser
vice avail
abil
ity for in
fra
struc
ture such as dark fiber or wave length service, and la
-
tency which may im

pact ap
pli
ca
tion per
for
mance. These fac
tors de
ter
mine the Re
cov
-
ery Point Ob jec
tive (RPO) and Re
cov
ery Time Ob
jec
tive (RTO) for ap
pli
ca
tion
availability.
In contrast to a sin
gle site de
ployment a net working solu
tion for mul
ti
ple sites must
also address the need to main tain a level of sep
a
ra
tion. Any event whether planned or
un
planned im pact
ing one site should not spread to any other site as it would im
pact
over
all ap
pli
ca
tion availability.
When de ploying a network infra

structure based on VXLAN EVPN, the con sis
tent deliv
-
ery of Layer 2, Layer 3 and IP mul ti
cast ser
vices must be main tained. Together, these
allow for the de liv
ery of dis
tributed appli
ca
tion ar
chitec
tures and geograph i
cally-dis
-
persed clus tered infra
struc
ture to sup port highly avail
able stor
age ac
cess and com pute
virtualization.
De
sign cri
te
ria to be con
sid
ered for such de
ploy
ments include:
• Phys
i
cal Connectivity: In many cases, given the con
straints out
lined above, the
avail
ability of connectiv
ity services may be lim ited. As an exam ple, dark fiber or
wave length ser vices availabil
ity may be limited or cost-prohibi
tive over large dis-
tances, whereas a routed Layer 3 or MPLS ser vice may be read ily avail
able at an
achiev able price point. The de sign must take into con sid
er
a
tion the need to allow
for mul ti
ple connection types rang ing from high band width dark fiber through to
bandwidth-con strained service provider-de liv
ered Layer 3 services.
• Fault Isolation: When con
nect
ing mul
ti
ple dis
crete net
work en
vi
ron
ments to
-
gether, the risk of a failure event prop a
gat
ing be
tween sites increases signif
i
cantly
unless controls are applied to restrict the control plane and data plane activ
ity. Ex
-
am ples in
clude selec
tion and con fig
u
ration of control plane pro
to
cols such as BGP,
and the con trol or re stric
tion of data plane ac tiv
ity such as ARP sup pres
-
sion/spoof ing and storm control.
Based on these cri

te
ria, the multi-site so
lu
tion must de
liver the ap
pro
pri
ate set of fea-
tures and func tion
al
ity required to meet the spe cific demands of a par tic
ular
deployment.
In sub
se
quent chap ters the op
tions for multi-POD and multi-site de ploy
ment are ex -
plored fur
ther, in
cluding back-to-back vPC, OTV, and PBB-EVPN for a com prehensive
DCI so
lu
tion in order to maintain con
trol plane and data plane iso
la
tion and at the same
time pro
vide work load mobility.
Multi-POD Design
With a multi-POD de sign, mul

ti
ple data cen
ter PODs are in ter
connected using redun-
dant Layer 3 paths and run the same VXLAN EVPN con trol plane. Each POD can have its
own Clos Fabric ar
chi
tec
ture with in
de
pendent spine and leaf layers. Phys
i
cal con
nec-
tiv
ity be
tween PODs can be es tab
lished by in
ter
con
necting them on ei ther the leaf or
spine layer. This ses
sion dis
cusses dif
fer
ent op
tions in the multi-POD fab
ric design.
Placement of Inter-POD Connecting Points

The phys
i
cal connec
tions be
tween PODs pro vide the Layer 3 path for the EVPN con trol
and data planes, as a part of the un der
lay IP transport network. There fore, the in
ter-
POD links do not bear any spe cial func
tional requirements for VXLAN EVPN. The min i
-
mum requirements for in
ter-POD links are Layer 3 IP uni cast and mul ti
cast rout
ing in
the un
der
lay. Place
ment of the in ter-POD con necting points is flex
i
ble as it can be ei-
ther on a leaf node or a spine node. Con sider
a
tion around the link speeds, the optic
types, and/or the ca
bling plan could be dri
ving factors for the de
ci
sion to in
ter
con
nect
PODs on leaf or spine nodes. Fig
ures below il
lus
trate the topologies options.
Figure: Multi-POD Topology with Inter-Connections on Leaf Nodes or Spine Nodes

To achieve high avail

abil
ity for in
ter-POD con
nec
tiv
ity, the rec
om
men
da
tion is to
lever
age re
dundant in
ter-POD paths using two or more de vices in each POD. Since the
in
ter-POD connections only need to provide Layer 3 con
nectiv
ity, the re
dun
dant in
ter
-
connect
ing de
vices do not need to be in vPC pair.
Scale Multi-POD with Multi-Stage Clos Architecture

When the num ber of PODs in creases, sim
ple yet scal
able multi-POD de sign be
comes an
im
por tant deci
sion point. Fol
lowing the n-stage data center fab
ric de
sign prin
ci
ple, one
de
sign op tion to connect mul ti
ple data cen
ter PODs is to in
troduce a super spine layer
that inter
connects the spine layer of each POD.
This essen tially builds a multi-stage hi er

ar
chical fab
ric topology. MP-BGP EVPN is run -
ning be tween the Fab ric nodes to dis trib
ute the VXLAN EVPN routes. This multi-stage
fabric design with a su per-spine layer sim pli
fies the interconnec tion topology among
PODs, mak ing it eas
ier to scale the num ber of PODs. It is the most ef fi
cient way of pro-
viding con sistent forward ing hop counts for in ter-POD traf fic. If un der
lay multi
cast
replica
tion is used to trans port the VXLAN Fab ric BUM traf fic, a multi-stage Clos Fab ric
design also helps re duce the num ber of Mul ti
cast Out put In terfaces (OIF) required.
Since most switch plat forms sup port a limited num ber of mul ti
cast OIFs, the su per-
spine will allow the VXLAN Fab ric to scale without ex ceeding the max i
mum num ber of
OIFs sup ported on a sin gle spine device.
Figure: Multi-Stage Fabric

Scalability
When de
sign
ing for con
trol plane scale for an in
ter-POD Fab
ric, plat
form OIF, mul
ti
cast
groups, and VTEPs need to be con sid
ered in addition to host MAC and MAC/IP. It is
im
por
tant to look at the hard
ware ver
i
fied scal
a
bil
ity guidelines.
For ex ample, in a simple multi-POD sce nario, if the spine supports 256 OIF, then sub -
tract 2 OIF for the up link to
wards the L3 core, leav ing 254 OIFs for southbound con nec
-
tiv
ity to the leafs in the vPC domains. This would give 254 leaves, or 127 vPC do mains, to
con nect south bound if each leaf in the vPC domain has a single link to each spine in the
POD.
Looking closer at the above ex ample, both vPC VTEP switches in dependently send the
IP PIM regis
ter to the Rendezvous Point for the mul ti
cast group of the VXLAN VNI. Both
source the reg is
ter packets from the any cast VTEP ad dress and each installs the cor
re
-
sponding (*, G) entry in their mul ti
cast rout
ing ta
bles with the VTEP inter
face (NVE1) in
the out
put interface (OIF) list.
In addi
tion, consid
er
a
tion needs to be given to host MAC and IP scale per leaf. A leaf
will learn all BGP routes across the multi-POD en vi
ronment but will not program the
hard ware tables Forwarding Infor
mation Base / Routing Infor
ma
tion Base (FIB/RIB)
unless the leaf needs to know about them. If the leaf knows about the VRF and is im -
porting the route-tar gets it will pro
gram the RIB for the MAC/IP routes. In ad di
tion,
the leaf only programs the FIB with the MAC ad dress of the VNIs of the VRFs it has lo
-
cally defined.
Building the Overlay

Building the Overlay
Data Plane Operation
Like a single-POD, the tun nels run be

tween VTEP de vices in a multi-POD fabric. In a
multi-POD fab ric the tun nel headend VTEPs can re side in dif
fer
ent PODs if the traffic
tra
verses the POD bound ary, which re
sults in an in
ter-POD tun nel. VXLAN encapsula
-
tion and de cap
sula
tion only take places on the ingress and egress VTEPs. The other de -
vices along the for ward ing path only need to route the en cap
sulated VXLAN pack ets.
This provides very effi
cient end-to-end, single-tunnel over
lay data plane processing.
Figure: Data Plane Operation
IP Gateway Localization
In net
works with out Distrib
uted Any cast Gate way the de fault gate way is made re dun -
dant through the use of a First Hop Re dun
dancy Pro to
col (FHRP). When a net work seg -
ment spans across mul tiple physi
cal lo
ca
tions, the same con cept can force all traf fic
through sin gle VTEP. Al terna
tively you can pro vide local
ization by hav ing an active in-
stance of the de fault gateway in each lo ca
tion. Using lo calization provides an more op -
ti
mal for
ward ing path be tween sub nets within the same lo cation. If ap
pli
ca
tion work -
load mobil
ity is re
quired be tween locations, it is important to main tain the same de fault
gate
way IP and MAC ad dress. With gate way lo caliza
tion, end points do not need to re-
learn these information at the new location.
Multi-POD VXLAN Fab ric with Dis

trib
uted Any
cast Gate
way makes it easy to meet this
re
quire
ment. Simi
lar to sin
gle-POD, all the VTEPs serv
ing the same IP sub
net can use
the Dis
trib
uted Anycast Gateway.
Figure: Gateway Localization with Distributed Anycast Gateway
In the il
lus
tra
tion above, the VTEP leafs in blue are Dis
trib
uted Any
cast Gate
way for
Layer 2 VNI "Blue" while the VTEP leafs in green are Distrib
uted Any
cast Gate
way for
Layer 2 VNI "Green".
Control Plane Operation

EVPN MP-BGP con trol pro
to
col runs through out multi-PODs the same way as it does
within a single-POD. The fab ric nodes run ning EVPN ex change MP-BGP EVPN routes
with one an other. Each VTEP de vice detects its local endpoints and in stalls HMM
routes for end point tracking. The HMM routes are au to
mati
cally in
jected into MP-BGP
EVPN ad dress-fam ily and dis trib
uted to other EVPN nodes as EVPN type-2 routes.
Upon receiv ing the EVPN routes, the rest of the VTEP de vices will in
stall the endpoint
reacha
bil
ity infor
mation into their L2 RIB and L3 RIB ta bles. Fur ther program ming of
the hardware for warding ta
bles, includ
ing the MAC-ad dress table and host/LPM for -
warding tables based on the RIB in for
mation will happen if they pos sess the cor re
-
sponding L2VNI and L3VNI information.
Placement of MP-BGP EVPN Peering

In a VXLAN Fab ric the same EVPN con trol plane runs through out the entire envi
ron-
ment. Within a POD, EVPN ses sions are formed be tween leaf and spine nodes. Be tween
PODs, EVPN peer ing does not nec es
sar
ily need to co in
cide with the physi
cal connec-
tion topol
ogy. The fol
lowing drawing depicts the avail
able de
signs in which EVPN MP-
BGP peering occurs be
tween the con nected leafs or be tween the spine nodes of dif fer
-
ent PODs.
Often, switch hard ware platforms with more con trol plane capacity and higher band -
width are cho sen for the spine layer. Also, due to their cen tral
ized loca
tion in a POD,
the spine nodes are often cho sen as the control point for MP-BGP EVPN route dis tri
b
u
-
tion. For ex
ample, in a MP-iBGP Fab ric, the spine nodes are often cho sen to be the iBGP
route reflec
tors. In this case, peer
ing on the spine nodes be tween PODs can take ad -
vantage of the more scal able con
trol plane and the com plete set of EVPN rout ing in
for
-
mation on the spine nodes.
MP-iBGP vs MP-eBGP
MP-BGP EVPN dis trib
utes the Layer 2 and Layer 3 reach a
bil
ity in
for
mation for the
VXLAN over lay network. It sup
ports both iBGP and eBGP topol ogy, which pro vides the
design flex
i
bil
ity to run MP-BGP in a multi-POD en vi
ronment. It is not within the scope
of this book to doc u
ment all the pos
si
ble combi
nations of iBGP and/or eBGP de signs in
a multi-POD Fab ric. The com mon practice designs will be discussed to illus
trate the
design principles.
The Figure below de scribes a com mon multi-POD de sign in which each POD runs MP-
iBGP EVPN be tween leafs and spines whereas MP-eBGP EVPN is used to in ter
con
nect
the PODs. The draw ing does not in di
cate any physi
cal topol
ogy for connect
ing mul
ti
ple
PODs to gether rather, it de picts the peering topol
ogy. Con cep
tu
ally, the Route Re
flec-
tors (RR) of dif
fer
ent PODs are ex changing EVPN routes via MP-eBGP so that reach abil
-
ity in
for
mation can be ex tented from one POD to another.
Figure: eBGP Peering Among PODs
With this design, BGP peer ing among mul ti

ple PODs is simple. EVPN routes can be dis -
trib
uted among PODs through MP-eBGP peer ing without the need for addi
tional con-
fig
u
ra
tion. Addi
tional con
sider
a
tions need to be given to how to preserve the at
tributes
in an EVPN route when it is dis trib
uted within the Fab
ric as eBGP default be
havior may
cause some of the at trib
utes to be overwritten:
• By de
fault, a router over
writes the next-hop in the route to it
self when send
ing a
route to its eBGP peers.
• If each AS gen
er
ates EVPN route-tar
gets (RT) au
to
mat
i
cally, they may end up hav
-
ing dif
fer
ent RTs for the same L2VNI or L3VNI as often the auto-RT func tion uses
the BGP AS num ber as one of the el
e
ments to derive EVPN RTs. So ad di
tional cau-
tion needs to be applied when config
ur
ing the EVPN RT im port and ex
port policies
to ensure the routes within the same VNI shall have the same im port/ex port RTs
on VTEPs with differ
ent PODs so that the route dis tri
b
u
tion can be complete end-
to-end.
An
other de
sign is to use a sin
gle BGP AS across all PODs so that the multi-POD Fab
ric
runs EVPN MP-iBGP.
Figure: iBGP Peering Among PODs
With this design, addi

tional iBGP design prin
ci
ples need to be applied to ensure EVPN
routes are distributed end-to-end through the BGP AS. As a loop pre vention mecha-
nism, iBGP has the rule that routes learned from one iBGP peer will not be ad ver
tised to
the other iBGP peer. That is the rea son why iBGP route re flec
tors (RR) are needed to
re
flect routes be tween the peers. In this multi-POD de sign, RRs from dif fer
ent PODs
are fur
ther in
terconnected within the same BGP AS, there is a need for an other layer of
RRs to pass MP-iBGP EVPN routes among the POD RRs. How ever, by de
sign MP-iBGP
preserves EVPN route at trib
utes bet
ter than MP-eBGP.
• iBGP by de
sign pre
serves BGP next-hop. There
fore, when an EVPN route is dis
trib
-
uted within an iBGP topol
ogy, the orig
i
nat
ing VTEP ad
dress will be pre
served in the
BGP next-hop.
• iBGP does not change the EVPN Route-Tar
get (RT) value while dis
trib
ut
ing the routes.
• Auto-RT func
tion will gen
er
ate the same EVPN RT for the same VNI across dif
fer
ent
PODs. This ensures that VTEPs in dif
fer
ent PODs will have con
sis
tent im
port and
ex
port RT value for the same VNI.
When com par

ing the two com mon EVPN MP-BGP de signs, each of them of
fers sim
plic
-
ity in one as
pect while in
tro
duc
ing com
plex
ity in an
other. The follow
ing table sum
ma -
rizes the comparison.
MP-eBGP MP-iBGP
BGP Peering Simple Complex
EVPN Route Distribute Complex Simple
If a multi-POD Fab ric is de

ployed in a net work that already has a BGP de ployment, the
deci
sion on whether to use MP-iBGP or MP-eBGP peer ing will de
pend on the ex ist
ing
BGP de ployment. It is worth noting that in the context of multi-POD de sign, the advan
-
tage of using dif
fer
ent BGP AS's for bet ter control plane segmen ta
tion is not sig
nif
i
cant
as the en tire multi-POD fab ric is under the same ad min
is
tra
tive scope and MP-BGP
EVPN domain.
Building the Underlay
Cabling
Most of the ex ist

ing ca
bling in
fra
structures were de signed to han dle 3-tier phys i
cal
cable layouts which may lend itself to a multi-Pod topol ogy due to limited cabling ca-
pac
ity. In N+1 POD en vi
ronments, a cabling infra
struc
ture pro
vid
ing cen tral core con -
nec
tiv
ity will ad
dress scale out and fa cil
i
tates adding more ca pacity. Another cabling
op
tion is multi-POD using dark fiber, MAN or DWDM.
Figure: Cabling Infrastructure
IP Multicast Replication vs Ingress Replication

The MP-BGP EVPN con trol plane is used to dis cover end
points and ex change host in -
formation be
tween leaf nodes, IP Mul ti
cast or Ingress Repli
ca
tion is used for BUM traf -
fic. There are two ways to repli cate BUM traf fic through
out the fabric, ei
ther one to
many, or one to one many times. ARP re quests are one ex am ple of BUM traf fic that
needs to be repli
cated using a mul ti
cast group or ingress replication.
Ingress replica
tion can have scale is sues as the switch needs to repli cate BUM pack ets
as many times as there are VTEPs that own the VNI need ing to see that traffic. As an
ex
ample, with 50 VTEPs that own the same VNI that re quire BUM traf fic, repli
ca
tion
needs to be per formed 50 times. Repli cated BUM trans missions con
sume a lot of band -
width in the net work. In contrast, IP mul ti
cast across a multi-POD en vironment is a
much more scal able so
lu
tion to han dle BUM traf fic as the fab
ric na
tively pro vides the
ca
pa
bil
i
ties for the required repli
ca tion. IP multi
cast reduces network load, im proves
per
for
mance, and in creases scal
a
bility across multi-POD environments.
When an any cast RP is con

fig
ured, the re
stric
tion of hav
ing one ac
tive RP per multi
cast
group instead de ploy re
dun
dant RPs for the same group range. The RP routers share a
sin
gle uni
cast IP address be
tween PODs. This method pro vides RP re
dundancy and load
shar
ing within the do main. Sources from one RP are known to other RPs in other PODs
using the Mul ti
cast Source Discov
ery Protocol (MSDP). Sources and re ceivers use the
clos
est RP, as deter
mined by the IGP. During an RP fail
ure, sources and re
ceivers seam -
lessly failover to a new RP based on the un

der
lay rout
ing do
main. In multi-POD en
vi
-
ronments, PIM-SM RP and RP re dundancy should be po si
tioned lo
cally in
side of each
re
spective POD. Any cast RP clus
ter
ing can be used for RP redun
dancy across PODs but
for bet
ter control of the mul
ti
cast en
vi
ron
ment MSDP is the rec om mended so lu
tion to
connect multi
ple PIM-SM domains.
Figure: MSDP for Inter Site Multicast

More de
tails of con
fig
ura
tion ex
am
ples can be found at http://
www.
cisco.
com/
c/
en/
us/
support/ docs/ip/ip-multicast/
115011-anycast-pim.html
Multi-POD Routing Design

Multi-POD IP Routing de
signs need to take into con sid
er
a
tion the re
quirements for
both un
der
lay and over
lay. The underlay will es
tab
lish Layer 3 connectiv
ity be
tween
VTEPs de
ployed across multi
ple PODs.
The underlay should only con

sist of VTEP reach
a
bil
ity in
for
ma
tion for all net
work
ing
de
vices in the fabric.
When in ter

con nect
ing an EVPN multi-POD en vi
ronment, it is im
portant to main tain as
much as POD in depen
dence as pos si
ble. In very large multi-POD en vi
ron
ments it may
be bene
fi
cial to have multi
ple IGP areas to im prove fault toler
ance across the PODs. As
an example, each POD can be op tionally a Stub Area. As a Stub Area, each DC area
knows its own topol ogy and has a default route towards the bor der leaf; while the back-
bone area has a view of the full multi-POD Fab ric. However, for most de signs a single
area across mul ti
ple PODs will suf
fice as simplic
ity out
weighs complexity.
The un derlay can be built with any rout ing pro to
col. BGP may not be the best choice as
an un der
lay proto
col as it is a dis
tance vec tor rout
ing protocol and it does not take into
account link speed or path cost, and in a multi-POD en vi
ronment mul ti
ple paths with
dif
ferent link speeds might be used to in ter
con nect the PODs. Dri ving sim plic
ity in the
rout ing design in the under lay will help to im prove overall conver
gence in the over lay.
Tun ing IGP timers may help im prove con ver
gence time, how ever, there is no generic
recom men da
tion, and this must be qual i
fied and vali
dated for each deployment.
Other IGPs such as OSPF, ISIS would be a better op

tion for un
derlay rout
ing. Please
refer to the Sin
gle-POD design chap
ter for a more de tailed dis
cussion on routing
protocols.
Service Integration
In a multi-POD design, it is a rec
om
mended prac tice to have all the ser
vices in
fra
struc
-
ture such as fire
walls or load bal ancers con nected to a sep a
rate services node POD.
This helps with scal
a
bil
ity and high avail
abil
ity for services across a multi-POD design.
Design Options
When con nect

ing data cen
ter sites based on VXLAN Fab rics, there are a num ber of de-
sign consid
er
a
tions which will de
ter
mine the over
all per
formance, availabil
ity, and scale
of the environment.
These de
sign con
sid
er
a
tions in
clude the fol
low
ing aspects.
• De
ter
min
ing the In
ter-Site Bor
der Con
nec
tion Points: The Fab
ric bor
der pro
vides
an edge func tion to allow for exter nal con
nectiv
ity in and out of the Fabric and also
provides an attachment point for the DCI ser vices which de liver the re
quired inter-
site connec
tivity. Al
though the Fab ric bor
der for Layer 2 Layer 3 Ex ter
nal Connec-
tiv
ity and DCI ser vices have sim i
lar char
ac
teris
tics, they may or may not be com -
bined de pending on fac tors de
tailed in the Ex
ternal Con nec
tiv
ity chapter.
• DCI Ser
vice De
liv
ery: An ap
pro
pri
ate se
lec
tion of DCI ser
vice will be a pri
mary fac
-
tor in the multi-site de
sign as each will have dif
fer
ent prop
er
ties as ex
plained fur
-
ther in the Ex
ter
nal Con
nec
tiv
ity chapter.
• L3 ser
vices in
clud
ing L3VPN, VRF Lite, LISP or VXLAN
• L2 ser
vices in
clud
ing Eth
er
net over Dark Fibre/DWDM, OTV, PBB-EVPN, MPLS
EVPN, VPLS or VXLAN
Figure:Multi-Site DCI over L3 Service
Figure: Multi-Site DCI over L2 Service
Border Leaf Scalability

Border Leaf Scalability

The bor
der leaf pro
vides the at
tach
ment point be
tween mul
ti
ple Fab
rics, de
liv
er
ing
Layer 3 rout ing and Layer 2 ex tension be
tween sites. In order to per form host routing
for the traf
fic tra
vers
ing the DCI trans port, it must also maintain a host rout
ing table in
hardware for con nectiv
ity within and across mul ti
ple sites.
The key fac

tors which typ
i
cally de
ter
mine multi-site scale at Layer 2 and Layer 3
include:
• Vir
tual Net
work Iden
ti
fiers (VNI) - Layer 2 and Layer 3
• MAC Addresses
• IP Host Routes (IPv4/IPv6)
Building the Multi-Site Inter-Connectivity
The need for creat

ing mul
ti
ple sites is to en
sure that any im pact in one avail
abil
ity zone
will have min
i
mal to zero impact on the other avail abil
ity zones. An inde
pendent fab ric
is one that has its own control plane and data plane. Multi-site pro vides the ability to
in
tercon
nect the inde
pen
dent fab rics using a DCI so lu
tion such as back-to-back vPC,
OTV, EVPN, PBB-EVPN and Layer 3 con
nec
tiv
ity with VRF-lite, MPLS or LISP.
A contin
u
ously avail
able, ac
tive/ac
tive, flex
i
ble en
vi
ron
ment pro
vides sev
eral ben
e
fits
to the business:
• In
creased uptime
• Dis
as
ter avoidance
• Eas
ier maintenance
• Flex
i
ble work
load placement
• Ex
tremely low RTO
It is important to re
mem ber that host reach a
bil
ity in
for
mation is con
tained within a
single site and extended using a DCI technology. The Layer 3 di a
grams below demon -
strate independent con
trol planes in each site and will high
light how to extend Layer 2
connectivity.
Figure: Layer 3 DCI for VXLAN Interconnect
To provide true ac

tive/ac
tive archi
tec
ture, it is also re
quired to inte
grate Layer4-Lay
-
er7 ser
vices such as fire
walls and ADCs. Cisco Adap tive Secu
rity Appli
ance (ASA) pro
-
vides support for multi-site ac
tive/ac tive fire
wall cluster
ing with sites located hun-
dreds of kilo
meters/miles apart.
Layer 2 Reachability Across Sites

Due to re
quirements for dis
as
ter avoid
ance and workload mo bil
ity, there is a re
quire
-
ment to ex
tend VLANs across differ
ent VXLAN Fabrics. The dis
cus sion below will cover
three DCI op
tions: vPC-based, OTV, and VXLAN.
The DCI solu

tion should pro vide Layer 2 and Layer 3 ex ten
sion, and en sure that a fail
-
ure in one data center will not be prop a
gated to the other data cen ter. To prevent this
from happening, the key tech ni
cal require
ment is the ca pa
bil
ity to control the broad -
cast, un
known uni cast and mul ti
cast flood at the data plane level while ensuring control
plane independence.
Layer 2 exten
sion must be dual homed for re dundancy while pro hibit
ing end-to-end
Layer 2 loops that would lead to traffic storms causing link overflows, satu
rate switch
CPUs and virtual machine CPUs. This is why in Data Cen ter In
ter
con nect deploy
ments,
one key com ple
mentary fea
ture to Layer 2 ex
ten
sion is storm control.
Figure: Storm Control
VPC as a DCI Transport

VPC as a DCI Transport

Two VXLAN fab rics can be directly con nected using back-to-back vPC. On each side,
one pair of border nodes are leveraging a back-to-back vPC con nection to ex
tend Layer
2 con
nectiv
ity across sites. This dual link vPC could use dark fiber or DWDM. vPC is ex -
tremely simple to config
ure how ever there are certain lim
i
ta
tions to using vPC as a DCI
such as:
• Can
not in
ter
con
nect more than two sites
• Lack of fail
ure boundaries
• Site in
de
pen
dence is not preserved
OTV as DCI Transport

OTV pro vides a proven and extremely sim ple way to in ter
con
nect mul ti
ple data cen
-
ters. OTV has been designed for the data cen ter in
tercon
nect space, and is consid
ered
the most ma ture and func
tion
ally rich so
lu
tion to extend multi-point Layer 2 con nec
-
tiv
ity over a generic IP network. In ad di
tion, it offers na
tive functions that allow
strengthen
ing the DCI con
nection and in
creas ing the in
de
pendence of the fabrics.
• Span
ning Tree (STP) isolation
• Un
known Uni
cast traf
fic suppression
• ARP optimization
• Layer 2 broad
cast pol
icy control
• Fault isolation
• Site independence
• Fail
ure bound
ary preservation
Figure: OTV as a DCI
VXLAN as DCI Transport

VXLAN allows Layer 2 ex ten
sion, not only inside a data cen ter but also the abil
ity to
allow Layer 2 in
tercon
nec tions be
tween mul ti
ple sites. Today VXLAN EVPN as a DCI of -
fers some ben e
fits but is not yet mature in terms of na tive functions like pro
vided by
OTV.
New function
al
i
ties are being added to the VXLAN con trol plane which would make it a
very vi
able DCI so lu
tion in the fu
ture. This is fur
ther discussed in the in
tro
duc
tion
chapter.
As de
picted in the dia
gram below, logi
cal back-to-back vPC connec
tions are used be-
tween the VXLAN bor der leaf nodes and the local pair of VXLAN DCI de
vices to in
ter
-
con
nect multi
ple sites.
Figure: VXLAN as a DCI
Layer 3 Reachability Across Sites

In ad
di
tion to the Layer 2 solutions, the VXLAN fab
ric can also be con
nected across
Layer 3 boundaries using MPLS L3VPN, VRF-Lite and LISP.
VRF-lite Based Approach

VRF-lite uses a two-de vice ap proach to provid
ing Layer 3 con nectiv
ity be
tween multi-
site fab rics. In this approach, each ten ant VRF in the fab ric is ex
tended using a subin -
terface per ten ant with In terior Gate
way Pro to
col (IGP) or Ex ter
nal Border Gate way
Pro to
col (eBGP) peer ing between the bor der node and the edge router at each site. The
External Con nectiv
ity chap ter dis
cusses in detail the pro
cedure to con nect a Layer 3
hand off to the VXLAN fab ric. The same prin ci
ples can be used to pro vide multi-site in
-
tercon nect using VRF-lite.
MPLS-Based Approach
MPLS-Based Approach
This ap proach uses a sin gle de vice to in ter
connect multi-site VXLAN fab rics and
achieve segmen ta
tion using MPLS L3VPNs. The sin gle device called Bor der PE can be
used to termi
nate MPLS and VXLAN rout ing on the same de vice. The Ex ter
nal Con nec
-
tiv
ity chap
ter pro
vides addi
tional details about using a MPLS hand off to the VXLAN fab -
ric. The same princi
ples can be used to pro vide multi-site in
ter
connectiv
ity as well.
LISP-Based Approach
The third ap proach for interconnect
ing multi-site VXLAN fab rics is LISP. It of
fers the
same seg men ta
tion bene
fits as MPLS and can be used as an al ter
native so
lu
tion. The
External Con nec
tiv
ity chapter pro
vides ad
di
tional de
tails about using a LISP hand off to
the VXLAN fab ric. The same prin ci
ples can be used to pro vide multi-site inter
con nec-
tiv
ity as well.
Operations &
Management
Introduction
This final chap

ter is fo
cused on pro
vid
ing guidance for oper
a
tional as
pects of build
ing,
op
erat
ing and main tain
ing a VXLAN Fabric. The lat
ter half of the chapter cov
ers APIs
and off-the-shelf and open source tools for automating the man age
ment of the fabric.
For the last 20 years, networks have been man aged as in
de
pen dent ele
ments lever aging
pur
pose-built pro to
cols and inter
faces, such as Simple Network Man agement Pro tocol
(SNMP), Com mand-Line In terface (CLI), and Net Conf, to name a few. These pro to
cols
have served net work ad minis
trators well, and have mostly ful filled their ob
jec
tives for
Fault, Config
u
ration, Accounting, Perfor
mance and Se cu
rity man age
ment tasks (also
known as the FCAPS frame work). How ever, to meet the new scale re quirements, the
net
work has to be viewed and man aged as a system to enable faster and more con sis
-
tent de
liv
ery of services.
Sev
eral years ago, the server in
dus
try, dri
ven by scale re
quire
ments, went through the
same tran
si
tion. Server teams were faced with the need to man age large pools of re
-
sources that drove the need for more au to
mated con figu
ra
tion man age
ment tools.
Today, server man age
ment teams lever age popular con figu
ra
tion man age
ment tools
such as Puppet, Chef or Ansi
ble. These tools are chang ing orga
ni
zational processes
which sup
port agile de
vel
op
ment and De vOps initiatives.
It is questionable when this dis ruption will im

pact the net work in dustry; however, the
con fig
u
ra
tion man agement tools listed above are now able to pro vide com pa
ra
ble ways
to man age network ele
ments. Over time, IT or ga
nization will evolve to this new way of
man ag
ing IT infra
structure, how ever, it is im
portant that or gani
za
tions have time to
com plete this tran
si
tion. IT sys
tems re quire the ability to support the exist
ing manage-
ment par adigms as well as these new mod els during a transition phase to wards more
ef
fi
cient processes.
VXLAN tech nol

ogy will ben e
fit from so lu
tions that can consis
tently de
ploy con fig
ura
-
tions across multi
ple switches when cre at
ing new tenants or new net works. The man -
agement and op era
tions of VXLAN will de pend on tools that can pro vide vis
i
bil
ity and
di
ag
nostic analy
sis of the un
der ly
ing infrastructure.
Management tasks
VXLAN Fab rics are no dif

fer
ent from other technologies, given that they require foun
-
da
tional in
fra
structure that needs to be op
er
a
tionally man aged in a sim
plifed manner.
Mul
ti
ple tra
di
tional frame
works exist to de
fine what the op
er
a
tions of IT in
fra
struc
ture
entail, such as IT In fra
struc
ture Library (ITIL) or FCAPS. Some or ga
niza
tions have
started to in cor
porate IT oper
a
tional prac
tices from other areas of the industry such as
appli
cation devel
op
ment taken from De vOps (De vel
op
ment + Oper
ations), as is cov
ered
in the next section.
Agility is one of the main ob jec

tives that most data center leaders covet, and is part of
the over all oper
a
tions process. Au tomat
ing the data center en ables pro
vision
ing net-
work re sources in a reli
able man ner while main
tain
ing configu
ration con
sis
tency to re-
duce down time. In place of the stan dard com mand line in ter
face (CLI), network au -
toma tion can be used to sim plify these commands, for con sis
tency, standardiza
tion,
and re
duction of human error. Net
work provi
sion
ing normally starts with script-level
au
tomation and can progress to more ad
vanced mod els of deployment.
Whatever tools or process are used, whether au

toma
tion or man
ual in
ter
ven
tion, there
are some basic tasks that need to be per
formed. The net
work manage ment life
cy
cle is
di
vided into three dif
fer
ent phases:
• Day 0: install
• Day 1: configure/optimize
• Day N: up
grade and monitoring
Day 0
Day 0
Tradi
tion
ally, Day 0 activ
i
ties have included in stalling the de vice into a rack, powering it
up, some basic boot strap con fig
ura
tion, and op tionally, updating the firmware. This is
how many or gani
za
tions have dealt with Day 0 tasks until now. As a con se
quence, it is
not un common to see a net work with mul ti
ple versions of soft ware deployed and dif -
fer
ing standards of con fig
ura
tion. In order to re duce the in con sis
ten
cies in the network
when the equip ment is de ployed, au toma tion of the ini tial deployment is a crucial first
step. This pro vides a solid foundation for suc cessful network operation.
Day 1
After the base con fig
u
ra
tion and com mon software re
leases have been de ployed across
the Fabric, the next step is to provi
sion the over
lay and de
vice-specific config
u
ra
tions.
These con fig
ura
tion steps in
clude items such as MP-BGP, Mul ti
cast, VNIs, VRFs, VLANs,
Anycast Gate way and core capabilities.
The VXLAN Fab ric re

quires more config
u
ra
tion than was previously needed in tra
di
-
tional de
signs. The burden of the ad
di
tional VXLAN Fabric config
u
ration re
quire
ments
can be eased by automation.
For this phase a few op tions exist to help au to
mate con fig
u
ration deployment. There
are tools such as Cisco Prime Data Cen ter Net work Man ager (DCNM), Cisco Nexus Fab -
ric Manager (NFM), Python scripts or script ing languages that can con fig
ure the de vices
di
rectly or via API. Another option is con figu
ra
tion man agement tools (CMT) such as
Puppet, Chef, and An si
ble that deliver con fig
u
ra
tion standard iza
tion. Instead of just
pushing config
u
ra
tion com mands to the switches, CMT checks the run ning con figu
ra
-
tion and updates changes to the con fig
u
ra tion. This al
lows the cre ation of man i
fests,
recipes, or playbooks with the de sired end state of the spe cific el
ements in the net -
work. For ex ample, the spine switches would have a very dif ferent config
uration than
the leaf switches, but the leaf switch con figura
tions would likely be very sim i
lar to one
another across the fabric.
As a result of vir
tu
al
iza
tion and cloud pro vi
sion
ing, an
other item to consider is VMM in-
te
gration. Whether or not the con fig
u
ra
tion of a switch should be dy nami
cally mod
i
fied
based on a trig ger event is dis
cussed at length in the Software Over lay chapter.
Day N
Once the net
work is con
fig
ured, run
ning, and op
ti
mized, changes and soft
ware up
-
grades to the Fab ric will be needed. CMT so lu
tions can au
to
mate soft
ware up
grades
and config
u
ra
tion changes to multi
ple devices.
Another impor
tant Day-N task is con
fig
u
ra
tion backups, revi
sion con
trol, and the abil
ity
to roll back to a pre
vi
ous snap
shot. This tra
di
tional config
ura
tion Management can be
done with tools such as DCNM, NFM, the afore mentioned CMT so lu
tions, or with open
source tools such as RANCID.
Mon i
tor
ing the net work and re acting to events is a criti
cal part of Day N op er
a
tions.
Tradi
tional network man agement tools used SNMP to mon i
tor device parameters such
as inter
face utiliza
tion or avail able mem ory. With NX-OS pro gram mabil
ity functions,
using new tools, such as Car bon/Graphite, Zenoss, or Splunk en ables access to richer
in
for
ma tion. Linux-based mon i
tor
ing agents can be in stalled natively on the switch. Ex -
amples such as OpenTSDB (http:// opentsdb.net/) provide a col lec
tor agent which
sends infor
ma tion to a central repos i
tory for consolidation.
Vis
i
bil
ity is an
other impor
tant Day-N func tion. Tra
di
tional vis
i
bil
ity tools are still avail
-
able with a VXLAN-based so lu
tion in
cluding network TAPs, switch port an
alyzer (SPAN),
Netflow and/or sFlow, where ap plic
a
ble. Nexus Data Bro ker (NDB) clients can be lever -
aged to con sol
i
date SPAN from leaf switches into a com mon switch ag grega
tion point
to build scalable net
work TAPs and SPAN ag gre
ga
tion infrastructures.
VXLAN OAM (Operations, Administration and Management)

With the ad di
tion of the over lay, a level of in
di
rection has been in tro
duced, result
ing in
a cer
tain degree of abstraction from the un der
ly
ing network. This ab strac
tion be
came a
simple and ef
fi
cient way of pro vid
ing services with out considering the in
ter
medi
ate de-
vices. When ser vice degradation oc curs, the ef
fective physi
cal com po
nent or path used
can become hid den. To iden tify the path the ap pli
ca
tion traffic takes from end point to
endpoint requires tremen dous ef fort. The prob lem is ex acerbated because VXLAN
changes the UDP source-port to achieve en tropy. In addi
tion with ECMP the num ber of
paths increases signif
i
cantly, mag nify
ing the problem scope.
VXLAN OAM (Op er

a
tions, Ad ministra
tion, and Man agement) and CFM (Con nectivity
Fault Man agement) pro vide a sim ple and com pre
hensive so lu
tion for prob lems de -
scribed ear lier. Rather than try ing to un derstand load-bal ancing hashing al
gorithms to
fig
ure out which path has been used, a sin gle probe could pro vide the re
spective feed -
back, and ac knowl edge reach a
bil
ity of the ex pected desti
na
tion. In the case where one
path is af
fected by per for
mance degra dation, the probe could de ter
mine this by return -
ing potential packet loss sta tis
tics. In the pres ence of an applica
tion or payload pro file,
the probe will mimic ap pli
ca
tion be hav ior, and the result will plot the ef
fec
tive phys i
cal
path used by the work loads in question.
An ad di
tional tool within VXLAN OAM is the “tissa”-based tra cepath, fol lowing the
“draft-tissa-nvo3-oam-fm” IETF draft. This tool not only gets the exact path plot ting
from an un derlay perspec
tive, but it also de
rives the spe cific VTEP where the des ti
na
-
tion is actu
ally at
tached. Further
more, with ad di
tional input pa rameters it is possible to
identify the egress VTEP, the un derlay path from ingress to egress VTEP in cluding all
in
termediate hops, as well as all in volved interfaces. In ad di
tion, the load and error
coun ters for those inter
faces can be pro vided as well.
The sam ple out put below shows a "tissa"-based over lay path
trace. The function
al
ity ex
-
poses the phys i
cal path (un
der
lay) from leaf via spine to border, while the request was
ini
ti
ated in the VXLAN overlay.
Path trace Request to peer ip 10.254.254.200 source ip 10.254.254.102

Sender handle: 38
Hop Code ReplyIP IngressI/f EgressI/f State

====================================================
1 !Reply from 10.254.254.101, Eth2/1 Eth4/5 UP / UP
Input Stats:
discards:0
errors:0
unknown:0
bandwidth:42949672970000000
Output Stats:
discards:0
errors:0
bandwidth:42949672970000000
2 !Reply from 10.254.254.200, Eth6/1 - UP / -

Input Stats:
discards:0
errors:0
unknown:0
bandwidth:42949672970000000
VXLAN OAM is im ple

mented within NX-OS plat forms and can be ex e
cuted using CLI or
with an API-dri ven ap proach using NX-API. It is pos si
ble to execute the var i
ous probes
across a pro gram matic in ter
face and also re trieve the sta tis
ti
cal in
for
ma tion from the
VTEP in the same way. As an ac knowl edgment of the probe ex ecu
tion, a sta tis
tic identi
-
fier is sent. The sta tis
tic identi
fier provides cur rent and his toric statistics from the
VTEP local OAM data base in a pro gram matic or CLI dri ven way. Fur ther en hance ments
in VXLAN OAM in clude the in troduction of periodic probes and re spective no ti
fi
ca
tion
to more proac tively man age the over lay net
work with its phys i
cal un
der lay. In order to
make the col lected path in for
mation and statistics mean ingful, VXLAN OAM is going to
in
te
grate with VXLAN re lated man age ment systems like DCNM or VTS.
Available Tools
There are multi

ple ap
proaches to address the challenges described in the pre
vi
ous sec
-
tion. These ap
proaches can be classi
fied into the fol
low
ing categories:
• Tra
di
tional (CLI, scripting)
• Off-the-shelf tools
• De
vOps (Pup
pet, Chef, Ansible)
Traditional Tools
Command Line Interface
For VXLAN there are a num

ber of new com
mands to help with con
fig
ur
ing, mon
i
tor
ing
and trou
bleshooting the fab
ric. However, it needs to be noted that network op
er
a
tors
who rely on CLI to manage their fab
ric will be con
fronted with two issues:
1 VXLAN con
fig
u
ra
tion is com
mand-in
ten
sive. The cre
ation of new ten
ants or seg
-
ments re
quires mul
ti
ple lines of con
fig
u
ra
tion, po
ten
tially across a large num
ber of
devices.
2 VXLAN tech
nol
ogy de
pends on the pres
ence of a con
sid
er
able num
ber of un
der
ly
-
ing pro
to
cols, mak
ing it more bur
den
some to de
ploy when com
pared to other
tech
nolo
gies like Span
ning Tree or FabricPath.
Python Scripting
Python script
ing has been used by net work opera
tors for years; however, with NX-OS
run
ning on the switches, script
ing can be taken to a whole new di mension. APIs and
Soft
ware Devel
opment Kits (SDKs) are avail
able for NX-OS. An ex ample of an SDK for
NX-OS is the nx toolkit which is freely avail
able for down load: https://github.
com/
datacenter/nxtoolkit.
An ex
am
ple Python script for VXLAN is lo
cated at the fol
low
ing: https://
github.
com/
erjosito/evpn_
shell. This script is es
sen tially an ex
ter
nal CLI that can be used to cre-
ate, delete, and view ten ants, VNIs, and rel e
vant config
u
ra
tion el
e
ments across all
VTEPs in a VXLAN EVPN Fab ric. This script makes use of in fra
struc
ture vari
ables such
as man age
ment IP ad dresses, credentials etc. and with a sin gle command de ploys all
the required VXLAN EVPN con fig
u
ra
tion to cre ate a tenant or a net
work inside of a
tenant.
For more script

ing ex
am
ples, please check GitHub (https://
github.
com/datacenter) or
the Cisco De
vel
oper Community for NXOS (https://
opennxos.cisco.
com).
Scripting with Other Programming Languages

Multi
ple script
ing lan
guages can make use of NX-OS APIs using HTTP, as sum
ing they
can parse JSON or XML strings, even if no SDK is avail
able for that spe
cific language.
There are two APIs avail

able in NXOS:
• NX-API: HTTP-based API over which CLI com

mands are sent to the de
vice. The
out
puts can be sent back in JSON or XML format.
• REST API: REST
ful API lever
ages an ob
ject model. Being com
pletely ob
ject-based
this makes de
vel
op
ment of SDKs pos si
ble. Python SDKs are avail
able in Github (see
https://
github.
com/ datacenter). Commands and con fig
u
ra
tion are sent using XML
or JSON for
mat, and command out puts are re
turned similarly.
Languages such as XML and JSON are used to struc ture com mands and out puts and
they elimi
nate the need to parse hu man-read able strings formatted in para
graphs and
ta
bles. String parsing is commonly used in script
ing but has ver sion de
penden
cies. That
puts a bur den on life cy
cle man
agement for these au toma tion scripts that have kept
many or gani
za
tions from using them. The APIs avail able in NXOS are an im provement
over tradi
tional script
ing meth
ods, and will improve the automa tion processes.
Off-the-Shelf Tools
Off-the-Shelf Tools
Cisco Data Center Network Manager (DCNM)
Cisco DCNM is a gen eral pur

pose Net work Manage
ment Soft ware (NMS) / Op er
a
tional
Support Soft
ware (OSS) prod uct tar
geted at NX-OS networking equip
ment. It sup
ports
clas
si
cal Span
ning Tree deploy
ments with or without Vir
tual Port Chan
nels, Fab
ric
Path
and VXLAN.
In the con
text of a VXLAN-based so
lu
tion, DCNM can be uti
lized for the fol
low
ing
purposes:
1 Firstly, to pro
vide for the Fab
ric un
der
lay con
fig
u
ra
tion. DCNM has built-in Power
On Auto Pro
vi
sion
ing (POAP) sup
port to de
liver zero-touch auto-pro
vi
sion
ing of
the net
work de
vices that build the VXLAN Fabric.
2 Once the Fab

ric is up and run
ning, DCNM can also be uti
lized for pro
vi
sion
ing the
VXLAN over
lay configuration.
3 DCNM sup
ports mon
i
tor
ing of the per
for
mance and uti
liza
tion of the net
work
switches, as well as fault man
age
ment and sys
log aggregation.
4 Man
ag
ing the soft
ware run
ning on the switches and per
form
ing soft
ware up
grades
and downgrades.
This pro
vi
sion
ing can be per
formed in a top-down (push) fashion, where DCNM tracks
de
ployment events and simply pushes the re
quired CLI con
fig for the ac
cess port onto
the switch.
Al
terna
tively, a more dy namic mech anism is pos si
ble, where the leaf switches “pull” the
config
u
ration from the LDAP data base of DCNM based on a spe cific event, such as a
local at
tach ment of an end point. A typi
cal ex am ple of this more dy namic mech a
nism is
the support on the VXLAN leaf nodes of a func tion
al
ity called Vir
tual Machine Tracker
Auto-Con fig (VM Tracker), which au tomat i
cally provi
sions a specific ten
ant con
fig
u
ra
-
tion. The com mands re quired for pro vi
sion ing the ten ant are stored in the form of a
config
u
ration pro
file. A config
u
ra
tion pro file is a set of com mands that will be required
for provi
sion
ing a par
tic
ular ten
ant, ex
cept the re
quired pa
ra
me
ters are writ
ten as vari
-
ables instead of ac
tual val
ues in a command.
Spe
cific to VXLAN man
age
ment DCNM pro
vides the fol
low
ing capabilities:
• DCNM pro
vides in
te
grated Power-On Auto Pro
vi
sion
ing (POAP) to boot new
switches for a green
field Fab
ric or add new switches to an exist
ing VXLAN Fab ric.
DCNM man ages this POAP work flow so that an admin sim
ply assigns a de
vice to a
pre
con
fig
ured template.
• In ad
di
tion, the POAP con
fig
u
ra
tion Diff/Sync fea
ture lets the admin know if a de
-
vice’s con
fig
u
ration does not match its POAP tem
plate and then lets the user re
-
solve these differences.
• DCNM also pre
sents topol
ogy views show
ing phys
i
cal and over
lay net
works on the
same page, helping net
work ad
mins quickly iden
tify the ex
tent of vir
tual over
lay
net
works on a Fabric.
• DCNM also pre
sents smart topol
ogy views show
ing vir
tual port chan
nels (vPCs) and
vir
tual de
vice contexts. In topology view, DCNM shows VXLAN Tun nel endpoint
sta
tus as well as VXLAN search. DCNM shows VXLAN net work iden
ti
fier (VNI) sta
-
tus and other VXLAN in formation on a per-switch basis.
• Built-in search al
lows ad
mins to search by VM Name, VM IP Address, VM MAC Ad
-
dress, VNI, or Switch ID.
More in
for
ma
tion on Cisco Data Cen
ter Net
work Man
ager can be found at: http://
www.
cisco.
com/go/dcnm.
Ignite
Day-0 tasks are ex tremely im
portant in order to have a con sis
tent Fabric. Ig
nite is a
simple hands-off approach to bootstrap a device with the ap
propri
ate code level and
ini
tial de
vice setup. To achieve that, Ig
nite leverages the POAP ca pabil
i
ties of Cisco
Nexus switches.
Ig
nite is an open-source tool that can be down
loaded at no cost from Github: https://
github.com/ datacenter/
ignite.
In order to have a POAP en vi

ronment that al
lows for the au
tomation of de
ploy
ment of
firmware and initial con
fig
ura
tion, there are some ex ter
nal components that Ig
nite
requires:
• A DHCP server to boot

strap the in
ter
face and DNS in
for
ma
tion of switches that are
boot
ing up.
• A TFTP server that con
tains the con
fig
u
ra
tion script used to au
to
mate the soft
ware
image in
stal
la
tion and con
fig
u
ra
tion process.
• An Ubuntu server where Ig
nite will be in
stalled, that con
tains the de
sired soft
ware
im
ages and rules to dy
nam
i
cally build con
fig
u
ra
tion files.
Cisco Nexus Fabric Manager

The Cisco Nexus Fab ric Man ager (NFM) is a man age
ment sys tem de signed to highly
sim
plify and op
ti
mize the full life
cy
cle management of a switch fab
ric built with NX-OS
based platforms (at the time of writ ing of this book, NFM sup port is lim
ited to the
Nexus 9000 family).
Cisco NFM has a fabric-wide focus and allows for the auto-provi
sion
ing and manage-
ment of the whole net
work. NFM pro vides point-and-click meth ods for per
form
ing fab
-
ric man
agement tasks such as adding, re
mov ing, and con
fig
ur
ing net
work com po
nents
such as switch
pools, switches, switch inter
faces, VRFs, port chan nels and broadcast
domains.
Cisco NFM builds a VXLAN EVPN Fab ric, but ab

stracts the com
plexity . It is still pos
si
ble
to log into the switches and view the con fig
u
ra
tion that has been de ployed by Cisco
NFM, trou bleshoot with the CLI, or use any other stan dard moni
toring so lution to ver-
ify the state of the network.
Cisco NFM cov

ers var
i
ous phases of the Fab
ric man
age
ment lifecycle:
• Cre
ation: NFM al
lows for a zero-touch boot up of the Fab
ric, per
form
ing some
Day-0 oper
ations like ca
bling topol
ogy ver
i
fi
ca
tion and au
to
matic VXLAN un
der
lay
provisioning
• Con
nec
tion: NFM fully man
ages the en
tire VXLAN con
fig
u
ra
tion, re
mov
ing the op
-
era
tional as
so
ci
ated hur
dles. This es
sen
tially im
plies that a user does not nec
es
sar
-
ily need to know that VXLAN with MP-BGP EVPN is de ployed as the key func
tion
al
-
ity to en
able endpoint communication
• Ex
pan
sion: there are more day-N type of op
er
a
tions, such as zero-touch ad
di
tion
of switches to the Fab
ric and auto-up
grade of ex
ist
ing fab
ric devices
• Fault Management: NFM of
fers a built-in fault man
age
ment system
• Re
port
ing: Cisco NFM com
mu
ni
cates to the switches de
ployed in the fab
ric by
lever
ag
ing soft
ware agents em
bed
ded into the switches
More infor
ma
tion re
gard
ing Cisco Nexus Fab
ric Man
ager is avail
able at: http://
www.
cisco.
com/go/nexusfabricmanager.
Cisco Virtual Topology System (VTS)

Ser
vice providers have very spe
cific re
quire
ments re
gard
ing data cen
ter net
work man
-
age
ment and operations:
1 Sup
port for a mix of soft
ware and hard
ware VTEPs
2 In
te
gra
tion with the hy
per
vi
sor layer
3 Sup
port of a mul
ti
ven
dor Fabric
4 Over
lay and un
der
lay op
er
ated by dif
fer
ent teams
VTS is an add-on to a VXLAN Fab

ric con
sist
ing of the fol
low
ing elements:
• Vir
tual Topol
ogy Con
troller: this is a man
age
ment plat
form that of
fers ways to de
-
ploy ten
ants and net works over a GUI or a north
bound REST ful API. It in
tegrates
with VMware vCen ter and with Openstack/KVM, so cus tomers can man age the
over
lay di
rectly from the VMM. The Vir
tual Topol
ogy Con
troller will roll out the re
-
quired changes using southbound APIs such as NX-API or NetConf/YANG.
• IOS XRv: this is a vir

tual router in
stance that can take over re
quired con
trol plane
func
tion
al
ity in case of a de
ploy
ment con sist
ing ex
clu
sively of soft
ware VTEPs. This
component is responsi
ble for dis
trib
ut
ing routes to hard ware VTEPs over EVPN
BGP, and to software VTEPs using the REST CONF API.
• Vir
tual Topol
ogy For
warder (VTF): this is a soft
ware VTEP that can be in
stalled in a
VMware vSphere host or an Open stack compute node. It is con
trolled by the Vir
tual
Topology Controller, of
fer
ing L2 and L3 connec
tiv
ity be
tween VMs run ning in the
local or re
mote servers. VTF is a virtual ma
chine running in user space, so it does
not need any mod i
fi
ca
tion to the vSphere code. VTF ex ploits perfor
mance op ti
-
mization tech
nologies such as the open-source-licensed DPDK (http:// dpdk.org/)
and Cisco Vector Packet Pro cess
ing (VPP).
Cisco VTS sup ports flood and learn as well MP-BGP EVPN con trol planes. It includes
func
tional
ity such as ARP sup
pression ca
pa
bil
i
ties, sym
metric IRB, VTEP authen ti
cation
and fast con ver
gence upon network fail
ures and end point mobility.
One im por

tant con
cept to un der
stand is that Cisco Vir
tual Topology System does not
man age the un
der
lay. It is as
sumed that the required underlay con
fig
u
ra
tion is al
ready
in place.
More in
for
ma
tion re
garding Cisco Vir
tual Topol
ogy Sys
tem is avail
able at: https://
www.
cisco.
com/go/vts.
DevOps Tools
Con fig
u
ra
tion Man
agement Tools (CMT) are a new gen er
a
tion of in
tent-based tools
that have gained great pop u
lar
ity, mainly in the Linux com munity. They can be clas
si
-
fied into two cat
e
gories: Agent-based and agent less tools.
• In agent-based con
fig
u
ra
tion man
age
ment, changes are made cen
trally on a mas
ter
node, and are pulled down and ex e
cuted by the agent. The device agents pe
ri
od
i
-
cally con
nect with the master for con
fig
u
ra
tion in
for
ma
tion and the changes are
pulled down and exe
cuted. Only the changes that are needed are pulled.
• Agent
less Con
fig
u
ra
tion Man
age
ment is push-based in
stead of pull-based. Con
fig
u
-
ra
tion management scripts are run on the mas ter and the mas
ter con
nects to the
man aged de
vices and ex
e
cutes the task over an API.
Puppet and Chef are ex am ples of agent-based con fig
u
ration man age
ment tools. With
these agent-based sys tems, the user lever ages a cus tom de clar
a
tive lan
guage to de-
scribe the system con fig
u
ration which needs to be con fig
ured on the re mote systems.
Both of these tools have sim i
lar func
tional
ity which is con tin
u
ally evolv
ing. Pup
pet re
-
cently released mod ules to con fig
ure, pro vi
sion, and man age a Cisco VXLAN-based
Fabrics plus sev
eral standard top-of-rack switch features.
Puppet uses mod ules that in clude descriptions about which fea tures are supported,
and man i
fests that are the ac tual de
scriptions of how those de vices should be con fig
-
ured. Man i
fests can be sta tic, dynami
cally in
corpo
rate condi
tions or even use Ruby
logic. Some con ditions will depend on which sys tem is being man aged, and a wealth of
that in
for
ma tion is gathered by Pup pet's com pan
ion tool "fac
ter". The Puppet agent will
pull the manifest from the Pup pet server (Puppet Master) and imple
ment it.
There are some ex

am
ples man
i
fests in Github under https://
github.
com/
cisco/
cisco-
network-puppet-module.
Chef ar
chi
tec ture is very sim
i
lar, but in
stead of mani
fests the jar
gon is "recipes", that is
where the ex pected state of the man aged devices is doc u
mented. Recipes can be
grouped to gether in Cook books for eas ier management. As al ready de scribed, Chef
runs in a client/server ar chi
tecture, but it has an addi
tional standalone mode called
"Chef solo".
As with Pup
pet, some ex
am
ples of Chef recipes for Cisco NX-OS are avail
able in Github
under https://
github.
com/
cisco/ cisco-network-chef-cookbook.
An
si
ble is an example of an agentless based con fig
u
ra
tion manage ment sys tem that
manages nodes via SSH and has the abil ity to ex
e
cute the scripts locally on the man-
aged node or on the local server con nects via the Cisco NX-API. An si
ble uses the con
-
cept of Mod ules, Tasks, Plays, and Play
books to man age the config
ura
tion on the re
-
mote devices.
• Mod
ules: units of work that An
si
ble ships out to re
mote ma
chines. Some mod
ules
pre-in
stalled, cus
tom mod
ules can be man
u
ally in
stalled as well
• Tasks: com
bi
na
tion of mod
ules with ar
gu
ments and de
scrip
tion names
• Plays: map
ping of hosts or groups to their tasks
• Play
books: col
lec
tion of Plays by which An
si
ble or
ches
trates, con
fig
ures, ad
min
is
-
ters, or de
ploys sys
tems. Play
books are writ
ten in YAML
Summary Table
The fol
lowing table il
lus
trates how the tools dis
cussed above con
tribute to the Day 0, 1
or N oper
a
tions of network fabrics:
Day0 Day1 DayN
CLI X X
Python X X
Cisco Data Center X X X

Network Manager
Cisco Nexus Fabric X X X

Manager
Ignite X
Cisco Virtual Topology X X

System
Ansible X X
Puppet X X
Chef X X
Acronyms
Acronyms 163
Acronyms
ACI: Ap
pli
ca
tion Cen
tric Infrastructure
ADC: Ap
pli
ca
tion De
liv
ery Controllers
API: Ap
pli
ca
tion Pro
gram Interface
ARP: Ad
dress Res
o
lu
tion Protocol
BD: Bridge Domain
BGP: Bor
der Gate
way Protocol
CLI: Com
mand-Line Interface
DAG: Dis
trib
uted Any
cast Gateway
DCNM: Data Cen

ter Net
work Manager
ECMP: Equal Cost Multi-Path
ETR: Egress Tun

nel Router
EVPN: Eth
er
net Vir
tual Pri
vate Network
FCAPS: Fault, Con

fig
u
ra
tion, Ac
count
ing, Per
for
mance and Security
GEN
EVE: Generic Net
work Vir
tu
al
iza
tion Encapsulation
GPE: Generic Pro

to
col Encapsulation
Acronyms 164
IDS: In
tru
sion De
tec
tion System
IEEE: In
sti
tute of Elec
tri
cal and Elec
tron
ics Engineers
IGP: In
te
rior Gate
way Protocol
IPS: In
tru
sion Pre
ven
tion System
IRB: In
te
grated Rout
ing and Bridging
ITR: Ingress Tun

nel Router
LISP: Lo
ca
tor/ID Sep
a
ra
tion Protocol
LSA: Link State Advertisement
MP-BGP: Multi-Pro
to
col BGP
MPLS: Multi-Pro
to
col Label Switching
MSDP: Mul
ti
cast Source Dis
cov
ery Protocol
MTU: Max
i
mum Trans
mis
sion Unit
NAT: Net
work Ad
dress Translation
NDB: Nexus Data Broker
NFM: Nexus Fab

ric Manager
NLRI: Net
work Layer Reach
a
bil
ity Information
NSH: Net
work Ser
vice Header
NVO: Net
work Vir
tu
al
iza
tion Overlay
Acronyms 165
OAM: Op
er
a
tions, Ad
min
is
tra
tion and Management
OTV: Over
lay Trans
port Virtualization
PBB: Provider Back

bone Bridges
PIM: Pro
to
col-In
de
pen
dent Multicast
POD: Point of Delivery
PVP: Path Vec

tor Protocol
RD: Route Distinguisher
RP: Ren
dezvous Point
RR: Route Reflector
RT: Route Target
SDK: Soft
ware De
vel
op
ment Kit
SDN: Soft
ware De
fined Networking
SNMP: Sim
ple Net
work Man
age
ment Protocol
VMM: Vir
tual Ma
chine Manager
VNI: Vir
tual Net
work Instance
VNID: VXLAN Net

work Identifier
vPC: Vir
tual Port-Channel
VRF: Vir
tual Rout
ing and Forwarding
Acronyms 166
VTC: Vir
tual Topol
ogy Controller
VTEP: Vir
tual Tun
nel Endpoint
VTF: Vir
tual Topol
ogy Forwarder
VTS: Vir
tual Topol
ogy System

Vxlan Evpn

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Vxlan Evpn

Enviado por

Direitos autorais:

Formatos disponíveis

Preface 1

Why a New Approach 11

Single-POD VXLAN Design 42

External Connectivity for VXLAN Fabric 65

Operations & Management 145

VLANs also use a 12-bit VLAN iden​ ti​

Multi-Pro​to​col Bor​der Gate​ way Pro​ to​

This book ex​ plores VXLAN EVPN, be​ gin​

VXLAN BGP EVPN fea​ tures and func​tion​

• Cisco Nexus 9000 Se​

This book rep​

A spe​ cial thanks to Cisco’s In​sieme and EISG BU Ex​ ec​

We would also like to thank Cyn​thia Brod​er​

We are also gen​

• Adam Hyde (Founder)

Laia and the team cre​

Organization of this book

Read​ers can read this book se​

Why a New Approach

Single-POD VXLAN Design

Single-POD VXLAN Design

External Connectivity for VXLAN Fabrics

Multi-POD and Multi-Site Designs

Operations and Management

Operations and Management

The in​tended au​ di​

Book Writing Methodology

How many en​ gi​

Evolv​ ing user de​mands and ap​ pli​

This chap​ter in​

Why VXLAN Overlay

When net​works were first de​ vel​

Today in the mod​ern data cen​ ter, ap​

An over​lay takes the orig​ i​

tity. This en​

Today, ac​ cord​ ing to mar​ ket re​

Why a Control Plane

When im​ ple​ment​ing an over​

The first task, for​

The third com​ po​

VXLAN Control Plane

VXLAN Control Plane

While the de​ mand for scal​ able net​

Even though VXLAN tech​ nol​ogy has at​tained a con​sid​

Generic Protocol Encapsulation (VXLAN-GPE)

VXLAN-GPE was in​ vented to bring some con​ sol​

NSH en​ ables the pos​si​

Evolution of the EVPN Control Plane

In order to prop​ erly ad​

In the net​ work​

An over​lay net​work typ​

This chap​ter pro​vides an overview of the con​

There are pros and cons to con​sider when se​ lect​

The VXLAN stan​

Figure: VXLAN Packet Format

VXLAN uses an 8-byte header that con​sists of a 24-bit iden​ti​

The VXLAN tun​ nel end​point func​

How Does VXLAN Work?

Figure: VXLAN Overlay Network

At the same time, the un​

Figure: VXLAN Gateway Functions

While flood and learn method​ ol​

VLANs also use a 12-bit VLAN iden ti

Multi-Protocol Border Gate way Pro to

This book ex plores VXLAN EVPN, be gin

VXLAN BGP EVPN fea tures and function

• Cisco Nexus 9000 Se

This book rep

A spe cial thanks to Cisco’s Insieme and EISG BU Ex ec

We would also like to thank Cynthia Broder

We are also gen

Laia and the team cre

Readers can read this book se

The intended au di

How many en gi

Evolv ing user demands and ap pli

This chapter in

When networks were first de vel

Today in the modern data cen ter, ap

An overlay takes the orig i

tity. This en

Today, ac cord ing to mar ket re

When im plementing an over

The first task, for

The third com po

While the de mand for scal able net

Even though VXLAN tech nology has attained a consid

VXLAN-GPE was in vented to bring some con sol

NSH en ables the possi

In order to prop erly ad

In the net work

An overlay network typ

This chapter provides an overview of the con

There are pros and cons to consider when se lect

The VXLAN stan

VXLAN uses an 8-byte header that consists of a 24-bit identi

The VXLAN tun nel endpoint func

At the same time, the un

While flood and learn method ol

In order to ad dress the con cerns of scala

In order to un derstand MP-BGP EVPN func tional

In the EVPN con trol plane, there are tech

For the pur

With the rise of vir tu

The host-based over lay typ

In summary, when eval

With a VXLAN EVPN Fab ric and the as so

For ex ample, proper VLAN and VNI con fig

In conclusion, by leverag ing network-based over lays it is pos

Hybrid VXLAN overlays consist of both host-based soft

Cisco Vir tual Topology Sys

When de sign

This chapter ex

In order to improve the through put and net work perfor

In order to ensure that VXLAN en

A recom mended approach is to use IP unnumbered for the inter

While BGP also has merit as an un der

In a point-to-point net work, the “Broad cast” in

When de ploy

When de ploy

As shown in the configu

EVPN is an other MP-BGP ad dress fam

The EVPN route up

Once a VTEP de tects its local endpoints, it will in

The VXLAN seg ments are indepen

The sam ple below shows the cre

When mul ti

cast groups to the L2V

• Using a unique mul

The gen erally rec