Escolar Documentos
Profissional Documentos
Cultura Documentos
Free Resources
View Archives
CCIE Bloggers
Search
Aug
42 Comments
Search
Submit
Edit: For those of you that want to take a look first-hand at these packets, the Wireshark PCAP files
referenced in this post can be found here
Tw eet
One of the hottest topics in networking today is Data Center Virtualized Workload Mobility (VWM). For those of you
that have been hiding under a rock for the past few years, workload mobility basically means the ability to
dynamically and seamlessly reassign hardware resources to virtualized machines, often between physically
disparate locations, while keeping this transparent to the end users. This is often accomplished through VMware
vMotion, which allows for live migration of virtual machines between sites, or as similarly implemented in Microsofts
Hyper-V and Citrixs Xen hypervisors.
Categories
Select Category
One of the typical requirements of workload mobility is that the hardware resources used must be on the same
layer 2 network segment. E.g. the VMware Host machines must be in the same IP subnet and VLAN in order to
allow for live migration their VMs. The big design challenge then becomes, how do we allow for live migrations of
VMs between Data Centers that are not in the same layer 2 network? One solution to this problem that Cisco has
devised is a relatively new technology called Overlay Transport Virtualization (OTV).
As a side result of preparing for INEs upcoming CCIE Data Center Nexus Bootcamp Ive had the privilege (or
punishment depending on how you look at it
) of delving deep into the OTV implementation on Nexus 7000. My
goal was to find out exactly what was going on behind the scenes with OTV. The problem I ran into though was
that none of the external Cisco documentation, design guides, white papers, Cisco Live presentations, etc. really
contained any of this information. The only thing that is out there on OTV is mainly marketing info, i.e. buzzword
bingo, or very basic config snippets on how to implement OTV. In this blog post Im going to discuss the details of
my findings about how OTV actually works, with the most astonishing of these results being that OTV is in fact, a
fancy GRE tunnel.
From a high level overview, OTV is basically a layer 2 over layer 3 tunneling protocol. In essence OTV
accomplishes the same goal as other L2 tunneling protocols such as L2TPv3, Any Transport over MPLS (AToM),
or Virtual Private LAN Services (VPLS). For OTV specifically this goal is to take Ethernet frames from an end
station, like a virtual machine, encapsulate them inside IPv4, transport them over the Data Center Interconnect
(DCI) network, decapsulate them on the other side, and out pops your original Ethernet frame.
For this specific application OTV has some inherent benefits over other designs such as MPLS L2VPN with AToM
or VPLS. The first of which is that OTV is transport agnostic. As long as there is IPv4 connectivity between Data
Centers, OTV can be used. For AToM or VPLS, these both require that the transport network be MPLS aware,
which can limit your selections of Service Providers for the DCI. For OTV you can technically use it over any
regular Internet connectivity.
Another advantage of OTV is that provisioning is simple. AToM and VPLS tunnels are Provider Edge (PE) side
protocols, while OTV is a Customer Edge (CE) side protocol. This means for AToM and VPLS the Service Provider
has to pre-provision the pseudowires. Even though VPLS supports enhancements like BGP auto-discovery,
provisioning of MPLS L2VPN is still requires administrative overhead. OTV is much simpler in this case, because
as well see shortly, the configuration is just a few commands that are controlled by the CE router, not the PE
router.
The next thing we have to consider with OTV is how exactly this layer 2 tunneling is accomplished. After all we
could just configure static GRE tunnels on our DCI edge routers and bridge IP over them, but this is probably not
the best design option for either control plane or data plan scalability.
The way that OTV implements the control plane portion of its layer 2 tunnel is what is sometimes described as
MAC in IP Routing. Specifically OTV uses Intermediate System to Intermediate System (IS-IS) to advertise the
VLAN and MAC address information of the end hosts over the Data Center Interconnect. For those of you that are
familiar with IS-IS, immediately this should sound suspect. After all, IS-IS isnt an IP protocol, its part of the legacy
OSI stack. This means that IS-IS is directly encapsulated over layer 2, unlike OSPF or EIGRP which ride over IP at
layer 3. How then can IS-IS be encapsulated over the DCI network that is using IPv4 for transport? The answer? A
fancy GRE tunnel.
The next portion that is significant about OTVs operation is how it actually sends packets in the data plane.
Assuming for a moment that the control plane just works, and the DCI edge devices learn about all the MAC
addresses and VLAN assignments of the end hosts, how do we actually encapsulate layer 2 Ethernet frames
inside of IP to send over the DCI? What if there is multicast traffic that is running over the layer 2 network? Also
what if there are multiple sites reachable over the DCI? How does it know specifically where to send the traffic?
The answer? A fancy GRE tunnel.
CCIE Bloggers
Brian Dennis CCIE #2210
Routing & Sw itching
ISP Dial
Security
Service Provider
Voice
Brian McGahan CCIE #8593
Routing & Sw itching
Security
Service Provider
Petr Lapukhov CCIE #16379
Routing & Sw itching
Security
Service Provider
Voice
Mark Snow CCIE #14073
Next I want to introduce the specific topology that will be used for us to decode the details of how OTV is working
behind the scenes. Within the individual Data Center sites, the layer 2 configuration and physical wiring is not
relevant to our discussion of OTV. Assume simply that the end hosts have layer 2 connectivity to the edge routers.
Additionally assume that the edge routers have IPv4 connectivity to each other over the DCI network. In this
specific case I chose to use RIPv2 for routing over the DCI (yes, you read that correctly), simply so I could filter it
from my packet capture output, and easily differentiate between the routing control plane in the DCI transport
network vs. the routing control plane that was tunneled inside OTV between the Data Center sites.
Voice
Security
Popular Posts
FabricPath - CCIE DC Nexus
Sw itching Class Preview
CCIE DC Written and Nexus
Sw itching Videos are Both Now
Available
N7K1-3:
vlan 172
name OTV_EXTEND_VLAN
!
vlan 999
name OTV_SITE_VLAN
!
spanning-tree vlan 172 priority 4096
!
otv site-vlan 999
otv site-identifier 0x101
!
interface Overlay1
otv join-interface Ethernet1/23
otv control-group 224.100.100.100
otv data-group 232.1.2.0/24
otv extend-vlan 172
no shutdown
!
interface Ethernet1/23
ip address 150.1.38.3/24
ip igmp version 3
ip router rip 1
no shutdown
N7K2-7:
vlan 172
name OTV_EXTEND_VLAN
!
vlan 999
name OTV_SITE_VLAN
!
spanning-tree vlan 172 priority 4096
!
otv site-vlan 999
otv site-identifier 0x102
!
interface Overlay1
otv join-interface port-channel78
otv control-group 224.100.100.100
otv data-group 232.1.2.0/24
otv extend-vlan 172
no shutdown
!
interface port-channel78
ip address 150.1.78.7/24
ip igmp version 3
ip router rip 1
As you can see the configuration for OTV really isnt that involved. The specific portions of the configuration that
are relevant are as follows:
Extend VLANs
These are the layer 2 segments that will actually get tunneled over OTV. Basically these are the
VLANs that you virtual machines reside on that you want to do the VM mobility between. In our case
this is VLAN 172, which maps to the IP subnet 172.16.0.0/24.
Site VLAN
Used to synchronize the Authoritative Edge Device (AED) role within an OTV site. This is for is when
you have more than one edge router per site. OTV only allows a specific Extend VLAN to be
tunneled by one edge router at a time for the purpose of loop prevention. Essentially this Site VLAN
lets the edge routers talk to each other and figure out which one is active/standby on a per-VLAN
basis for the OTV tunnel. The Site VLAN should not be included in the extend VLAN list.
Site Identifier
Should be unique per DC site. If you have more than one edge router per site, they must agree on
the Site Identifier, as its used in the AED election.
Overlay Interface
The logical OTV tunnel interface.
OTV Join Interface
The physical link or port-channel that you use to route upstream towards the DCI.
OTV Control Group
Multicast address used to discover the remote sites in the control plane.
OTV Data Group
Used when youre tunneling multicast traffic over OTV in the data plane.
IGMP Version 3
Needed to send (S,G) IGMP Report messages towards the DCI network on the Join Interface.
At this point thats basically all thats involved in the implementation of OTV. It just works, because all the behind
the scenes stuff is hidden from us from a configuration point of view. A quick test of this from the end hosts shows
us that:
R2#ping 255.255.255.255
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 255.255.255.255, timeout is 2 seconds:
Reply to request 0 from 172.16.0.3, 4 ms
Reply to request 1 from 172.16.0.3, 1 ms
Reply to request 2 from 172.16.0.3, 1 ms
Reply to request 3 from 172.16.0.3, 1 ms
Reply to request 4 from 172.16.0.3, 1 ms
R2#traceroute 172.16.0.3
Type escape sequence to abort.
Tracing the route to 172.16.0.3
VRF info: (vrf in name/id, vrf out name/id)
1 172.16.0.3 0 msec *
0 msec
The fact that R3 responds to R2s packets going to the all hosts broadcast address (255.255.255.255) implies that
they are in the same broadcast domain. How specifically is it working though? Thats what took a lot further
investigation.
To simplify the packet level verification a little further, I changed the MAC address of the four end devices that are
used to generate the actual data plane traffic. The Device, IP address, and MAC address assignments are as
follows:
The first thing I wanted to verify in detail was what the data plane looked like, and specifically what type of tunnel
encapsulation was used. With a little searching I found that OTV is currently on the IETF standards track in draft
format. As of writing, the newest draft is draft-hasmit-otv-03. Section 3.1 Encapsulation states:
3.
Data Plane
3.1.
Encapsulation
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|
IHL
|Type of Service|
Total Length
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags|
Fragment Offset
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Header Checksum
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
UDP length
UDP Checksum = 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|R|R|R|
Overlay ID
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Instance ID
| Reserved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
A quick PING sweep of packet lengths with the Dont Fragment bit set allowed me to find the encapsulation
overhead, which turns out to be 42 bytes, as seen below:
None of my testing however could verify what the encapsulation header was though. The draft says that the
transport is supposed to be UDP port 8472, but none of my logging produced results showing that any UDP traffic
was even in the transit network (save for my RIPv2 routing
). After much frustration, I finally broke out the sniffer
and took some packet samples. The first capture below shows a normal ICMP ping between R2 and R3.
MPLS? GRE? Where did those come from? Thats right, OTV is in fact a fancy GRE tunnel. More specifically it is
an Ethernet over MPLS over GRE tunnel. My poor little PINGs between R2 and R3 are in fact encapsulated as
ICMP over IP over Ethernet over MPLS over GRE over IP over Ethernet (IoIoEoMPLSoGREoIP for short). Lets
take a closer look at the encapsulation headers now:
In the detailed header output we see our transport Ethernet header, which in a real deployment can be anything
depending on what the transport of your DCI is (Ethernet, POS, ATM, Avian Carrier, etc.) Next we have the IP OTV
tunnel header, which surprised me in a few aspects. First, all documentation I read said that without the use of an
OTV Adjacency Server, unicast cant be used for transport. This is true up to a point. Multicast it turns out is
only used to establish the control plane, and to tunnel multicast over multicast in the data plane. Regular unicast
traffic over OTV will be encapsulated as unicast, as seen in this capture.
The next header after IP is GRE. In other words, OTV is basically the same as configuring a static GRE tunnel
between the edge routers and then bridging over them, along with some enhancements (hence fancy GRE). The
OTV enhancements (which well talk about shortly) are the reason why you wouldnt just configure GRE statically.
Nevertheless this surprised me because even in hindsight the only mention of OTV using GRE I found was here.
Whats really strange about this is that Ciscos OTV implementation doesnt follow what the standards track draft
says, which is UDP, even though the authors of the OTV draft are Cisco engineers. Go figure.
The next header, MPLS, makes sense since the prior encapsulation is already GRE. Ethernet over MPLS over
GRE is already well defined and used in deployment, so theres no real reason to reinvent the wheel here. I
havent verified this in detail yet but Im assuming that the MPLS Label value would be used in cases where the
edge router has multiple overlay interfaces, in which case the label in the data plane would quickly tell it which
overlay interface the incoming packet is destined for. This logic is similar to MPLS L3VPN where the bottom of the
stack VPN label tells a PE router which CE facing link the packet is ultimately destined for. Im going to do some
more testing later with a larger more complex topology to actually verify this fact though, as all data plane traffic
over this tunnel is always sharing the same MPLS label value.
Next we see the original Ethernet header, which is sourced from R2s MAC address 0000.0000.0002 and going to
R3s MAC address 0000.0000.0003. Finally we have the original IP header and the final ICMP payload. The key
with OTV is that this inner Ethernet header and its payload remain untouched, so it looks like from the end host
perspective that all the devices are just on the same LAN.
Now that it was apparent that OTV was just a fancy GRE tunnel, the IS-IS piece fell into place. Since IS-IS runs
directly over layer 2 (e.g. Ethernet), and OTV is an Ethernet over MPLS over GRE tunnel, then IS-IS can
encapsulate as IS-IS over Ethernet over MPLS over GRE (phew!). To test this, I changed the MAC address of one
of the end hosts, and looked at the IS-IS LSP generation of the edge devices. After all the goal of the OTV control
plane is to use IS-IS to advertise the MAC addresses of end hosts in that particular site, as well as the particular
VLAN that they reside in. The configuration steps and packet capture result of this are as follows:
R3#conf t
Enter configuration commands, one per line.
R3(config)#int gig0/0
R3(config-if)#mac-address 1234.5678.9abc
R3(config-if)#
*Aug 17 22:17:10.883: %LINK-5-CHANGED: Interface GigabitEthernet0/0, changed state to reset
*Aug 17 22:17:11.883: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0, changed
state to down
*Aug 17 22:17:16.247: %LINK-3-UPDOWN: Interface GigabitEthernet0/0, changed state to up
*Aug 17 22:17:17.247: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0, changed
state to up
The first thing I noticed about the IS-IS encoding over OTV is that it uses IPv4 Multicast. This makes sense,
because if you have 3 or more OTV sites you dont want to have to send your IS-IS LSPs as replicated Unicast. As
long as all of the AEDs on all sites have joined the control group (224.100.100.100 in this case), the LSP
replication should be fine. This multicast forwarding can also be verified in the DCI transport network core in this
case as follows:
N7K2-8#show ip mroute
IP Multicast Routing Table for VRF "default"
(*, 224.100.100.100/32), uptime: 20:59:33, ip pim igmp
Incoming interface: Null, RPF nbr: 0.0.0.0
Outgoing interface list: (count: 2)
port-channel78, uptime: 20:58:46, igmp
Ethernet1/29, uptime: 20:58:53, igmp
(150.1.38.3/32, 224.100.100.100/32), uptime: 21:00:05, ip pim mrib
Incoming interface: Ethernet1/29, RPF nbr: 150.1.38.3
Outgoing interface list: (count: 2)
port-channel78, uptime: 20:58:46, mrib
Ethernet1/29, uptime: 20:58:53, mrib, (RPF)
(150.1.78.7/32, 224.100.100.100/32), uptime: 21:00:05, ip pim mrib
Incoming interface: port-channel78, RPF nbr: 150.1.78.7
Outgoing interface list: (count: 2)
port-channel78, uptime: 20:58:46, mrib, (RPF)
Ethernet1/29, uptime: 20:58:53, mrib
(*, 232.0.0.0/8), uptime: 21:00:05, pim ip
Incoming interface: Null, RPF nbr: 0.0.0.0
Outgoing interface list: (count: 0)
Note that N7K1-3 (150.1.38.3) and N7K2-7 (150.1.78.7) have both joined the (*, 224.100.100.100). A very
important point about this is that the control group for OTV is an Any Source Multicast (ASM) group, not a Source
Specific Multicast (SSM) group. This implies that your DCI transit network must run PIM Sparse Mode and have a
Rendezvous Point (RP) configured in order to build the shared tree (RPT) for the OTV control group used by the
AEDs. You technically could use Bidir but you really wouldnt want to for this particular application. This kind of
surprised me how they chose to implement it, because there are already more efficient ways of doing source
discovery for SSM, for example how Multicast MPLS L3VPN uses the BGP AFI/SAFI Multicast MDT to advertise the
(S,G) pairs of the PE routers. I suppose the advantage of doing OTV this way though is that it makes the OTV
config very straightforward from an implementation point of view on the AEDs, and you dont need an extra control
plane protocol like BGP to exchange the (S,G) pairs before you actually join the tree. The alternative to this of
course is to use the Adjacency Server and just skip using multicast all together. This however will result in unicast
replication in the core, which can be bad, mkay?
Also for added fun in the IS-IS control plane the actual MAC address routing table can be verified as follows:
Metric
Uptime
Owner
Next-hop(s)
---- --------------
------
--------
---------
-----------
172 0000.0000.0002
01:22:06
site
port-channel27
172 0000.0000.0003
42
01:20:51
overlay
N7K1-3
172 0000.0000.000a
42
01:18:11
overlay
N7K1-3
172 0000.0000.001e
01:20:36
site
port-channel27
172 1234.5678.9abc
42
00:19:09
overlay
N7K1-3
Seq Number
N7K2-7.00-00
* 0x000000A3
Instance
0x000000A3
Area Address
00
NLPID
0xCC 0x8E
Hostname
N7K2-7
Extended IS
N7K1-3.01
Vlan
: 172 : Metric
MAC Address
Checksum
Lifetime
A/P/O/T
0xA36A
893
0/0/0/1
Length : 6
Metric : 40
: 1
: 0000.0000.001e
Vlan
: 172 : Metric
MAC Address
: 1
: 0000.0000.0002
Digest Offset :
N7K1-3.00-00
0x00000099
Instance
0x00000094
Area Address
00
NLPID
0xCC 0x8E
Hostname
N7K1-3
Extended IS
N7K1-3.01
Vlan
: 172 : Metric
MAC Address
0xBAA4
1198
0/0/0/1
Length : 6
Metric : 40
: 1
: 1234.5678.9abc
Vlan
: 172 : Metric
MAC Address
: 1
: 0000.0000.000a
Vlan
: 172 : Metric
MAC Address
: 1
: 0000.0000.0003
Digest Offset :
N7K1-3.01-00
0x00000090
0xCBAB
718
Instance
0x0000008E
Extended IS
N7K2-7.00
Metric : 0
Extended IS
N7K1-3.00
Metric : 0
Digest Offset :
0/0/0/1
So at this point we see what our ICMP PING was actually ICMP over IP over Ethernet over MPLS over GRE over IP
over Ethernet, and our routing protocol was IS-IS over Ethernet over MPLS over GRE over IP over Ethernet :/
What about multicast in the data plane though? At this point verification of multicast over the DCI core is pretty
straightforward, since we can just enable a multicast routing protocol like EIGRP and look at the result. This can be
seen below:
R2#config t
Enter configuration commands, one per line.
R2(config)#router eigrp 1
R2(config-router)#no auto-summary
R2(config-router)#network 0.0.0.0
R2(config-router)#end
R2#
R3#config t
Enter configuration commands, one per line.
R3(config)#router eigrp 1
R3(config-router)#no auto-summary
R3(config-router)#network 0.0.0.0
R3(config-router)#end
R3#
*Aug 17 22:39:43.419: %SYS-5-CONFIG_I: Configured from console by console
*Aug 17 22:39:43.423: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.0.2 (GigabitEthernet0/0) is
up: new adjacency
R3#show ip eigrp neighbors
IP-EIGRP neighbors for process 1
H
Address
Interface
172.16.0.2
Hold Uptime
SRTT
(sec)
(ms)
Gi0/0
11 00:00:53
RTO
Seq
Cnt Num
200
Our EIGRP adjacency came up, so multicast obviously is being tunneled over OTV. Lets see the packet capture
result:
We can see EIGRP being tunneled inside the OTV payload, but whats with the outer header? Why is EIGRP using
the ASM 224.100.100.100 group instead of the SSM 232.1.2.0/24 data group? My first guess was that link local
multicast (i.e. 224.0.0.0/24) would get encapsulated as control plane instead of as data plane. This would make
sense because control plane protocols like OSPF, EIGRP, PIM, etc. you would want those tunneling to all OTV
sites, not just the ones that joined the SSM feeds. To test if this was the case, the only change I needed to make
was to have one router join a non-link-local multicast group, and have the other router send ICMP pings. Since
theyre effectively in the same LAN segment, no PIM routing is needed in the DC sites, just basic IGMP Snooping,
which is enabled in NX-OS by default. The config on the IOS routers is as follows:
R2#config t
Enter configuration commands, one per line.
R2(config)#ip multicast-routing
R2(config)#int gig0/0
R2(config-if)#ip igmp join-group 224.10.20.30
R2(config-if)#end
R2#
R3#ping 224.10.20.30 repeat 1000 size 1458 df-bit
Type escape sequence to abort.
Sending 1000, 1458-byte ICMP Echos to 224.10.20.30, timeout is 2 seconds:
Packet sent with the DF bit set
Reply to request 0 from 172.16.0.2, 1 ms
Reply to request 1 from 172.16.0.2, 1 ms
Reply to request 2 from 172.16.0.2, 1 ms
This was more as expected. Now the multicast data plane packet was getting encapsulated in the ICMP over IP
over Ethernet over MPLS over GRE over IP *Multicast* over Ethernet OTV group. The payload wasnt decoded, as
I think even Wireshark was dumbfounded by this string of encapsulations.
In summary we can make the following observations about OTV:
OTV encapsulation has 42 bytes of overhead that consists of:
New Outer Ethernet II Header 14 Bytes
New Outer IP Header 20 Bytes
GRE Header 4 Bytes
MPLS Header 4 Bytes
OTV uses both Unicast and Multicast transport
ASM Multicast is used to build the control plane for OTV IS-IS, ARP, IGMP, EIGRP, etc.
Unicast is used for normal unicast data plane transmission between sites
SSM Multicast is used for normal multicast data plane transmission between sites
Optionally ASM & SSM can be replaced with the Adjacency Server
GRE is the ultimate band-aid of networking
Now the next time someone is throwing around fancy buzzwords about OTV, DCI, VWM, etc. you can say oh, you
mean that fancy GRE tunnel?
Ill be continuing this series in the coming days and weeks on other Data Center and specifically CCIE Data Center
related technologies. If you have a request for a specific topic or protocol that youd like to see the behind the
scenes details of, drop me a line at bmcgahan@ine.com.
Happy Labbing!
Tags: gre, IS-IS, MPLS, multicast, otv
Download this page as a PDF
About Brian McGahan, CCIE #8593:
Brian McGahan w as one of the youngest engineers in the w orld to obtain the CCIE, having achieved his first CCIE in
Routing & Sw itching at the age of 20 in 2002. Brian has been teaching and developing CCIE training courses for over 8
years, and has assisted thousands of engineers in obtaining their CCIE certification. When not teaching or developing
new products Brian consults w ith large ISPs and enterprise customers in the midw est region of the United States.
Find all posts by Brian McGahan, CCIE #8593 | Visit Website
Reply
August 17, 2012 at 11:54 pm
Joshua Walton
Well done, Brian!
Reply
August 18, 2012 at 12:01 am
Luciano
Brian,
Sorry for being off-topic but I dont know where else to ask this. I am interested in the new CCNP SP track and I have noticed that
currently there is no specific documentation for this track. I would love to see a CCNP SP ATC from you
.. there is really nothing
like that or for that matter, specific documentation out there right now.. time is precious so having to the point training materials
would be convenient.
Thanks
Reply
August 19, 2012 at 8:39 am
Reply
August 18, 2012 at 2:10 am
laurent
Simply amazing post Brian!!!! That is just Wowwww!
Thanks a lot for sharing!
Regards,
Laurent
Reply
August 18, 2012 at 4:46 am
Al
Hello Brian, this is by far the best explanation of OTV I have come across.
Would love to see something in the same vein for virtual port channels
Reply
August 18, 2012 at 7:04 am
Alexander Lim
Excellent post Brian!
Do you agree with Cisco that OTV is the best way of extending VLANs over DCI for enterprise network?
Suggestion for next post: FabricPath.
Reply
August 18, 2012 at 8:45 am
Reply
August 20, 2012 at 7:52 am
CJ Infantino
So youre saying STP stops at the OTC Edge devices? Does that mean STP is only local to each DC?
That would be preferable if that was the case. It never seems like a good idea to have a L2 DCI. Stretching VLANs is
just creating one massive failure domain.
Now if you have a broadcast storm, etc, in one VLAN in DC1 youre taking out DC2 as well. Not a good design if you
ask me.
CJ
Reply
August 20, 2012 at 11:09 am
Reply
September 1, 2012 at 8:32 pm
AB
Hi Brian,
Is OTV VRF-aware i.e. can I have the Join-interface in a VRF? Also, does the latest NX-OS support SVI or loopback
interfaces in the OTV VDC?
Thanks.
AB.
Reply
August 18, 2012 at 3:15 pm
Krunal
Excellent post Brian, you really nailed it down. I was so surprised to see ethernet frames are not encapsulated in UDP but it is
actually GRE o MPLS. Cisco might have done this because encapsulating TCP packets over UDP would not make sense. One thing
I figured out that UDP and GRE+MPLS has same header length. 16 bytes. If you look at the draft of OTV http://tools.ietf.org/pdf/drafthasmit-otv-03.pdf the expiration date is Jan 9th 2012. Also on newer implementation of OTV on ASR 1000 show otv command
actually tells you it does GRE/IPv4 encapsulation. I supposed when draft is resubmitted again it will include GRE+MPLS changes.
From Ciscos OTV implementation point of view the ASIC on line card does not need to be change as total 42 bytes of overhead
remains intact in both implementation and ASIC case parse same length of packet. This would be a software change instead.
Just for curiosity and for everyones benefit if you can upload actual .pcap file, I would really appreciate.
Reply
August 18, 2012 at 3:35 pm
Reply
August 18, 2012 at 5:38 pm
Krunal
Thank you Brian.
Reply
August 18, 2012 at 6:27 pm
Reply
August 18, 2012 at 9:22 pm
Aryan
Hey Brian,
Can you do a detailed post on how how vPC works (low level information) & How Layer 2 loop prevention happens with vPC
scenario.
Also covering details of vPC vs vPC+
Thanks!
Reply
August 19, 2012 at 9:19 am
Marcio Costa
you rock man!! tks for the blog about it.
That would be good to see the comparison of these DCI technologies from the design point of view.
Reply
August 19, 2012 at 9:50 am
Reply
August 20, 2012 at 6:26 pm
Marcio Costa
Thanks Brian. You got my point. Im already waiting to read this article/blog
Reply
August 28, 2012 at 2:12 am
Surya
Hi all.
To me OTV is no go in DCI area. Just because it doesnt support unknown unicast flooding.
How many customers use Microsft NLB or other similar technologies ?
My prefered scenario is currently Trill over DWDM.
Reply
August 20, 2012 at 5:59 pm
Richard Chan
Hi Brian
the VMware Host machines must be in the same IP subnet and VLAN in order to allow for live migration their VMs.
Could you clarify: I thought it is the VMware guest that must end up in the same VLAN; the vMotion VLANs can be routed (though that
might not be supported)
Reply
August 21, 2012 at 12:16 am
Reply
August 23, 2012 at 8:21 am
Reply
August 23, 2012 at 8:23 am
Reply
August 23, 2012 at 9:15 am
Reply
August 23, 2012 at 11:31 am
Reply
August 23, 2012 at 11:34 am
Devang
How does OTV different from VPLS?
Reply
August 24, 2012 at 1:56 pm
Reply
August 27, 2012 at 4:37 am
Rob
Great post, thanks a lot. Looking forward to Data Center ciriculum (hopefully online classes?)
You made an argument for OTV vs VPLS and AToM but what about L2TP? Are OTV and L2TP not pretty much the same with L2TP
being supported by cheaper hardware?
Reply
August 31, 2012 at 2:49 pm
Reply
August 29, 2012 at 8:44 am
Brett Gianpetro
Nice work. This is good stuff.
I am wondering if the MPLS label is used to encode the VLAN information. That Cisco OTV FAQ that you linked to notes that an OTV
shim is added to the header to encode VLAN information. Im thinking that maybe OTV shim=MPLS label. Can you try and extend
another VLAN over OTV using the same overlay interface and see if it generates a different MPLS label?
Reply
August 29, 2012 at 8:46 pm
Reply
August 29, 2012 at 10:18 pm
Jake Howering
Really a great post. Understanding the header details has significant implications re: HW support for other similar overlay
technologies like VXLAN and LISP.
Nice Job !
Reply
August 31, 2012 at 1:25 pm
Reply
August 31, 2012 at 4:22 pm
Victor Moreno
Very nice work Brian. And clearly well received by the readers.
Regarding the encapsulation. Keep in mind that OTV is shipping on ASICs that were finalized well before we even conceived OTV.
So to get the solution to the users, we worked with the existing ASIC capabilities and managed to deliver a hardware accelerated
Ethernet in IP tunnel. As you well describe the trick is to do this in two stages and utilize the capabilities of the existing hardware
(Eth-in-MPLS + MPLS-in-GRE). We do not really use the MPLS bits for any MPLS purpose (that is a very important point) and we use
if as the OTV shim that carries segment (VLAN) information.
We did however, design OTV with an ideal header in mind and that is what we proposed to the standards bodies, and it is also what
you will see in future hardware as well as the new wave of technology proposals (LISP and VXLAN use the same exact header
proposed by OTV to the IETF).
I wouldnt trivialize OTV as just a fancy GRE tunnel. Although we use the GRE encap (because the secret sauce isnt really in the
encapsulation), the real value OTV brings is in its control plane and the way it handles traffic an simplifies configuration. In other
words, the encap is not really that important and we are working on getting most of these to converge into one.
Reply
August 31, 2012 at 9:44 pm
Reply
September 7, 2012 at 5:57 pm
Victor Moreno
Hi Brian,
Here are the answers to your questions from a little while back.
Q: I have a few questions if you dont mind. With the current implementation, does the label value encode the VLAN?
A: Yes.
Q: Are your plans to move towards the UDP encap in future releases?
A: Yes, there are benefits to the UDP encap, so once the HW is available, we will support both modes.
Q: If the EoMPLSoGRE is already hardware accelerated in the ASIC, whats the advantage of using the UDP encap?
A: The UDP encap is more efficient, but more importantly, the UDP encap allows better entropy as the core devices
can hash on UDP port numbers and the encapsulated traffic doesnt get polarized to a single path.
Q: Why not just amend the next draft proposal with the EoMPLSoGRE format?
A: That would be very confusing as the ideal encap is UDP and our other overlay efforts are converging on this UDP
encap (LISP, VXLAN, OTV). EoMPLSoGRE was simply a way to get to market in a timely manner.
Q: Another point that others have brought up is the security of OTV. Im sure as you know even in MPLS L3VPN
environments, many designs require encryption due to compliance. Are there any plans to integrate GETVPN or other
similar tunneling techniques into the AED itself, or is it assumed that this should be done on your true L3 edge
device, such as ASR1K upstream of the AED?
A: The right tools for the right job in the right places. The encapsulated traffic can be easily encrypted by the WAN edge
routers, like all other inter DC traffic is encrypted. No need to raise the cost of high density port offerings like the N7K
by adding crypto HW as you probably want to manage the policy at the WAN edge anyway and there is little incentive to
encrypt the traffic between DC aggregation and WAN edge.
Q: I saw some recent documents talking about the integration of GET and LISP on ASR.
A: Yes, LISP plays a role in CPE devices like the ASR1K and ISRs, therefore it makes sense for the product to support
such solution.
The ASR implementation of OTV is also integrated with crypto, allowing the encapsulated traffic to be encrypted. The
model is similar to what you would do across multiple boxes in larger networks, only you do it on a single router at
smaller sites that dont require the speeds and densities of a Nexus switching infrastructure.
Reply
September 7, 2012 at 9:32 pm
Reply
September 1, 2012 at 8:38 pm
AB
Hi Brian,
Is OTV VRF-aware i.e. can I have the Join-interface in a VRF? Also, is the limitation of a separate VDC for SVI routing removed in the
newer versions of NX-OS, and the SVI or Loopback interface as Join-interfaces?
Thanks.
AB.
Reply
September 4, 2012 at 2:43 pm
Reply
Leave a Reply
Name (required)
Submit Comment
twitter.com/inetraining
OTV Decoded
http://t.co/0P1XDUvw #cciedc
#nexus
pdfcrowd.com