When you create a Universal or “regular” Distributed Logical Router in NSX you have the option of deploying an appliance along with it.
This post will cover why to deploy an appliance along with a DLR, configuring the appliance for high availability and how the DLR high availability feature works.
Deploy if you need dynamic routing on your DLR
Enable HA and add an IP address on a dedicated logical switch
Disable vSphere HA for the DLR appliance
Pick Universal if you'll eventually have cross-vcenter and will only connect to logical switches (no dPGs)
Which is which?
UDLR vs DLR
The new-to-6.3 Universal Distributed Logical Router (UDLR) requires a universal Segment ID pool, a universal transport zone and universal logical switches (which are created by selecting a utz during switch creation).
When you add interfaces to a UDLR the only option is Logical Switches and the table will only show Universal Logical Switches to pick from.
An appliance deployed with a DLR will have a simple name in vSphere like MyFirstDLR-0
UDLR appliances have names like edge-768ba914-619b-40de-a5a7-4cf8d0c0640e-0-UDLR
Both DLR and UDLR create the same appliance – a 1 vCPU / 512MB / 1GB (two ~500MB VMDKs). Interestingly the type is vShield Edge:
Which is also what it greets you with at the initial login:
To Deploy or not to deploy.
Speaking of deploying DLR appliances – do you need one?
The NSX UI says under “Deploy Edge Appliance”:
Deploys NSX Edge Appliance to support Firewall and Dynamic routing.
and for “Enable High Availability”:
Enable HA, for enabling and configuring High Availability.
However, you can configure the DLR firewall without deploying an appliance (you can test this in Hands On Labs in about 5 minutes). I believe it means you can’t protect the deployed appliance with a firewall unless you deploy the appliance. Which is a little circular 🙂
The official docs state in section 6.3 Docs “Add a Logical (Distributed) Router”
An edge appliance (also called a logical router virtual appliance) is required for dynamic routing and the logical router appliance’s firewall, which applies to logical router pings, SSH access, and dynamic routing traffic.
Which also all appears to mean that an appliance is required for SSH access (which would be to the deployed appliance) and dynamic routing. the “logical router pings” must mean you can’t SSH in and ping out without an appliance – because you can certainly ping the configured IPs for the DLR without an appliance deployed.
In the same section the docs state:
High availability is required if you are planning to do dynamic routing.
which is not true, you only need a deployed appliance (with or without HA) for dynamic routing to work.
Note that if you choose to deploy an appliance you must select an HA Interface Configuration (logical switch or distributed port group) even if you didn’t select HA. \ However, you don’t have to add an IP Address (that’s optional – and new to version 6.3).
More on High availability
If you enable DLR HA during deployment it will automatically create two appliances and if you specified two different Resource pool/Datastore pairs then it will create one in each location.
and, like with other VMs managed by NSX Manager, it will set them to auto start on the host.
If you don’t enable DLR HA, you can add another appliance later but it will just sit there undeployed until HA is enabled.
Note when you enable HA the Logical Router Appliances table gets a new column to let you know the heartbeat is up.
With high availability enabled, the dynamic router functionality can be recovered in seconds instead of the time it takes vSphere High Availability to recover it.
By default, the secondary appliance waits for about 15 seconds after a failure to take over dynamic routing updates. This delay is configurable as “Declare Dead Time” in the HA Configuration tile in the DLR configuration tab.
It also keeps the VM under NSX management – the configuration could break if restarted on a different host by vSphere HA.
Speaking of NSX management – If you need to move the appliance between hosts/datastores you’ll need to use the edge management page instead of vSphere tools.
I’m not entirely sure how configuration updates are made to the control VM – since network connectivity to the NSX manager is not a requirement. Possibly via VMware Tools? If anyone knows please drop me a line.
How DLR HA works
Note that this is via observation, Wireshark and contemplation. I’ll update if I learn more.
When you configured HA on 6.2 there was no option to add an IP address, you just needed to pick a logical switch or port group to connect to.
6.3 added the ability to configure an HA IP address.
This IP is used as an extra check before the secondary appliance takes over. I see no traffic referencing the IP other than an ARP from the secondary after a failure of the primary.
DLR IP Fun
If you deploy a DLR appliance with no interfaces, the VM will only show automatic IPv6 addresses.
As soon as you enable HA, the appliance will get an IPv4 APIPA address
If you configure the HA IP address that will show up also:
A fully configured DLR vm with multiple interfaces and HA enabled will show a bunch of IPs.
Note the DLR VM has all of the same IP addresses configured for the DLR. At some point I’ll post about the IP addresses and how configuring OSPF can change them and the difference between uplink and internal.
For now, know that while the two high availability VMs have almost the same IPs they have different APIPAs and different MAC addresses for all interfaces.
One of the VMs is set to Active by NSX
and that is the VMe exchanging dynamic routing updates.
In the meantime the two HA VMs are chatting constantly via the APIPAs addresses.
If the secondary sees about 15 seconds of silence from the primary, it tries an ARP to the HA IP (if configured).
If there is no response the secondary takes over dynamic router functionality and sends some gratuitous ARPs to update other VMs tables (I believe only routing partners will need this) since the MAC address for the IPs has changed.
In the meantime, NSX records a 30202 “Major” event.
When the other VM recovers (or a new one is deployed) it assumes the standby role and begins the heartbeat chat with the current active.
Since the HA IP is not used for anything other than to respond to ARPs you could really stick it anywhere. For now I will go with a dedicated logical switch just for the HA heart-beating and a unique IP on a unique subnet for the HA IP.