tl/dr: Enable HA during DLR deployment, don’t specify an HA IP address (if prompted), use a unique logical switch for HA.
Edits: Some info from VMware below. Also, if you are upgrading from 6.3 I would remove the HA IP address first!
I wrote a few posts last year on the DLR components of NSX – specifically the Control VM that handles dynamic routing partnerships.
There are a few interesting changes to the Control VM for 6.4.0 I wanted to get down on paper cause they can result in a call to support if not handled right.
Enable HA and set an IP
Both of the issues concern the initial config wizard for the DLR. You are prompted on the first page to enable HA.
Make sure you enable HA here! It is very possible to not be able to enable it later w/o a call to support.
Note that whether or not you enable it, on the fourth screen you’ll need to set an HA interface connection.
Also on that fourth page note that you might be able to set an IP address (see my old posts on what happened with 6.3 when you set it). If you don’t enable HA on the first screen you will be able to set an HA IP. If you enable HA, you might be able to set an IP
If you see an entry for HA IP Do not set an IP address here. This isn’t that bad, as even tho you can add one, it won’t actually retain the IP address you set here.
Look ma! No IP!
The problem comes when you didn’t enable HA during install and go to add it later. Or, disable HA. Because when HA is disabled you can see – and add – an IP address under HA Interface Config:
Compare that to a DLR with HA enabled:
Now if you go and enable HA, you are in a world of hurt
I just deleted it instead of calling support so maybe they have a work around, but best bet is don’t do it!
EDIT: This is news to VMware apparently. Also I would really suggest removing the HA IP (if you configured it) before upgrading to 6.4!
Other Issue
The other issue is the “Connected To:” network for the HA interface. In 6.3 you could easily set the same network for a regular (uplink/internal) interface and the HA interface. and with 6.4 you can easily set them to be the same during the initial install.
But, after deployment, you can’t set the HA interface to one already used by an interface.
But you can set an interface to the one currently used by HA.
Is it a bug? Are you not supposed to used the same network for HA and an interface? If I find out I’ll let you know, but for now I’m creating a unique logical switch just for the control VM HA traffic.
EDIT: Per VMware, set a dedicated network for HA, or use an uplink. Exploiting the interface to set an internal network will cause problems (it will always fail the IP check).
Bonus
I currently have an open support ticket out on why I lose traffic during a control VM failover – prior to 6.2.5 you would lose pings as the Active control VM would pull all routed from the hosts on its way down, but that was resolved.
Note that I still saw it in 6.3.2 so YMMV.
Now what I see is the Edges changing the internal DLR networks to “Weight 32768, AS Path ?” briefly when the Secondary takes over. I have my BGP timers set to 1/3. I’ll post what support says when I hear from them.