VMware vSphere, MS Clusters and vMotion resulting in cluster service failures

Quick guide here, from experiences we’ve gained.

When running an active/passive MS Failover Cluster (MSFoC) in a VMware environment, you need to be aware of the behaviour of the clustering when a vMotion event happens on the active node. Because of the momentary interruption in networking and activity on the active node during the final cutover of the VM to the new host, the passive node can often see this as a failure and try to take over. As the active node is not actually down, this results in both nodes trying to run the services, resulting in ‘split-brain’ detection shutting service down on both nodes.

Our way around this issue is to set DRS for these cluster nodes to manual, making sure we’re aware they are there. In the event of vMotion being needed (maintenance generally, or for manual load balancing), we ensure the MSFoC service is shutdown on the passive node. That way we can move whatever we like as we need to without triggering a false takeover.

Bear this in mind when running MSFoC in VMware. Impact seems to be more often with shared-disk clusters, but also seen on non-shared-disk clusters too.


Cisco UCS and VLAN conflicts

If you ever find yourself with conflicting VLANs as a result of a UCS code upgrade, don’t panic. If, like us, you use NFS, and the NFS VLAN conflicted with a VSAN FCoE VLAN, you’ll likely have impacted your NFS VLAN. Once the conflict was resolved by moving the VSAN FCoE VLAN to an unused one, we still had no network traffic on the VLAN we use for NFS. We had to change the VLAN number for that VLAN to something else, commit the change, then change it back. This forced a refresh of the VLAN config and restored connectivity.

VMware vCloud Director 5.1 VXLAN install issues

Here is an interesting issue I hit today. Installing VMware vCloud Director 5.1 with vShield Manager 5.1.2a was giving me problems creating the VXLAN backend. Each time it would fail to install the VIB with “vib-module for agent not installed on host …. (vshield-vxlan-service)”. Lots of searching and an SR to VMware was raised. Then, just after raising the SR, I found my answer, thanks to this post


Snipped here in case it vanishes from feedreader….


When I tried to enable VXLAN in my vCloud Director setup I got the following error: “VIB module for agent is not installed on host vShield-VXLAN-Service”

This is solved by installing the VXLAN VIB which is available as a download on your vShield Manager: https://vsm-ip/bin/vdn/vibs/5.1/vxlan.zip on all your ESXi hosts in your VCD Cluster.

I used vCenter Update Manager to update all my ESXi hosts. no reboot is required.

After that, login to your  vCloud Networking and Security appliance (previously knows as vShield Manager), navigate to Networks, Network Virtualization and click on Preparation. Click Resolve to prepare your hosts.


Hope this note helps someone.

Rescan SCSI bus in linux

Useful if you’ve just added a disk to a linux server and want to rescan to be able to use it:

echo “- – -” > /sys/class/scsi_host/host0/scan

vCenter custom alarms

I had a requirement today to create a custom alarm for vCenter 5.0, one that would alert me to a failed virtual disk consolidation event. This has been causing me a few issues and I want to know about it before the client does.

For anyone trying to do this, I found this page that tells all:

Long and the short of it, you create a new alarm for an Event, and for the triggering event, you use the API definition. In this case, it was “com.vmware.vc.VmDiskFailedToConsolidateEvent”.

FYI, Veeam provide a nice breakdown of these API definitions:

Now, any time I get one of these events, my team will get notified and we can take action to rectify it.