As I’m sure most of the active VMware users and enthusiasts are aware, vSphere 5.5 was released to the masses last weekend. I eagerly downloaded a copy and have installed it on a lab machine. I’ve not played with the full suite yet – just the ESXi 5.5 hypervisor.

The install went smoothly on the HP DL360G5 I was using. Unfortunately, the server only has 32GB RAM so I cannot test for myself that the 32GB limit for the “free” hypervisor is removed. I can confirm that under the “licensed features” heading the “Up to 8-way virtual SMP” entry is still there but the “Up to 32 GB of memory” entry is removed (when using a “freebie” license key). So that looks good 🙂 As I said, I’ve not installed the entire suite yet, only the hypervisor, so I am only using the Windows client currently. Don’t do what I did and upgrade a VM’s hardware version – you won’t be able to manage it via the Windows client – which does not support the latest features (including newer VM hardware versions).

Anyway, one of the first things I check when I install ESXi onto a machine is that the hardware status is correctly reported under the Configuration tab. Disks go bad, PSUs fail or get unplugged and fans stop spinning so I like to ensure that ESXi is reporting the server hardware health correctly. To my dismay I found that the disk health was not being reported for the P400i attached storage, after installing from the HP OEM customised ESXi 5.5 ISO. Now this is not entirely unexpected, as the HP G5 servers are not supported with ESXi 5.5. Drat!

By following the VMware Twitteratti, I’ve learnt that various ESXi 5.0 and 5.1 drivers have been successfully used on ESXi 5.5 (specifically for Realtek network cards, the drivers for which have been dropped from ESXi 5.5). So I figured I’d give it a go at using the ESXi 5.0/5.1 HP providers on this ESXi 5.5 install.

I downloaded “hp-esxi5.0uX-bundle-1.4-16.zip” from HP’s website, which is contained on the “HP ESXi Offline Bundle for VMware ESXi 5.x” page which can be navigated to from http://h18000.www1.hp.com/products/servers/software/vmware-esxi/offline_bundle.html.

This ZIP file contains a few .vib files, intended for VMware ESXi 5.0 or 5.1. The VIB we are looking for is called “hp-smx-provider-500.03.02.00.23-434156.vib”. Extract this .VIB, and upload it to your favorite datastore. Now, enable the ESXi shell (or SSH) and connect onto the ESXi host’s console. Use the following command:


esxcli software vib install -v file:///vmfs/volumes/datastore1/hp-smx-provider-500.03.02.00.23-434156.vib

and reboot the host. You should now see this software component listed under Software Components within the Health Status section. You should also see the health of the P400i and its associated storage listed. So far so good. However, on my server the HP P400i controller was showing as a yellow “Warning”. Hmm. Not sure why.

So, I figured maybe there was an incompatibility between these older HP agents and the newer versions from the HP OEM CD. So, I decided to reinstall ESXi from the plain VMware ESXi 5.5 ISO.

So, a fresh install results in fan status, temperature readings and power supply status being reported and no (as expected) P400i storage health.

So, let’s install “hp-esxi5.0uX-bundle-1.4.5-3.zip”. Yes it’s a newer version than I used above, only because I found it after I’d reinstalled the vanilla ESXi.


esxcli software vib install -d file:///vmfs/volumes/datastore/hp/hp-esxi5.0uX-bundle-1.4.5-3.zip
reboot

Hey presto! Green health status. I pulled a drive from a RAID array and the status indicated the failure and then the subsequent rebuild. Certainly seems to be a workable solution to extend the life of these perfectly serviceable lab machines 🙂

I would expect this status monitoring to work for P800 controllers too.

One can also install hp-HPUtil-esxi5.0-bundle-1.5-31.zip to get access to some HP utilities at the ESXi command line.

 

I’ve recently got to setup up a Shuttle XH61V for a friend. I’ve read a few posts about how they make good VMware ESXi hosts for those power conscious folk running home labs. I figured this would be a good time to see just how power hungry, or not, one of these boxes is and how well ESXi runs.

The box would end up with an Intel i3-2120 processor, 16GB RAM (2 * Crucial CT102464BF1339) and 126GB Crucial mSATA SSD (CT128M4SSD3). Quite a beefy XBMC media centre PC build from a selection of new bits and pre-owned bits! Anyhoo, while putting the components together I took some power readings along the way:

 

Description Power (VA) Power (W)
Power supply alone, i.e. without computer attached 20VA 2W
Power supply with bare case off 20VA 2W
Power supply with bare case on (turned on but obviously doing nothing) 24VA 3W
PSU + case + 2*8GB DIMMs (turned on but obviously doing nothing) 24VA 3W
PSU + case + CPU + 2*8GB DIMMs (idling at BIOS) 46VA 37W
PSU + case + CPU + 2*8GB DIMMs + SSD (idling at BIOS) 46VA 37W
PSU + case + CPU + 2*8GB DIMMs + SSD (switched off) 24VA 3W
Installing ESXi 32VA – 46VA
ESXi with no VMs (High Performance power option) 40VA
ESXi with no VMs (Balanced power option) 32VA 21W
ESXi with no VMs (Low power option) 32VA 21W
ESXi with three busy VMs (Balanced power option) 64VA
Windows 7 x64 SP1 idle (balanced, low, high power options) 32VA 21W
Windows 7 x64 SP1 put into sleep mode 28VA 3W

 

So, not too shabby when it idles. I will be interested in seeing what power a 22nm 3rd or 4th generation processor would consume while idling. It seems that this i3-2120 CPU idles at approximately 18W. During a heavy work load, the processor seems to consume approximately 21W extra for a total of roughly 40W – not quite the 65W TDP max Intel quote.

I installed it with the standard ESXi 5.1U1 installation media. No issues, once I found a suitable USB drive to USB boot from! Both onboard NICs were seen and the mSATA SSD was recognised too.

Note: It seems the included Realtek 8186 has reliability issues under VMware ESXi 5.1. The odd thing is that when I first installed ESXi 5.1 it worked fine and I was able to use to successfully. However, once I rebooted a couple of times, the NIC does not really work. It manages to get a DHCP IP address and is pingable for about 30 seconds before it drops off the network. No log entries on the host or the switch indicate the cause. Very curious!

Well, another tidbit (and maybe I’m slow coming to find this out)… from the VMware vCenter Server 5.1.0b release notes:

vSphere Client. In vSphere 5.1, all new vSphere features are available only through the vSphere Web Client. The traditional vSphere Client will continue to operate, supporting the same feature set as vSphere 5.0, but not exposing any of the new features in vSphere 5.1.

vSphere 5.1 and its subsequent update and patch releases are the last releases to include the traditional vSphere Client. Future major releases of VMware vSphere will include only the vSphere Web Client.

For vSphere 5.1, bug fixes for the traditional vSphere Client are limited to security or critical issues. Critical bugs are deviations from specified product functionality that cause data corruption, data loss, system crash, or significant customer application down time where no workaround is available that can be implemented.

WOAH!

Firstly, so, none of the new features of 5.1 are available through the current vSphere client. Not really checked this in detail, or which features this includes, but this suprises me.

Secondly, in the course of studying for the VCP5 exam the existing web client is described as something of a lightweight, not for general use by VMware admins tool. If the existing Windows vSphere client is being done away with, VMware will need to do some SERIOUS work in getting the future web client up to scratch. Not only that but they will have the tough task of choosing what web browsers to support and the various versions thereof.

Thirdly, what about all the third party plugins? Many of those will all need to be updated and rewritten to run on the vCenter server/vCSA. Now this is quite a big one since it will require plugins to be changed from running on the Windows based vSphere client to running on a Linux appliance. Not a totally trivial task I would wager.

Forthly, upgrading from 4.x/5.x to 6.x will probably be frought with some challenges in needing to switch back and forth between the “old” Windows client and the “new” web client. I expect some serious planning and testing will need to be performed to ensure that all operational tasks can be completed before, during and after the migrations/upgrades.

Can’t say I’m pleased about this. I can see massive challenges in keeping web browsers working smoothly with such a core and critical application. Even getting relatively simple websites to render equivalently across browsers can be challenging, let alone something as complex as a vSphere administration console.

However, I can see why VMware would want to do this – for their big vCloud push. “One client to rule them all” for the administrators/providers of clouds all the way down to their end customers. Quite a vision, but I wonder if they can pull it off.

There is a risk here that Microsoft’s Hyper-V will gain a foothold when this comes to pass. I imagine coupled with the removal of the thick client will be the removal of the installable vSphere Server. If this comes to pass then some Microsoft shops are likely to question the introduction of a Linux appliance into their server estate when a Hyper-V and Microsoft based platform will be available (and quite possibly already included in their existing MS licenses).

Customers are fickle and can switch allegance quite quickly… I hope VMware has considered this and doesn’t shoot it self in the foot!

 

Well while studying for my VCP5 I have discovered that the “Solution/Database Interoperability” support matrix for vCenter databases is not quite as straight forward as I would have expected. For instance, MS SQL Server 2008 Enterprise R2 SP2 is only supported with VMware vCenter Server 5.0U2.  For MS SQL Server shops, it appears as if the currently “safe” DB options are MS SQL Server 2008 R2 (no SP) and MS SQL Server 2008 SP2. These appear to have the broadest support – of course if you are installing vSphere 5.0U1 or U2 (which you probably should be) then you can use R2 SP1. Once more, it pays to check the HCLs carefully. You can determine the version and service pack level of MS SQL Servers using the information in KB321185 from Microsoft.

Also, the matrix has a footnote for some of the MS SQL Server versions stating they are not supported with vCSA – but not all the MS SQL Server entries have this note implying some versions of MS SQL Server are supported with vCSA.

And for Oracle support – that is a bit of a minefield too. Various versions of 10gR2 and 11gR2 are supported with various patch sets. Once again, do you homework carefully!

 

Well, in short the GIGABYTE G1.Sniper M3 motherboard does support Intel VT-d and both ESXi 5 and 5.1 can use it. I have tested this with BIOS version f9 and “beta” BIOS versions f10c, f10d and f10e and all show VT-d as an option when a compatible processor is installed. Note that this option is not shown unless a suitable (generally Intel i5 or i7 non-k CPU) processor is installed. The “VT-d” option is shown below the “Intel Virtualization Technology” option on the “BIOS Features” page of the BIOS setup.

I have had mixed success with actually passing through devices to VMs. Generally the cards in PCI-E slots and configured for pass through worked as expected within a VM during my testing (USB3, NICs, Hauppauge 1700 cards). However, devices on the motherboard (SATA, Audio, LAN, Video) and PCI-E graphics cards do not work. For the most part, these devices pass through but the devices don’t start under Windows, drivers fail to attach or cause blue screens when accessed (yes, Mr ATI graphics card with your atikmpag.sys driver BSOD).

Until I actually did these tests I was not sure if this motherboard did or did not support VT-d /Intel Virtualization Technology for Directed I/O / VMware VMDirectPath I/O. I already had this motherboard in a HTPC with an Intel i3 (dual core with hyper threading) which, by the way, ran ESXi adequately. I wanted to play with VT-d so  I took a punt on an Intel i7 processor and luckily it worked. If not, my backup plan was to also procure an ASRock motherboard, most of which seem to have working VT-d support.

I had hoped to run a virtual HTPC with an ATI graphics card passed through on this computer. Unfortunately the virtualisation gods do not seem to be happy with this idea at the moment. Still, this box makes a decent whitebox ESXi host, apart from the onboard Intel 82579V NIC which ESXi does not support out the box. A custom driver needs to be injected into the ESXi installation ISO, unless you have a supported PCI-E NIC in which case the driver can be installed post-install.

Note1: While playing with passthrough and various options of internal graphics and PCI-E graphics BIOS configurations I got to the point where I could no longer get graphics from the onboard graphics card. I found a couple of posts on the Internet about this too. Even resetting/clearing CMOS did not resolve this. As per the other posts, I reflashed the BIOS and it sorted it out. Weird behaviour and unexpected – I could not get the BIOS to save the option to use IGFX (Internal graphics) rather than PEG (PCI-E graphics) as the “Init Display First” option.

Note2: The following are the graphics cards I attempted to pass through to the VMs. Note that I tried both VMware ESXi 5.0U2 build 914586 and 5.1 build 799733 and 914609 with motherboard BIOS f9, f10d and f10e.

Asus ATI Radeon HD 5450 – passed through and seen by VM but has atikmpag.sys BSOD 0x0116 “Attempt to reset the display driver and recover from timeout failed.” whenever I connected the monitor to the graphics card or tried to enable the display on the card.

Asus ATI Radeon HD 6450 – exactly as above.

Asus NVIDIA Geforce GT610 – passed through and seen by the VM. However the device driver is unable to start in Windows.

Note3: While trying to get the graphics cards to work properly I tried various combinations of additional/advanced settings including:

On the VM:

pciHole.start 1200
pciHole.end   2200
pciPassthru0.maxMSIXvectors  16
pciPassthru0.msiEnabled   FALSE

On the host, edit of: /etc/vmware/passthru.map by adding

#ATI Radeon HD
1002 ffff bridge false
#tried with flr/d3d0/link/bridge in the third column

Note4: In ESXi5.1 ACS checking is enforced more strictly resulting in quad-port NICs (and other devices apparently) not successfully getting configured for passthrough. After each reboot the devices still show as needing a reboot. The console logs show a message similar to:

WARNING: PCI: ssss: nnn:nnn:nn.n: Cannot change ownership to PASSTHRU 
(non-ACS capable switch in hierarchy)
This can be bypassed (at your own peril) using the host advanced option: disableACSCheck=true
Use the vSphere console: Configuration/Software/Advanced Settings/VMKernel/VMKernel.Boot.DisableAcsCheck
More info can be found at this informative post. This option got the quad port NICs passed through successfully but did not make any difference to the ATI or NVIDIA cards.
Note5: Currently ESXi 5.1 upto and including build 914609 does not seem to allow USB controller passthrough via VMDirectPath I/O. You can select the device for passthrough but once the host is rebooted, the device is unselected. I am not sure if this is a bug or a conscious decision by VMware. A cynic like myself might think this is intentional, as without the ability to pass through a USB controller there is no way to pass through a real keyboard and mouse into a VM and hence no need to get GPUs working with passthrough. (Hmm – maybe a bluetooth USB device passed into a VM and then paired with a BT keyboard and mouse?? Something for another day).

 

 

Well today I’ve been upgrading a couple of my servers from VMware ESXi 3.5 and ESXi 4.1 to ESXi 5.0. For the most part this went smoothly and without any drama.

The HP DL360 G5 upgrade from ESXi 4.1 to 5.0 went smoothly and the upgrade process maintained all the settings and configuration properly. The hardware health monitors were working before and after the upgrade without the need for any additional fiddling. I used the VMware ESXi 5.0U1 ISO from HP.com for this server.

The HP ML110 G5 needed to be a reinstalled as it was running ESXi 3.5 and there is no direct upgrade path to 5.0. After recreating the vSwitches and associated VM port groups I was up and running. I used the HP.com image once more and to my surprise the hardware health monitoring now shows the RAID status of the SmartArray E200 controller. In the past, when using the HP providers on ML110G5 hardware, purple screens were common. Now, the server seems stable and displays the storage health status. A win for the day!

Note that this server needed a further tweak as the SCSI passthrough of the SCSI attached LTO3 drive stopped working after the installation of ESXi5.0. A bit of Googling revealed that the following would solve this problem:

esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --vendor="HP" --model="Ultrium 3-SCSI"

So the VM could now see the attached tape drive. However VMware appear to have changed their passthrough or SCSI subsystem since ESXi3.5 and as a result I’ve had to reduce my tape block size. In the past I was able to read and write 512kB blocks (tar -b 1024)  however I’ve had to drop this to 128kB blocks (tar -b 256). If I get some time, I will attempt  to work out the exact limit and update this post.

For the Dell PE840 upgrade, I used the Dell ESXi 5.0 customised ISO. Again, the upgrade from 4.1 preserved the configuration of the server. To my dismay the RAID status of the PERC 5/i was now missing. Turns out the Dell ISO is lacking the providers for storage health. Long story short, after some searching I got the health status back. I initially tried the Dell OpenManage VIB (OM-SrvAdmin-Dell-Web-6.5.0-542907.VIB-ESX50i_A02.zip) which didn’t appear to change much. The useful info was here on the RebelIT website which referred to using the VIB from LSI.com. This made sense as the Dell PERC 5/i is basically a LSI MegaRAID SAS 8480E. I downloaded the VIB (VMW-ESX-5.0.0-LSIProvider-500.04.V0.24-261033-456178.zip) from LSI.com. Note that the 8480E is not listed as supported by this release, but it works – PHEW! I guess the Perc 5/i is getting old in the tooth now, but given it works like a champ there is no need to upgrade. Note that I had to extract the .zip file and then install the VIB from the server’s console as:

esxcli software vib install -v /vmfs/volumes/datastore1/vmware-esx-provider-LSIProvider.vib

So now all three servers have been upgraded to ESXi 5.0 and have full hardware health status available which is being monitored via Nagios. Now the fun begins, upgrading the hardware version and VMware Tools for all the VMs….

Last night I was upgrading some ESX 3.5 VMs from “flexible” NICs to “VMXNET Enhanced” NICs and ran into a problem with on of the servers. As an aside, doing some rudimentary throughput testing using the iperf tool I was surprised to see the significant throughput increase and CPU usage decrease when switching from flexible (i.e. VMXNET) to VMXNET Enhanced (i.e. VMXNET2) NICs. Well worth doing – apart from the gotcha I ran into…

This one particular VM which showed a problem is running Quagga (a BGP daemon) to inject routes for a local AS112 server. Once I switched over the NICs the remote Cisco router started logging the following errors

  • %TCP-6-BADAUTH: No MD5 digest from
  • %TCP-6-BADAUTH: Invalid MD5 digest from

and would not bring up the BGP peering session. BGP MD5 authentication is enabled between the router and the Quagga daemon. The use of “debug ip tcp transactions” also shows the invalid MD5 signatures.

I initially suspected it was related to a offloaded checksumming issue which I previously observed (http://communities.vmware.com/thread/250159). Turns out it was not incorrect TCP checksums being calculated (related to the above post’s incorrect UDP checksums).

Digging a bit deeper I came across a post on the quagga-users mailing list describing a similar problem to the one I was observing. The usage of MD5 checksums “complicates” the process of offloading checksumming to NICs.

The VMXNET Enhanced NIC enables additional offloading from the VM to the NIC to enhance performance. This works well for many use cases but causes problems when using TCP MD5 checksums. In my case, turning off “TCP Segmentation Offload” has worked around the problem. Adding a command such as

ethtool -K eth0 tso off
ethtool -K eth0 sg off

to a startup script has worked around the issue to some degree. In an ideal world, the VMXNET driver should allow “tx-checksumming” to be turned off using ethtool aswell.

In fairness to VMware on this one, this issue appears to not be specific to virtual machines but may in fact be observed on physical hardware with NICs providing offload functions.

Having used the above two ethtool commands, to allow the BGP session to be established, I still continue to see the following errors on the Cisco router:

TCP0: bad seg from x.x.x.x -- no MD5 string: port 179 seq
%TCP-6-BADAUTH: No MD5 digest from

%TCP-6-BADAUTH: Invalid MD5 digest from

 

Interestingly, using the flexible and hence original VMXNET VNIC within the VM (but oddly the same actual VMXNET driver binary!!) tx-checksumming and scatter-gather is enabled for the NIC:

$ ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
$

For the record, all the above is using the latest ESX 3.5 VMware tools build 317866 from VMwareTools-3.5.0-317866.tar.gz.

 

Doing a tcpdump (from a physical server) of the traffic between the router and the VM reveals that packets which leave the VM with a valid MD5 signature (as per tcpdump from within the VM) arrive on the wire with an invalid MD5 signature (tcpdump -s0 -M md5password port 179). This indicates that VMware ESXi may infact be altering the packets between the VM and the wire in some way which is invalidating the MD5 signature 🙁

For now, some errors with an established session is better than lots of errors and no BGP session. Ideally there would be no errors being logged by the Cisco router.