Introduction

While working in a lab environment recently I wanted to vMotion a VM between two ESXi hosts. The vMotion failed, which was not entirely unexpected, due to CPU incompatibilities. These particular ESXi hosts are not in a vSphere cluster so enabling EVC (Enhanced vMotion Compatibility), which would resolve the issue, is not an option.

Attempting to vMotion a VM from host A to host B gave errors about:

  • MOVBE
  • 3D!Now PREFETCH and PREFETCHW

After powering the VM down, migrating the VM to host B and powering on, an attempt to vMotion from host B to host A gave errors about:

  • PCID
  • XSAVE
  • Advanced Vector Extensions (AVX)
  • Half-precision conversion instructions (F16C)
  • Instructions to read and write FS and GS base registers
  • XSAVE SSE State
  • XSAVE YMM State

Manual VM CPUID mask configuration

As the error messages indicate, Enhanced vMotion Compatibility (EVC) would enable the vMotion to take place between mixed CPUs within a cluster. Mixing host CPU models within a vSphere cluster is generally considered bad practice and should be avoided where possible. In this instance, the hosts are not in a cluster so EVC is not even an option.

As indicated above shutting down the VM and doing a cold migration is possible. This issue only relates to the case where I want to be able to migrate running VMs between hosts containing processors with different feature sets.

For the two hosts in question, I know (based on the EVC processor support KB and Intel ARK and VMware KB pages) that the Intel “Westmere” Generation baseline ought to be the highest compatible EVC mode; one of the processors is an Intel Avoton C2750 and the other is an Intel i7-3770S Sandy Bridge. The Avoton falls into the Westmere category for EVC. We will come back to EVC later on.

I suspected it would be possible to create a custom CPU mask to enable the vMotion between these to hosts. In general, features supported by a given processor are exposed via a CPUID instruction call. By default, VMware ESXi manipulates the results of the CPUID instructions executed by a VM as part of the virtualisation process. EVC is used to further manipulate these feature bits to hide CPU features from VMs to ensure that “well behaved VMs” are able to run when migrated between hosts containing processors with different features.

In this instance, “well behaved VMs” refers to VMs running code which use the CPUID instruction to determine available features. If the guest OS or application uses the CPUID instruction to determine available processor features then, when moved via vMotion to a different host, that same set of features will be available. If a guest uses some other mechanism to determine processor feature availability (e.g. based on the processor model name) or merely assumes a given feature will be available then the VM or application may crash or have other unexpected errors.

So back to this experiment. Attempting to go from host A to host B indicated only two feature incompatibilities. I turned to the Intel developers manual (64-ia-32-architectures-software-developer-vol-2a-manual.pdf) for the detail about the CPUID instruction. CPUID called with EAX=0x80000001 results in the PREFETCHW capability being exposed at bit 8 in ECX. Similarly with EAX=0x1, the MOVBE capability is exposed at bit 22 in ECX.

As an initial test, I did a cold migration of the VM to host A and edited the VM settings as shown below.

In summary, this is passing the CPUID result of the host via the above mask filter. The mask filter can hide, set or pass through a CPUID feature bit. In this instance I am hiding the two bits identified above through the use of the “0” in those bit positions. There are other options you can use as displayed in the legend.

I chose “0” rather than “R” as I need to hide the feature from the guest OS and I do not care if the destination host actually has that feature or not.

I saved this configuration and powered on the VM. I was able to successfully perform a vMotion from host A to host B. I was also able to vMotion the VM back to host A.

I performed a vMotion back to host B and powered the VM off. I then powered the VM back on on host B. I tried to vMotion to back to host A, which again failed with the same error as shown above. The reason it failed in the reverse direction is that the VM pickups up it’s masked capabilities at power on and maintains that set of capabilities until it is powered off once more. So by powering on the VM on host B, it got a different set of capabilities to when it was powered on on host A. This explains why when attempting to originally perform the vMotion we had two different sets of errors.

To get the masks to enable a vMotion from host B to host A, I took a look at the developers guide and performed some Googlefoo, I identified the CPUID bits needed to mask the unsupported features:

PCID: CPUID EAX=1, result in ECX bit 17
XSAVE: CPUID EAX=1, result in ECX bit 26
AVX: CPUID EAX=1, result in ECX bit 28
F16C: CPUID EAX=1, result in ECX bit 29

FSGSBASE: CPUID EAX=7, result in EBX bit 00

XSAVE SSE: CPUID EAX=0xd, result in EAX bit 01
XSAVE YMM: CPUID EAX=0xd, result in EAX bit 02 (YMM are 256-bit AVX registers)

The first four are easy as the vSphere client allows one to edit the EAX=1 CPUID results. With the below configuration in place, the vMotion from host B to host A only showed the last three errors (FSGSBASE, XSAVE SSE and XSAVE YMM). This is expected as no masking had been put in place.

To put the masking in place for EAX=0x7 and EAX=0xd we need to edit the virtual machine’s .VMX file. We can do this by editing the .vmx file directly or by using the Configuration Parameters dialogue for the VM under Options/Advanced/General in the VM’s settings dialogue. The following two parameters (first one for FSGSBASE and second for the XSAVE) were added:

cpuid.7.ebx = -------------------------------0
cpuid.d.eax = -----------------------------00-

Powering on the VM succeeded, however the vMotion to host A failed with the same error about FS & GS Base registers (but the XSAVE errors were gone). Surprisingly when I checked the .vmx directly and the cpuid.7.ebx line was missing. For some reason it appears that the VI client does not save this entry. So I removed the VM from the inventory, added that line to the .VMX directly and then re-registered the VM.

I was now able to power on the VM on host B and vMotion back and forth. I was not able to do the same when the VM was powered on on host A. I needed to merge the two sets of capabilities.

At this stage we would have the following in the .vmx file:

for host A -> host B:
cpuid.80000001.ecx = "-----------------------0--------"
cpuid.1.ecx = "---------0----------------------"

for host B -> host A:
cpuid.1.ecx = "--00-0--------0-----------------"
cpuid.7.ebx = "-------------------------------0"
cpuid.d.eax = "-----------------------------00-"

(Note that there are some default entries which get added which are all dashes, and one for cpuid.80000001.edx with dashes and a single H).

We merge our two sets of lines to obtain:

cpuid.80000001.ecx = "-----------------------0--------"
cpuid.1.ecx = "--00-0---0----0-----------------"
cpuid.7.ebx = "-------------------------------0"
cpuid.d.eax = "-----------------------------00-"

At this stage we can now power on the VM on either host and migrate in either direction. Success. Using these four lines of config, we have masked the specific features which vSphere was highlighting as preventing vMotion. It has also shown how we can hide or expose specific CPUID features on a VM by VM basis.

Manual EVC Baseline Configuration

Back to EVC. The default EVC masks can be determined by creating a cluster (even without any hosts) and enabling EVC. You can then see the default masks put in place on the host by EVC. Yes, EVC puts a default mask in place on the hosts in an EVC enabled cluster. The masked off CPU features are then not exposed to the guests at power-on and are not available during vMotion compatibility checks.

The default baselines for Westmere and Sandybridge EVC modes are shown below:

 

The differences are highlighted. Leaf1 (i.e. CPUID with EAX=1) EAX result relates to processor family and stepping information. The three Leaf1 ECX flags relate to AES, XSAVE and TSC-Deadline respectively. The three Leafd EAX flags are for x87, SSE and AVX XSAVE state. The three Leafd ECX flags are related to maximum size needed for the XSAVE area.

Anyway I’ve digressed. So the masks which I created above obviously only dealt with the specific differences between my two processors in question. In order to determine a generic “Westmere” compatible mask on a per VM basis we will start with VMware’s ESXi EVC masks above. The EVC masks are showing which feature bits are hidden (zeros) and which features may be passed through to guests (ones). So we can see which feature bits are hidden in a particular EVC mode. So to convert the above EVC baselines to VM CPUID masks I keep the zeros and change the ones to dashes. I selected dashes instead of ones to ensure that the default guest OS masks and host flags still take effect. We get the following for a VM for Westmere feature flags:

cpuid.80000001.ecx = "0000000000000000000000000000000-"
cpuid.80000001.edx = "00-0-000000-00000000-00000000000"
cpuid.1.ecx = "000000-0-00--000---000-000------"
cpuid.1.edx = "-000-------0-0-------0----------"
cpuid.d.eax = "00000000000000000000000000000000"
cpuid.d.ecx = "00000000000000000000000000000000"
cpuid.d.edx = "00000000000000000000000000000000"

I did not map the cpuid.1.eax flags as I did not want to mess with CPU family/stepping flags. Also, the EVC masks listed did not show the cpuid.7.ebc line I needed for the FSGSBASE feature. Sure enough, using only the 7 lines above meant I could not vMotion from host B to host A. So, adding

cpuid.7.ebx = "-------------------------------0"

to the VMX then allowed the full vMotion I was looking for. The ESXi hypervisor must alter other flags apart from only those shown on the EVC configuration page.

TL;DR

To configure a poor man’s EVC on a VM by VM basis for a Westmere feature set, add the following lines to a VM’s .VMX file.

cpuid.80000001.ecx = "0000000000000000000000000000000-"
cpuid.80000001.edx = "00-0-000000-00000000-00000000000"
cpuid.1.ecx = "000000-0-00--000---000-000------"
cpuid.1.edx = "-000-------0-0-------0----------"
cpuid.7.ebx = "-------------------------------0"
cpuid.d.eax = "00000000000000000000000000000000"
cpuid.d.ecx = "00000000000000000000000000000000"
cpuid.d.edx = "00000000000000000000000000000000"

 

Appendix

1

Useful thread -> https://communities.vmware.com/thread/467303

The above thread covers manipulating the guest CPUID. An interesting option is mentioned in post 9 relating to an option to enable further CPUID manipulation than is possible by default. In my tinkering above, I did not need this option.

monitor_control.enable_fullcpuid = TRUE

Note too that the vmware.log of any VM can be used to see the CPUID information of the host, as also mentioned in post 9:

As for extracting the results, you can write a program to query the CPUID function(s) of interest, or you can just look in the vmware.log file of any VM.  All current VMware hypervisors log all CPUID information for both the host and the guest

Post 13, again a user jmattson (exVMware now at Google), reveals a simple way to configure the processor name visible to guests:

cpuid.brandstring = "whatever you want"

2

This thread https://communities.vmware.com/thread/503236, again involving jmattson, discusses cpuid masks – and gives an insight into how EVC masks interact with the VM cpuid masks.

3

This post https://v-reality.info/2014/08/vsphere-vm-version-impact-available-cpu-instructions/ reveals that the virtual hardware version of a given VM also plays a role in the CPUID mask a VM is given. I found this interesting as it does give us another reason to actively upgrade the hardware versions of VMs.

4

A little gem is mentioned at the bottom of https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1029785 – it seems that IBM changed the BIOS default for some of their servers. The option was changed to enable the AES feature by default. This resulted in identical servers configured with BIOS defaults, which were added to a vSphere cluster, having two different sets of CPUID feature bits set (AES enabled on some and disabled on others) resulting in vMotions not being possible between all servers.

A recent comment I made on Chris Wahl’s blog seems to have generated a little bit of interest, so I thought I would expand upon my thinking about this.

The comment I wrote in response to the “The New Haswell Fueled ESXi 5.5 Home Lab Build” post is included here:

Great review of your lab kit once more!

I’ve been looking at new lab kit myself and have been considering E3-1200v3 vs E5-2600v2 processors. Obviously an E5 based machine is going to be more expensive than an E3 based one. However- the really big draw for me is the fact the E5 procs can use more than 32GB of RAM.

Looking at some rough costs of 2* E3-1265Lv3 32GB X10SL7-F-0 servers vs 1* E5-2640v2 64GB X9SRH-7F server (same cases and roughly similar cooling components) it seems that the two E3 servers are more expensive.

Do you consider 32GB to still be sufficient for a home lab? And within a year or three time? I’ve not considered the cost-benefit of a E3 now and replacing it (probably similar cost) with the “latest and greatest” equivalent in 2 years time. I guess it depends on one’s expected depreciation time frame.

Who would have thought VCAP-DCD that scale-out vs scale-up questions would be relevant to one’s home lab :)

(In fairness, I was looking at having this “lab” environment also run a couple of “production” VMs for home use concurrently, so the 32GB would not be dedicated to the LAB)

 

Going through various lab scenarios with some VMware software prompted me to consider options as far as home lab or standalone ESXi hardware was concerned. Also, I have some ageing server hardware which still runs vSphere but no nested 64-bit VMs (yes, fairly old E54xx and E5300 CPUs, without EPT and other cool new features) which needs to be replaced.

So, my requirements when considering the hardware for a home lab were

  • modern Intel CPU (I just prefer them… my choice – no debate 🙂  ), so i7, E3-1200v2 or E5-2600v2 are all in the running
  • sufficient RAM
  • remote KVM (i.e. iLO, DRAC, etc), remote media and remote power cycle
  • cost is a factor so high end server kit is probably out
  • power consumption must be considered. Electricity is not cheap these days and less power translates on the most part to less heat too

Now all are fairly self explanatory apart from the RAM. In my particular case, I would want to have the LAB kit also run a “production” workload of a couple of home servers (currently approximately 6-8GB RAM depending on TPS and workload). This presents a couple of challenges. How to separate things out sufficiently? I could go for a “big” server with more than 32GB of RAM or have more than one “smaller” server.

In terms of efficiency and reduced complexity, a single bigger server is probably going to give more scope for expansion in the RAM department. In my experience, RAM is constrained before CPU for most situations (even more so with my VMs ). So, the 32GB limit of the i7 and E3 is definitely something to consider. The E5 gives 64GB and upward, depending on chipset, motherboard, DIMMs in use, etc.

So given I currently need about 7GB RAM for my “production” workload, that would leave 24GB (of a maxed out i7/E3 CPU) for the LAB.  Is 24GB sufficient for my lab purposes? That is a question I am still grappling with before actually purchasing any new kit.

I have managed to run a Site Recovery Manager set up under VMware Workstation on a desktop machine (i7 and 32GB RAM). The performance was “OK” but I noticed a few packets being lost every 15 minutes or so. (I started the ping tests after noticing some network connectivity issues). I attributed this packet loss to the VR traffic causing some bottleneck somewhere, but that is a post for another time.

Clearly 32GB is sufficient for many workloads. Even 8GB would be sufficient for a small two node ESXi cluster with vCenter on a minimalist Windows VM. So – what is the necessary LAB RAM size? Well, to answer that you need to look at the workload you intend to run.

Not only that, you need to factor in how long this lab needs to last and what your expansion plan would be. Do you have space for more than one machine?

So to wrap up with some take-away bullet points to consider when thinking about home/small vSphere labs:

  • 32GB “hosts” (be they ESXi to run nested ESXi and other VMs or Workstation to run nested ESXi and other VMs) are still perfectly viable for the VCP-DCV/VCAP-DCA/VCAP-DCD exams
  • 32GB “hosts” may struggle with cloud lab setups. More VMs doing more things
  • 32GB is a limit imposed by the choice of CPU – i7/E3 and cannot be expanded beyond. Worth bearing in mind if one only has space for a single machine that needs to last for a few years
  • Less RAM and “smaller” CPUs will tend to use less power, create less heat and produce less noise than bigger machines and will be more suited for home use
  • Fewer larger hosts will likely be more “home” friendly
  • More smaller hosts will likely give more lab opportunities – real FT, real DRS, real DPM
  • Scale up vs scale out – factor all options. For instance, my rough costing spreadsheet, as mentioned above, showed a single E5+64GB RAM server was cheaper than two E3 with 32GB servers
  • the i7 and E3 servers tend to be single socket while the E5 can be dual socket capable

Next time I come to replace my lab I will probably lean towards a single E5 with 64GB RAM (if RAM prices have dropped by then) on a SuperMicro motherboard.  Or a E3 with 32GB and a much smaller Intel NUC or Shuttle based box for “production” workloads.

So – yes, 32GB is currently sufficient for many home lab uses… but not all 🙂

 

vcap-dcd

Well, I managed to pass the “VMware Certified Advanced Professional – Data Center Design” (VCAP5-DCD) exam yesterday! Hurray.

First – a shout out to the various blogs which helped with the studying. Unfortunately, I don’t have a central list thereof to post here, but if I get time to collate the links I will update this post. A very useful summary of the DCD content  is contained here at http://professionalvmware.com/wp-content/uploads/2012/11/VCAP5DCD_StudyOutline.pdf.

In summary, this exam is all about general design processes with an obvious slant towards the VMware virtualisation platform. So you need to know the VMware “base” vSphere offerings along with detail of “general design principles”. This exam is probably not going to be  easy for a day-to-day vSphere admin as this is not about testing technical knowledge of the product set. Having been in a variety of architecture roles for the last number of years I can attest to this exam being a fairly good  representation of the real and thought processes necessary to go from capturing requirements through to implementation and subsequent support. If only we could follow these processes for all projects 🙂

So what to cover in preparation? Well, follow the blueprint! It may seem obvious, but for this exam you need to read all (well at least most of!!) the referenced documentation. I don’t think you need much (if any) hands-on lab time to prepare for this exam. Knowing various available options in the products can be learnt from the documentation. Saying that though, I did do some hands on exercises to reinforce the learning. Various books are incredibly useful too, including “VCAP5-DCD Official Cert Guide” by Paul McSharry, “VMware vSphere Design 2nd Ed” by Forbes Guthrie and Scott Lowe and “VMware vSphere 5 /5.1 Clustering Deep Dive” by Duncan Epping and Frank Denneman. “Mastering VMware vSphere 5” or “Mastering vSphere 5.5” (v5.5, less so for the exam I suppose) by Scott Lowe et al are great books and definitely worth reading, although can  be skipped for the DCD in my opinion if you don’t have the time.

I would point out that the exam is broadly focussed on vSphere 5.0 as opposed to 5.1/5.5. Don’t rule out any technologies  “removed” or “deprecated” by 5.1 and 5.5!

The exam itself. Well 3h45 is a long time for an exam. It flew by for me and I managed to finish with 15 minutes to spare. Somehow I made up time after the half way point which was a pleasant surprise. The 100 questions, of which I had 6 of the “Visio-like” design drawing, all covered the content from the blueprint. I don’t think there was anything which rang alarm bells as “whoa, where did that come from” – just a couple of questions where I though “drat, didn’t cover that in enough detail”. Remember, you cannot go back in this exam – so if you get a subsequent question which shows you answered something incorrectly earlier try not to let it get to you – move forward and stay focussed.

The design drawing questions are fairly straight forward if you can understand what they are trying to test.  That was the first problem I had – I struggled with a couple of them to understand what they were actually trying to get me to draw as I found some of the wording to be a little ambiguous. The rest were fairly straight forward.  Put down a few building blocks and link them together. Ah, and there is the second problem, when you are putting things into other things (for example, say a VM into an ESXi host) sometimes they would not stick and as such I was not sure if the tool “registered” the placement. Anyway – I tried not to get bogged down by this and quickly moved forward regardless. Do practise with the simulation tool on the VMware web site.

The remaining requestions are split between choose 1, choose 2 or choose 3 multiple choice and match the left column to the right column type questions. The multiple choice questions are generally the easier ones, although you need to pay attention to the exact wording and match it to phrases used in the prep material when describing the various terms. Keep an eye on the definitions in the “VCAP5-DCD Official Cert Guide” and “VMware vSphere Design 2nd Ed” books. The match the left to the right would be easy if they were all 1:1 mappings, which they are unfortunately not.  Some are 1:1 some are n:1 and others are 1:n. Tricky stuff! I consider myself pretty good at the requirements, risk, assumption and constraint stuff but some of the terms/phrases they used could be a little ambiguous – let’s hope they accept various answers. In these situations, I tried not to over think the wording and just read it at face value 🙂

So, all in all I think this is a pretty decent exam which does a good job of evaluating a candidate’s understanding of the topics at hand. I don’t think this is one of those exams where one can simply memorise a number of facts and pass.

 

As I’m sure most of the active VMware users and enthusiasts are aware, vSphere 5.5 was released to the masses last weekend. I eagerly downloaded a copy and have installed it on a lab machine. I’ve not played with the full suite yet – just the ESXi 5.5 hypervisor.

The install went smoothly on the HP DL360G5 I was using. Unfortunately, the server only has 32GB RAM so I cannot test for myself that the 32GB limit for the “free” hypervisor is removed. I can confirm that under the “licensed features” heading the “Up to 8-way virtual SMP” entry is still there but the “Up to 32 GB of memory” entry is removed (when using a “freebie” license key). So that looks good 🙂 As I said, I’ve not installed the entire suite yet, only the hypervisor, so I am only using the Windows client currently. Don’t do what I did and upgrade a VM’s hardware version – you won’t be able to manage it via the Windows client – which does not support the latest features (including newer VM hardware versions).

Anyway, one of the first things I check when I install ESXi onto a machine is that the hardware status is correctly reported under the Configuration tab. Disks go bad, PSUs fail or get unplugged and fans stop spinning so I like to ensure that ESXi is reporting the server hardware health correctly. To my dismay I found that the disk health was not being reported for the P400i attached storage, after installing from the HP OEM customised ESXi 5.5 ISO. Now this is not entirely unexpected, as the HP G5 servers are not supported with ESXi 5.5. Drat!

By following the VMware Twitteratti, I’ve learnt that various ESXi 5.0 and 5.1 drivers have been successfully used on ESXi 5.5 (specifically for Realtek network cards, the drivers for which have been dropped from ESXi 5.5). So I figured I’d give it a go at using the ESXi 5.0/5.1 HP providers on this ESXi 5.5 install.

I downloaded “hp-esxi5.0uX-bundle-1.4-16.zip” from HP’s website, which is contained on the “HP ESXi Offline Bundle for VMware ESXi 5.x” page which can be navigated to from http://h18000.www1.hp.com/products/servers/software/vmware-esxi/offline_bundle.html.

This ZIP file contains a few .vib files, intended for VMware ESXi 5.0 or 5.1. The VIB we are looking for is called “hp-smx-provider-500.03.02.00.23-434156.vib”. Extract this .VIB, and upload it to your favorite datastore. Now, enable the ESXi shell (or SSH) and connect onto the ESXi host’s console. Use the following command:


esxcli software vib install -v file:///vmfs/volumes/datastore1/hp-smx-provider-500.03.02.00.23-434156.vib

and reboot the host. You should now see this software component listed under Software Components within the Health Status section. You should also see the health of the P400i and its associated storage listed. So far so good. However, on my server the HP P400i controller was showing as a yellow “Warning”. Hmm. Not sure why.

So, I figured maybe there was an incompatibility between these older HP agents and the newer versions from the HP OEM CD. So, I decided to reinstall ESXi from the plain VMware ESXi 5.5 ISO.

So, a fresh install results in fan status, temperature readings and power supply status being reported and no (as expected) P400i storage health.

So, let’s install “hp-esxi5.0uX-bundle-1.4.5-3.zip”. Yes it’s a newer version than I used above, only because I found it after I’d reinstalled the vanilla ESXi.


esxcli software vib install -d file:///vmfs/volumes/datastore/hp/hp-esxi5.0uX-bundle-1.4.5-3.zip
reboot

Hey presto! Green health status. I pulled a drive from a RAID array and the status indicated the failure and then the subsequent rebuild. Certainly seems to be a workable solution to extend the life of these perfectly serviceable lab machines 🙂

I would expect this status monitoring to work for P800 controllers too.

One can also install hp-HPUtil-esxi5.0-bundle-1.5-31.zip to get access to some HP utilities at the ESXi command line.

 

I’ve recently got to setup up a Shuttle XH61V for a friend. I’ve read a few posts about how they make good VMware ESXi hosts for those power conscious folk running home labs. I figured this would be a good time to see just how power hungry, or not, one of these boxes is and how well ESXi runs.

The box would end up with an Intel i3-2120 processor, 16GB RAM (2 * Crucial CT102464BF1339) and 126GB Crucial mSATA SSD (CT128M4SSD3). Quite a beefy XBMC media centre PC build from a selection of new bits and pre-owned bits! Anyhoo, while putting the components together I took some power readings along the way:

 

Description Power (VA) Power (W)
Power supply alone, i.e. without computer attached 20VA 2W
Power supply with bare case off 20VA 2W
Power supply with bare case on (turned on but obviously doing nothing) 24VA 3W
PSU + case + 2*8GB DIMMs (turned on but obviously doing nothing) 24VA 3W
PSU + case + CPU + 2*8GB DIMMs (idling at BIOS) 46VA 37W
PSU + case + CPU + 2*8GB DIMMs + SSD (idling at BIOS) 46VA 37W
PSU + case + CPU + 2*8GB DIMMs + SSD (switched off) 24VA 3W
Installing ESXi 32VA – 46VA
ESXi with no VMs (High Performance power option) 40VA
ESXi with no VMs (Balanced power option) 32VA 21W
ESXi with no VMs (Low power option) 32VA 21W
ESXi with three busy VMs (Balanced power option) 64VA
Windows 7 x64 SP1 idle (balanced, low, high power options) 32VA 21W
Windows 7 x64 SP1 put into sleep mode 28VA 3W

 

So, not too shabby when it idles. I will be interested in seeing what power a 22nm 3rd or 4th generation processor would consume while idling. It seems that this i3-2120 CPU idles at approximately 18W. During a heavy work load, the processor seems to consume approximately 21W extra for a total of roughly 40W – not quite the 65W TDP max Intel quote.

I installed it with the standard ESXi 5.1U1 installation media. No issues, once I found a suitable USB drive to USB boot from! Both onboard NICs were seen and the mSATA SSD was recognised too.

Note: It seems the included Realtek 8186 has reliability issues under VMware ESXi 5.1. The odd thing is that when I first installed ESXi 5.1 it worked fine and I was able to use to successfully. However, once I rebooted a couple of times, the NIC does not really work. It manages to get a DHCP IP address and is pingable for about 30 seconds before it drops off the network. No log entries on the host or the switch indicate the cause. Very curious!

Well, in short the GIGABYTE G1.Sniper M3 motherboard does support Intel VT-d and both ESXi 5 and 5.1 can use it. I have tested this with BIOS version f9 and “beta” BIOS versions f10c, f10d and f10e and all show VT-d as an option when a compatible processor is installed. Note that this option is not shown unless a suitable (generally Intel i5 or i7 non-k CPU) processor is installed. The “VT-d” option is shown below the “Intel Virtualization Technology” option on the “BIOS Features” page of the BIOS setup.

I have had mixed success with actually passing through devices to VMs. Generally the cards in PCI-E slots and configured for pass through worked as expected within a VM during my testing (USB3, NICs, Hauppauge 1700 cards). However, devices on the motherboard (SATA, Audio, LAN, Video) and PCI-E graphics cards do not work. For the most part, these devices pass through but the devices don’t start under Windows, drivers fail to attach or cause blue screens when accessed (yes, Mr ATI graphics card with your atikmpag.sys driver BSOD).

Until I actually did these tests I was not sure if this motherboard did or did not support VT-d /Intel Virtualization Technology for Directed I/O / VMware VMDirectPath I/O. I already had this motherboard in a HTPC with an Intel i3 (dual core with hyper threading) which, by the way, ran ESXi adequately. I wanted to play with VT-d so  I took a punt on an Intel i7 processor and luckily it worked. If not, my backup plan was to also procure an ASRock motherboard, most of which seem to have working VT-d support.

I had hoped to run a virtual HTPC with an ATI graphics card passed through on this computer. Unfortunately the virtualisation gods do not seem to be happy with this idea at the moment. Still, this box makes a decent whitebox ESXi host, apart from the onboard Intel 82579V NIC which ESXi does not support out the box. A custom driver needs to be injected into the ESXi installation ISO, unless you have a supported PCI-E NIC in which case the driver can be installed post-install.

Note1: While playing with passthrough and various options of internal graphics and PCI-E graphics BIOS configurations I got to the point where I could no longer get graphics from the onboard graphics card. I found a couple of posts on the Internet about this too. Even resetting/clearing CMOS did not resolve this. As per the other posts, I reflashed the BIOS and it sorted it out. Weird behaviour and unexpected – I could not get the BIOS to save the option to use IGFX (Internal graphics) rather than PEG (PCI-E graphics) as the “Init Display First” option.

Note2: The following are the graphics cards I attempted to pass through to the VMs. Note that I tried both VMware ESXi 5.0U2 build 914586 and 5.1 build 799733 and 914609 with motherboard BIOS f9, f10d and f10e.

Asus ATI Radeon HD 5450 – passed through and seen by VM but has atikmpag.sys BSOD 0x0116 “Attempt to reset the display driver and recover from timeout failed.” whenever I connected the monitor to the graphics card or tried to enable the display on the card.

Asus ATI Radeon HD 6450 – exactly as above.

Asus NVIDIA Geforce GT610 – passed through and seen by the VM. However the device driver is unable to start in Windows.

Note3: While trying to get the graphics cards to work properly I tried various combinations of additional/advanced settings including:

On the VM:

pciHole.start 1200
pciHole.end   2200
pciPassthru0.maxMSIXvectors  16
pciPassthru0.msiEnabled   FALSE

On the host, edit of: /etc/vmware/passthru.map by adding

#ATI Radeon HD
1002 ffff bridge false
#tried with flr/d3d0/link/bridge in the third column

Note4: In ESXi5.1 ACS checking is enforced more strictly resulting in quad-port NICs (and other devices apparently) not successfully getting configured for passthrough. After each reboot the devices still show as needing a reboot. The console logs show a message similar to:

WARNING: PCI: ssss: nnn:nnn:nn.n: Cannot change ownership to PASSTHRU 
(non-ACS capable switch in hierarchy)
This can be bypassed (at your own peril) using the host advanced option: disableACSCheck=true
Use the vSphere console: Configuration/Software/Advanced Settings/VMKernel/VMKernel.Boot.DisableAcsCheck
More info can be found at this informative post. This option got the quad port NICs passed through successfully but did not make any difference to the ATI or NVIDIA cards.
Note5: Currently ESXi 5.1 upto and including build 914609 does not seem to allow USB controller passthrough via VMDirectPath I/O. You can select the device for passthrough but once the host is rebooted, the device is unselected. I am not sure if this is a bug or a conscious decision by VMware. A cynic like myself might think this is intentional, as without the ability to pass through a USB controller there is no way to pass through a real keyboard and mouse into a VM and hence no need to get GPUs working with passthrough. (Hmm – maybe a bluetooth USB device passed into a VM and then paired with a BT keyboard and mouse?? Something for another day).

 

 

Well today I’ve been upgrading a couple of my servers from VMware ESXi 3.5 and ESXi 4.1 to ESXi 5.0. For the most part this went smoothly and without any drama.

The HP DL360 G5 upgrade from ESXi 4.1 to 5.0 went smoothly and the upgrade process maintained all the settings and configuration properly. The hardware health monitors were working before and after the upgrade without the need for any additional fiddling. I used the VMware ESXi 5.0U1 ISO from HP.com for this server.

The HP ML110 G5 needed to be a reinstalled as it was running ESXi 3.5 and there is no direct upgrade path to 5.0. After recreating the vSwitches and associated VM port groups I was up and running. I used the HP.com image once more and to my surprise the hardware health monitoring now shows the RAID status of the SmartArray E200 controller. In the past, when using the HP providers on ML110G5 hardware, purple screens were common. Now, the server seems stable and displays the storage health status. A win for the day!

Note that this server needed a further tweak as the SCSI passthrough of the SCSI attached LTO3 drive stopped working after the installation of ESXi5.0. A bit of Googling revealed that the following would solve this problem:

esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --vendor="HP" --model="Ultrium 3-SCSI"

So the VM could now see the attached tape drive. However VMware appear to have changed their passthrough or SCSI subsystem since ESXi3.5 and as a result I’ve had to reduce my tape block size. In the past I was able to read and write 512kB blocks (tar -b 1024)  however I’ve had to drop this to 128kB blocks (tar -b 256). If I get some time, I will attempt  to work out the exact limit and update this post.

For the Dell PE840 upgrade, I used the Dell ESXi 5.0 customised ISO. Again, the upgrade from 4.1 preserved the configuration of the server. To my dismay the RAID status of the PERC 5/i was now missing. Turns out the Dell ISO is lacking the providers for storage health. Long story short, after some searching I got the health status back. I initially tried the Dell OpenManage VIB (OM-SrvAdmin-Dell-Web-6.5.0-542907.VIB-ESX50i_A02.zip) which didn’t appear to change much. The useful info was here on the RebelIT website which referred to using the VIB from LSI.com. This made sense as the Dell PERC 5/i is basically a LSI MegaRAID SAS 8480E. I downloaded the VIB (VMW-ESX-5.0.0-LSIProvider-500.04.V0.24-261033-456178.zip) from LSI.com. Note that the 8480E is not listed as supported by this release, but it works – PHEW! I guess the Perc 5/i is getting old in the tooth now, but given it works like a champ there is no need to upgrade. Note that I had to extract the .zip file and then install the VIB from the server’s console as:

esxcli software vib install -v /vmfs/volumes/datastore1/vmware-esx-provider-LSIProvider.vib

So now all three servers have been upgraded to ESXi 5.0 and have full hardware health status available which is being monitored via Nagios. Now the fun begins, upgrading the hardware version and VMware Tools for all the VMs….