Evernote Export

How I increased IOPS 200 times with XenServer and PVS

UPDATED INFO HERE: http://virtexperience.com/2014/03/10/an-update-about-my-experience-with-pvs/

In my previous blogpost I’m describing how the PVS 7.1 new Cache to Ram with overflow to disk does not give you any more IOPS than cache to disk.http://virtexperience.com/2013/11/05/pvs-7-1-ram-cache-with-overflow-to-disk/

During that test, I discovered that Intermediate Buffering in the PVS device can improve your performance 3 times on xenserver, but with some more experimenting I got up to 200 times the IOPS on a PVS device booted on Xenserver. Here is a little description of what I did and how I measured it.

First, a little disclaimer. This is observations done I a LAB environment, DO NOT implement this in production without testing properly.

I’m using IO meter to test, and here is my IO meter setup for this test and my previous cache to RAM test.

  • Target Disk: c:
  • Maximum disk size 204800 (100MB)
  • Default Access specification
  • Update frequency 10 secs
  • Run time 1 minute

I know there has been a lot of discussions on around wether or not to enable intermediate buffering in the PVS device or not. Citrix has a KB article that describes this:http://support.citrix.com/article/CTX126042

Basically; some disk controllers will give you better performance and some will give you worse with buffering enabled. In a virtual environment, it’s actually the virtual disk controller that counts. So I’ve tested on Hyper-V 2012 R2 and Xenserver 6.2 if the virtual disk controller was able to get better performance with this option. I did a baseline with a Windows 2008 R2 image on the same hardware on both hypervisors. Without buffering enabled, Hyper-V was 2x faster than Xenserver. When enabling buffering, the VM got 3x performance on Xenserver, but on hyper-v it actually got slower. So the virtual disk controller on Hyper-V was reacting negatively in my test to disk buffering. Please let me know if you know a way to tune this on Hyper-V.

Reading further on the Citrix KB description of disk buffering, you will see that the RAM on the disk controller is being used when intermediate disk buffering is enabled. Randomly I stumbled upon this article that describes how to increase Xenserver performance by adding more RAM to the dom0: http://blogs.citrix.com/2012/06/23/xenserver-scalabilitiy-performance-tuning/

Reading the documentation of the dom0 RAM it’s clearly that the virtual disk controller is using dom0 RAM and that with many VM’s you have to increase the dom0 RAM to avoid it becoming a bottle neck for disk IO. The only reason not to increase it is if you don’t have much RAM available and want to save RAM. Well, today most servers are on 128 or 256GB ram, I Guess increasing from 752 to 2940 is okay.


I followed the guide to increase dom0 RAM to 3GB.

Booting a local VM running IO meter showed no difference. Booting a PVS VM without intermediate buffering was still not better. About 150 total IOPS as my PVS cache was on a slow SATA disk. Then I enabled disk buffering in the PVS image: REG ADD HKLMSYSTEMCurrentControlSetServicesBNIStackParameters /v WcHDNoIntermediateBuffering /t REG_DWORD /d 2 /F

Booting up, running IO meter again I was shocked. 35000 IOPS! That is more than 200 times better! Moving the cache to SSD disk, I got 55000 total IOPS. Wow! I had to test several times, with different OS’es but with the similar results. I also tested this with “Cache to RAM with overflow to disk”, and now I was able to get 55000 IOPS here too. It means that cache to device RAM with overflow to disk, is actually, “cache to host RAM”, and give the same result as cache to device disk.

Checking with Uberagent for splunk, it gives me 1,5 sec total login time! (No GPO, local profile, no loginscripts)

So the conclusion to my test is:

Intermediate disk buffering together with Xenserver with lots of dom0 RAM, performs much better that PVS cache to RAM but without the risk of crash when RAM is full. How this will work in a production environment however I don’t know yet. I guess LoginVSI could be used to simulate workloads a get a better picture of this, but I haven’t been able to do so yet.

Please test this yourself and verify if you get the same result as me.

MCS will not be able to get the same boost, but adding more DOM0 RAM is still a good idea. I don’t know if it’s possible to enable similar buffering with MCS, I guess this is something for Citrix to look into.

I’ve tested with the following OS’es:

  • Windows 7 X64
  • Windows 2008 R2 x64
  • Windows 2012 R2 x64
  • Windows 8.1 x64

Below is a diagram showing the data from all my tests.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s