TrainSignal: Introduction to EMC Storage Essentials – YouTube

Advertisements

Microsoft Releases iSCSI Software Target 3.3 to Windows Server 2008 R2 Users – Reposted

Microsoft Releases iSCSI Software Target 3.3 to Windows Sever 2008 R2 Users

Microsoft released on Monday iSCSI Software Target 3.3, a Windows Server 2008 R2 addition that allows for shared block storage in storage area networks using the iSCSI protocol.

According to Microsoft’s announcement, iSCSI Software Target 3.3 is the first release that can be used in a production environment. The product enables “storage consolidation and sharing on a Windows Server by implementing the iSCSI (Internet Small Computer Systems Interface) protocol, which supports SCSI-block access to [a] storage device over a TCP/IP network,” according to the product overview at Microsoft’s Download Center.

This type of storage architecture offers a number of benefits, according to the product overview. It can be used to achieve high availability with Microsoft’s Hyper-V hypervisor using the “live migration” feature. Storage for application servers can be consolidated, including on a Windows failover cluster. Finally, Microsoft iSCSI Software Target 3.3 supports the remote booting of diskless computers “from a single operating system image using iSCSI.”

Microsoft’s team ran this release the software through extensive testing, particularly with Windows Server failover clusters and Hyper-V, according to the announcement. One scenario involved using Microsoft iSCSI Software Target in a “two-node Failover Cluster,” with 92 Hyper-V virtual machines storing data to one of the nodes. The team introduced a failure in the main node and found that all 92 virtual machines switched to the second node without a noticeable effect on the underlying application.

Microsoft is recommending using Service Pack 1 with Windows Server 2008 R2 for this release of Microsoft iSCSI Software Target. The product can be installed in a Hyper-V virtual machine. It doesn’t work with a core installation of Windows Server 2008 R2.

About the Author

Kurt Mackie is senior news producer for the 1105 Enterprise Computing Group.

THE DISK IS OFFLINE BECAUSE OF POLICY SET BY AN ADMINISTRATOR

Note from Tanny:

This post did not work, but is worth sharing…. in my case it was a matter of just bringing the storage resource online in the cluster resource manager.

For a 2008R2 Clustered environment, take look at the cluster resource manager.
In my case it was a matter of bring the storage resource on line.  We swing a LUN from different servers for quick backups and restores.  The instructions did not work, but usually after presenting the LUN to the Cluster or any of the stand alone environments, a quick scan will bring the disk online and keep the previous drive letter.

Source: (Repost from the Happy SysAdm Blog)The disk is offline because of policy set by an administrator

You have just installed or cloned a VM with Windows 2008 Enterprise or Datacenter or you have upgraded the VM to Virtual Hardware 7 and under Disk Management you get an error message saying:
“the disk is offline because of policy set by an administrator”.
This is because, and this is by design, all virtual machine disk files (VMDK) are presented from Virtual hardware 7 (the one of ESX 3.5) to VMs as SAN disks.
At the same time, and this is by design too, Microsoft has changed how SAN disks are handled by its Windows 2008 Enterprise and Datacenter editions.
In fact, on Windows Server 2008 Enterprise and Windows Server 2008 Datacenter (and this is true for R2 too), the default SAN policy is now VDS_SP_OFFLINE_SHARED for all SAN disks except the boot disk.
Having the policy set to Offline Shared means that your SAN disks will be simply offline on startup of your server and if your paging file is on one of this secondary disks it will be unavailable.
Here’s the solution to this annoying problem.
What you have to do is first to query the current SAN policy from the command line with DISKPART and issue the following SAN commands:
= = = = = = = = = = = = = = = = = =
DISKPART.EXE
 
DISKPART> san
 
SAN Policy : Offline Shared
= = = = = = = = = = = = = = = = = =
Once you have verified that the applied policy is Offline Shared, you have two options to set the disk to Online.
The first one is to log in to your system as an Administrator, click Computer Management > Storage > Disk Management, right-click the disk and choose Online.
The second one is to make a SAN policy change, then select the offline disk, force a clear of its readonly flag and bring it online. Follow these steps:
= = = = = = = = = = = = = = = = = =
DISKPART> san policy=OnlineAll
 
DiskPart successfully changed the SAN policy for the current operating system.
DISKPART> LIST DISK
 
Disk ### Status Size Free Dyn Gpt
——– ————- ——- ——- — —
Disk 0 Online 40 GB 0 B
* Disk 1 Offline 10 GB 1024 KB
 
DISKPART>; select disk 1
 
Disk 1 is now the selected disk.
 
DISKPART> ATTRIBUTES DISK CLEAR READONLY
 
Disk attributes cleared successfully.
 
DISKPART> attributes disk
Current Read-only State : No
Read-only : No
Boot Disk : No
Pagefile Disk : No
Hibernation File Disk : No
Crashdump Disk : No
Clustered Disk : No
 
DISKPART> ONLINE DISK
 
DiskPart successfully onlined the selected disk.
= = = = = = = = = = = = = = = = = =
Once that is done, the drive mounts automagically.
  1. So, I’m trying all this but the return message I get in disk part is “DiskPart failed to clear disk attributes.”. Any further advice?

    DISKPART> san policy=OnlineAll

    DiskPart successfully changed the SAN policy for the current operating system.

    DISKPART> rescan

    Please wait while DiskPart scans your configuration…

    DiskPart has finished scanning your configuration.

    DISKPART> select disk 1

    Disk 1 is now the selected disk.

    DISKPART> attributes disk clear readonly

    DiskPart failed to clear disk attributes.

    DISKPART> attributes disk
    Current Read-only State : Yes
    Read-only : Yes
    Boot Disk : No
    Pagefile Disk : No
    Hibernation File Disk : No
    Crashdump Disk : No
    Clustered Disk : Yes

    DISKPART> san

    SAN Policy : Online All

    (Note from Tanny take a look at the Cluster resource manager and bring storage resource online)

  2. I see you problem. Have you checked that you have full access to the volume you want to change attributes for? Is it a cluster resource? I think so because your log says “clustered disk: yes”. In this case you should stop all nodes but one and then you will be allowed to use diskpart to reset the flags. The general idea is to grant the server you are connected to write access to the volume.
    Let me know if you need more help and if, so, please post more details about you configuration (servers and LUNs).
    Regards

    Reply

  3. I am having this same problem. It is in cluster and I have shut down the other node. I am still unable to change the read only flag.
    Please help?!

  4. Wacky problem – a SAN volume mounted to a 2008 (not R2) 32bit enterprise server had been working fine. After a reboot of the server, the disk was offline. Putting it back online, no problem, diskpart details for the volume show “Read Only: No”. Got support feom Dell and foud the the Volume was listed as Read Only. Simple fix, change the Volume to “Read Only: No” with Diskpart. 4 hours later, the Volume is marked as “read only” again.No chnages made by us, nothing in the Windows logs.
    The disk is an Dell/Emc SAN LUN, fiber connected, exclusive use to this machine. Have another LUN, almost the same size attached the same way to this machine, no problems with that. Appreciate any thoughts or places to look.

    Reply

  5. Ahhh, nice! A perfect tutorial! Thanks a lot!

    Reply

  6. Great article! I just spent 2 hours trying to figure out why my san disks weren’t showing and this was the fix.

    Thank you!

    Reply

  7. Thank you, thank you, thank you! This article helped me with an IBM DS3000 and an IBM System x3650M3 Windows Server 2008 R2. Thumbs up to you! I’d be still trying to figure why I couldn’t configure these drives!

    Reply

  8. These settings are good for window server 2008 R1 and R2. It breaks again with R2 SP1 ;-(. Is there any solution for R2 SP1?

    Reply

  9. Thanks. Very helpful.

    Reply

  10. Wonderful article..thanks a lot dude!!

    Reply

  11. This worked perfectly for me. I tried figuring it out on my own but just couldn’t get it to work within VMware Workstation.

    Reply

  12. let me know i how to remove is read only attribute and bring online. if i access san directly then it possible.
    i have two server in one server its show online but in second server its display reserved the disk offline message.

    I’m also trying all this but the return message I get same problem in disk part is “DiskPart failed to clear disk attributes.”. Any further advice?

    DISKPART> san policy=OnlineAll

    DiskPart successfully changed the SAN policy for the current operating system.

    DISKPART> rescan

    Please wait while DiskPart scans your configuration…

    DiskPart has finished scanning your configuration.

    DISKPART> select disk 1

    Disk 1 is now the selected disk.

    DISKPART> attributes disk clear readonly

    DiskPart failed to clear disk attributes.

    DISKPART> attributes disk
    Current Read-only State : Yes
    Read-only : Yes
    Boot Disk : No
    Pagefile Disk : No
    Hibernation File Disk : No
    Crashdump Disk : No
    Clustered Disk : Yes

    DISKPART> san

    SAN Policy : Online All

    Reply

  13. Exactly the answer I was looking for!

    Reply

  14. Well done – fixed me right up.

    Reply

  15. Perfect answer for a vexing problem. I had no clue where to look for

    Reply

  16. This is really helpful article ! Many Thanks.

    Reply

  17. Thanks for your reply!

  18. Thanks, this was very helpful for me.

    Reply

  19. Hi same problem here, the disk says its a clustered disk but i don’t have it in the Failover cluster manager. Its just a dedicated disk to one server from the san.. have cleared simultaneous connections and only one server is connected now but still won’t come online. Any help would be great.
    Thanks

    Reply

  20. Anonym

Citrix Blog: Turbo Charging your IOPS with the new PVS Cache in RAM with Disk Overflow Feature! – Part One

With PVS 7.1 and later you may have noticed a new caching option called “Cache in Device RAM with Hard Disk Overflow”.  We actually implemented this new feature to address some application compatibility issues with Microsoft ASLR and PVS.  You can check out CTX139627 for more details.

One of the most amazing side effects of this new feature is that it can give a significant performance boost and drastically increase your IOPS for PVS targets while reducing and sometimes eliminating the IOPS from ever hitting physical storage!!!

My colleague Dan Allen and I have recently been conducting some testing of this new feature in both lab and real customer environments and wanted to share some of our results and some “new” recommended practices for PVS that encourage everyone to start taking advantage of this new cache type.   This is a 2 part blog series where I first recap the various write cache options for PVS and discuss some of new results we are seeing with the new cache type.  In part 2 of the series, Dan Allen dives deeper into performance results and provide some guidelines for properly sizing the RAM for both XenApp and VDI workloads.

PVS Write Cache Types

When using Provisioning Services, one of the most important things to ensure optimal performance is the write cache type.  I am sure that most of you are already familiar with the various types, but I will review them here again as a refresher!

  1. Cache on server: this write-cache type places the write-cache on the PVS server.  By default it is placed in the same location as the vDisk, but a different path can be specified.
    This write-cache type provides poor performance when compared to other write cache types, limits high availability configurations, and should almost never be used in a virtual environment.  This option was typically only used when streaming to physical end points or thin clients that are diskless.
  2. Cache on device’s hard drive:  this write-cache type creates a write-cache file (.vdiskcache) on the target devices’ hard drive.  It requires an NTFS formatted hard drive on the target device to be able to create this file on the disk. This cache type has been our leading Citrix best practice for environments to date and most of our deployments use this write-cache type as it provides the best balance between cost and performance. To achieve the highest throughput to the write-cache drive, Intermediate Buffering should almost always be used (caution should be used with target devices hosted on Hyper-V where we have occasionally seen adverse effects).  Intermediate Buffering allows writes to use the underlying buffers of the disk/disk driver before committing them to disk allowing the PVS disk drive to continue working rather than waiting for the write on disk to finish, therefore increasing performance.  By default this feature is disabled.  For more information on Intermediate Buffering, including how to enable it, please refer to CTX126042.
  3. Cache in device RAM:  this write-cache type reserves a portion of the target device’s memory for the write cache, meaning that whatever portion of RAM is used for write-cache is not available to the operating system.  The amount of memory reserved for write-cache is specified in the vDisk properties.  This option provides better throughput, better response times, and higher IOPS for write-cache than the previous types because it writes to memory rather than disk.
    There are some challenges with this option, though.  First of all, there is no overflow, so once the write cache is filled the device will become unusable (might even blue screen).  Therefore, there has to be plenty of RAM available for the target devices to be able to operate and not run out of write-cache space, which can be expensive, or just not possible because of memory constraints on the physical host.  Second, if there is a need to store persistent settings or data such as event logs, a hard drive will still be required on each target.  On the flip side, this hard disk will not be as large or use as many IOPS as when using “Cache on device’s hard drive” since the write cache will not be on it.  We have typically seen customers successfully use this feature when virtualizing XenApp since you do not run as many XenApp VMs on a physical host (compared to VDI), so often times there is enough memory to make this feature viable for XenApp.
  4. Cache on device RAM with overflow on hard disk:  this is a new write-cache type and is basically a combination of the previous two, but with a different underlying architecture.  It provides a write-cache buffer in memory and the overflow is written to disk.  However, the way that memory and disk are used is different than with “Cache in device RAM” and “Cache in device’s hard drive” respectively.  This is how it works:
  • Just as before, the buffer size is specified in the vDisk properties. By default, the buffer is set to 64 MB but can be set to any size.
  • Rather than reserving a portion of the device’s memory, the cache is mapped to Non-paged pool memory and used as needed, and the memory is given back to the system if the system needs it.
  • On the hard drive, instead of using the old “.vdiskcache” file, a VHDX (vdiskdif.vhdx) file is used.
  • On startup, the VHDX file is created and is 4 MB due to the VHDX header.
  • Data is written to the buffer in memory first.  Once the buffer is full, “stale” data is flushed to disk.
  • Data is written to the VHDX in 2 MB blocks, instead of 4 KB blocks as before.  This will cause the write-cache file to grow faster in the beginning than the old “.vdiskcache” cache file.  However, over time, the total space consumed by this new format will not be significantly larger as data will eventually back fill into the 2 MB blocks that are reserved.

A few things to note about this write-cache type:

  • The write-cache VHDX file will grow larger than the “.vdiskcache” file format.  This is due to the VHDX format using 2 MB blocks vs. 4 KB blocks.  Over time, the size of the VHDX file will normalize and become closer in size to what the “.vdiskcache” would be, as data will eventually back fill into the 2 MB blocks that are reserved.  The point at which the size normalizes varies by environment depending on the workload.
  • Intermediate buffering is not supported with this write-cache type (this cache type is actually designed to replace it).
  • System cache and vDisk RAM cache work in conjunction.  What I mean by this is that if there is block data that is moved from the PVS RAM cache into the disk overflow file, but it is still available in the Windows System Cache, it will be re-read from memory rather than disk.
  • This write-cache type is only available for Windows 7/2008 R2 and later.
  • This cache type addresses interoperability issues with Microsoft ASLR and Provisioning Services write-cache where we have seen application and printer instability that result in undesirable behavior.  Therefore, this cache type will provide the best stability.
  • A PVS 7.1 hotfix is required for this write-cache type to work properly: 32-bit and 64-bit.

New PVS RAM Cache Results

Now, a review of the newest cache type wouldn’t be complete if we didn’t share some results of some of our testing.  I will summarize some of the impressive new results we are seeing and in Part 2 of the series, Dan Allen will dive much deeper into the results and provide sizing considerations.

Test Environment 1

Physical Hardware

Server CPU: 2 x 8 core CPU Intel 2.20 GHz
Server RAM: 256 GB
Hypervisor: vSphere 5.5

Storage: EMC VNX 7500.  Flash in tier 1 and 15K SAS RAID 1 in tier 2. (Most of our IOPS stayed in tier 1)

XenApp Virtual Machine

vServer CPU: 4 vCPU
vServer RAM: 30 GB
vServer OS: Windows 2012
vServer Disk: 30 GB Disk (E: disk for PVS write cache on tier 1 storage)

We ran 5 tests using IOMETER against the XenApp VM so that we could compare the various write cache types.  The 5 tests are detailed below:

  1. E: Drive Test:  This IOMETER test used an 8 GB file configured to write directly on write-cache disk (E:) bypassing PVS.  This test would allow us to know the true underlying IOPS provided by the SAN.
  2. New PVS RAM Cache with disk Overflow:  We configured the new RAM cache to use up to 10 GB RAM and ran the IOMETER test with an 8 GB file so that all I/O would remain in the RAM.
  3. New PVS RAM Cache with disk Overflow: We configured the new RAM cache to use up to 10 GB RAM and ran the IOMETER test with a 15 GB file so that at least 5 GB of I/O would overflow to disk.
  4. Old PVS Cache in Device RAM: We used the old PVS Cache in RAM feature and configured it for 10 GB RAM.  We ran the IOMETER test with an 8 GB file so that the RAM cache would not run out, which would make the VM crash!
  5. PVS Cache on Device Hard Disk:  We configured PVS to cache on device hard disk and ran IOMETER test with 8 GB file.

With the exception of the size of the IOMETER test file as detailed above, all of the IOMETER tests were run with the following parameters:

  • 4 workers configured
  • Depth Queue set to 16 for each worker
  • 4 KB block size
  • 80% Writes / 20% Reads
  • 90% Random IO / 10% Sequential IO
  • 30 minute test duration
Test # IOPS Read IOPS Write IOPS MBps Read MBps Write MBps Avg Response Time (ms)

1

18,412 3,679 14733 71.92 14.37 57.55 0.89

2

71,299 14,258 57,041 278.51 55.69 222.81 0.86

3*

68,938 13,789 55,149 269.28 53.86 215.42 0.92

4

14,498 2,899 11,599 56.63 11.32 45.30 1.10

5

8,364 1,672 6,692 32.67 6.53 26.14 1.91

* In test scenario 3, when the write-cache first started to overflow to disk the IOPS count dropped to 31,405 and the average response time was slightly over 2 ms for a brief period of time.  As the test progressed, the IOPS count gradually increased back up and the response time decreased. This was due to the PVS driver performing the initial flush of large amounts of data to the disk to make enough room in RAM so that most of the data could remain in RAM.  Even during this initial overflow to disk, the total IOPS was still nearly twice as fast as what the underlying disk could physically provide!

As you can see from the numbers above, we are getting some amazing results from our new RAM Cache feature.  In our test, the tier one storage was able to provide us a raw IOPS capability of a little over 18K IOPS, which is pretty darn good!  However, when using our new RAM Cache with overflow feature, we are able to get nearly 70K+ IOPS when staying in RAM and were able to maintain nearly 69K IOPS even when we had a 15 GB workload and only 10GB RAM buffer.  There are also a few other very interesting things we learned from this test:

  • The old PVS Cache in RAM feature could not push above 14K IOPS.  This is most likely due to the old driver architecture used by this feature.  The new Cache in RAM with disk overflow is actually more than 4 times faster than the old RAM cache!
  • The PVS “Cache on device hard disk” option, which uses the old .vdiskcache type could only drive about 50% of the IOPS that the actual flash SAN storage could provide.  Again, this is due limitations in the older driver architecture

It is quite obvious that the new Cache in Device RAM with Hard Disk Overflow option is definitely the best option from a performance perspective, and we encourage everyone to take advantage of it.  However, it is critical that proper testing be done in a test system/environment in order to understand the storage requirements for the write cache with your particular configuration.  Chances are that you will need more disk space, but exactly how much will depend on your particular workload and how large your RAM buffer is (the larger the RAM buffer, the less disk space you will need), so make sure you test it thoroughly before making the switch.

Check out Part 2 of this blog series for more information on various test configurations and sizing considerations.

Another thing you might want to consider is whether your write-cache storage should be thick or thin provisioned.  For information on this topic please refer to my colleague Nick Rintalan’s recent post Clearing the Air (Part 2) – Thick or Thin Provisioned Write Cache.

Finally, I would like to thank my colleagues Chris Straight, Dan Allen, and Nick Rintalan for their input and participation during the gathering of this data.

Happy provisioning!

Migs

Citrix Blog: Turbo Charging your IOPS with the new PVS Cache in RAM with Disk Overflow Feature! – Part Two

My colleague Miguel Contreras and I have done quite a bit of testing with the new PVS Cache in RAM with Hard Disk Overflow feature with some amazing results!  If you haven’t already read it, I would recommend that you check out Part One of this series.

In part two of this series I will walk you through some more detailed analysis and results from testing in our labs as well as testing in real world customer environments. I will also provide you some recommendations on how much memory you should plan to allocate in order to take advantage of this new feature.

The exciting news is that we can reduce the IOPS for both XenApp RDS workloads and Windows VDI workloads to less than 1 IOPS per VM, regardless of the number of users!  When properly leveraged, this feature will eliminate the need to buy fast & expensive SSD storage for XenApp and VDI workloads.  Storage and IOPS requirements have been a major pain point in delivering VDI on a large scale and now with PVS and this new feature, we eliminate IOPS as an issue without buying any SSD storage!

IOPS Overview

Before I jump into the tests and numbers, I think it is important to give a quick overview of IOPS (I/O Operations per Second) and how I present some of the numbers.  Whenever I try to explain IOPS, I always tell people that IOPS calculations are a bit of a voodoo art and calculated using very “fuzzy” math.

For example, a VM might boot in 2 minutes and consume an average of 200 IOPS during boot when placed on a SATA disk.  That same VM when placed on an SSD might consume 1,600 IOPS and boot in 15 seconds.  So how many IOPS so I need to boot the VM?  The reality is that I need about 24,000 TOTAL IO Operations, but the number per second will vary greatly depending upon what the acceptable boot time is for the VM.

Using the previous example, if a 4 minute boot time is acceptable and the VM needs 24,000 I/O to boot, then the VM requires access to 100 IOPS during the boot phase.  It is important to run more than one VM simultaneously and to use a tool such as LoginVSI to determine at what point a VM is pushed to the point of no longer providing an acceptable user experience in order to determine the required minimum number of IOPS.  This is why I used LoginVSI to run multiple concurrent sessions and why I also provide the total IO operations used during heavy phases such as boot and logon in addition to providing the IOPS.

If you want some more gory details about IOPS then I recommend you check out the following BriForum 2013 presentation delivered by myself and Nick Rintalan; find it on YouTube here.

Additionally, I would recommend you also check out Jim Moyle’s IOPS white paper he wrote several years ago.  It does a great job of explaining IOPS; get the PDF here.

Since XenApp with RDS workloads and Windows VDI workloads have completely different usage profiles, I have separate results and recommendations for each.  I will start with Windows VDI below.

Windows 7 and XenDesktop VDI

In testing the new PVS feature with VDI workloads, we ran the tests with three different scenarios. For all scenarios we used LoginVSI with a Medium workload as the baseline test.  For the first test we used Machine Creation Services (MCS) as the provisioning technology.  MCS does not offer any enhanced caching capabilities, so this test would give us the baseline number of IOPS that a typical Window 7 desktop would consume.

Here are some more details on this baseline test…

Windows 7 VDI Baseline with MCS on Hyper-V 2012 R2

  • Single Hyper-V host with hyper-threaded Quad Core CPU and 32 GB RAM
  • A single dedicated 7200 RPM SATA 3 disk with 64 MB cache was used for hosting the Windows 7 VMs
  • Windows 7 x64 VMs: 2 vCPU with 2.5 GB RAM
  • UPM and Folder Redirection were properly configured and fully optimized such that the profile was less than 10 MB in size.  Refer to this blog post or watch the BriForum 2013 presentation delivered by Nick Rintalan and yours truly on You Tube

Below are the Boot IOPS numbers for the MCS test.

 

# of VMs Boot Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
1 VM 2 minutes 24,921 per VM 213 184 29 86% / 14%
5 VMs 6 minutes 25,272 per VM 70 60 10 86% / 14%

 

As you can see from the above table, whether booting 1 VM or 5 VMs, approximately the same number of total I/O operations is consumed for each VM.  The IOPS per VM are less when booting multiple VMs because the VMs are sharing the same disk, and the amount of time required to boot each VM also increases proportionally.  For this baseline test I used 5 VMs because that is the approximate number of traditional Windows 7 VMs you can run on a single SATA 3 disk and still get acceptable performance.  Also, my definition of “boot” is not simply how long it takes to get to the Control-Alt-Delete logon screen, but how long it takes for most services to fully startup and for the VM to successfully register with the Citrix Desktop Delivery Controller in addition to displaying the logon screen.

The next baseline MCS test I ran was to determine the IOPS consumed during the logon and initial application start-up phase. This phase includes the logon and initial launch of several applications.

 

# of VMs Logon Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
1 VM 25 seconds 4,390 per VM 175 103 72 59% / 41%
5 VMs 90 seconds 4,249 per VM 48 23 25 48% / 52%

 

Just like the logon phase, the total I/O operations generated are pretty much the same whether logging on 1 user or 5 users.  The overall logon duration was a little longer because the 5 session launches were spread out over 60 seconds.  Each individual logon session averaged 30 seconds to complete the logon and launch the first applications.

The final baseline MCS test run was to determine the steady state IOPS generated during the LoginVSI Medium workload.

 

# of VMs Session Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
1 VM 45 minutes 22,713 per VM 8.5 3 5.5 35% / 65%
5 VMs 45 minutes 20,009 per VM 7.5 2 5.5 27% / 73%

 

The steady state IOPS number is the one that is typically of most interest and what is used in most sizing equations when dealing with a large number of VMs.  It is not that we do not care about IOPS during boot or logon phases; however, these phases typically represent only a very small percentage of the overall load generated throughout the day during production time.  If properly designed, well over 80% of all VDI boot phases should occur during non-production time such as overnight or very early in the morning.  Additionally, as long as we have properly designed and implemented our profile strategy, we can keep the impact of logon IOPS to both a short period of time and manageable amount as well.

Now that we have the baseline number of IOPS that a standard Windows 7 desktop will consume, let’s see how PVS 7.1 with the new caching feature performs.

Windows 7 – PVS 7.1 RAM Cache with 256 MB on Hyper-V 2012 R2

This test was configured just like the MCS baseline test and run on the same hardware.

  • Single Hyper-V host with hyper-threaded Quad Core CPU and 32 GB RAM
  • A single dedicated 7200 RPM SATA 3 disk with 64 MB cache was used for hosting the write cache disk for the Windows 7 VMs
  • Windows 7 x64 VMs: 2 vCPU with 2.5 GB RAM
  • PVS 7.1 Standard Image with RAM Cache set at 256 MB (PVS on separate host)
  • Windows Event Logs were redirected directly to the write cache disk so that they persist and their I/O would not be cached in RAM
  • The profile was fully optimized with UPM and Folder Redirection (profile share on separate host)

For this test I went straight to booting 11 VMs, which was the maximum number of VMs I could run on my host with 32 GB RAM.

 

# of VMs Boot Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
11 VMs 3.5 minutes 536 per VM 2.5 1.4 1.1 56% / 44%

 

These results are amazing!  With only 256 MB of RAM used per VM for caching, we are able to reduce the dreaded boot storm IOPS to only 2.5 sustained IOPS!  We have always known that PVS essentially eliminates the read IOPS due to the PVS server caching all the read blocks in RAM, but now we also have the PVS target driver eliminating most of the write IOPS as well!

Now let’s see what happens during logon phase.

 

# of VMs Logon Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
11 VMs 3 minutes 101 per VM .56 .05 .51 9% / 91%

 

I launched the 11 session at 15 second intervals.  It took only 15 seconds for each session to fully logon and launch the initial set of applications.  As a result the total duration for which I tracked the logon IOPS for the 11 VMs lasted 180 seconds.  During this time, we generated less than one IOPS per VM!

Now let’ see what happened during the steady state.

 

# of VMs Session Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
11 VMs 45 minutes 290 per VM .1 .001 .1 1% / 99%

 

Yes, you are reading it correctly.  We generated one tenth of one IOP per second per VM.  Total IOPS generated by all 11 VMs was only 1.1!  The read IOPS were so low that they could actually be considered zero.  With 11 VMs actively running for 45 consecutive minutes, we generated a total of 1 read I/O per minute.

I know that some of you are probably thinking that a fully optimized profile solution where the profile is only 10MB in size and everything is redirected might be hard to implement. Sometimes customers get stuck keeping the AppData folder in the profile instead of redirecting it which will significantly increase logon load and could also overrun the RAM cache. For this reason, I reran the tests with a bloated and non-redirected AppData folder to see how it would impact the results.  I stopped redirecting the AppData folder and bloated the AppData folder in the UPM share for each user to be over 260 MB in size containing 6,390 files and 212 subfolders.  Additionally, I disabled UPM profile streaming so that the logon would have to fully wait for the profile to entirely download and that 100% of it would download at logon.  Since the RAM cache is limited to 256 MB and the users’ profiles are now over 270MB in size, this would guarantee that we would overrun the RAM cache before the logon completed and the LoginVSI tests began.

Since my VMs were running on Hyper-V with the Legacy Network Adapter (100 Mb limit) the logons definitely increased in duration with the bloated profile.  It took approximately 3:30 seconds per user to complete the logon process and launch the first applications.  For this reason, I staggered the sessions to logon at a rate of 1 per minute.

Here are the logon and steady state results for this test with the bloated profile.

 

# of VMs Logon Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
11 VMs 16 minutes 4,390 per VM 4.56 .22 4.34 5% / 95%

 

Even with a bloated profile we are less than 5 IOPS per user during the logon phase.  Below are the steady state results.

 

# of VMs Session Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
11 VMs 45 minutes 2,301 per VM .85 .02 .83 2% / 98%

 

The steady state numbers with the bloated profile were also higher than the optimized profile; however we still maintained less than 1 IOPS per VM!

Now let’s see what happens when we run a similar test using a much larger server with VMware as opposed to my meager lab server running Hyper-V.

Windows 7 – PVS 7.1 RAM Cache with 512 MB on VMware vSphere 5.5

Here are some details on the server and environment.

  • 48 Core AMD Server with 512 GB RAM
  • NFS LUN connected to SAN
  • vSphere 5.5
  • 150 Windows 7 x64 VMs on host: 2 vCPU with 3 GB RAM
  • McAfee Anti-virus running within each Windows 7 guest VM
  • Write Cache disk for each VM on NFS LUN
  • PVS 7.1 Standard Image with RAM Cache set at 512 MB (PVS on separate host)
  • Profiles fully optimized with UPM and Folder Redirection

I simultaneously initiated a boot of all 150 Windows 7 VMs on the host.  For those of you that have tried booting a lot of VMs simultaneously on a single host, you know that this typically crushes the host is not something we would normally do. However, I was feeling brave so I went for it!  Amazingly, all 150 VMs were able to be fully booted and registered with the XenDesktop Delivery Controller in just under 8 minutes!  It took 8 minutes to boot due to the CPUs on the host being pegged.

Here are the IOPS results during the 150 VM boot phase.

 

# of VMs Boot Duration Total IO Operations IOPS per VM Read IOPS per VM Write IOPS per VM Read/Write Ratio
150 VMs 8 minutes 655 per VM 1.36 .5 .86 37% / 63%

 

For the Logon test, I configured LoginVSI to launch a session every 12 seconds.  At that rate it took just over 30 minutes to logon all of the sessions. Here are the logon results.

 

# of VMs Logon Duration Total IO Operations IOPS per VM Read IOPS Write IOPS Read/Write Ratio
150 VMs 32 minutes 1,144 per VM .59 .01 .58 2% / 98%

 

Now let’s look at the steady state.

 

# of VMs Session Duration Total IO Operations IOPS per VM Read IOPS Write IOPS Read/Write Ratio
150 VMs 30 minutes 972 per VM .535 .003 .532 1% / 99%

 

As you can see from both the logon and steady state phases above, we are able to keep the total IOPS per Windows 7 VM to less than one!

The results of this the new PVS 7.1 RAM Cache with Hard Disk Overflow feature is simply amazing.  Even with a RAM Cache amount of only 256 MB, we are able to significantly reduce IOPS even in situations where users have large bloated profiles.

So, what kind of benefits does this new feature provide for XenApp workloads?  I ran several tests on both Hyper-V and vSphere as well, so let’s see how it worked.

XenApp 6.5 on Windows 2008 R2

For XenApp I ran tests very similar to the ones I ran for VDI.  I based all the tests on the LoginVSI Medium workload and I used the same user accounts with the same profile settings.  For the XenApp tests I only tested the fully optimized profile scenario.  We typically have fewer VMs with larger memory configurations on our Hypervisor hosts with XenApp, so I tested the PVS RAM cache feature with 1 GB, 3GB, and 12 GB of RAM allocated for caching.  The results are below.

XenApp 6.5 2008 R2 hosted on Hyper-V 2012 R2

I used the same Hyper-V host in my personal lab for testing XenApp that I used for the VDI tests.  My host was configured as follows.

  • Hyper threaded Quad Core CPU with 32 GB RAM
  • Hyper-V 2012 R2
  • A single dedicated 7200 RPM SATA 3 disk with 64 MB cache was used for hosting the write cache disk for the XenApp VMs.
  • 2 Windows 2008 R2 XenApp 6.5 VMs configured as:
    • 4 vCPU with 14 GB RAM
    • 60 launched LoginVSI Medium sessions (30 per VM)

Test#1 PVS XenApp Target with 1 GB RAM Cache

For this test I configured LoginVSI to launch 1 session every 30 seconds.  The logon duration lasted 30 minutes.

 

# of Users Logon Duration Total IO Operations IOPS Read IOPS Write IOPS Read/Write Ratio
60 Users 30 minutes 19,687 10.9 .1 10.8 1% / 99%

 

The average IOPS during the logon phase was less than 11 IOPS total for 60 users! That was a little over 5 IOPS per XenApp VM during the peak logon period.

Here are the steady state values.

 

# of Users Session Duration Total IO Operations IOPS Read IOPS Write IOPS Read/Write Ratio
60 Users 45 minutes 16,411 6 3.7 2.3 61% / 39%

 

Our average IOPS during the 45 minute steady state was 6, which means we averaged 3 IOPS per XenApp VM which means only .1 IOPS per user.

These are very impressive results for only using 1 GB of Cache on a VM that has 14 GB of RAM.  At the peak point in the test where each XenApp VM had 30 active LoginVSI Medium sessions running, the VM had only committed a little over 10 GB RAM.  So I decided to increase the amount of RAM Cache to 3 GB and rerun the test.

Test#2 PVS XenApp Target with 3 GB RAM Cache

I ran the 3 GB RAM test with the same settings as the previous test with the exception of having LoginVSI launch a session every 20 seconds instead of every 30 seconds.

 

# of Users Logon Duration Total IO Operations IOPS Read IOPS Write IOPS Read/Write Ratio
60 Users 20 minutes 1,673 1.4 .1 1.3 7% / 93%

 

Here are the steady state results.

 

# of Users Session Duration Total IO Operations IOPS Read IOPS Write IOPS Read/Write Ratio
60 Users 45 minutes 7,947 2.95 .05 2.9 2% / 98%

 

The IOPS results with only 1 GB of Cache per XenApp VM were quite impressive; however, increasing the cache to 3 GB we were able to reduce the IOPS even more.  We were able to get the average IOPS for an entire XenApp VM hosting 30 users to less than 2 total IOPS.

Now that we see the results from the small Hyper-V server in my personal lab, what happens when we run a larger XenApp workload on a real production quality server?

XenApp 6.5 2008 R2 hosted on VMware vSphere 5.5

  • 48 Core AMD Server with 512 GB RAM
  • NFS LUN connected to SAN
  • vSphere 5.5
  • 10 Windows 2008 R2 XenApp 6.5 VMs on host configured as:
    • 6 vCPU with 48 GB RAM
    • Write Cache disk for each VM on NFS LUN
    • PVS 7.1 Standard Image with RAM Cache set at 12 GB (PVS on separate host)
    • Profiles fully optimized with UPM and Folder Redirection

For this test I had 10 XenApp VMs on the host and each VM had 48 GB RAM with the PVS RAM Cache configured to allow up to 12 GB for caching.  I launched 300 LoginVSI Medium workload sessions so that each VM hosted 30 sessions.  I configured LoginVSI to launch 1 user every 8 seconds so the total login time took 40 minutes.

Here are the results for the logon phase.

 

# of Users Logon Duration Total IO Operations IOPS Read IOPS Write IOPS Read/Write Ratio
300 Users 43 minutes 18,603 7.2 .1 7.1 1% / 99%

 

Here are the results for the steady state.

 

# of Users Session Duration Total IO Operations IOPS Read IOPS Write IOPS Read/Write Ratio
300 Users 45 minutes 16,945 6.27 .02 6.25 1% / 99%

 

As you can see from the numbers above, the logon and steady state phases are identical with approximately 7 IOPS being the total amount generated for 300 users across 10 XenApp VMs.  That is less than 1 IOP per XenApp VM.  It is obvious from these results that 100% of the I/O remained in cache as the PVS RAM Cache never overflowed to disk.

Summary and Recommendations

As you can see from the results, the new PVS RAM Cache with Hard Disk Overflow feature is a major game changer when it comes to delivering extreme performance while eliminating the need to buy expensive SAN I/O for both XenApp and Pooled VDI Desktops delivered with XenDesktop.  One of the reasons this feature gives such a performance boost even with modest amounts of RAM is due to how it changes the profile for how I/O is written to disk.  A XenApp or VDI workload traditionally sends mostly 4K Random write I/O to the disk. This is the hardest I/O for a disk to service and is why VDI has been such a burden on the SAN.   With this new cache feature, all I/O is first written to memory which is a major performance boost.  When the cache memory is full and overflows to disk, it will flush to a VHDX file on the disk.  We flush the data using 2MB page sizes.  VHDX with 2MB page sizes give us a huge I/O benefit because instead of 4K random writes, we are now asking the disk to do 2MB sequential writes.  This is significantly more efficient and will allow data to be flushed to disk with fewer IOPS.

With the exception of the 150 Windows 7 VM test above, there was no virus protection or other security software running on my VMs.  Additionally, there was no 3rd party monitoring software.  3rd party monitoring, virus protection and other software packages are sometimes configured to write directly to the D: drive or whatever letter is assigned to the PVS write cache disk.  This is so that the I/O is not lost between reboots.  If you configure such software in your environment, you need to calculate the additional IOPS that are generated and factor it into your planning.

Here are the key takeaways.

  • This solution can reduce your IOPS required per user to less than 1!!!
  • You no longer need to purchase or even consider purchasing expense flash or SSD storage for VDI anymore. This is a HUGE cost savings!  VDI can now safely run on cheap tier 3 SATA storage!
  • Every customer using PVS should upgrade and start using this new feature ASAP no matter how much RAM you have. This feature is actually required to fix issues with ASLR.
  • This feature uses non-paged pool memory and will only use it if it is free.  If your OS runs low on RAM, it will simply not allocate any more RAM for caching.  Also, a great new feature with this cache type is that it gives back memory and disk space as files/blocks in cache are deleted. In previous versions of PVS once a block was committed to RAM or Disk, it was never freed.  This is no longer the case!  The risk associated with running out of RAM if you size properly is very low.
  • For VDI workloads even a small amount of RAM can make a HUGE difference in performance.  I would recommend configuring at least 256 MB of cache for VDI workloads.
    • For Windows 7 32-bit VMs you should allocate 256 MB RAM for caching.
    • For Windows 7 x64 VMs with 3 – 4 GB RAM, I would recommend allocating 512 MB RAM for caching.
    • For Windows 7 x64 VMs with more than 4 GB, feel free to allocate more than 512 MB.
    • For XenApp workloads I would recommend allocating at least 2 GB for caching.  In most configurations today, you probably have much more RAM available on XenApp workloads, so for maximum performance I would allocate even more than 2 GB RAM if possible.  For example, if you have 16 GB RAM in your XenApp VM, you should safely be able to allocate at least 4 GB for the PVS RAM Cache. Of course, you should always test first!

I wish you luck as you implement this amazing new PVS feature and significantly reduce your IOPS while boosting the user experience!