Size Matters: PVS RAM Cache Overflow Sizing,

The Citrix Blog

Communities  Log In
Products Team Blogs Authors For Partners
Size Matters: PVS RAM Cache Overflow Sizing,
By Amit Ben-Chanoch · Published January 19, 2015 · 28 Comments
Tags:  CD-XenAppXenDesktop · CitrixDeveloper · provisioning_services · Ram Cache with Overflow · Sizing · storage · testing
Products:  XenApp · XenDesktop
Teams:  Consulting
Share on email
With the introduction of RAM cache with overflow to hard disk (introduced in PVS 7.1, hotfix required), IOPS can be almost eliminated and has therefore become the new leading practice. However, this feature can have a significant impact the write cache sizing. This stems from the difference in how the write cache storage is allocated with this new option. To understand this feature and how to properly design your solution with it, Citrix Worldwide Consulting organized a series of tests, the results of which are shared in this post. I would also like to recognize my co-authors Martin Zugec, Jeff PinterParsons, and Brittany Kemp as this wasn’t a one-man job 🙂 .
Before we begin let me just note the scope of the article. We will not be covering any IOPS information as that has already very well covered in previous blogs. If you have not already, we recommend you read up on the blogs by Miguel Contreras and Dan Allen.

Turbo Charging your IOPS with the new PVS Cache in RAM with Disk Overflow Feature! – Part One
Turbo Charging your IOPS with the new PVS Cache in RAM with Disk Overflow Feature! – Part Two.

What is covered however is why sizing the write cache may change, what to do about it, and a deep dive as to how it works. This was all tested in the lab and we would like to thank the Citrix Solutions Lab team for providing the hardware and support that helped make this possible. The test environment and methodology is summarized at the end of the blog.
Lessons Learned
Out of all this testing there were some previous recommendations that were reinforced and new ones based on the additional findings. Based on the results of the testing, Citrix Worldwide Consulting propose the following recommendations for the new RAM cache with overflow to hard disk option.
If transitioning an environment from Cache on HDD to Cache in RAM with overflow to disk and the RAM buffer is not increased from the 64MB default setting, allocate twice as much space to the write cache as a rule of thumb. Remember this storage does not need to be on a SAN, but can be cheaper local storage.
The size of the RAM buffer greatly impacts the size of the write cache as any data that can be stored in the RAM buffer will not consume any space on the write cache. Therefore it is recommended to allocate more than the default 64MB RAM buffer. Not only will an increased RAM buffer reduce the IOPS requirements (see blogs mentioned above), but the write cache size requirements will be reduced as well. A larger RAM buffer may alleviate the larger write cache requirement for environments that do not have storage capacity. With enough RAM you can even eliminate the need for ever writing to the storage. For desktop operating systems start with 256-512MB and for server operating systems start with 2-4GB.
Defragment the vDisk before deploying the image and after major changes. Defragmenting the vDisk resulted in write cache savings of up to 30% or more during testing. This will impact any of you who use versioning as defragmenting a versioned vDisk is not recommended. Defragmenting a versioned vDisk will create excessively large versioned disks (.AVHD files). Run defragmentation after merging the vDisk versions.Note: Defragment the vDisk by mounting the .VHD on the PVS server and running a manual defragmentation on it. This allows for a more robust defragmentation as the OS is not loaded. An additional 15% reduction in the write cache size was seen with this approach over standard defragmentation.
The rest of this post is a deep dive as to how the RAM Cache Overflow feature works and test results all of which back up the recommendations above. Also detailed out is the test methodology and the environment used for the testing which can help if you plan to perform additional testing in your environment.
RAM Cache Write Cache Sizing vs Cache in HDD
As mentioned above, the write cache with the new RAM cache overflow option can grow much larger and much faster than the previously recommended cache on device HDD option if not designed properly. Where the old option writes the data to the write cache in 4KB clusters, the new option now reserves 2MB blocks on the write cache which are composed of 4KB clusters. 
To illustrate the difference, reference the visual below. Note that although in the write cache a 2MB block is composed of 512 clusters (4KB * 512 = 2MB), for the sake of simplicity let’s assume each block consists of 8 clusters.

When the target device boots up, the OS is not aware of the write cache and writes to the logical disk that it is presented (the vDisk). By default, the Windows OS sets the file system to use 4KB clusters and therefore writes data into 4KB clusters.
The PVS driver then redirects this data at the block level to the write cache. Assuming we start with an empty area of the vDisk and we write multiple files which equal to just over 2MB of data. The OS writes the files onto consecutive clusters on the vDisk which are redirected to the write cache and completely utilize a 2MB block, but also reserve an additional 2MB block which is underutilized.
Since the OS sees these sectors as empty, the blocks can be written to and then further utilized if additional data is written by the target device.

To test this behavior we ran a series of tests on both the RAM cache with overflow option as well as the cache on device HDD for comparison.
Below is a comparison of an office worker test run on a Windows 7 target device. We can see from the data the following:
The RAM Cache option uses significantly more space on the write cache drive than the cache on device HDD. The reason for this is explained in the next section.
The RAM Cache option is heavily impacted by the size of the RAM buffer. This is expected since much of the data does not get flushed to the disk as it remains in memory.

Running a similar test on Server 2012 R2 with 40 users, the same results are seen. Again the RAM Cache Overflow option with the default RAM buffer requires considerably more space than the Cache in HDD and is again greatly reduced by increasing to the RAM buffer.

To further investigate the effects, we decided to run specific tests to characterize the behavior of the write cache. We created 1,000 files on the target device each 1MB in size. This should translate into 1000MB of data being written to the write cache. As can be seen below, this was the case for the cache in HDD option, but using the RAM Cache Overflow option the result was doubled. We see a slight savings with an increased RAM buffer, but the savings is simply the additional files that have been cached in memory. Keep reading if you would like to know why we see this behavior and how to minimize the effect.

Effects of Fragmentation on Write Cache Size
The culprit above in excessive write cache size is fragmentation. This is caused because PVS redirection is block based and not file based. The redirection to the write cache is based on the logical block level structure of the vDisk. To illustrate this, consider the scenario below.
Note again that for illustrative purposes we are using 8 clusters in a block while on the disk each block is composed of 512 clusters.

Looking at a section of the vDisk that has existing data on it, consider what happens when data is fragmented across multiple clusters. The OS will write data to the free clusters that are available to it. As it does this, the PVS driver will redirect this data to the write cache and create 2MB blocks to store it. However, the clusters in the 2MB blocks on the write cache correspond to the clusters that the OS sees and therefore only a few clusters within each block are used and the remainder is unutilized.
As is shown below, six 1KB (6KB) files can actually result in a 6MB increase to the write cache! Consider segments such as this all over the vDisk and the result is a far larger write cache drive.

The results shown in the previous section are of vDisks that were not defragmented prior to testing. Once the vDisk is defragmented, the areas of the disk should be arranged more like the visual below. In this case new data that is written will utilize the space much more efficiently resulting in a smaller write cache size.

Let’s look at the tests again, this time after defragmenting the vDisk. We can immediately see that although defragmentation does affect the cache on HDD size, it has a much greater impact when using the RAM cache with overflow option. Now although the write cache is reduced, it is not completely minimized as certain files may not be defragmented. The test on Windows 7 was still significantly higher when using the default RAM buffer, but is actually lower when using a 256MB RAM buffer.
In the case of Server 2012 R2 which had a high 16% fragmentation, the write cache size drops by over 30% after defragmentation. Once a larger RAM buffer is added the write cache drops significantly lower than the cache in HDD option.

To illustrate how the disk may not be completely defragmented, refer to the following test run. In this test we created 25,000 x 1KB files, a little extreme but done to help prove a point. As can be seen below, the first several thousand files quickly grow the write cache as they are written by the OS to partially fragmented portions of the disk. Once those portions of the disk are occupied, the remaining files are written consecutively and the write cache grows at a much slower pace.

Final Caveat
The final scenario to consider which can impact the size of the write cache with the RAM Cache Overflow option is when pre-existing data on the vDisk is modified. In this case a file which was already on the vDisk such as a system file. Should this file be modified the change will be redirected to the write cache and those clusters modified. This again can result in underutilized space as 2MB blocks are created on the write cache for changes that are far smaller as can be seen below.
In this case defragmentation cannot mitigate the issue and the potential for a larger write cache still exists.

Test Methodology
The test methodology included two rounds of testing to compare the write cache size. The first round used an automated office worker workload on both Windows 7 and Server 2012R2. Each test was performed 3 times for repeatability as follows:
The vDisk write cache option was set for each scenario as described under test configurations below.
A single target machine was booted up running PVS target software 7.1.3.
A startup script was launched to log the size of the write cache to a file on a file share
vdiskdif.vhdx was monitored for RAM cache with overflow option
.vdiskcache was monitored for cache on device HDD option
A typical office worker workload was simulated.
A single user was simulated on Windows 7
40 users were simulated on Server 2012 R2
The second round of testing was performed to demonstrate the effect of small files on the write cache growth and was performed only on Windows 7. This was done by generating files of different sizes and comparing the write cache size. The file creation scenarios are outlined under test configuration below. Each test was performed 3 times for repeatability as follows:
The vDisk write cache option was set for each scenario as described under test configurations below.
A single target machine was booted up running PVS target software 7.1.3.
A startup script was launched to log the size of the write cache to a file on a file share
vdiskdif.vhdx was monitored for RAM cache with overflow option
.vdiskcache was monitored for cache on device HDD option
A second startup script was launched which generated data by creating an equal number of files of a specific size. There was no user login on the machine during the duration of the test.
Test Environment
The environment used for testing consisted of HP Gen7 (Infrastructure) and Gen8 (Targets) physical blade servers with excess capacity for the purposes of the tests. The components of interest are summarized below as well as a visual summary of the test environment. The entire environment was virtualized on Hyper-V 2012 R2. All PVS components were run using PVS Software 7.1.3.1 (Hotfix CPVS71003).
2 x Provisioning Services Servers on Windows 2012 R2
4 vCPU and 16GB vRAM
1 XenApp 7.5 Target Device on Windows 2012 R2
8 vCPU and 44GB of vRAM
50GB Write Cache Drive on EMC SAN
1 XenDesktop 7.5 Target Device on Windows 7 SP1 (64-bit)
2 vCPU and 3GB of vRAM
15GB Write Cache Drive on EMC SAN
Optimized as per CTX127050
The environment used a 10Gbps core switch with 3Gbps allocated to PVS streaming

Click on environment diagram for larger view.
Test Configurations
There were three write cache configurations that were tested for both XenApp 7.5 and XenDesktop 7.5 to provide baseline and comparison write cache size data.
1.    Cache on device hard disk: This provided a baseline disk size for the well know legacy write cache option. It was used for comparison against caching on the target device RAM with overflow to hard disk.
2.    Cache on device RAM with overflow to hard disk (default 64MB): This test was performed to evaluate the growth of the write cache using the default RAM buffer.
3.    Cache on device RAM with overflow to hard disk (optimized): This test was performed to evaluate the write cache disk growth when using a larger RAM buffer as is the current recommendation. RAM buffers of 256MB for Desktop OS and 2048MB for Server OS were tested. Note that with a large enough RAM buffer the write cache will never write to disk.
After additional investigation, several tests were repeated with a freshly defragmented disk to investigate the effects. The two vDisks the tests were run on were the following:
The first set of runs on Windows 7 were tested on a versioned vDisk. The vDisk had a single version which was used to apply the optimization as best practice. On server 2012 R2 the first set of runs were on a base vDisk which was not defragmented and had a 16% fragmentation.
The second set of runs used the same vDisk once it has been merged and defragmented. The fragmentation on the vDisk after merging was 1%.

Thanks for reading my post,
Amit Ben-Chanoch
Worldwide Consulting
Desktop & Apps Team
Project Accelerator
Virtual Desktop Handbook

28 Comments
Lukas Pravda
on 1 week ago · Reply
Good article, thanks for it.
Anyways, the fragmentation finding sounds most of all like a reason to implement some fast compression for RAM cache.
Martin Zugec
on 1 week ago · Reply
FancyCache strikes again? 🙂
Lukas Pravda
on 6 days ago · Reply
Not really. FC never reached relevancy for resource economy to be a question. However since your research indicates a potential saving coming from more efficient space usage, the compression of empty space is the first thought. 🙂 Cheers! 🙂
Trentent
on 1 week ago · Reply
Great article!
Rachel Berry
on 1 week ago · Reply
Great job! Exactly the kind of technical blog we need more of! Awesome work!
Sacha
on 1 week ago · Reply
Great article! Thanks! Im already a big fan of PVS with cache to ram with overflow – this article give me a deeper and better understanding about whats happen under de hood!
Tom Link
on 1 week ago · Reply
Very great thanks for the work
Andrew Wood
on 1 week ago · Reply
An interesting read – good to see a return of technical blogs.
As I understood it, even with the RAM+HDD the pagefile is *always* written to disk – it is never held RAM If that is true, how can it be that the statement ” with a large enough RAM buffer the write cache will never write to disk” – is true? Is it that the pagefile is not always written to disk?
Carl Fallis
on 1 week ago · Reply
That is true the pagefile is always handled directly by the operating system which is more efficient than going through our cache. if you properly size the RAM (I am talking about OS and application not ram cache) for your workload then the use of a pagefile to extend virtual memory requirements will be minimum. There are also a technical reason that the pagefile is not placed in our cache, you really do not want to be in the IO path when a Page Fault occurs.
Martin Zugec
on 1 week ago · Reply
Hi Andrew,
As Carl explained, pagefile will always go directly to the write cache DISK (same as properly configured event logs, antivirus definitions etc.).
However, the article is talking about write cache (no disk here 🙂 ), the naming convention can be a bit confusing here.
David Pisa
on 1 week ago · Reply
Hi Amit,
thanks a lot for sharing this great article. Are you able to measure any performance difference between fragmented and not-fragmented base image disk?
Martin Zugec
on 1 week ago · Reply
Hello David,
We havent been able to measure any scalability data (that was not the goal of our testing), however I dont believe there should be any measurable impact.
On the other hand, fragmentation of the write cache disk itself (the one attached to the target device) might actually impact the performance, but thats another story.
Martin
Paul Stansel
on 1 week ago · Reply
Excellent article and I love the level of detail. One thing I have to caution on though is this statement I keep seeing Citrix make: ” Remember this storage does not need to be on a SAN, but can be cheaper local storage.” While that is absolutely true, using local drives creates its own headaches at any type of scale. PVS tools for instance can no longer create new machines when you have an ESX cluster that spans multiple physical hosts, each with their own local storage. You also lose the ability to vmotion for HA in a cluster. So yes, you can save on disk. But your management becomes a lot more manual. Just keep the tradeoffs in mind.
Marc
on 1 week ago · Reply
Hm,, my storage guys will not apprecieta defragmenting a vdisk that is run from Netapp storage.
Maybe this caveat can be attcked by citrix in the driver part, ie to fool the os that data is written in the corresponding cluster but in reality it is optimized ?
Carl Fallis
on 1 week ago · Reply
defragment of the VDisk occurs once on a base disk (or after a merge to base disk) that is the only time it would occur. It can not be handled at the driver level.
Marc
on 1 week ago · Reply
Is there an article that compares performance of cache to ram with overflow and cache to disk with intermediate buffering ?
I would like to know of cache to ram with overflow is worth implementing, as it has its caveats as stated in this article ( defragmenting a vhd that is on a netapp metrocluster is killer for performance and effectively causes 4times the writes).
Right now i cache to local storage with intermediate buffering on.
Carl Fallis
on 1 week ago · Reply
There has not been a performance comparison, but there is a significant reduction in IOPS and the more RAM Cache dedicated to the target there should be a significant increase in performance since you are not accessing a slow device like the disk. The other major reason to use this new cache type is that is resolves an interoperability issue between the PVS legacy cache and Microsofts ASLR, which can cause random application crashes.
Martin Zugec
on 1 week ago · Reply
Hello Marc,
I would like to apologize for the confusion, but the disk defragmentation is actually needed on the base vDisk only (so its a one time operation and doesnt need to be done for all the target devices).
Martin

Dan Allen
on 6 days ago · Reply
You should have absolutely no worry about defragging the base VHD on a NetApp volume. You will not hurt the NetApp and will not increase the writes. In fact you will be decreasing the writes! You must keep in mind that we are talking about defragmenting the base VHD only after initial creation or after making major changes. This will not happen very often and it is only happening once per base disk! The amount of writes you save on each target are significant. Think about it this way, you generate a few extra NetApp writes to defrag a single 40GB VHD base disk. That happens once. You then have 5,000 Windows 7 target devices that boot every day and you save 100 MB of writes per target device because you defragged the base. That means you are now saving 500GB of disk writes to the NetApp every day! That is well worth a single defrag on a 40 GB VHD file!
Cheers,
Dan
Tobias Kreidl
on 1 week ago · Reply
Very interesting article and the details are truly appreciated. A couple of observations. First, file-based storage can have certain advantages over block allocation, especially since some operating systems like Linux do a pretty good job of minimizing fragmentation in the first place. This is, of course, much more problematic if you have limited storage and very frequent rewrites. Second, one way to mitigate this might be the basic means that disk fragmentation uses, namely to maintain pointers for wherever the smallest amount of contiguous space is available such that writes are performed there to optimize what fits best.
This still, of course, does not completely avoid the issue, as chunks will still fragment over time. To address this, two options come to mind. One would be to over-allocate by using fairly large chunks of multiple contiguous blocks (and yes, you would have to use pointers to track how they are strung together, just as with a standard file written to a disk). At least when deleted, you would end up with the same consistent size of holes and you would not lose much efficiency in refilling them (though over time, nevertheless requiring more pointers). The other option would be while storing the information in RAM and before the RAM gets too full, spend the extra time doing constant defragmenting on the disk. The danger there is of course if there is a crash, you lose your information plus trying to keep up. Yet another option that could address that particular weakness would be to use an intermediate SSD drive to hold the overflow from RAM that needs to be eventually committed to permanent storage and temporarily store the data there while the defragmentation on the permanent storage takes place. Once caught up, the permanent write could take place. A caveat is that this could lead to bottlenecks if the defragmenting operations are still too slow. There could be a ” write before it is too late” option and perhaps a way to catch up then later.
Kyrian
on 1 week ago · Reply
Great info! Thanks for sharing.
Shinz Antony
on 1 week ago · Reply
Hey Amit,
Great work! Thanks for sharing
Nageshwar
on 1 week ago · Reply
Great one…….a lot of info to know.thanks again.

Dan Allen
on 6 days ago · Reply
Fantastic Article!!!
gabriel byrne
on 3 days ago · Reply
Excellent article. I have been using the overflow with 4GB buffer on W2k12 and the results are very positive
Martyn Dews
on 3 days ago · Reply
A very interesting article, thanks for posting.
Im interested in the impact of de-fragging a vDisk in a production environment. The article states that for best results the VHD should be mounted on the PVS server and defragged from there. I assume that the vDisk has to be offline at this point. Is this correct?
This is fine for an initial vDisk base image that has not been rolled out to production yet but for an active production vDisk which has had a major change and needs to be de-fragged could be a little more difficult to perform.
We are looking at making use of RCwDO but need to understand implications for the support and maintain teams taking into account this research on de-fragging.
Am I being unduly concerned?

Amit Ben-Chanoch
on 3 days ago · Reply
Hi Martyn,
To defragment the vDisk either from the PVS server or from within the OS both require the vDisk to be out of production as we do not recommend defragging a versioned vDisk. In our testing we just saw better results from defragmenting outside of the OS.
I am assuming that the major changes you are performing on an active vDisk is done through versioning and in that case what you can do is defragment the vDisk as part of your process of merging the versions which should be done regularly. To get better storage utilization, you can perform the merging more often.
That should definitely not deter you from using the RAM Cache with Overflow option as the performance is simply too good not too.
Martyn Dews
on 2 days ago · Reply
Thanks for the feedback Amit.
Post a Comment

Citrix Summit (67)Citrix Synergy (653)daas (118)Director 7.6 (14)vdi (428)Citrix Service Provider (69)Citrix Workspace Cloud (2)WorkspacePod (1)
View All Tags
Popular Posts
Do you want Citrix XenApp to run Windows apps on the iPhone ? (864,420)
Installing Ubuntu on XenServer (367,336)
The Nirvana device a Smartphone as a PC alternative (253,202)
iPhone, iCitrix ! (247,345)
Free, as in Virtual Infrastructure (241,366)
Do You Want To Use Citrix Receiver To Run Windows 7 From The iPad ? (239,137)
What’s the coolest app that doesn’t work on the iPhone …. yet ? (193,067)

© 1999-2013 Citrix Systems, Inc.
All Rights Reserved
Terms of Use Privacy Governance

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s