ESX Deployment Appliance (EDA) from VMware Solution Exchange

Source: VMware URL

Overview

EDA is an appliance dedicated to deploying ESX servers fast and easy. It has a scriptbuilder to quickly create %post-scripts so the ESX servers are not only installed very quickly, but also completely configured for direct import into vCenter.

Highlights
  • esx deployment
  • script builder
  • deploy AND configure
Description

Comment on this appliance

New in 1.05; Small update to support more installation media for the ESXi 5.x installer. also redeployed with WS9

New in 1.02; full support for ESXi 5. includes scripts for the new version and supports older hardware where the installer doesn’t recognize any local disks!

New in v0.95- Full ESXi installable scripting – new Scriptbuilder editor (should be way more intuitive)- lots of new scriptlets!New in v0.9- ESX 4 support
– Stateless ESXi 4 support
– ESXi 4 installable support
– Boot from SAN support (esx4 only)
– (Initial) console configuration

New in 0.87:
Don’t forget to download the latest patch (0.87-1) for some fixes on hostname input and more! patch 0.87-1
– editing the order of the scriptparts
– bulk creation and deletion of ESX hostnames/ip
– an fs.php page that allows for small remote updates
– ESXi support fixed again

New in 0.85:
– ESXi support (it works- just not unattended yet. hints appreciated!)
– new scriptbuilder interface. makes building scripts even easier and more accessible
– scriptpart editor. rudimentary but working nicely ūüėČ

new in v0.81:
– samba passwords are changed now too
– user password configurable
– support for dhcp number when unconfigured
– persistent network names removed (finaly- now it always uses eth0)
– some work on the interface has been done (configuration pages)
– rebuilt the harddisk to cleanup the vmdk file (saves another 200mb)
Quick setup guide

new in v0.80: some added features:
– root password in ks.cfg now configurable from interface
– fqdn names of ESX hosts derived from hostname entry and DHCP domain
– fixed the active adapters to VSwitch script
– a stop button for DHCP
– initial dhcp configuration
– removed apparmor
– fixed some VM compatibility between workstation and esx
– complete rebuild on the bootstrapped 8.04 JeOS appliance!

Benefits of ESXi Host Local Flash storage

Count The Ways – Flash as Local Storage to an ESXi Host
Posted: 21 Jul 2014   By: Joel Grace

When performance trumps all other considerations, flash technology is a critical component to achieve the highest level of performance. By deploying Fusion ioMemory, a VM can achieve near-native performance results. This is known as pass-through (or direct) I/O.

The process of achieving direct I/O involves passing the PCIe device to the VM, where the guest OS sees the underlying hardware as its own physical device.¬†The ioMemory device is then formatted with ‚Äúfile system‚ÄĚ by the OS, rather than presented as a virtual machine file system (VMFS) datastore.¬†This provides the lowest latency, highest IOPS and throughput.¬†Multiple ioMemory devices can also be combined to scale to the demands of the application.

Another option is to use ioMemory as a local VMFS datatstore. This solution provides high VM performance, while maintaining its ability to utilize features like thin provisioning, snapshots, VM portability and storage vMotion. With this configuration, the ioMemory can be shared by VMs on the same ESXi host and specific virtual machine disks (VMDK) stored here for application acceleration.

Either of these options can be used for each of the following design examples.

Benefits of Direct I/O:

Raw hardware performance of flash within a VM with Direct I/OProvides the ability to use RAID across ioMemory cards to drive higher performance within the VMUse of any file system to manage the flash storage

Considerations of Direct I/O:

ESXi host may need to be rebooted and CPU VT flag enabledFusion-io VSL driver will need to be install in the guest VM to manage deviceOnce assigned to a VM the PCI device cannot be share with any other VMs

Benefits Local Datastore:

High performance of flash storage for VM VMDKsMaintain VMware functions like snapshots and storage vMotion

Considerations Local Datastore:

Not all VMDKs for a given VM have to reside on local flash use shared storage for OS and flash for application DATA VMDKsSQL/SIOS

Many enterprise applications reveal their own high availability (HA) features when deployed in bare metal environments. These elements can be used inside VMs to provide an additional layer of protection to an application, beyond that of VMware HA.

Two great SQL examples of this are Microsoft’s Database Availability Groups and SteelEye DataKeeper. Fusion-io customers leverage these technologies in bare metal environments to run all-flash databases without sacrificing high availability. The same is true for virtual environments.

By utilizing shared-nothing cluster aware application HA, VMs can still benefit from the flexibility provided by virtualization (hardware abstraction, mobility, etc.), but also take advantage of local flash storage resources for maximum performance.

Benefits:

Maximum application performanceMaximum application availabilityMaintains software defined datacenter

Operational Considerations:

100% virtualization is a main goal, but performance is critical.  Does virtualized application have additional HA features? SAN/NAS based datastore can be used for Storage vMotion if hosts needs to be taken offline for maintenance CITRIX

The Citrix XenDesktop and XenApp application suites also present interesting use cases for local flash in VMWare environments. Often times these applications are deployed in a stateless fashion via Citrix Provisioning Services, where several desktop clones or XenApp servers are booting from centralized read-only golden images. Citrix Provisioning Services stores all data changes during the users’ session in a user-defined write cache location.  When a user logs off or the XenApp server is rebooted, this data is flushed clean. The write cache location can be stored across the network on the PVS servers, or on local storage devices. By storing this data on a local Fusion-io datastore on the ESXi host, it drastically reduces access time to active user data making for a better Citrix user experience and higher VM density.

Benefits:

Maximum application performanceReduced network load between VM’s and Citrix PVS ServerAvoids slow performance when SAN under heavy IO pressureMore responsive applications for better user experience

Operational Considerations

Citrix Personal vDisks (persistent desktop data) should be directed to the PVS server storage for resiliency.PVS vDisk Images can also be stored on ioDrives in the PVS server further increasing performance while eliminating the dependence on SAN all together.ioDrive capacity determined by Citrix write cache sizing best practices, typically a 5GB .vmdk per XenDekstop instance.

70 desktops x 5GB write cache = 350GB total cache size (365GB ioDrive could be used in this case).

The Citrix XenDesktop and XenApp application suites also present interesting use cases for local flash in VMWare environments. Often times these applications are deployed in a stateless fashion via Citrix Provisioning Services, where several desktop clones or XenApp servers are booting from centralized read-only golden images. Citrix Provisioning Services stores all data changes during the users’ session in a user-defined write cache location.  When a user logs off or the XenApp server is rebooted, this data is flushed clean. The write cache location can be stored across the network on the PVS servers, or on local storage devices. By storing this data on a local Fusion-io datastore on the ESXi host, it drastically reduces access time to active user data making for a better Citrix user experience and higher VM density.

VMware users can boost their system to achieve maximum performance and acceleration using flash memory. Flash memory will maintain maximum application availability during heavy I/O pressure, and makes your applications more responsive, providing a better user experience. Flash can also reduce network load between VMs and Citrix PVS Server.Click here to learn more about how flash can boost performance in your VMware system.

Joel Grace Sales Engineer
Source: http://www.fusionio.com/blog/count-the-ways.

VMware KB (1008205): Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)

Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions) (1008205)

Details

This article provides information about esxtop and latency statistics that can be used when troubleshooting performance issues with SAN-connected storage (Fibre Channel or iSCSI).

Note: In ESXi 5.x, you may see messages indicating that performance has deteriorated. For more information, see Storage device performance deteriorated (2007236).

Solution

http://youtu.be/e-Yq1BL5rY8

The interactive esxtop utility can be used to provide I/O metrics over various devices attached to a VMware ESX host.

Configuring monitoring using esxtop

 To monitor storage performance per HBA:

  1. Start esxtop by typing esxtop at the command line.
  2. Press d to switch to disk view (HBA mode).
  3. To view the entire Device name, press SHIFT + L and enter 36 in Change the name field size.
  4. Press f to modify the fields that are displayed.
  5. Press b, c, d, e, h, and j to toggle the fields and press Enter.
  6. Press s and then 2 to alter the update time to every 2 seconds and press Enter.
  7. See Analyzing esxtop columns for a description of relevant columns.

Note: These options are available only in VMware ESX 3.5 and later.

To monitor storage performance on a per-LUN basis:

  1. Start esxtop by typing esxtop from the command line.
  2. Press u to switch to disk view (LUN mode).
  3. Press f to modify the fields that are displayed.
  4. Press b, c, f, and h to toggle the fields and press Enter.
  5. Press s and then 2 to alter the update time to every 2 seconds and press Enter.
  6. See Analyzing esxtop columns for a description of relevant columns.

To increase the width of the device field in esxtop to show the complete naa id:

  1. Start esxtop by typing esxtop at the command line.
  2. Press u to switch to the disk device display.
  3. Press L to change the name field size.

    Note: Ensure to use uppercase L.

  4. Enter the value 36 to display the complete naa identifier.

To monitor storage performance on a per-virtual machine basis:

  1. Start esxtop by typing esxtop at the command line.
  2. Type v to switch to disk view (virtual machine mode).
  3. Press f to modify the fields that are displayed.
  4. Press b, d, e, h, and j to toggle the fields and press Enter.
  5. Press s and then 2 to alter the update time to every 2 seconds and press Enter.
  6. See Analyzing esxtop columns for a description of relevant columns.

Analyzing esxtop columns

Refer to this table for relevant columns and descriptions of these values:

Column  Description
CMDS/s This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such
as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.
In most cases CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)
DAVG/cmd This is the average response time in milliseconds per command being sent to the device
KAVG/cmd This is the amount of time the command spends in the VMkernel
GAVG/cmd This is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG

These columns are for both reads and writes, whereas xAVG/rd is for reads and xAVG/wr is for writes. The combined value of these columns is the best way to monitor performance, but high read or write response time it may indicate that the read or write cache is disabled on the array. All arrays perform differently, however DAVG/cmd, KAVG/cmd, and GAVG/cmd should not exceed more than 10 milliseconds (ms) for sustained periods of time.

Note: VMware ESX 3.0.x does not include direct functionality to monitor individual LUNs or virtual machines using esxtop. Inactive LUNs lower the average for DAVG/cmd, KAVG/cmd, and GAVG/cmd. These values are also visible from the vCenter Server performance charts. For more information, see the Performance Charts section in the Basic System Administration Guide.

If you experience high latency times, investigate current performance metrics and running configuration for the switches and the SAN targets. Check for errors or logging that may suggest a delay in operations being sent to, received, and acknowledged. This includes the array’s ability to process I/O from a spindle count aspect, or the array’s ability to handle the load presented to it.

If the response time increases to over 5000 ms (or 5 seconds), VMware ESX will time out the command and abort the operation. These events are logged; abort messages and other SCSI errors can be reviewed in these logs:

  • ESX 3.5 and 4.x ‚Äď /var/log/vmkernel
  • ESXi 3.5 and 4.x ‚Äď /var/log/messages¬†
  • ESXi 5.x – /var/log/vmkernel.log

The type of storage logging you may see in these files depends on the configuration of the server. You can find the value of these options by navigating to Host > Configuration > Advanced Settings > SCSI > SCSI.Log* or SCSI.Print*.

Additional Information

You can also collect performance snapshots using vm-support. For more information, see Collecting performance snapshots using vm-support (1967).

See also Creating an alarm or email notification that triggers when Disk Latency or DAVG is high

CITRIX XENDESTOP AND PVS: A WRITE CACHE PERFORMANCE STUDY

Thursday, July 10, 2014   , , , , , , , , , , , ,   Source: Exit | the | Fast | Lane

image

If you’re unfamiliar, PVS (Citrix Provisioning Server) is a vDisk deployment mechanism available for use within a XenDesktop or XenApp environment that uses streaming for image delivery. Shared read-only vDisks are streamed to virtual or physical targets in which users can access random pooled or static desktop sessions. Random desktops are reset to a pristine state between logoffs while users requiring static desktops have their changes persisted within a Personal vDisk pinned to their own desktop VM. Any changes that occur within the duration of a user session are captured in a write cache. This is where the performance demanding write IOs occur and where PVS offers a great deal of flexibility as to where those writes can occur. Write cache destination options are defined via PVS vDisk access modes which can dramatically change the performance characteristics of your VDI deployment. While PVS does add a degree of complexity to the overall architecture, since its own infrastructure is required, it is worth considering since it can reduce the amount of physical computing horsepower required for your VDI desktop hosts. The following diagram illustrates the relationship of PVS to Machine Creation Services (MCS) in the larger architectural context of XenDesktop. Keep in mind also that PVS is frequently used to deploy XenApp servers as well.

image

PVS 7.1 supports the following write cache destination options (from Link):

  • Cache on device hard drive – Write cache can exist as a file in NTFS format, located on the target-device‚Äôs hard drive. This write cache option frees up the Provisioning Server since it does not have to process write requests and does not have the finite limitation of RAM.
  • Cache on device hard drive persisted (experimental phase only) – The same as Cache on device hard drive, except cache persists. At this time, this write cache method is an experimental feature only, and is only supported for NT6.1 or later (Windows 7 and Windows 2008 R2 and later).
  • Cache in device RAM – Write cache can exist as a temporary file in the target device‚Äôs RAM. This provides the fastest method of disk access since memory access is always faster than disk access.
  • Cache in device RAM with overflow on hard disk – When RAM is zero, the target device write cache is only written to the local disk. When RAM is not zero, the target device write cache is written to RAM first.
  • Cache on a server – Write cache can exist as a temporary file on a Provisioning Server. In this configuration, all writes are handled by the Provisioning Server, which can increase disk IO and network traffic.
  • Cache on server persistent – This cache option allows for the saving of changes between reboots. Using this option, after rebooting, a target device is able to retrieve changes made from previous sessions that differ from the read only vDisk image.

Many of these were available in previous versions of PVS, including cache to RAM, but what makes v7.1 more interesting is the ability to cache to RAM with the ability to overflow to HDD. This provides the best of both worlds: extreme RAM-based IO performance without the risk since you can now overflow to HDD if the RAM cache fills. Previously you had to be very careful to ensure your RAM cache didn’t fill completely as that could result in catastrophe. Granted, if the need to overflow does occur, affected user VMs will be at the mercy of your available HDD performance capabilities, but this is still better than the alternative (BSOD).

Results

Even when caching directly to HDD, PVS shows lower IOPS/ user numbers than MCS does on the same hardware. We decided to take things a step further by testing a number of different caching options. We ran tests on both Hyper-V and ESXi using our standard 3 user VM profiles against LoginVSI’s low, medium, high workloads. For reference, below are the standard user VM profiles we use in all Dell Wyse Datacenter enterprise solutions:

Profile Name Number of vCPUs per Virtual Desktop Nominal RAM (GB) per Virtual Desktop Use Case
Standard 1 2 Task Worker
Enhanced 2 3 Knowledge Worker
Professional 2 4 Power User

We tested three write caching options across all user and workload types: cache on device HDD, RAM + Overflow (256MB) and RAM + Overflow (512MB). Doubling the amount of RAM cache on more intensive workloads paid off big netting a near host IOPS reduction to 0. That’s almost 100% of user generated IO absorbed completely by RAM. We didn’t capture the IOPS generated in RAM here using PVS, but as the fastest medium available in the server and from previous work done with other in-RAM technologies, I can tell you that 1600MHz RAM is capable of tens of thousands of IOPS, per host. We also tested thin vs thick provisioning using our high end profile when caching to HDD just for grins. Ironically, thin provisioning outperformed thick for ESXi, the opposite proved true for Hyper-V. Toachieve these impressive IOPS number on ESXi it is important to enable intermediate buffering (see links at the bottom). I’ve highlighted the more impressive RAM + overflow results in red below. Note: IOPS per user below indicates IOPS generation as observed at the disk layer of the compute host. This does not mean these sessions generated close to no IOPS.

Hyper-visor PVS Cache Type Workload Density Avg CPU % Avg Mem Usage GB Avg IOPS/User Avg Net KBps/User
ESXi Device HDD only Standard 170 95% 1.2 5 109
ESXi 256MB RAM + Overflow Standard 170 76% 1.5 0.4 113
ESXi 512MB RAM + Overflow Standard 170 77% 1.5 0.3 124
ESXi Device HDD only Enhanced 110 86% 2.1 8 275
ESXi 256MB RAM + Overflow Enhanced 110 72% 2.2 1.2 284
ESXi 512MB RAM + Overflow Enhanced 110 73% 2.2 0.2 286
ESXi HDD only, thin provisioned Professional 90 75% 2.5 9.1 250
ESXi HDD only thick provisioned Professional 90 79% 2.6 11.7 272
ESXi 256MB RAM + Overflow Professional 90 61% 2.6 1.9 255
ESXi 512MB RAM + Overflow Professional 90 64% 2.7 0.3 272

For Hyper-V we observed a similar story and did not enabled intermediate buffering at the recommendation of Citrix. This is important! Citrix strongly recommends to not use intermediate buffering on Hyper-V as it degrades performance. Most other numbers are well inline with the ESXi results, save for the cache to HDD numbers being slightly higher.

Hyper-visor PVS Cache Type Workload Density Avg CPU % Avg Mem Usage GB Avg IOPS/User Avg Net KBps/User
Hyper-V Device HDD only Standard 170 92% 1.3 5.2 121
Hyper-V 256MB RAM + Overflow Standard 170 78% 1.5 0.3 104
Hyper-V 512MB RAM + Overflow Standard 170 78% 1.5 0.2 110
Hyper-V Device HDD only Enhanced 110 85% 1.7 9.3 323
Hyper-V 256MB RAM + Overflow Enhanced 110 80% 2 0.8 275
Hyper-V 512MB RAM + Overflow Enhanced 110 81% 2.1 0.4 273
Hyper-V HDD only, thin provisioned Professional 90 80% 2.2 12.3 306
Hyper-V HDD only thick provisioned Professional 90 80% 2.2 10.5 308
Hyper-V 256MB RAM + Overflow Professional 90 80% 2.5 2.0 294
Hyper-V 512MB RAM + Overflow Professional 90 79% 2.7 1.4 294

Implications

So what does it all mean? If you‚Äôre already a PVS customer this is a no brainer, upgrade to v7.1 and turn on ‚Äúcache in device RAM with overflow to hard disk‚ÄĚ now. Your storage subsystems will thank you. The benefits are clear in both ESXi and Hyper-V alike. If you‚Äôre deploying XenDesktop soon and debating MCS vs PVS, this is a very strong mark in the ‚Äúpro‚ÄĚ column for PVS. The fact of life in VDI is that we always run out of CPU first, but that doesn‚Äôt mean we get to ignore or undersize for IO performance as that‚Äôs important too. Enabling RAM to absorb the vast majority of user write cache IO allows us to stretch our HDD subsystems even further, since their burdens are diminished. Cut your local disk costs by 2/3 or stretch those shared arrays 2 or 3x. PVS cache in RAM + overflow allows you to design your storage around capacity requirements with less need to overprovision spindles just to meet IO demands (resulting in wasted capacity).

References:

DWD Enterprise Reference Architecture

http://support.citrix.com/proddocs/topic/provisioning-7/pvs-technology-overview-write-cache-intro.html

When to Enable Intermediate Buffering for Local Hard Drive Cache

XENAPP 7.X ARCHITECTURE AND SIZING

XenApp 7.x Architecture and Sizing

Source: http://weestro.blogspot.com/2014/05/xenapp-7x-architecture-and-sizing.html

image

Peter Fine here from Dell CCC Solution Engineering, where we just finished an extensive refresh for our XenApp recommendation within the Dell Wyse Datacenter for Citrix solution architecture.¬† Although not called ‚ÄúXenApp‚ÄĚ in XenDesktop versions 7 and 7.1 of the product, the name has returned officially for version 7.5. XenApp is still architecturally linked to XenDesktop from a management infrastructure perspective but can also be deployed as a standalone architecture from a compute perspective. The best part of all now is flexibility. Start with XenApp or start with XenDesktop then seamlessly integrate the other at a later time with no disruption to your environment. All XenApp really is now, is a Windows Server OS running the Citrix Virtual Delivery Agent (VDA). That‚Äôs it! XenDesktop on the other hand = a Windows desktop OS running the VDA.

Architecture

The logical architecture depicted below displays the relationship with the two use cases outlined in red. All of the infrastructure that controls the brokering, licensing, etc is the same between them. This simplification of architecture comes as a result of XenApp shifting from the legacy Independent Management Architecture (IMA) to XenDesktop’s Flexcast Management Architecture (FMA). It just makes sense and we are very happy to see Citrix make this move. You can read more about the individual service components of XenDesktop/ XenApp here.

image

Expanding the architectural view to include the physical and communication elements, XenApp fits quite nicely with XenDesktop and compliments any VDI deployment. For simplicity, I recommend using compute hosts dedicated to XenApp and XenDesktop, respectively, for simpler scaling and sizing. Below you can see the physical management and compute hosts on the far left side with each of their respective components considered within. Management will remain the same regardless of what type of compute host you ultimately deploy but there are several different deployment options. Tier 1 and tier 2 storage are comprehended the same way when XenApp is in play, which can make use of local or shared disk depending on your requirements. XenApp also integrates nicely with PVS which can be used for deployment and easy scale out scenarios.  I have another post queued up for PVS sizing in XenDesktop.

image

From a stack view perspective, XenApp fits seamlessly into an existing XenDesktop architecture or can be deployed into a dedicated stack. Below is a view of a Dell Wyse Datacenter stack tailored for XenApp running on either vSphere or Hyper-v using local disks for Tier 1. XenDesktop slips easily into the compute layer here with our optimized host configuration. Be mindful of the upper scale utilizing a single management stack as 10K users and above is generally considered very large for a single farm. The important point to note is that the network, mgmt and storage layers are completely interchangeable between XenDesktop and XenApp. Only the host config in the compute layer changes slightly for XenApp enabled hosts based on our optimized configuration.

image

Use Cases

There are a number of use cases for XenApp which ultimately relies on Windows Server’s RDSH role (terminal services). The age-old and most obvious use case is for hosted shared sessions, i.e. many users logging into and sharing the same Windows Server instance via RDP. This is useful for managing access to legacy apps, providing a remote access/ VPN alternative, or controlling access to an environment through which can only be accessed via the XenApp servers. The next step up naturally extends to application virtualization where instead of multiple users being presented with and working from a full desktop, they simply launch the applications they need to use from another device. These virtualized apps, of course, consume a full shared session on the backend even though the user only interacts with a single application. Either scenario can now be deployed easily via Delivery Groups in Citrix Studio.

image

XenApp also compliments full XenDesktop VDI through the use of application off-load. It is entirely viable to load every application a user might need within their desktop VM, but this comes at a performance and management cost. Every VDI user on a given compute host will have a percentage of allocated resources consumed by running these applications which all have to be kept up to date and patched unless part of the base image. Leveraging XenApp with XenDesktop provides the ability to off-load applications and their loads from the VDI sessions to the XenApp hosts. Let XenApp absorb those burdens for the applications that make sense. Now instead of running MS Office in every VM, run it from XenApp and publish it to your VDI users. Patch it in one place, shrink your gold images for XenDesktop and free up resources for other more intensive non-XenApp friendly apps you really need to run locally. Best of all, your users won’t be able to tell the difference!

image

Optimization

We performed a number of tests to identify the optimal configuration for XenApp. There are a number of ways to go here: physical, virtual, or PVS streamed to physical/ virtual using a variety of caching options. There are also a number of ways in which XenApp can be optimized. Citrix wrote a very good blog article covering many of these optimization options, of which most we confirmed. The one outlier turned out to be NUMA where we really didn’t see much difference with it turned on or off. We ran through the following test scenarios using the core DWD architecture with LoginVSI light and medium workloads for both vSphere and Hyper-V:

  • Virtual XenApp server optimization on both vSphere and Hyper-V to discover the right mix of vCPUs, oversubscription, RAM and total number of VMs per host
  • Physical Windows 2012 R2 host running XenApp
  • The performance impact and benefit of NUMA enabled to keep the RAM accessed by a CPU local to its adjacent DIMM bank.
  • The performance impact of various provisioning mechanisms for VMs: MCS, PVS write cache to disk, PVS write cache to RAM
  • The performance impact of an increased user idle time to simulate a less than 80+% concurrency of user activity on any given host.

To identify the best XenApp VM config we tried a number of configurations including a mix of 1.5x CPU core oversubscription, fewer very beefy VMs and many less beefy VMs. Important to note here that we based on this on the 10-core Ivy Bridge part E5-2690v2 that features hyperthreading and Turbo boost. These things matter! The highest density and best user experience came with 6 x VMs each outfitted with 5 x vCPUs and 16GB RAM.  Of the delivery methods we tried (outlined in the table below), Hyper-V netted the best results regardless of provisioning methodology. We did not get a better density between PVS caching methods but PVS cache in RAM completely removed any IOPS generated against the local disk. I’ll got more into PVS caching methods and results in another post.

Interestingly, of all the scenarios we tested, the native Server 2012 R2 + XenApp combination performed the poorest. PVS streamed to a physical host is another matter entirely, but unfortunately we did not test that scenario. We also saw no benefit from enabling NUMA. There was a time when a CPU accessing an adjacent CPU’s remote memory banks across the interconnect paths hampered performance, but given the current architecture in Ivy Bridge and its fat QPIs, this doesn’t appear to be a problem any longer.

The ‚ÄúDell Light‚ÄĚ workload below was adjusted to account for less than 80% user concurrency where we typically plan for in traditional VDI. Citrix observed that XenApp users in the real world tend to not work all at the same time. Less users working concurrently means freed resources and opportunity to run more total users on a given compute host.

The net of this study shows that the hypervisor and XenApp VM configuration matter more than the delivery mechanism. MCS and PVS ultimately netted the same performance results but PVS can be used to solve other specific problems if you have them (IOPS).

image

* CPU % for ESX Hosts was adjusted to account for the fact that Intel E5-2600v2 series processors with the Turbo Boost feature enabled will exceed the ESXi host CPU metrics of 100% utilization. With E5-2690v2 CPUs the rated 100% in ESXi is 60000 MHz of usage, while actual usage with Turbo has been seen to reach 67000 MHz in some cases. The Adjusted CPU % Usage is based on 100% = 66000 MHz usage and is used in all charts for ESXi to account for Turbo Boost. Windows Hyper-V metrics by comparison do not report usage in MHz, so only the reported CPU % usage is used in those cases.

** The ‚ÄúDell Light‚ÄĚ workload is a modified VSI workload to represent a significantly lighter type of user. In this case the workload was modified to produce about 50% idle time.

†Avg IOPS observed on disk is 0 because it is offloaded to RAM.

Summary of configuration recommendations:

  • Enable Hyper-Threading and Turbo for oversubscribed performance gains.
  • NUMA did not show to have a tremendous impact enabled or disabled.
  • 1.5x CPU oversubscription per host produced excellent results. 20 physical cores x 1.5 oversubscription netting 30 logical vCPUs assigned to VMs.
  • Virtual XenApp servers outperform dedicated physical hosts with no hypervisor so we recommend virtualized XenApp instances.
  • Using 10-Core Ivy Bridge CPUs, we recommend running 6 x XenApp VMs per host, each VM assigned 5 x vCPUs and 16GB RAM.
  • PVS cache in RAM (with HD overflow) will reduce the user IO generated to disk almost nothing but may require greater RAM densities on the compute hosts. 256GB is a safe high water mark using PVS cache in RAM based on a 21GB cache per XenApp VM.

Resources:

Dell Wyse Datacenter for Citrix ‚Äď Reference Architecture

XenApp/ XenDesktop Core Concepts

Citrix Blogs ‚Äď XenApp Scalability

  1. Do you have anything on XenApp 7.5 + HDX 3D? This is super helpful, but there is even less information on sizing for XenApp when GPUs are involved.

    Reply

  2. Unfortunately we don’t yet have any concrete sizing data for XenApp with graphics but this is Tee’d up for us to tackle next. I’ll add some of the architectural considerations which will hopefully help.

    Reply

  3. Two questions:
    1. Did you include antivirus in your XenApp scalability considerations? If not, physical box overhead with Win 2012 R2 and 1 AV instance is minimal, when compared to 6 PVS streamed VMs outfitted with 6 AV instances respectively (I am not recommending to go physical though).
    2. When suggesting PVS cache in RAM to improve scalability of XenApp workloads, do you consider CPU, not the IO, to be the main culprit? After all, you only have 20 cores in a 2 socket box, while there are numerous options to fix storage IO.

    PS. Some of your pictures are not visible

    Reply

  4. Hi Alex,

    1) Yes, we always use antivirus in all testing that we do at Dell. Real world simulation is paramount. Antivirus used here is still our standard McAfee product, not VDI-optimized.

    2) Yes, CPU is almost always the limiting factor and exhausts first, ultimately dictating the limits of compute scale. You can see here that PVS cache in RAM didn’t change the scale equation, even though it did use slightly less CPU, but it all but eliminates the disk IO problem. We didn’t go too deep on the higher IO use cases with cache in RAM but this can obviously be considered a poor man’s Atlantis ILIO.

    Thanks for stopping by!

    Reply

Count The Ways – Flash as Local Storage to an ESXi Host

Count The Ways – Flash as Local Storage to an ESXi Host
Posted: 21 Jul 2014   By: Joel Grace

When performance trumps all other considerations, flash technology is a critical component to achieve the highest level of performance. By deploying Fusion ioMemory, a VM can achieve near-native performance results. This is known as pass-through (or direct) I/O.

The process of achieving direct I/O involves passing the PCIe device to the VM, where the guest OS sees the underlying hardware as its own physical device.¬†The ioMemory device is then formatted with ‚Äúfile system‚ÄĚ by the OS, rather than presented as a virtual machine file system (VMFS) datastore.¬†This provides the lowest latency, highest IOPS and throughput.¬†Multiple ioMemory devices can also be combined to scale to the demands of the application.

Another option is to use ioMemory as a local VMFS datatstore. This solution provides high VM performance, while maintaining its ability to utilize features like thin provisioning, snapshots, VM portability and storage vMotion. With this configuration, the ioMemory can be shared by VMs on the same ESXi host and specific virtual machine disks (VMDK) stored here for application acceleration.

Either of these options can be used for each of the following design examples.

Benefits of Direct I/O:

Raw hardware performance of flash within a VM with Direct I/OProvides the ability to use RAID across ioMemory cards to drive higher performance within the VMUse of any file system to manage the flash storage

Considerations of Direct I/O:

ESXi host may need to be rebooted and CPU VT flag enabledFusion-io VSL driver will need to be install in the guest VM to manage deviceOnce assigned to a VM the PCI device cannot be share with any other VMs

Benefits Local Datastore:

High performance of flash storage for VM VMDKsMaintain VMware functions like snapshots and storage vMotion

Considerations Local Datastore:

Not all VMDKs for a given VM have to reside on local flash use shared storage for OS and flash for application DATA VMDKsSQL/SIOS

Many enterprise applications reveal their own high availability (HA) features when deployed in bare metal environments. These elements can be used inside VMs to provide an additional layer of protection to an application, beyond that of VMware HA.

Two great SQL examples of this are Microsoft’s Database Availability Groups and SteelEye DataKeeper. Fusion-io customers leverage these technologies in bare metal environments to run all-flash databases without sacrificing high availability. The same is true for virtual environments.

By utilizing shared-nothing cluster aware application HA, VMs can still benefit from the flexibility provided by virtualization (hardware abstraction, mobility, etc.), but also take advantage of local flash storage resources for maximum performance.

Benefits:

Maximum application performanceMaximum application availabilityMaintains software defined datacenter

Operational Considerations:

100% virtualization is a main goal, but performance is criticalDoes virtualized application have additional HA features?SAN/NAS based datastore can be used for Storage vMotion if hosts needs to be taken offline for maintenanceCITRIX

The Citrix XenDesktop and XenApp application suites also present interesting use cases for local flash in VMWare environments. Often times these applications are deployed in a stateless fashion via Citrix Provisioning Services, where several desktop clones or XenApp servers are booting from centralized read-only golden images. Citrix Provisioning Services stores all data changes during the users’ session in a user-defined write cache location.  When a user logs off or the XenApp server is rebooted, this data is flushed clean. The write cache location can be stored across the network on the PVS servers, or on local storage devices. By storing this data on a local Fusion-io datastore on the ESXi host, it drastically reduces access time to active user data making for a better Citrix user experience and higher VM density.

Benefits:

Maximum application performanceReduced network load between VM’s and Citrix PVS ServerAvoids slow performance when SAN under heavy IO pressureMore responsive applications for better user experience

Operational Considerations

Citrix Personal vDisks (persistent desktop data) should be directed to the PVS server storage for resiliency.PVS vDisk Images can also be stored on ioDrives in the PVS server further increasing performance while eliminating the dependence on SAN all together.ioDrive capacity determined by Citrix write cache sizing best practices, typically a 5GB .vmdk per XenDekstop instance.

70 desktops x 5GB write cache = 350GB total cache size (365GB ioDrive could be used in this case).

The Citrix XenDesktop and XenApp application suites also present interesting use cases for local flash in VMWare environments. Often times these applications are deployed in a stateless fashion via Citrix Provisioning Services, where several desktop clones or XenApp servers are booting from centralized read-only golden images. Citrix Provisioning Services stores all data changes during the users’ session in a user-defined write cache location.  When a user logs off or the XenApp server is rebooted, this data is flushed clean. The write cache location can be stored across the network on the PVS servers, or on local storage devices. By storing this data on a local Fusion-io datastore on the ESXi host, it drastically reduces access time to active user data making for a better Citrix user experience and higher VM density.

VMware users can boost their system to achieve maximum performance and acceleration using flash memory. Flash memory will maintain maximum application availability during heavy I/O pressure, and makes your applications more responsive, providing a better user experience. Flash can also reduce network load between VMs and Citrix PVS Server.Click here to learn more about how flash can boost performance in your VMware system.

Joel Grace Sales Engineer
Source: http://www.fusionio.com/blog/count-the-ways.
Copyright © 2014 SanDisk Corporation. All rights reserved.Terms of UseTrademarksPrivacyCookies

VNX 5300/VMware: Troubleshoot ESXi connectivity to SAN va iSCSI connection

Troubleshoot VMware ESXi/ESX to iSCSI array connectivity:

Note: This A rescan is required after every storage presentation change to the environment.

1.Log into the ESXi/ESX host and verify that the VMkernel interface (vmk) on the host can vmkping the iSCSI targets with this command:

# vmkping target_ip

If you are running an ESX host, also check that Service Console interface (vswif) on the host can ping the iSCSI target with:

# ping target_ip

Note: Pinging the storage array only applies when using the Software iSCSI initiator. In ESXi, ping and ping6 both run vmkping. For more information about vmkping, see Testing VMkernel network connectivity with the vmkping command (1003728).

2.Use netcat (nc) to verify that you can reach the iSCSI TCP port (default 3260) on the storage array from the host:

# nc -z target_ip 3260

Example output:

Connection to 10.1.10.100 3260 port [tcp/http] succeeded!

Note: The netcat command is available with ESX 4.x and ESXi 4.1 and later.

3.Verify that the host Hardware Bus Adapters (HBAs) are able to access the shared storage. For more information, see Obtaining LUN pathing information for ESX or ESXi hosts (1003973).

4.Confirm that no firewall is interfering with iSCSI traffic. For details on the ports and firewall requirements for iSCSI, see Port and firewall requirements for NFS and SW iSCSI traffic (1021626). For more information, see Troubleshooting network connection issues caused by firewall configuration (1007911).

Note: Check the SAN and switch configuration, especially if you are using jumbo frames (supported from ESX 4.x). To test the ping to a storage array with jumbo frames from an ESXi/ESX host, run this command:

# vmkping -s MTUSIZE IPADDRESS_OF_SAN -d

Where MTUSIZE is 9000 – (a header of) 216, which is 8784, and the -d option indicates “do not fragment”.

5.Ensure that the LUNs are presented to the ESXi/ESX hosts. On the array side, ensure that the LUN IQNs and access control list (ACL) allow the ESXi/ESX host HBAs to access the array targets. For more information, see Troubleshooting LUN connectivity issues on ESXi/ESX hosts (1003955).

Additionally ensure that the HOST ID on the array for the LUN (on ESX it shows up under LUN ID) is less than 255 for the LUN. The maximum LUN ID is 255. Any LUN that has a HOST ID greater than 255 may not show as available under Storage Adapters, though on the array they may reside in the same storage group as the other LUNS that have host IDs less than 255. This limitation exists in all versions of ESXi/ESX from ESX 2.x to ESXi 5.x. This information can be found in the maximums guide for the particular version of ESXi/ESX having the issue.

6.Verify that a rescan of the HBAs displays presented LUNs in the Storage Adapters view of an ESXi/ESX host. For more information, see Performing a rescan of the storage on an ESXi/ESX host (1003988).

7.Verify your CHAP authentication. If CHAP is configured on the array, ensure that the authentication settings for the ESXi/ESX hosts are the same as the settings on the array. For more information, see Checking CHAP authentication on the ESXi/ESX host (1004029).

8.Consider pinging any ESXi/ESX host iSCSI initiator (HBA) from the array’s targets. This is done from the iSCSI host.

9.Verify that the storage array is listed on the Storage/SAN Compatibility Guide. For more information, see Confirming ESXi/ESX host hardware (System, Storage, and I/O) compatibility (1003916).

Note: Some array vendors have a minimum-recommended microcode/firmware version to operate with VMware ESXi/ESX. This information can be obtained from the array vendor and the VMware Hardware Compatibility Guide.

10.Verify that the physical hardware is functioning correctly, including:

‚ó¶The Storage Processors (sometimes known as heads) on the array

‚ó¶The storage array itself

‚ó¶Check the SAN and switch configuration, especially if you are using jumbo frames (supported from ESX 4.x). To test the ping to a storage array with jumbo frames from ESXi/ESX, run this command:

# vmkping -s MTUSIZE STORAGE_ARRAY_IPADDRESS

Where MTUSIZE is 9000 – (a header of) 216, which is 8784.

Note: Consult your storage array vendor if you require assistance.

11.Perform some form of network packet tracing and analysis, if required. For more information, see:

‚ó¶Capturing virtual switch traffic with tcpdump and other utilities (1000880)

‚ó¶Troubleshooting network issues by capturing and sniffing network traffic via tcpdump (1004090)