VMware KB (2007236): Storage device performance deteriorated

Storage device performance deteriorated (2007236)

Details

This message is received when the latency on the device is higher than the average latency. The device latency increased which generated this event. This occurs when either the latency ratio to the last time the log was updated is 30 or if the ratio doubled since the last log.

Some of the possible reasons why the device latency increased are:

  • Changes made on the target
  • Disk or media failures
  • Overload conditions on the device
  • Failover

For more/related information, see Information regarding vSphere storage device latency performance metrics (2001676).

Impact

The storage device performance declines.

Example

This is an example of the message received when this occurs:

Device naa.5000c5000b36354b performance has deteriorated. I/O latency increased from average value of 1832 microseconds to 19403 microsecond

Note: The message shows microseconds which can be converted to milliseconds: 19403 microseconds = 19.403 milliseconds.

Solution

If you observe that this latency is too high for a consistent period of time, this indicates that there is a concern about storage performance and you must check the logs on the storage array for any indication of a failure. If failures are logged on the storage array side, corrective action should be taken. Contact your storage vendor for information regarding checking logs on the array.

Also check if these messages are generated when there were any scheduled tasks, such as backups, replications, etc., as these can also cause intermittent performance hits.

If the message is generated because of an overload condition, attempt to reduce the load on the affected storage device. If the storage device is overloaded, reduce the load on the storage device.

If running a LUN replication tool, pause the task from the storage end and attempt a storage vMotion to a different datastore. This should help improve the I/O operations.

For more/related information on diagnosing overload/performance issues, see Using esxtop to identify storage performance issues (1008205).

Advertisements

VMware KB (1008205): Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)

Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions) (1008205)

Details

This article provides information about esxtop and latency statistics that can be used when troubleshooting performance issues with SAN-connected storage (Fibre Channel or iSCSI).

Note: In ESXi 5.x, you may see messages indicating that performance has deteriorated. For more information, see Storage device performance deteriorated (2007236).

Solution

http://youtu.be/e-Yq1BL5rY8

The interactive esxtop utility can be used to provide I/O metrics over various devices attached to a VMware ESX host.

Configuring monitoring using esxtop

 To monitor storage performance per HBA:

  1. Start esxtop by typing esxtop at the command line.
  2. Press d to switch to disk view (HBA mode).
  3. To view the entire Device name, press SHIFT + L and enter 36 in Change the name field size.
  4. Press f to modify the fields that are displayed.
  5. Press b, c, d, e, h, and j to toggle the fields and press Enter.
  6. Press s and then to alter the update time to every 2 seconds and press Enter.
  7. See Analyzing esxtop columns for a description of relevant columns.

Note: These options are available only in VMware ESX 3.5 and later.

To monitor storage performance on a per-LUN basis:

  1. Start esxtop by typing esxtop from the command line.
  2. Press u to switch to disk view (LUN mode).
  3. Press f to modify the fields that are displayed.
  4. Press b, c, f, and h to toggle the fields and press Enter.
  5. Press s and then 2 to alter the update time to every 2 seconds and press Enter.
  6. See Analyzing esxtop columns for a description of relevant columns.

To increase the width of the device field in esxtop to show the complete naa id:

  1. Start esxtop by typing esxtop at the command line.
  2. Press u to switch to the disk device display.
  3. Press L to change the name field size.

    Note: Ensure to use uppercase L.

  4. Enter the value 36 to display the complete naa identifier.

To monitor storage performance on a per-virtual machine basis:

  1. Start esxtop by typing esxtop at the command line.
  2. Type v to switch to disk view (virtual machine mode).
  3. Press f to modify the fields that are displayed.
  4. Press b, d, e, h, and j to toggle the fields and press Enter.
  5. Press s and then 2 to alter the update time to every 2 seconds and press Enter.
  6. See Analyzing esxtop columns for a description of relevant columns.

Analyzing esxtop columns

Refer to this table for relevant columns and descriptions of these values:

Column  Description
CMDS/s This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such
as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.
In most cases CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)
DAVG/cmd This is the average response time in milliseconds per command being sent to the device
KAVG/cmd This is the amount of time the command spends in the VMkernel
GAVG/cmd This is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG

These columns are for both reads and writes, whereas xAVG/rd is for reads and xAVG/wr is for writes. The combined value of these columns is the best way to monitor performance, but high read or write response time it may indicate that the read or write cache is disabled on the array. All arrays perform differently, however DAVG/cmd, KAVG/cmd, and GAVG/cmd should not exceed more than 10 milliseconds (ms) for sustained periods of time.

Note: VMware ESX 3.0.x does not include direct functionality to monitor individual LUNs or virtual machines using esxtop. Inactive LUNs lower the average for DAVG/cmd, KAVG/cmd, and GAVG/cmd. These values are also visible from the vCenter Server performance charts. For more information, see the Performance Charts section in the Basic System Administration Guide.

If you experience high latency times, investigate current performance metrics and running configuration for the switches and the SAN targets. Check for errors or logging that may suggest a delay in operations being sent to, received, and acknowledged. This includes the array’s ability to process I/O from a spindle count aspect, or the array’s ability to handle the load presented to it.

If the response time increases to over 5000 ms (or 5 seconds), VMware ESX will time out the command and abort the operation. These events are logged; abort messages and other SCSI errors can be reviewed in these logs:

  • ESX 3.5 and 4.x – /var/log/vmkernel
  • ESXi 3.5 and 4.x – /var/log/messages 
  • ESXi 5.x – /var/log/vmkernel.log

The type of storage logging you may see in these files depends on the configuration of the server. You can find the value of these options by navigating to Host > Configuration > Advanced Settings > SCSI > SCSI.Log* or SCSI.Print*.

Additional Information

You can also collect performance snapshots using vm-support. For more information, see Collecting performance snapshots using vm-support (1967).

See also Creating an alarm or email notification that triggers when Disk Latency or DAVG is high

CITRIX XENDESTOP AND PVS: A WRITE CACHE PERFORMANCE STUDY

Thursday, July 10, 2014   , , , , , , , , , , , ,   Source: Exit | the | Fast | Lane

image

If you’re unfamiliar, PVS (Citrix Provisioning Server) is a vDisk deployment mechanism available for use within a XenDesktop or XenApp environment that uses streaming for image delivery. Shared read-only vDisks are streamed to virtual or physical targets in which users can access random pooled or static desktop sessions. Random desktops are reset to a pristine state between logoffs while users requiring static desktops have their changes persisted within a Personal vDisk pinned to their own desktop VM. Any changes that occur within the duration of a user session are captured in a write cache. This is where the performance demanding write IOs occur and where PVS offers a great deal of flexibility as to where those writes can occur. Write cache destination options are defined via PVS vDisk access modes which can dramatically change the performance characteristics of your VDI deployment. While PVS does add a degree of complexity to the overall architecture, since its own infrastructure is required, it is worth considering since it can reduce the amount of physical computing horsepower required for your VDI desktop hosts. The following diagram illustrates the relationship of PVS to Machine Creation Services (MCS) in the larger architectural context of XenDesktop. Keep in mind also that PVS is frequently used to deploy XenApp servers as well.

image

PVS 7.1 supports the following write cache destination options (from Link):

  • Cache on device hard drive – Write cache can exist as a file in NTFS format, located on the target-device’s hard drive. This write cache option frees up the Provisioning Server since it does not have to process write requests and does not have the finite limitation of RAM.
  • Cache on device hard drive persisted (experimental phase only) – The same as Cache on device hard drive, except cache persists. At this time, this write cache method is an experimental feature only, and is only supported for NT6.1 or later (Windows 7 and Windows 2008 R2 and later).
  • Cache in device RAM – Write cache can exist as a temporary file in the target device’s RAM. This provides the fastest method of disk access since memory access is always faster than disk access.
  • Cache in device RAM with overflow on hard disk – When RAM is zero, the target device write cache is only written to the local disk. When RAM is not zero, the target device write cache is written to RAM first.
  • Cache on a server – Write cache can exist as a temporary file on a Provisioning Server. In this configuration, all writes are handled by the Provisioning Server, which can increase disk IO and network traffic.
  • Cache on server persistent – This cache option allows for the saving of changes between reboots. Using this option, after rebooting, a target device is able to retrieve changes made from previous sessions that differ from the read only vDisk image.

Many of these were available in previous versions of PVS, including cache to RAM, but what makes v7.1 more interesting is the ability to cache to RAM with the ability to overflow to HDD. This provides the best of both worlds: extreme RAM-based IO performance without the risk since you can now overflow to HDD if the RAM cache fills. Previously you had to be very careful to ensure your RAM cache didn’t fill completely as that could result in catastrophe. Granted, if the need to overflow does occur, affected user VMs will be at the mercy of your available HDD performance capabilities, but this is still better than the alternative (BSOD).

Results

Even when caching directly to HDD, PVS shows lower IOPS/ user numbers than MCS does on the same hardware. We decided to take things a step further by testing a number of different caching options. We ran tests on both Hyper-V and ESXi using our standard 3 user VM profiles against LoginVSI’s low, medium, high workloads. For reference, below are the standard user VM profiles we use in all Dell Wyse Datacenter enterprise solutions:

Profile Name Number of vCPUs per Virtual Desktop Nominal RAM (GB) per Virtual Desktop Use Case
Standard 1 2 Task Worker
Enhanced 2 3 Knowledge Worker
Professional 2 4 Power User

We tested three write caching options across all user and workload types: cache on device HDD, RAM + Overflow (256MB) and RAM + Overflow (512MB). Doubling the amount of RAM cache on more intensive workloads paid off big netting a near host IOPS reduction to 0. That’s almost 100% of user generated IO absorbed completely by RAM. We didn’t capture the IOPS generated in RAM here using PVS, but as the fastest medium available in the server and from previous work done with other in-RAM technologies, I can tell you that 1600MHz RAM is capable of tens of thousands of IOPS, per host. We also tested thin vs thick provisioning using our high end profile when caching to HDD just for grins. Ironically, thin provisioning outperformed thick for ESXi, the opposite proved true for Hyper-V. Toachieve these impressive IOPS number on ESXi it is important to enable intermediate buffering (see links at the bottom). I’ve highlighted the more impressive RAM + overflow results in red below. Note: IOPS per user below indicates IOPS generation as observed at the disk layer of the compute host. This does not mean these sessions generated close to no IOPS.

Hyper-visor PVS Cache Type Workload Density Avg CPU % Avg Mem Usage GB Avg IOPS/User Avg Net KBps/User
ESXi Device HDD only Standard 170 95% 1.2 5 109
ESXi 256MB RAM + Overflow Standard 170 76% 1.5 0.4 113
ESXi 512MB RAM + Overflow Standard 170 77% 1.5 0.3 124
ESXi Device HDD only Enhanced 110 86% 2.1 8 275
ESXi 256MB RAM + Overflow Enhanced 110 72% 2.2 1.2 284
ESXi 512MB RAM + Overflow Enhanced 110 73% 2.2 0.2 286
ESXi HDD only, thin provisioned Professional 90 75% 2.5 9.1 250
ESXi HDD only thick provisioned Professional 90 79% 2.6 11.7 272
ESXi 256MB RAM + Overflow Professional 90 61% 2.6 1.9 255
ESXi 512MB RAM + Overflow Professional 90 64% 2.7 0.3 272

For Hyper-V we observed a similar story and did not enabled intermediate buffering at the recommendation of Citrix. This is important! Citrix strongly recommends to not use intermediate buffering on Hyper-V as it degrades performance. Most other numbers are well inline with the ESXi results, save for the cache to HDD numbers being slightly higher.

Hyper-visor PVS Cache Type Workload Density Avg CPU % Avg Mem Usage GB Avg IOPS/User Avg Net KBps/User
Hyper-V Device HDD only Standard 170 92% 1.3 5.2 121
Hyper-V 256MB RAM + Overflow Standard 170 78% 1.5 0.3 104
Hyper-V 512MB RAM + Overflow Standard 170 78% 1.5 0.2 110
Hyper-V Device HDD only Enhanced 110 85% 1.7 9.3 323
Hyper-V 256MB RAM + Overflow Enhanced 110 80% 2 0.8 275
Hyper-V 512MB RAM + Overflow Enhanced 110 81% 2.1 0.4 273
Hyper-V HDD only, thin provisioned Professional 90 80% 2.2 12.3 306
Hyper-V HDD only thick provisioned Professional 90 80% 2.2 10.5 308
Hyper-V 256MB RAM + Overflow Professional 90 80% 2.5 2.0 294
Hyper-V 512MB RAM + Overflow Professional 90 79% 2.7 1.4 294

Implications

So what does it all mean? If you’re already a PVS customer this is a no brainer, upgrade to v7.1 and turn on “cache in device RAM with overflow to hard disk” now. Your storage subsystems will thank you. The benefits are clear in both ESXi and Hyper-V alike. If you’re deploying XenDesktop soon and debating MCS vs PVS, this is a very strong mark in the “pro” column for PVS. The fact of life in VDI is that we always run out of CPU first, but that doesn’t mean we get to ignore or undersize for IO performance as that’s important too. Enabling RAM to absorb the vast majority of user write cache IO allows us to stretch our HDD subsystems even further, since their burdens are diminished. Cut your local disk costs by 2/3 or stretch those shared arrays 2 or 3x. PVS cache in RAM + overflow allows you to design your storage around capacity requirements with less need to overprovision spindles just to meet IO demands (resulting in wasted capacity).

References:

DWD Enterprise Reference Architecture

http://support.citrix.com/proddocs/topic/provisioning-7/pvs-technology-overview-write-cache-intro.html

When to Enable Intermediate Buffering for Local Hard Drive Cache

FUN WITH STORAGE SPACES IN WINDOWS SERVER 2012 R2 (Reposted From Exit | the | Fast | Lane)

Wednesday, September 17, 2014   , , , , ,

Source: Exit | the | Fast | Lane

image

Previously I took at look at a small three disk Storage Spaces configuration on Windows 8.1 for my Plex media server build. Looking at the performance of the individual disks, I grew curious so set out to explore Spaces performance in Server 2012 R2 with one extra disk. Windows 8.1 and Server 2012 R2 are cut from the same cloth with the Server product adding performance tiering, deduplication and means for larger scale.  At its core, Storage Spaces is essentially the same between the products and accomplishes a singular goal: take several raw disks and create a pool with adjustable performance and resiliency characteristics. The execution of this however, is very different between the products.

For my test bed I used a single Dell PowerEdge R610 which has 6 local 15K SAS disks attached to a PERC H700 RAID controller with 1GB battery-backed cache. Two disks in RAID 1 for the OS (Windows Server 2012 R2) and four disks left for Spaces. Spaces is a very capable storage aggregation tool that can be used with large external JBODs and clustered across physical server nodes, but for this exercise I want to look only at local disks and compare performance between three and four disk configurations.

Questions I’m looking to answer:

  • Do 2 columns outperform 1? (To learn about columns in Spaces, read this great article)
  • Do 4-disk Spaces greatly outperform 3-disk Spaces?
  • What advantage, if any, does controller cache bring to the table?
  • Are there any other advantages in Spaces for Server 2012 R2 that outperform the Windows 8.1 variant (aside from SSD tiering, dedupe and scale)?

No RAID

The first rule of Storage Spaces is: thou shalt have no RAID in between the disks and the Windows OS. Spaces handles the striping and resiliency manipulation itself so can’t have hardware or software RAID in the way. The simplest way to accomplish this is via the use of a pass-through SAS HBA but if you only have RAID controller at your disposal, it can work by setting all disks to RAID 0. For Dell servers, in the OpenManage Server Administrator (OMSA) console, create a new Virtual Disk for every physical disk you want to include, and set its layout to RAID-0. *NOTE – There are two uses of the term “virtual disk” in this scenario. Virtual Disk in the Dell storage sense is not to be confused with virtual disk in the Windows sense, although both are ultimately used to create a logical disk from a physical resource.

image

To test the native capability of Spaces with no PERC hardware cache assist, I set the following policy on each Virtual Disk:

image

Make some Spaces!

Windows will now see these local drives as assignable disks to a Storage Pool. From the File and Storage Services tab in Server Manager, launch the New Storage Pool Wizard. This is where creating Spaces in Windows 8.1 and Server 2012 R2 divest in a major way. What took just a few clicks on a single management pane in Win8.1 takes several screens and multiple wizards in Server 2012 R2. Also, all of this can be done in PowerShell using the new-storagepool and new-virtualdisk cmdlets. First name the new storage pool:

image

Add physical disks to the pool, confirm and commit.

image

You will now see your new pool created under the Storage Spaces section. Next create new virtual disks within the pool by right-clicking and selecting “New Virtual Disk”.

image

Select the pool to create the Virtual Disk within and name it. You’ll notice that this is where you would enable performance tiering if you had the proper disks in place using at least one SDD and one HDD.

image

Choose the Storage Layout (called resiliency in Win8.1 and in PowerShell). I previously determined that Mirror Spaces are the best way to go so will be sticking with that layout for these investigations.

image

Choices are good, but we want to save committed capacity, choose Thin Provisioning:

image

Specify the size:

image

Confirm and commit:

image

Now we have a pool (Storage Space) and a Virtual Disk, now we need to create a volume. Honestly I don’t know why Microsoft didn’t combine this next step with the previous Virtual Disk creation dialogs, as a Virtual Disk with no file system or drive letter is utterly useless.

image

Next select the size of the volume you want to create (smaller than or equal to the vDisk size specified previously), assign a drive letter and file system. It is recommended to use ReFS whenever possible. Confirm and commit.

image

Finally, a usable Storage Space presented to the OS as a drive letter!

Performance Baseline

First, to set a baseline I created a 4-disk RAID10 set using the integrated PERC H700 with 1GB cache (enabled). I copied a 6GB file back and forth between the C: and M: drives to test 100% read and 100% write operations. Write IO was fairly poor not even breaking 20 IOPS, the RAID10 double write penalty in full effect. Reads were much better, as expected, nearing 300 IOPS at peak.

image

3-Disk Mirror Space Performance

To force the server to go to disk for all IO operations (since I simply moved the same 6GB file between volumes), I used a little utility called Flush Windows Cache File by DelphiTools. This very simple CLI tool flushes the file cache and memory working sets along with the modified and purged memory standby lists.

Test constants: 3-disk Mirror space, PERC caching disabled, single column (default for 3 disks), 6GB file used to simulate a 100% read operation and a 100% write operation:image

imageHere are a few write operations followed by a read operation to demonstrate the disk performance capabilities. Write IOPS spike ~30 with read IOPS ~170. Interestingly, the write performance is better than RAID10 on the PERC here with read performance at almost half.

image

Turning the PERC caching back on doesn’t help performance at all, for reads or writes.

image

4-Disk Space, 1 Column Performance

Test constants: 4-disk Mirror space, PERC caching enabled, single column, 6GB file used to simulate a 100% read operation and a 100% write operation:

When adding a 4th disk to a 3-disk Space, the column layout has already been determined to be singular, so this does not change. To use a two column layout with four disks, the space would need to be destroyed and recreated. Let’s see how four disks perform with one column.

image

You’ll notice in the vDisk output that the Allocated size is larger now with the 4th disk, but the number of columns is unchanged at 1.

image

Adding the 4th disk to the existing 3-disk Space doesn’t appear to have helped performance at all. Even with PERC caching enabled, there is no discernable difference in performance between 3 disks or 4 using a single column, so the extra spindle adds nothing here. Disabling PERC cache appears to make no difference so I’ll skip that test here.

image

4-Disk Space, 2 Column Performance

Test constants: 4-disk Mirror space, PERC caching enabled, two columns, 6GB file used to simulate a 100% read operation and a 100% write operation.

Using 4 disks by default will create Virtual Disks with 2 columns as confirmed in PowerShell, this is also the minimum number of disks required to use two columns:

image

WOW! Huge performance difference with four disks and two columns. The read performance is consistently high (over 700 IOPS) with the writes performing in a wider arc upwards of 300 IOPS. This is a 1000% performance increase for writes and 300% increase for reads. Literally off the chart! Spaces is allowing each of these participating disks to exhibit their full performance potential individually.

image

Thinking this is too good to be true, I disabled the PERC cache and ran the test again. This is no fluke, I got very close to the same results the second time with all disks participating. Incredible.

image

Plex Performance

Since I began this whole adventure with Plex in mind, fitting that I should end it there too. Streaming a compatible file across the network yielded the following results. Skipping around in the file, all four disks were read from at some point during the stream, half spiking ~110 IOPS, the other two ~80 IOPS. This is twice the amount of observed performance on my home 3-disk 7.2k SATA-based HTPC.

image

Conclusions:

Using 15K SAS disks here, which are theoretically capable of ~200 IOPS each, performance is very good overall and seems to mitigate much of the overhead typically seen in a traditional RAID configuration. Spaces provides the resilient protection of a RAID configuration while allowing each disk in the set to maximize its own performance characteristics. The 3-disk one column Spaces configuration works but is very low performing. Four disks with one column don’t perform any better, so there is significant value in running a configuration that supports two or more columns. Columns are what drive the performance differential and the cool thing about Spaces is that they are tunable. Four disks with a two column configuration should be your minimum consideration for Spaces, as the performance increase adding that second column is substantial. Starting with three disks and adding a fourth will persist your single column configuration so is not advised. Best to just start with at least four disks. PERC caching showed to neither add nor detract from performance when used in conjunction with Spaces, so you could essentially ignore it altogether. Save your money and invest in SAS pass-through modules instead. The common theme that sticks out here is that the number of columns absolutely matters when it comes to performance in Spaces. The column-to-disk correlation ratio for a two-way mirror is 1:2, so you have to have a minimum of four disks to use two columns. Three disks and one column work fine for Plex but if you can start with four, do that.

Resources:

Storage Spaces FAQs

Storage Spaces – Designing for Performance

CLUSTERING SERVER 2012 R2 WITH ISCSI STORAGE

Wednesday, December 31, 2014   , , , , , , , , ,,   No comments

Source: Exit The Fast Lane

Yay, last post of 2014! Haven’t invested in the hyperconverged Software Defined Storage model yet? No problem, there’s still time. In the meanwhile, here is how to cluster Server 2012 R2 using tried and true EqualLogic iSCSI shared storage.

EQL Group Manager

First, prepare your storage array(s), by logging into EQL Group Manager. This post assumes that your basic array IP, access and security settings are in place.  Set up your local CHAP account to be used later. Your organization’s security access policies or requirements might dictate a different standard here.

SNAGHTML3b62e029

Create and assign an Access Policy to the VDS/VSS in Group Manager otherwise this volume will not be accessible. This will make subsequent steps easier when it’s time to configure ASM.image

Create some volumes in Group Manager now so you can connect your initiators easily in the next step. It’s a good idea to create your cluster quorum LUN now as well.

image

Host Network Configuration

First configure the interfaces you intend to use for iSCSI on your cluster nodes. Best practice says that you should limit your iSCSI traffic to a private Layer2 segment, not routed and only connecting to the devices that will participate in the fabric. This is no different from Fiber Channel in that regard, unless you are using a converged methodology and sharing your higher bandwidth NICs. If using Broadcom NICs you can choose Jumbo Frames or hardware offload, the larger frames will likely net a greater performance impact. Each host NIC used to access your storage targets should have a unique IP address able to access the network of those targets within the same private Layer2 segment. While these NICs can technically be teamed using the native Windows LBFO mechanism, best practice says that you shouldn’t, especially if you plan to use MPIO to load balance traffic. If your NICs will be shared (not dedicated to iSCSI alone) then LBFO teaming is supported in that configuration. To keep things clean and simple I’ll be using 4 NICs, 2 dedicated to LAN, 2 dedicated to iSCSI SAN. Both LAN and SAN connections are physically separated to their own switching fabrics as well, this is also a best practice.

image

MPIO – the manual method

First, start the MS iSCSI service, which you will be prompted to do, and check its status in PowerShell using get-service –name msiscsi.

image

Next, install MPIO using Install-WindowsFeature Multipath-IO

Once installed and your server has been rebooted, you can set additional options in PowerShell or via the MPIO dialog under  File and Storage Services—> Tools.

image

Open the MPIO settings and tick “add support for iSCSI devices” under Discover Multi-Paths. Reboot again. Any change you make here will ask you to reboot. Make all changes once so you only have to do this one time.

image

The easier way to do this from the onset is using the EqualLogic Host Integration Tools (HIT Kit) on your hosts. If you don’t want to use HIT for some reason, you can skip from here down to the “Connect to iSCSI Storage” section.

Install EQL HIT Kit (The Easier Method)

The EqualLogic HIT Kit will make it much easier to connect to your storage array as well as configure the MPIO DSM for the EQL arrays. Better integration, easier to optimize performance, better analytics. If there is a HIT Kit available for your chosen OS, you should absolutely install and use it. Fortunately there is indeed a HIT Kit available for Server 2012 R2.

image

Configure MPIO and PS group access via the links in the resulting dialog.

image

In ASM (launched via the “configure…” links above), add the PS group and configure its access. Connect to the VSS volume using the CHAP account and password specified previously. If the VDS/VSS volume is not accessible on your EQL array, this step will fail!

image

Connect to iSCSI targets

Once your server is back up from the last reboot, launch the iSCSI Initiator tool and you should see any discovered targets, assuming they are configured and online. If you used the HIT Kit you will already be connected to the VSS control volume and will see the Dell EQL MPIO tab.

image

Choose an inactive target in the discovered targets list and click connect, be sure to enable multi-path in the pop-up that follows, then click Advanced.

image

Enable CHAP log on, specify the user/pw set up previously:

image

If your configuration is good the status of your target will change to Connected immediately. Once your targets are connected, the raw disks will be visible in Disk Manager and can be brought online by Windows.

image

When you create new volumes on these disks, save yourself some pain down the road and give them the same label as what you assigned in Group Manager! The following information can be pulled out of the ASM tool for each volume:

image

Failover Clustering

With all the storage pre-requisites in place you can now build your cluster. Setting up a Failover Cluster has never been easier, assuming all your ducks are in a row. Create your new cluster using the Failover Cluster Manager tool and let it run all compatibility checks.

image

Make sure your patches and software levels are identical between cluster nodes or you’ll likely fail the clustering pre-check with differing DSM versions:

image

Once the cluster is built, you can manipulate your cluster disks and bring any online as required. Cluster disks will not be able to be brought online until all nodes in the cluster can access the disk.

image

Next add your cluster disks to Cluster Shared Volumes to enable multi-host read/write and HA.

image

The new status will be reflected once this change is made.

image

Configure your Quorum to use the disk witness volume you created earlier. This disk does not need to be a CSV.

image

Check your cluster networks and make sure that iSCSI is set to not allow cluster network communication. Make sure that your cluster network is setup to allow cluster network communication as well as allowing client connections. This can of course be further segregated if desired using additional NICs to separate cluster and client communication.

image

Now your cluster is complete and you can begin adding HA VMs, if using Hyper-V, SQL, File or other roles as required.

References:

http://blogs.technet.com/b/keithmayer/archive/2013/03/12/speaking-iscsi-with-windows-server-2012-and-hyper-v.aspx

http://blogs.technet.com/b/askpfeplat/archive/2013/03/18/is-nic-teaming-in-windows-server-2012-supported-for-iscsi-or-not-supported-for-iscsi-that-is-the-question.aspx