EMC VNX2 and VNX Future

Joe Chang : EMC VNX2 and VNX Future

EMC VNX2 and VNX Future

Update 2013-10: StorageReview on EMC Next Generation VNX
Update 2013-08: News reports that VNX2 will come out in Sep 2013

While going through the Flash Management Summit 2012 slide decks, I came across the session Flash Implications in Enterprise Storage Designs by Denis Vilfort of EMC, that provided information on performance of the CLARiiON, VNX, a VNX2 and VNX Future.

A common problem with SAN vendors is that it is almost impossible to find meaningful performance information on their storage systems. The typical practice is to cited some meaningless numbers like IOPS to cache or the combined IO bandwidth of the FC ports, conveying the impression of massive IO bandwidth, while actually guaranteeing nothing.

VNX (Original)

The original VNX was introduced in early 2011? The use of the new Intel Xeon 5600 (Westmere-EP) processors was progressive. The decision to employ only a single socket was not.

EMC VNX

EMC did provide the table below on their VNX mid-range systems in the document “VNX: Storage Technology High Bandwidth Application” (h8929) showing the maximum number of front-end FC and back-end SAS channels along with the IO bandwidths for several categories.

EMC VNX

It is actually unusual for a SAN storage vendor to provide such information, so good for EMC. Unfortunately, there is no detailed explanation of the IO patterns for each category.

Now obviously the maximum IO bandwidth can be reached in the maximum configuration, that is with all IO channels and all drive bays populated. There is also no question that maximum IO bandwidth requires all back-end IO ports populated and a sufficient number of front-end ports populated. (The VNX systems may support more front-end ports than necessary for configuration flexibility?)

However, it should not be necessary to employ the full set of hard disks to reach maximum IO bandwidth. This is because SAN systems are designed for capacity and IOPS. There are Microsoft Fast Track Data Warehouse version 3.0 and 4.0 documents for the EMC VNX 5300 or 5500 system. Unfortunately Microsoft has backed away from bare table scan tests of disk rates in favor of a composite metric. But it does seem to indicate that 30-50MB/s per disk is possible in the VNX.

What is needed is a document specifying the configuration strategy for high bandwidth specific to SQL Server. This includes the number and type of front-end ports, the number of back-end SAS buses, the number of disk array enclosures (DAE) on each SAS bus, the number of disks in each RAID group and other details for each significant VNX model. It is also necessary to configure the SQL Server database file layout to match the storage system structure, but that should be our responsibility as DBA.

It is of interest to note that the VNX FTDW reference architectures do not employ Fast Cache (flash caching) and (auto) tiered-storage. Both of these are an outright waste of money on DW systems and actually impedes performance. It does make good sense to employ a mix of 10K/15K HDD and SSD in the DW storage system, but we should use the SQL Server storage engine features (filegroups and partitioning) to place data accordingly.

A properly configured OLTP system should also employ separate HDD and SSD volumes, again using of filegroups and partitioning to place data correctly. The reason is that the database engine itself is a giant data cache, with perhaps as much as 1000GB of memory. What do we really expect to be in the 16-48GB SAN cache that is not in the 1TB database buffer cache? The IO from the database server is likely to be very misleading in terms of what data is important and whether it should be on SSD or HDD.

CLARiiON, VNX, VNX2, VNX Future Performance

Below are performance characteristics of EMC mid-range for CLARiiON, VNX, VNX2 and VNX Future. This is why I found the following diagrams highly interesting and noteworthy. Here, the CLARiiON bandwidth is cited as 3GB/s and the current VNX as 12GB/s (versus 10GB/s in the table above).

EMC VNX

I am puzzled that the VNX is only rated at 200K IOPS. That would correspond to 200 IOPS per disk and 1000 15K HDDs at low queue depth. I would expect there to be some capability to support short-stroke and high-queue depth to achieve greater than 200 IOPS per 15K disk. The CLARiiON CX4-960 supported 960 HDD. Yet the IOPS cited corresponds to the queue depth 1 performance of 200 IOPS x 200 HDD = 40K. Was there some internal issue in the CLARiiON. I do recall a CX3-40 generating 30K IOPS over 180 x 15K HDD.

A modern SAS controller can support 80K IOPS, so the VNX 7500 with 8 back-end SAS buses should handle more than 200K IOPS (HDD or SSD), perhaps as high as 640K? So is there some limitation in the VNX storage processor (SP), perhaps the inter-SP communication? or a limitation of write-cache which requires write to memory in both SP?

VNX2?

Below (I suppose) is the architecture of the new VNX2. (Perhaps VNX2 will come out in May with EMC World?) In addition to transitioning from Intel Xeon 5600 (Westmere) to E5-2600 series (Sandy Bridge EP), the diagram indicates that the new VNX2 will be dual-processor (socket) instead of single socket on the entire line of the original VNX. Considering that the 5500 and up are not entry systems, this was disappointing.

EMC VNX

VNX2 provides 5X increase in IOPS to 1M and 2.3X in IO bandwidth to 28GB/s. LSI mentions a FastPath option that dramatically increases IOPS capability of their RAID controllers from 80K to 140-150K IOPS. My understanding is that this is done by completely disabling the cache on the RAID controller. The resources to implement caching for large array of HDDs can actually impede IOPS performance, hence caching is even more degrading on an array of SSDs.

The bandwidth objective is also interesting. The 12GB/s IO bandwidth of the original VNX would require 15-16 FC ports at 8Gbps (700-800MBps per port) on the front-end. The VNX 7500 has a maximum of 32 FC ports, implying 8 quad-port FC HBAs, 4 per SP.

The 8 back-end SAS busses implies 4 dual-port SAS HBAs per SP? as each SAS bus requires 1 SAS port to each SP? This implies 8 HBAs per SP? Intel Xeon 5600 processor connects over QPI to a 5220 IOH with 32 PCI-E gen 2 lanes, supporting 4 x8 and 1×4 slots, plus a 1×4 Gen1 for other functions.

In addition, a link is needed for inter-SP communication. If one x8 PCI-E gen2 slot is used for this, then write bandwidth would be limited to 3.2GB/s (per SP?). A single socket should only be able to drive 1 IOH even though it is possible to connect 2. Perhaps the VNX 7500 is dual-socket?

An increase to 28GB/s could require 40 x8Gbps FC ports (if 700MB/s is the practical limit of 1 port). A 2-socket Xeon E5-2600 should be able to handle this easily, with 4 memory channels and 5 x8 PCI-E gen3 slots per socket.

VNX Future?

The future VNX is cited as 5M IOPS and 112GB/s. I assume this might involve the new NVM-express driver architecture supporting distributed queues and high parallelism. Perhaps both VNX2 and VNX Future are described is that the basic platform is ready but not all the components to support the full bandwidth?

EMC VNX

The 5M IOPS should be no problem with an array of SSDs, and the new NVM express architecture of course. But the 112GB/s bandwidth is curious. The number of FC ports, even at a future 16Gbit/s is too large to be practical. When the expensive storage systems will finally be able to do serious IO bandwidth, it will also be time to ditch FC and FCOE. Perhaps the VNX Future will support infini-band? The puprose of having extreme IO bandwidth capability is to be able to deliver all of it to a single database server on demand, not a little dribblet here and there. If not, then the database server should have its own storage system.

The bandwidth is also too high for even a dual-socket E5-2600. Each Xeon E5-2600 has 40 PCI-E gen3 lanes, enough for 5 x8 slots. The nominal bandwidth per PCIe G3 lane is 1GB/s, but the realizable bandwidth might be only 800MB/s per lane, or 6.4GB/s. A socket system in theory could drive 64GB/s. The storage system is comprised of 2 SP, each SP being a 2-socket E5-2600 system.

To support 112GB/s each SP must be able to simultaneously move 56GB/s on storage and 56GB/s on the host-side ports for a total of 112GB/s per SP. In addition, suppose the 112GB/s bandwidth for read, and that the write bandwidth is 56GB/s. Then it is also necessary to support 56GB/s over the inter-SP link to guarantee write-cache coherency (unless it has been decided that write caching flash on the SP is stupid).

Is it possible the VNX Future has more than 2 SP’s? Perhaps each SP is a 2-socket E5-4600 system, but the 2 SPs are linked via QPI? Basically this would be a 4-socket system, but running as 2 separate nodes, each node having its own OS image. Or that it is a 4-socket system? Later this year, Intel should be releasing an Ivy Bridge-EX, which might have more bandwidth? Personally I am inclined to prefer a multi-SP system over a 4-socket SP.

Never mind, I think Haswell-EP will have 64 PCIe gen4 lanes at 16GT/s. The is 2GB/s per lane raw, and 1.6GB/s per lane net, 12.8GB/s per x8 slot and 100GB/s per socket. I still think it would be a good trick if one SP could communicate with the other over QPI, instead of PCIe. Write caching SSD at the SP level is probably stupid if the flash controller is already doing this? Perhaps the SP memory should be used for SSD metadata? In any case, there should be coordination between what each component does.

Summary

It is good to know that EMC is finally getting serious about IO bandwidth. I was of the opinion that the reason Oracle got into the storage business was that they were tired of hearing complaints from customers resulting from bad IO performance on the multi-million dollar SAN.

My concern is that the SAN vendor field engineers have been so thoroughly indoctrinated in the SaaS concept that only capacity matters while having zero knowledge of bandwidth, that they are not be able to properly implement the IO bandwidth capability of the existing VNX, not to mention the even higher bandwidth in VNX2 and Future.

Updates will be kept on QDPMA Storage.

Published Monday, February 25, 2013 8:27 AM by jchang

MY TROUBLESHOOTING A HP FLEX 10 FCOE CONNECTION TO A CISCO FIBER CHANNEL MDS SWITCH

// <![CDATA[
(function() { var b=window,f="chrome",g="tick",k="jstiming";(function(){function d(a){this.t={};this.tick=function(a,d,c){var e=void 0!=c?c:(new Date).getTime();this.t[a]=[e,d];if(void 0==c)try{b.console.timeStamp("CSI/"+a)}catch(h){}};this[g]("start",null,a)}var a;b.performance&&(a=b.performance.timing);var n=a?new d(a.responseStart):new d;b.jstiming={Timer:d,load:n};if(a){var c=a.navigationStart,h=a.responseStart;0=c&&(b[k].srt=h-c)}if(a){var e=b[k].load;0=c&&(e[g](“_wtsrt”,void 0,c),e[g](“wtsrt_”,”_wtsrt”,h),e[g](“tbsd_”,”wtsrt_”))}try{a=null,
b[f]&&b[f].csi&&(a=Math.floor(b[f].csi().pageT),e&&0<c&&(e[g]("_tbnd",void 0,b[f].csi().startE),e[g]("tbnd_","_tbnd",c))),null==a&&b.gtbExternal&&(a=b.gtbExternal.pageT()),null==a&&b.external&&(a=b.external.pageT,e&&0=d&&b[k].load[g](“aft”)};var l=!1;function m(){l||(l=!0,b[k].load[g](“firstScrollTime”))}b.addEventListener?b.addEventListener(“scroll”,m,!1):b.attachEvent(“onscroll”,m);
})();
// ]]>
Source: Brian’s Virtually Useful Blog: VNX// <![CDATA[
var a="indexOf",b="&m=1",e="(^|&)m=",f="?",g="?m=1";function h(){var c=window.location.href,d=c.split(f);switch(d.length){case 1:return c+g;case 2:return 0//

First, some good links for documents about this:

Cisco, VMware, HP

The Problem:

Upon building a new HP chassis full of blades, hoping to connect my new blades to storage, I was build zoning rules connecting them to our EMC VNX, Inside of Cisco Fabric Manager I did not see the HP blades listed so that I could build zoning rules.  The other suspicious item was that inside of HP Virtual Connect Manager, inside of the “SAN Fabrics” tab, I was giving a warning for a status.

The Setup:

I had setup this up in a fully meshed fiber with NPIV enabled everywhere.  1 HP Chassis, two Flex 10 modules, two Cisco MDS switches, and one EMC VNX.  My flex 10 modules were each connected directly to both of the Cisco MDS Switches.   Each of the Cisco MDS’s were connected directly to each of the SP’s on the EMC VNX (fully meshed).

The Solution:

I worked on this for some time, then after changing the connections on the Flex 10’s to both go directly to the SAME Cisco MDS switch (removing the mesh from the Compute side), the HP Virtual Connect finally showed happy, and the Cisco MDS began to see all the HP Blades and I was able to connect storage.  So what did I do wrong originally?  I am sure there is a great reason, but what part of Fiber Channel for Dummies did I miss?

EMC VNX STORAGE POOL DESIGN & CONFIGURATION

// <![CDATA[
(function() { var b=window,f="chrome",g="tick",k="jstiming";(function(){function d(a){this.t={};this.tick=function(a,d,c){var e=void 0!=c?c:(new Date).getTime();this.t[a]=[e,d];if(void 0==c)try{b.console.timeStamp("CSI/"+a)}catch(h){}};this[g]("start",null,a)}var a;b.performance&&(a=b.performance.timing);var n=a?new d(a.responseStart):new d;b.jstiming={Timer:d,load:n};if(a){var c=a.navigationStart,h=a.responseStart;0=c&&(b[k].srt=h-c)}if(a){var e=b[k].load;0=c&&(e[g](“_wtsrt”,void 0,c),e[g](“wtsrt_”,”_wtsrt”,h),e[g](“tbsd_”,”wtsrt_”))}try{a=null,
b[f]&&b[f].csi&&(a=Math.floor(b[f].csi().pageT),e&&0<c&&(e[g]("_tbnd",void 0,b[f].csi().startE),e[g]("tbnd_","_tbnd",c))),null==a&&b.gtbExternal&&(a=b.gtbExternal.pageT()),null==a&&b.external&&(a=b.external.pageT,e&&0=d&&b[k].load[g](“aft”)};var l=!1;function m(){l||(l=!0,b[k].load[g](“firstScrollTime”))}b.addEventListener?b.addEventListener(“scroll”,m,!1):b.attachEvent(“onscroll”,m);
})();
// ]]>
Source: Brian’s Virtually Useful Blog: EMC VNX Storage Pool Design & Configuration// <![CDATA[
var a="indexOf",b="&m=1",e="(^|&)m=",f="?",g="?m=1";function h(){var c=window.location.href,d=c.split(f);switch(d.length){case 1:return c+g;case 2:return 0//

The EMC Whitepaper on VNX Best Practices h8268_VNX_Block_best_practices.pdf is the way to go, I used version 31.5 as it is the most current one available.

Storage Pools vs Raid Groups vs MetaLuns

From an design perspective, MetaLuns were basically replaced by Storage Pools, Storage pools allow for the large stripping across many drives that MetaLuns offers, but with a lot less complexity. MetaLuns are now generally used to combined/expand a traditional LUN.  Raid Groups have a maximum size of 16 disks, so for larger strips this isn’t a viable option.  For situations where guaranteed performance isn’t critical, go with Storage Pools, use Raid Groups if you need deterministic (guaranteed) performance.  The reason behind this is that you are probably going to create multiple LUNs out of your storage pool, so this could lead to one busy LUN affecting the others.

Raid Level Selection

Assuming your going with a Storage Pool, your options are Raid 5, 6, or 1/0.  If you are using large drives (over 1TB) then Raid 5 is not a good choices because of long rebuild times, Raid 6 is almost certainly the way to go.  Always use suggested drive numbers in the pools, Raid 5 is 5 disks, or a number that evenly divides by 5, Raid 6 & Raid 1/0 is 8 disks or a number that evenly divides by 8.  If you use less than the recommended you will be wasting space.

How Big of a Storage Pool do I start with?

Create a homogeneous storage pool with the largest number of practical drives.  Pools are designed for ease-of-use.  The pool dialog algorithmically implements many Best Practices.  It is better to start with a large pool, as you add disks to the pool, it does not (currently) restripe the disks, and therefore if you only added 5 disks to an existing 50 disk pool, the new LUNS would have much lower performance.   The smallest practical pool is 20 drives (four R5 raid groups).  It is recommended practice to segregate the storage system’s pool-based LUNs into two or more pools when availability or performance may benefit from separation.

I am only covering a small portion of what you need to know.

When dealing with storage, there are thousands of options, homogeneous drives vs. heterogeneous, Thick vs. Thin Provisioning, Fast VP Pools, Drive Speeds, Fast Cache and Flash Drives, Storage Tiering, the whitepaper above does a great of of detailing all of that, I won’t try to improve on what EMC has said.

EMC FAST: Whether to File and/or Block Tier

// // Source: Just Be Better… — EMC FAST: Whether to File and/or Block Tier// <![CDATA[
var customColor = '#ff6600';
var vimeoColor = '#ff6600';
var disableQuoteTitleFonts;
var disableHeaderNavFonts;
var slimAudioPlayer;

document.write(' .player, #page .video .video_embed, .photoset_narrow { display: none; } ‘);

// ]]>

Storage performance needs in today’s data center can change on a moment’s notice. Data that once needed the backing of a freight train today may only need the performance of a vespa tomorrow. Having the ability to react to the ever changing needs of one’s data in an automated fashion allows efficiencies never before seen in EMC’s midrange product line. Generally as data ages its importance lessens both from a business and usage perspective. Utilizing FAST allows EMC customers to place data on the appropriate storage tier based on application requirements and service levels. Choosing between cost (SATA/NL-SAS) and performance (EFD’s/SAS) is a thing of the past. Below are the what, when and why of EMC’s FAST. The intent is to help one make an informed decision based on the needs of their organization.

Block Tiering (What, When and Why)

The What: FAST for VNX/Clariion is an array based feature that utilizes Analyzer to move block based data (slices of LUNs). By capturing performance characteristics, it can intelligently make predictions on where that data will be best utilized. Data is moved at the sub LUN layer in 1G slices, eliminating the need and overhead with moving the full LUN. This could mean that portions of a LUN could exist on multiple disk types (FC, SATA , EFD) Migrations are seamless to the host and occur bidirectionally based on performance needs, ie. FC to SATA, FC to EFD, SATA to FC, SAS to NL-SAS, etc. FAST is utilized at the Storage Pool layer and not available within Traditional RAID Groups. To utilize FAST v2 (which is sub LUN tiering ) you must be at FLARE 30 or above (4.30.000.5.xxx), and have both Analyzer and FAST enabler installed on the array. Existing LUNs/Data can migrate seamlessly and non-disruptively into storage pools using the VNX/Clariion LUN migration feature. Additionally FAST operates with other Array based features such as Snapview, MirrorView, SAN Copy, RecoverPoint, etc, without issue. All FAST operations and scheduling is configurable through Unisphere.

The When: Automated tiering is a scheduled batch event and does not happen dynamically.

The Why: To better align your application service levels with the best storage type. Ease of management, as a requirement for FAST are storage pools. Storage pools allow for concise management and eased growth opportunities from one location. Individual RG and Meta LUNs management is not needed to obtain high end services levels with the use of SP’s and FAST. The idea going forth is to minimize disk purchasing requirements by moving hot and cold data to and fro disk types that meet specific service levels for that data. If data is accessed frequently then it makes sense that it lives on either EFD (enterprise FLASH drives) or FC/SAS. If data is not accessed frequently then it ideally should live on SATA/NL-SAS. By utilizing FAST in your environment, you are utilizing your Array in the most efficient manner while minimizing cap-ex costs.

File Tiering (What, When and Why)

The What: FAST for VNX File/Celerra utilizes the Cloud Tiering Appliance (or what was FMA, previously known as Rainfinity). The CTA utilizes a policy engine that allows movement of infrequently used files across different storage tiers based on last access times, modify times, size, filename, etc. As data is moved, the user perception is that the files still exist on primary storage. File retrieval (or recall) is initiated simply by clicking on the file, the file is then copied back to its original location. The appliance itself is available as a virtual appliance that can be imported into your existing VMware infrastructure via vCenter, or as a physical appliance (HW plus the software). Unlike FAST for VNX/CLARiiON, FAST for file allows you to tier across arrays (Celerra <-> VNX, Isilon or third party arrays) or cloud service providers (Atmos namely, other SP’s coming). The introduction of CTA to your environment is non-disruptive. All operations for CTA are configurable through the CTA GUI. In summary, CTA can be used as a Tiering engine, an archiving engine or a migration engine based on the requirements of your business. From an archiving perspective, CTA can utilize both Data Domain and Centera targets for long term enforced file level retention. As a migration engine, CTA can be utilized for permanent file moves from one array to another during technology refreshes or platform conversions. Note: CTA has no knowledge of the storage type, it simply moves files from one tier to another based on pre- defined criteria.

The When: Automated tiering is designed to running at scheduled intervals (in batches) and does not happen dynamically or continually I should say.

The Why: Unstructured data, data that exists outside of pre-defined data model such as SQL, is growing at an alarming rate. Think about how many word docs, excel spreadsheets, pictures, text files exist in your current NAS or general file-served environments. Out of that number what percentage hasn’t been touched since its initial creation? In that context, a fair assessment would be 50% of that data. A more accurate assessment would probably be 80% of your data. Archiving and Tiering via CTA simple allows for more efficient use of your high end and low end storage. If 80% of your data is not accessed or accessed infrequently it has no business being on fast spinning disk (FC or SAS). Ultimately this allows you to curb your future spending on pricey high end disk and focus more purchasing capacity for where your data should sit, on low end storage.

***Update***

As brought to my attention on the twitters (Thanks->@PBradz and @veverything), there is of course another option. Historically, data LUNs as used by the data movers for file specific data (CIFS, NFS) has only been supported on traditional RAID Group LUNs. With the introduction of the VNX, support has been extended to pool LUNs. This implies that you can utilize FAST block tiering for the data that encompasses those LUNs. A couple of things when designing and utilizing in this manner (more info here)…

  • The entire pool should be used by file only
  • Thick LUNs only within the pool
  • Same tiering policy for each pool LUN
  • Utilize compression and dedupe on the file side. Stay clear of block thin provisioning and compression.

There are of course numerous other recommendations that should be noted if you decide to go this route. Personally, its taken me a while to warm up to storage pools. Like any new technology it needs to gain my trust before I go all in on recommending it. Inherent bugs and inefficiencies early on have caused me to be somewhat cautious. Assuming you walk the line on how your pools are configured, this is a very viable means to file tier (so to speak) with the purchase of FAST block only. That being said there is still benefit to using the CTA for long term archiving primarily off array, as currently FAST Block is array bound only. Define the requirements up front so you’re not surprised on the backend as to what the technology can and can not do. If the partner you’re working with is worth their salt you’ll know all applicable options prior to that PO being cut…

UNEXPLAINED LUN TRESPASSES ON EMC VNX EXPLAINED …

Unexplained LUN trespasses on EMC VNX explained … | Duco Jaspars – vConsult

//

Recently I saw some unexplained LUN trespasses on an EMC VNX that is used in a vSphere 5 environment where we use VAAI.

Since we use pools on the VNX, it is advised to keep a LUN on the owning SP, to prevent unnecessary traffic over the internal bus between SPA and SPB. EMC says:

Avoid trespassing pool LUNs. Trespassing the pool LUNs to another SP may adversely affect performance. After a pool LUN trespass, a pool LUNs private information remains under control of the original owning SP. This will cause the trespassed LUNs I/Os to continue to be handled by the original owning SP. When this happens both SPs being used in handling the I/Os. Involving both SPs in an I/O increases the time used to complete an I/O.

When we investigated the cause of these trespasses, we noticed these trespasses seemed to happen during a Storage vMotion. The vmkernel.log showed (we uses EMC PowerPath/VE in this environment) that PowerPath/VE seems to be responsible for these trespasses:

PowerPath: EmcpEsxLogEvent:1252: Info:Mpx:Assigned volume 6006016029302F00A8445F18F351E111 to SPA

We could not understand why PowerPath decided to trespass these LUN’s

We opened a case at EMC and where pointed to EMC KB article emc287223
(EMC confirmed this is also true for Powerpath/VE 5.7)

This article says:

So what seems to happen is that when you do a Storage vMotion on a VNX from a Datastore on one SP, to a datastore on the other SP, is that after a certain amount of traffic has passed the internal bus between the SPs, the VNX decides it is better if both LUNs that hold the Datastores, are owned by the same SP, to keep the traffic caused by the Storage vMotion on the same SP.

The effect of this decision is that the Storage vMotion traffic now no longer traverses the internal bus, but this has as a side effect that all host traffic, which I consider more important then the Storage vMotion traffic, is now traversing the internal bus.

The scary part is this part:

In the event where there are both high application I/O and VAAI XCOPY I/Os, it is possible that the LUNs will trespass back and forth.

That is not what we would like to see.

I would prefer to see some more intelligence in the VNX that would say “OK, I know this is Storage vMotion Traffic I am sending over my internal bus, so if it takes a little longer to complete, it’s OK, as long as I keep the path for my host traffic optimal, by keeping the LUN on the owning SP” So optimize for host traffic, not for Storage vMotion traffic.

The article also says the message in the vmkernel.log file generated by PowerPath/VE is not the correct message. Instead of  “Assigned volume to SPx” which suggest PowerPath/VE is responsible for the trespass, it should read “followed to SPx”

I was told EMC engineering is aware of this issue, and hope they are working on a solution.

By the way, we think this also happens when not using PowerPath/VE but also when using VMware’s native multipathing, since this seems to be a VNX issue, and not a PowerPath/VE specific issue. We have not confirmed this in tests yet.

So to complete this story, why is this only happening when I use VAAI?

That’s pretty simple to explain. If I doe a storage vMotion without VAAI, the host is responsible for moving the data between both Datastores. So the host reads from a LUN on SPA, and writes to a LUN on SPB, so no traffic passes the internal bus.

So if this is an issue in your environment, you have two possible workarounds:

  1. Only do Storage vMotions between LUNs on the same SP like EMC suggests, which could be an issue, especially when using Storage DRS (Unless off course, you only create Datastor Clusters with LUNs on the same SP, so a Datastore Cluster for LUNs on SPA and a Datastore Cluster for LUNs on SPB)
  2. Disable the VAAI XCOPY primitive (HardwareAcceleratedMove)
    See Disabling the VAAI functionality in ESX/ESXi on how to do this.

If do not see this as a major problem in your environment, just log in to your VNX now and then and trespass the LUNs back to their owning SPs.

EMC VNX – VSPHERE INTEGRATION – LUN MIGRATION

Source:  // // <![CDATA[
(function(){var b={j:"content-snap-width-1",k:"content-snap-width-2",o:"content-snap-width-3",d:"content-snap-width-skinny-mode"};function f(){var a=[],c;for(c in b)a.push(b[c]);return a}function g(a){var c=f().concat(["guide-pinned","show-guide"]),e=c.length,h=[];a.replace(/S+/g,function(a){for(var d=0;d<e;d++)if(a==c[d])return;h.push(a)});return h};function l(a,c,e,h){var m=document.getElementsByTagName("html")[0],d=g(m.className);a&&1251<=(window.innerWidth||document.documentElement.clientWidth)&&(d.push("guide-pinned"),c&&d.push("show-guide"));if(e){e=window.innerWidth||document.documentElement.clientWidth;var k=e-21-50;1251e?”content-snap-width-skinny-mode”:1262<=k?"content-snap-width-3":1056// // ▶ EMC VNX – vSphere Integration – LUN Migration – YouTube

UNDOCUMENTED CELERRA / VNX FILE COMMANDS

Undocumented Celerra / VNX File commands | The SAN Guy//

The .server_config command is undocumented from EMC, I assume they don’t want customers messing with it. Use these commands at your own risk. 🙂

Below is a list of some of those undocumented commands, most are meant for viewing performance stats. I’ve had EMC support use the fcp command during a support call in the past.   When using the command for fcp stats,  I believe you need to run the ‘reset’ command first as it enables the collection of statistics.

There are likely other parameters that can be used with .server_config but I haven’t discovered them yet.

TCP Stats:

To view TCP info:
.server_config server_x -v “printstats tcpstat”
.server_config server_x -v “printstats tcpstat full”
.server_config server_x -v “printstats tcpstat reset”

Sample Output (truncated):
TCP stats :
connections initiated 8698
connections accepted 1039308
connections established 1047987
connections dropped 524
embryonic connections dropped 3629
conn. closed (includes drops) 1051582
segs where we tried to get rtt 8759756
times we succeeded 11650825
delayed acks sent 537525
conn. dropped in rxmt timeout 0
retransmit timeouts 823

SCSI Stats:

To view SCSI IO info:
.server_config server_x -v “printstats scsi”
.server_config server_x -v “printstats scsi reset”

Sample Output:
This output needs to be in a fixed width font to view properly.  I can’t seem to adjust the font, so I’ve attempted to add spaces to align it.
Ctlr: IO-pending Max-IO IO-total Idle(ms) Busy(ms) Busy(%)
0:      0         53    44925729       122348758     19159954   13%
1:      0                                           1 1 141508682       0          0%
2:      0                                           1 1 141508682       0          0%
3:      0                                           1 1 141508682       0          0%
4:      0                                           1 1 141508682       0          0%

File Stats:

.server_config server_x -v “printstats filewrite”
.server_config server_x -v “printstats filewrite full”
.server_config server_x -v “printstats filewrite reset”

Sample output (Full Output):
13108 writes of 1 blocks in 52105250 usec, ave 3975 usec
26 writes of 2 blocks in 256359 usec, ave 9859 usec
6 writes of 3 blocks in 18954 usec, ave 3159 usec
2 writes of 4 blocks in 2800 usec, ave 1400 usec
4 writes of 13 blocks in 6284 usec, ave 1571 usec
4 writes of 18 blocks in 7839 usec, ave 1959 usec
total 13310 blocks in 52397489 usec, ave 3936 usec

FCP Stats:

To view FCP stats, useful for checking SP balance:
.server_config server_x -v “printstats fcp”
.server_config server_x -v “printstats fcp full”
.server_config server_x -v “printstats fcp reset”

Sample Output (Truncated):
This output needs to be in a fixed width font to view properly.  I can’t seem to adjust the font, so I’ve attempted to add spaces to align it.
Total I/O Cmds: +0%——25%——-50%——-75%—–100%+ Total 0
FCP HBA 0 |                                                                                            | 0%  0
FCP HBA 1 |                                                                                            | 0%  0
FCP HBA 2 |                                                                                            | 0%  0
FCP HBA 3 |                                                                                            | 0%  0
# Read Cmds: +0%——25%——-50%——-75%—–100%+ Total 0
FCP HBA 0 |                                                                                            | 0% 0
FCP HBA 1 |                                                                                            | 0% 0
FCP HBA 2 |                                                                                            | 0% 0
FCP HBA 3 |  XXXXXXXXXXX                                                          | 25% 0

Usage:

‘fcp’ options are:       bind …, flags, locate, nsshow, portreset=n, rediscover=n
rescan, reset, show, status=n, topology, version

‘fcp bind’ options are:  clear=n, read, rebind, restore=n, show
showbackup=n, write

Description:

Commands for ‘fcp’ operations:
fcp bind <cmd> ……… Further fibre channel binding commands
fcp flags ………….. Show online flags info
fcp locate …………. Show ScsiBus and port info
fcp nsshow …………. Show nameserver info
fcp portreset=n …….. Reset fibre port n
fcp rediscover=n ……. Force fabric discovery process on port n
Bounces the link, but does not reset the port
fcp rescan …………. Force a rescan of all LUNS
fcp reset ………….. Reset all fibre ports
fcp show …………… Show fibre info
fcp status=n ……….. Show link status for port n
fcp status=n clear ….. Clear link status for port n and then Show
fcp topology ……….. Show fabric topology info
fcp version ………… Show firmware, driver and BIOS version

Commands for ‘fcp bind’ operations:
fcp bind clear=n ……. Clear the binding table in slot n
fcp bind read ………. Read the binding table
fcp bind rebind …….. Force the binding thread to run
fcp bind restore=n ….. Restore the binding table in slot n
fcp bind show ………. Show binding table info
fcp bind showbackup=n .. Show Backup binding table info in slot n
fcp bind write ……… Write the binding table

NDMP Stats:

To Check NDMP Status:
.server_config server_x -v “printstats vbb show”

CIFS Stats:

This will output a CIFS report, including all servers, DC’s, IP’s, interfaces, Mac addresses, and more.

.server_config server_x -v “cifs”

Sample Output:

1327007227: SMB: 6: 256 Cifs threads started
1327007227: SMB: 6: Security mode = NT
1327007227: SMB: 6: Max protocol = SMB2
1327007227: SMB: 6: I18N mode = UNICODE
1327007227: SMB: 6: Home Directory Shares DISABLED
1327007227: SMB: 6: Usermapper auto broadcast enabled
1327007227: SMB: 6:
1327007227: SMB: 6: Usermapper[0] = [127.0.0.1] state:active (auto discovered)
1327007227: SMB: 6:
1327007227: SMB: 6: Default WINS servers = 172.168.1.5
1327007227: SMB: 6: Enabled interfaces: (All interfaces are enabled)
1327007227: SMB: 6:
1327007227: SMB: 6: Disabled interfaces: (No interface disabled)
1327007227: SMB: 6:
1327007227: SMB: 6: Unused Interface(s):
1327007227: SMB: 6:  if=172-168-1-84 l=172.168.1.84 b=172.168.1.255 mac=0:60:48:1c:46:96
1327007227: SMB: 6:  if=172-168-1-82 l=172.168.1.82 b=172.168.1.255 mac=0:60:48:1c:10:5d
1327007227: SMB: 6:  if=172-168-1-81 l=172.168.1.81 b=172.168.1.255 mac=0:60:48:1c:46:97
1327007227: SMB: 6:
1327007227: SMB: 6:
1327007227: SMB: 6: DOMAIN DOMAIN_NAME FQDN=DOMAIN_NAME.net SITE=STL-Colo RC=24
1327007227: SMB: 6:  SID=S-1-5-15-7c531fd3-6b6745cb-ff77ddb-ffffffff
1327007227: SMB: 6:  DC=DCAD01(172.168.1.5) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD02(172.168.29.8) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD03(172.168.30.8) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD04(172.168.28.15) ref=2 time=0 ms
1327007227: SMB: 6: >DC=SERVERDCAD01(172.168.1.122) ref=334 time=1 ms (Closest Site)
1327007227: SMB: 6: >DC=SERVERDCAD02(172.168.1.121) ref=273 time=1 ms (Closest Site)
1327007227: SMB: 6:
1327007227: SMB: 6: CIFS Server SERVERFILESEMC[DOMAIN_NAME] RC=603
1327007227: UFS: 7: inc ino blk cache count: nInoAllocs 361: inoBlk 0x0219f2a308
1327007227: SMB: 6:  Full computer name=SERVERFILESEMC.DOMAIN_NAME.net realm=DOMAIN_NAME.NET
1327007227: SMB: 6:  Comment=’EMC-SNAS:T6.0.41.3′
1327007227: SMB: 6:  if=172-168-1-161 l=172.168.1.161 b=172.168.1.255 mac=0:60:48:1c:46:9c
1327007227: SMB: 6:   FQDN=SERVERFILESEMC.DOMAIN_NAME.net (Updated to DNS)
1327007227: SMB: 6:  Password change interval: 0 minutes
1327007227: SMB: 6:  Last password change: Fri Jan  7 19:25:30 2011 GMT
1327007227: SMB: 6:  Password versions: 2, 2
1327007227: SMB: 6:
1327007227: SMB: 6: CIFS Server SERVERBKUPEMC[DOMAIN_NAME] RC=2 (local users supported)
1327007227: SMB: 6:  Full computer name=SERVERbkupEMC.DOMAIN_NAME.net realm=DOMAIN_NAME.NET
1327007227: SMB: 6:  Comment=’EMC-SNAS:T6.0.41.3′
1327007227: SMB: 6:  if=172-168-1-90 l=172.168.1.90 b=172.168.1.255 mac=0:60:48:1c:10:54
1327007227: SMB: 6:   FQDN=SERVERbkupEMC.DOMAIN_NAME.net (Updated to DNS)
1327007227: SMB: 6:  Password change interval: 0 minutes
1327007227: SMB: 6:  Last password change: Thu Sep 30 16:23:50 2010 GMT
1327007227: SMB: 6:  Password versions: 2
1327007227: SMB: 6:
 

Domain Controller Commands:

These commands are useful for troubleshooting a windows domain controller connection issue on the control station.  Use these commands along with checking the normal server log (server_log server_2) to troubleshoot that type of problem.

To view the current domain controllers visible on the data mover:

.server_config server_2 -v “pdc dump”

Sample Output (Truncated):

1327006571: SMB: 6: Dump DC for dom='<domain_name>’ OrdNum=0
1327006571: SMB: 6: Domain=<domain_name> Next trusted domains update in 476 seconds1327006571: SMB: 6:  oldestDC:DomCnt=1,179531 Time=Sat Oct 15 15:32:14 2011
1327006571: SMB: 6:  Trusted domain info from DC='<Windows_DC_Servername>’ (423 seconds ago)
1327006571: SMB: 6:   Trusted domain:<domain_name>.net [<Domain_Name>]
   GUID:00000000-0000-0000-0000-000000000000
1327006571: SMB: 6:    Flags=0x20 Ix=0 Type=0x2 Attr=0x0
1327006571: SMB: 6:    SID=S-1-5-15-d1d612b1-87382668-9ba5ebc0
1327006571: SMB: 6:    DC=’-‘
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x547,1355
1327006571: SMB: 6:   Trusted domain: <Domain_Name>
1327006571: SMB: 6:    Flags=0x22 Ix=0 Type=0x1 Attr=0x1000000
1327006571: SMB: 6:    SID=S-1-5-15-76854ac0-4c527104-321d5138
1327006571: SMB: 6:    DC=’\<Windows_DC_Servername>’
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x0,0
1327006571: SMB: 6:   Trusted domain:<domain_name>.net [<domain_name>]
1327006571: SMB: 6:    Flags=0x20 Ix=0 Type=0x2 Attr=0x0
1327006571: SMB: 6:    SID=S-1-5-15-88d60754-f3ed4f9d-b3f2cbc4
1327006571: SMB: 6:    DC=’-‘
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x547,1355
DC=DC0x0067a82c18 <Windows_DC_Servername>[<domain_name>](10.3.0.5) ref=2 time(getdc187)=0 ms LastUpdt=Thu Jan 19 20:45:14 2012
    Pid=1000 Tid=0000 Uid=0000
    Cnx=UNSUCCESSFUL,DC state Unknown
    logon=Unknown 0 SecureChannel(s):
    Capa=0x0 Nego=0x0000000000,L=0 Chal=0x0000000000,L=0,W2kFlags=0x0
    refCount=2 newElectedDC=0x0000000000 forceInvalid=0
    Discovered from: WINS

To enable or disable a domain controller on the data mover:

.server_config server_2 -v “pdc enable=<ip_address>”  Enable a domain controller

.server_config server_2 -v “pdc disable=<ip_address>”  Disable a domain controller

MemInfo:

 .server_config server_2 -v “meminfo”

Sample Output (truncated):

CPU=0
3552907011 calls to malloc, 3540029263 to free, 61954 to realloc
Size     In Use       Free      Total nallocs nfrees
16       3738        870       4608   161720370   161716632
32      18039      17289      35328   1698256206   1698238167
64       6128       3088       9216   559872733   559866605
128       6438      42138      48576   255263288   255256850
256       8682      19510      28192   286944797   286936115
512       1507       2221       3728   357926514   357925007
1024       2947       9813      12760   101064888   101061941
2048       1086        198       1284    5063873    5062787
4096         26        138        164    4854969    4854943
8192        820         11        831   19562870   19562050
16384         23         10         33       5676       5653
32768          6          1          7        101         95
65536         12          0         12         12          0
524288          1          0          1          1          0
Total Used     Total Free    Total Used + Free
all sizes   18797440   23596160   42393600

MemOwners:

.server_config server_2 -v “help memowners”

Usage:
memowners [dump | showmap | set … ]

Description:
memowners [dump] – prints memory owner description table
memowners showmap – prints a memory usage map
memowners memfrag [chunksize=#] – counts free chunks of given size
memowners set priority=# tag=# – changes dump priority for a given tag
memowners set priority=# label=’string’ – changes dump priority for a given label
The priority value can be set to 0 (lowest) to 7 (highest).

Sample Output (truncated):

1408979513: KERNEL: 6: Memory_Owner dump.
nTotal Frames 1703936 Registered = 75,  maxOwners = 128
1408979513: KERNEL: 6:   0 (   0 frames) No owner, Dump priority 6
1408979513: KERNEL: 6:   1 (3386 frames) Free list, Dump priority 0
1408979513: KERNEL: 6:   2 (40244 frames) malloc heap, Dump priority 6
1408979513: KERNEL: 6:   3 (6656 frames) physMemOwner, Dump priority 7
1408979513: KERNEL: 6:   4 (36091 frames) Reserved Mem based on E820, Dump priority 0
1408979513: KERNEL: 6:   5 (96248 frames) Address gap based on E820, Dump priority 0
1408979513: KERNEL: 6:   6 (   0 frames) Rmode isr vectors, Dump priority 7

Storage Area Networking Best Practices: EMC VNX

vnx | The SAN Guy//

Storage Area Networking Best Practices:

1.  Single Initiator Zoning.  Only put one initiator and the targets it will access in one zone.  I’ve already practiced this for many years, but it was interesting to hear the reasons for doing it that way.  It drastically reduces the number of server queries to the switch.

2.  Dynamic Interface Management.  Use Brocade port fencing.  It can prevent a SAN outage by giving you the ability to shut down a single host or port.

3.  Monitor for Congestion.  Congestion from one host can spread an cause problems in other operating environments.  Definitely enable Brocade bottleneck detection.

4.  Periodic SAN health checks.  Use the EMC SANity or Brocade SAN health tools regularly.  If you’re reading this, you’re verly likely already doing that. 🙂

5.  Monitor for bit errors.  These can lead to bb_credit loss.  The Brocade parameter portcvglongdistance should be set (bb_credit recovery option), which will prevent the problem.  Bit errors could still be a problem on F Ports, however.

6.  Cable Hygeine.  Always clean your cables!  They are a major contributor to physical connectivity problems.

7.  Target Releases.  Always run a target release of Firmware.

VNX NAS Configuration Best Practices:

1. For transactional NAS, always use the correct transfer size for your workload.  The latest VNX OE release supports up to 1MB.

2. Use Jumbo frames end to end.

3. 10G is available now. Don’t use 1G anymore!

4. Use link aggregation for redundancy and load balancing, failsafe for high availability.

5. Use AVM for typical NAS configurations, MVM for transactional NAS where you have high concurrency from a single NFS stream to a single NFS export.

6. Use Direct Writes for apps that use concurrent IO or are asynchronous (Oracle, VMWare, etc.).

7. For NAS Pool LUNs, use thick LUNs that are all the same size, balance the SP designation for each one, use the same tiering policy for each one, use approximately 1 LUN for every 4 drives, and use thin enabled file systems to maximize the consumption of the highest tier.

8. Always enable FASTCache for NAS LUNs.  They benefit from it even more than block LUNs.

EMC VNX LUN MIGRATION PROVEN BEST PRACTISES

// <![CDATA[
(function() { var b=window,f="chrome",g="tick",k="jstiming";(function(){function d(a){this.t={};this.tick=function(a,d,c){var e=void 0!=c?c:(new Date).getTime();this.t[a]=[e,d];if(void 0==c)try{b.console.timeStamp("CSI/"+a)}catch(h){}};this[g]("start",null,a)}var a;b.performance&&(a=b.performance.timing);var n=a?new d(a.responseStart):new d;b.jstiming={Timer:d,load:n};if(a){var c=a.navigationStart,h=a.responseStart;0=c&&(b[k].srt=h-c)}if(a){var e=b[k].load;0=c&&(e[g](“_wtsrt”,void 0,c),e[g](“wtsrt_”,”_wtsrt”,h),e[g](“tbsd_”,”wtsrt_”))}try{a=null,
b[f]&&b[f].csi&&(a=Math.floor(b[f].csi().pageT),e&&0<c&&(e[g]("_tbnd",void 0,b[f].csi().startE),e[g]("tbnd_","_tbnd",c))),null==a&&b.gtbExternal&&(a=b.gtbExternal.pageT()),null==a&&b.external&&(a=b.external.pageT,e&&0=d&&b[k].load[g](“aft”)};var l=!1;function m(){l||(l=!0,b[k].load[g](“firstScrollTime”))}b.addEventListener?b.addEventListener(“scroll”,m,!1):b.attachEvent(“onscroll”,m);
})();
// ]]>EMC VNX LUN Migration Proven Best Practises. ~ EMC VNX Training & SAN NAS Unified Storage Certification.// //

LUN Migration moves data from a source LUN to a destination LUN (of the same or larger size) within a single storage system. This migration is accomplished without disruption to the applications running on the host. LUNs can be the target or source of LUN migration operations. LUN Migration can enhance performance or increase disk utilization for your changing business needs and applications by allowing the user to change LUN type and characteristics, such as RAID type or size, while their production volume remains online. LUNs can be moved between pools or to a traditional LUN in another RAID group. When migrating a thin LUN to another thin LUN, only the consumed space is copied. When migrating a thick Pool LUN or traditional LUN to a thin LUN, the space reclamation feature is invoked and only the consumed capacity is copied.

LUN migration allows data to be moved from one LUN to another, regardless of RAID type, disk type, LUN type, speed and number of disks in the RAID group or Storage Pool with some restrictions. The process involved in the migration is the same across all VNX systems. On VNX series with the Thin provisioning enabler installed, LUNs can be migrated from a Thin

or Thick LUN to a traditional LUN and vise-versa.
The LUNs used for migration may not be private LUNs or Hot Spares, nor may they be in the process of binding, expanding or migrating. Either LUN, or both LUNs, may be metaLUNs. Neither LUN may be a component LUN of a metaLUN. The destination LUN may not be part of SnapView Snapshot, SnapView Clone or MirrorView operation. This includes Clone Private LUNs, Write Intent Log LUNs, and Reserved LUN Pool LUNs.
Note the Destination LUN is required to be at least as large as the Source LUN, but may be larger.

The LUN Migration feature is transparent to any host accessing the Source LUN, though there may be a performance impact. Copying of data proceeds while the Source LUN is available for read/write access, and the copy process may be terminated at any time. Once all data is copied, the Destination LUN assumes the full identity of the Source LUN, and the Source LUN is unbound as a security measure. The host accessing the Source LUN sees no identity change, though of course the LUN size
may have changed. In that case, a host utility, such as Microsoft Windows diskpart, can be used to make the increased space available to the host OS.