iSCSI configuration in a cluster

Configuring the Microsoft iSCSI target software for use in a cluster

Source: http://clusteringformeremortals.com/2011/03/24/configuring-the-microsoft-iscsi-target-software-for-use-in-a-cluster/

Now that Starwind has stopped offering a free, limited version of their iSCSI target software you might be looking for an alternative for your labs. Microsoft has recently made their iSCSI target software available as part of the Windows Storage Server 2008 R2 download on Tech-Net and MSDN. It is not for use in production and has some of its own licensing restrictions, but it works fine and it is free for Tech-Net and MSDN subscribers.

I recorded some really quick and dirty videos that aim to show you how to configure the iSCSI target and iSCSI initiator software in under 7 minutes. At the end, you will have a shared disk array ready to start your shared storage cluster. Hopefully when I get some more time I’ll actually write these steps out, but in a pinch this will give you the general idea of what needs to be done. There are plenty of other features, but for a lab environment this will do the trick.

http://screencast.com/t/2qUUDvZo6Zka – configuring the iSCSI target software and iSCSI initiator on the client

http://screencast.com/t/7m9ElSIdAbP – configuring the iSCSI initiator….continued

Repost: CLUSTERING 101 – FIVE OPTIONS FOR SQL SERVER HIGH AVAILABILITY ON VMWARE

Source: CLUSTERING 101 – FIVE OPTIONS FOR SQL SERVER HIGH AVAILABILITY ON VMWARE

Re-posted from “Clustering For Mere Mortals”

As part of my Clustering 101 Webinar Series I take a look at five options for providing high availability for SQL Server running on VMware. The webinar was recorded and can be viewed here.

http://discover.us.sios.com/asset-reg-webinar-clustering-101-vmware-sql-server.html

Step-by-Step: Configuring a 2-node multi-site cluster on Windows Server 2008 R2 – Part 1

Source: Clustering for mere mortals

STEP-BY-STEP: CONFIGURING A 2-NODE MULTI-SITE CLUSTER ON WINDOWS SERVER 2008 R2 – PART 1

CREATING YOUR CLUSTER AND CONFIGURING THE QUORUM: NODE AND FILE SHARE MAJORITY

Introduction

Welcome to Part 1 of my series “Step-by-Step: Configuring a 2-node multi-site cluster on Windows Server 2008 R2″. Before we jump right in to the details, let’s take a moment to discuss what exactly a multi-site cluster is and why I would want to implement one. Microsoft has a great webpage and white paperthat you will want to download to get you all of the details, so I won’t repeat everything here. But basically a multi-site cluster is a disaster recovery solution and a high availability solution all rolled into one. A multi-site cluster gives you the highest recovery point objective (RTO) and recovery time objective (RTO) available for your critical applications. With the introduction of Windows Server 2008 failover clustering a multi-site cluster has become much more feasible with the introduction of cross subnet failover and support for high latency network communications.

I mentioned “cross-subnet failover” as a great new feature of Windows Server 2008 Failover Clustering, and it is a great new feature. However, SQL Server has not yet embraced this functionality, which means you will still be required to span your subnet across sites in a SQL Server multi-site cluster. As of Tech-Ed 2009, the SQL Server team reported that they plan on supporting this feature, but they say it will come sometime after SQL Server 2008 R2 is released. For the foreseeable future you will be stuck with spanning your subnet across sites in a SQL Server multi-site cluster. There are a few other network related issues that you need to consider as well, such as redundant communication paths, bandwidth and file share witness placement.

Network Considerations

All Microsoft failover clusters must have redundant network communication paths. This ensures that a failure of any one communication path will not result in a false failover and ensures that your cluster remains highly available. A multi-site cluster has this requirement as well, so you will want to plan your network with that in mind. There are generally two things that will have to travel between nodes: replication traffic and cluster heartbeats. In addition to that, you will also need to consider client connectivity and cluster management activity. You will want to be sure that whatever networks you have in place, you are not overwhelming the network or you will have unreliable behavior. Your replication traffic will most likely require the greatest amount of bandwidth; you will need to work with your replication vendor to determine how much bandwidth is required.

With your redundant communication paths in place, the last thing you need to consider is your quorum model. For a 2-node multi-site cluster configuration, the Microsoft recommended configuration is a Node and File Share Majority quorum. For a detailed description of the quorum types, have a look at thisarticle.

The most common cause of confusion with the Node and File Share Majority quorum is the placement of the File Share Witness. Where should I put the server that is hosting the file share? Let’s look at the options.

Option 1 – place the file share in the primary site.

This is certainly a valid option for disaster recovery, but not so much for high availability. If the entire site fails (including the Primary node and the file share witness) the Secondary node in the secondary site will not come into service automatically, you will need to force the quorum online manually. This is because it will be the only remaining vote in the cluster. One out of three does not make a majority! Now if you can live with a manual step being involved for recovery in the event of a disaster, then this configuration may be OK for you.

Option 2 – place the file share in the secondary site.

This is not such a good idea. Although it solves the problem of automatic recovery in the event of a complete site loss, it exposes you to the risk of a false failover. Consider this…what happens if your secondary site goes down? In this case, your primary server (Node1) will go also go offline as it is now only a single node in the primary site and will no longer have a node majority. I can see no good reason to implement this configuration as there is too much risk involved.

Option 3 – place the file share witness in a 3rd geographic location

This is the preferred configuration as it allows for automatic failover in the event of a complete site loss and eliminates any the possibility of a failure of the secondary site causing the primary node to go offline. By having a 3rd site host the file share witness you have eliminated any one site as a single point of failure, so now the cluster will act as you expect and automatic failover in the event of a site loss is possible. Identifying a 3rd geographic location can be challenging for some companies, but with the advent of cloud based utility computing like Amazon EC2 and GoGrid, it is well within the reach of all companies to put a file share witness in the clouds and have the resiliency required for effective multi-site clusters. In fact, you may consider the cloud itself as your secondary data center and just failover to the cloud in the event of a disaster. I think the possibilities of cloud based computing and disaster recovery configurations are extremely enticing and in fact I plan on doing a whole blog post on a just that in the near future.

Configure the Cluster

Now that we have the basics in place, let’s get started with the actual configuration of the cluster. You will want to add the Failover Clustering feature to both nodes of your cluster. For simplicity sake, I’ve called my nodes PRIMARY and SECONDARY. This is accomplished very easily through the Add Features Wizard as shown below.

Figure 1 – Add the Failover Clustering Role

Next you will want to have a look at your network connections. It is best if you rename the connections on each of your servers to reflect the network that they represent. This will make things easier to remember later.

Figure 2- Change the names of your network connections

You will also want to go into the Advanced Settings of your Network Connections (hit Alt to see Advanced Settings menu) of each server and make sure the Public network is first in the list.

Figure 3- Make sure your public network is first

Your private network should only contain an IP address and Subnet mask. No Default Gateway or DNS servers should be defined. Your nodes need to be able to communicate across this network, so make sure the servers can communicate across this network; add static routes if necessary.

Figure 4 – Private network settings

Once you have your network configured, you are ready to build your cluster. The first step is to “Validate a Configuration”. Open up the Failover Cluster Manager and click on Validate a Configuration.

Figure 5 – Validate a Configuration

The Validation Wizard launches and presents you the first screen as shown below. Add the two servers in your cluster and click Next to continue.

Figure 6 – Add the cluster nodes

A multi-site cluster does not need to pass the storage validation (see Microsoft article). Toskip the storage validation process,click on “Run only the tests I select” and click Continue.

Figure 7 – Select “Run only tests I select”

In the test selection screen, unselect Storage and click Next

Figure 8 – Unselect the Storage test

You will be presented with the following confirmation screen. Click Next to continue.

Figure 9 – Confirm your selection

If you have done everything right, you should see a summary page that looks like the following. Notice that the yellow exclamation point indicates that not all of the tests were run. This is to be expected in a multi-site cluster because the storage tests are skipped. As long as everything else checks out OK, you can proceed. If the report indicates any other errors, fix the problem, re-run the tests, and continue.

Figure 10 – View the validation report

You are now ready to create your cluster. In the Failover Cluster Manager, click on Create a Cluster.

Figure 11 – Create your cluster

The next step asks whether or not you want to validate your cluster. Since you have already done this you can skip this step. Note this will pose a little bit of a problem later on if installing SQL as it will require that the cluster has passed validation before proceeding. When we get to that point I will show you how to by-pass this check via a command line option in the SQL Server setup. For now, choose No and Next.

Figure 12 – Skip the validation test

The next step is that you must create a name for this cluster and IP for administering this cluster. This will be the name that you will use to administer the cluster, not the name of the SQL cluster resource which you will create later. Enter a unique name and IP address and click Next.

Note: This is also the computer name that will need permission to the File Share Witness as described later in this document.

Figure 13 – Choose a unique name and IP address

Confirm your choices and click Next.

Figure 14 – Confirm your choices

Congratulation, if you have done everything right you will see the following Summary page. Notice the yellow exclamation point; obviously something is not perfect. Click on View Report to find out what the problem may be.

Figure 15 – View the report to find out what the warning is all about

If you view the report, you should see a few lines that look like this.

Figure 16 – Error report

Don’t fret; this is to be expected in a multi-site cluster. Remember we said earlier that we will be implementing a Node and File Share Majority quorum. We will change the quorum type from the current Node Majority Cluster (not a good idea in a two node cluster) to a Node and File Share Majority quorum.

Implementing a Node and File Share Majority quorum

First, we need to identify the server that will hold our File Share witness. Remember, as we discussed earlier, this File Share witness should be located in a 3rd location, accessible by both nodes of the cluster. Once you have identified the server, share a folder as you normally would share a folder. In my case, I create a share called MYCLUSTER on a server named DEMODC.

The key thing to remember about this share is that you must give the cluster computer name read/write permissions to the share at both the Share level and NTFS level permissions. If you recall back at Figure 13, I created my cluster and gave it the name “MYCLUSTER”. You will need to make sure you give the cluster computer account read/write permissions as shown in the following screen shots.

Figure 17 – Make sure you search for Computers

Figure 18 – Give the cluster computer account NTFS permissions

Figure 19 – Give the cluster computer account share level permissions

Now with the shared folder in place and the appropriate permissions assigned, you are ready to change your quorum type. From Failover Cluster Manager, right-click on your cluster, choose More Actions and Configure Cluster Quorum Settings.

Figure 20 – Change your quorum type

On the next screen choose Node and File Share Majority and click Next.

Figure 21 – Choose Node and File Share Majority

In this screen, enter the path to the file share you previously created and click Next.

Figure 22 – Choose your file share witness

Confirm that the information is correct and click Next.

Figure 23 – Click Next to confirm your quorum change to Node and File Share Majority

Assuming you did everything right, you should see the following Summary page.

Figure 24 – A successful quorum change

Now when you view your cluster, the Quorum Configuration should say “Node and File Share Majority” as shown below.

Figure 25 – You now have a Node and File Share Majority quorum

The steps I have outlined up until this point apply to any multi-site cluster, whether it is a SQL, Exchange, File Server or other type of failover cluster. The next step in creating a multi-site cluster involves integrating your storage and replication solution into the failover cluster. This step will vary from depending upon your replication solution, so you really need to be in close contact with your replication vendor to get it right. In Part 2 of my series, I will illustrate how SteelEye DataKeeper Cluster Edition integrates with Windows Server Failover Clustering to give you an idea of how one of the replication vendor’s solutions works.

Other parts of this series will describe in detail how to install SQL, File Servers and Hyper-V in multi-site clusters. I will also have a post on considerations for multi-node clusters of three or more nodes.

THE DISK IS OFFLINE BECAUSE OF POLICY SET BY AN ADMINISTRATOR

Note from Tanny:

This post did not work, but is worth sharing…. in my case it was a matter of just bringing the storage resource online in the cluster resource manager.

For a 2008R2 Clustered environment, take look at the cluster resource manager.
In my case it was a matter of bring the storage resource on line.  We swing a LUN from different servers for quick backups and restores.  The instructions did not work, but usually after presenting the LUN to the Cluster or any of the stand alone environments, a quick scan will bring the disk online and keep the previous drive letter.

Source: (Repost from the Happy SysAdm Blog)The disk is offline because of policy set by an administrator

You have just installed or cloned a VM with Windows 2008 Enterprise or Datacenter or you have upgraded the VM to Virtual Hardware 7 and under Disk Management you get an error message saying:
“the disk is offline because of policy set by an administrator”.
This is because, and this is by design, all virtual machine disk files (VMDK) are presented from Virtual hardware 7 (the one of ESX 3.5) to VMs as SAN disks.
At the same time, and this is by design too, Microsoft has changed how SAN disks are handled by its Windows 2008 Enterprise and Datacenter editions.
In fact, on Windows Server 2008 Enterprise and Windows Server 2008 Datacenter (and this is true for R2 too), the default SAN policy is now VDS_SP_OFFLINE_SHARED for all SAN disks except the boot disk.
Having the policy set to Offline Shared means that your SAN disks will be simply offline on startup of your server and if your paging file is on one of this secondary disks it will be unavailable.
Here’s the solution to this annoying problem.
What you have to do is first to query the current SAN policy from the command line with DISKPART and issue the following SAN commands:
= = = = = = = = = = = = = = = = = =
DISKPART.EXE
 
DISKPART> san
 
SAN Policy : Offline Shared
= = = = = = = = = = = = = = = = = =
Once you have verified that the applied policy is Offline Shared, you have two options to set the disk to Online.
The first one is to log in to your system as an Administrator, click Computer Management > Storage > Disk Management, right-click the disk and choose Online.
The second one is to make a SAN policy change, then select the offline disk, force a clear of its readonly flag and bring it online. Follow these steps:
= = = = = = = = = = = = = = = = = =
DISKPART> san policy=OnlineAll
 
DiskPart successfully changed the SAN policy for the current operating system.
DISKPART> LIST DISK
 
Disk ### Status Size Free Dyn Gpt
——– ————- ——- ——- — —
Disk 0 Online 40 GB 0 B
* Disk 1 Offline 10 GB 1024 KB
 
DISKPART>; select disk 1
 
Disk 1 is now the selected disk.
 
DISKPART> ATTRIBUTES DISK CLEAR READONLY
 
Disk attributes cleared successfully.
 
DISKPART> attributes disk
Current Read-only State : No
Read-only : No
Boot Disk : No
Pagefile Disk : No
Hibernation File Disk : No
Crashdump Disk : No
Clustered Disk : No
 
DISKPART> ONLINE DISK
 
DiskPart successfully onlined the selected disk.
= = = = = = = = = = = = = = = = = =
Once that is done, the drive mounts automagically.
  1. So, I’m trying all this but the return message I get in disk part is “DiskPart failed to clear disk attributes.”. Any further advice?

    DISKPART> san policy=OnlineAll

    DiskPart successfully changed the SAN policy for the current operating system.

    DISKPART> rescan

    Please wait while DiskPart scans your configuration…

    DiskPart has finished scanning your configuration.

    DISKPART> select disk 1

    Disk 1 is now the selected disk.

    DISKPART> attributes disk clear readonly

    DiskPart failed to clear disk attributes.

    DISKPART> attributes disk
    Current Read-only State : Yes
    Read-only : Yes
    Boot Disk : No
    Pagefile Disk : No
    Hibernation File Disk : No
    Crashdump Disk : No
    Clustered Disk : Yes

    DISKPART> san

    SAN Policy : Online All

    (Note from Tanny take a look at the Cluster resource manager and bring storage resource online)

  2. I see you problem. Have you checked that you have full access to the volume you want to change attributes for? Is it a cluster resource? I think so because your log says “clustered disk: yes”. In this case you should stop all nodes but one and then you will be allowed to use diskpart to reset the flags. The general idea is to grant the server you are connected to write access to the volume.
    Let me know if you need more help and if, so, please post more details about you configuration (servers and LUNs).
    Regards

    Reply

  3. I am having this same problem. It is in cluster and I have shut down the other node. I am still unable to change the read only flag.
    Please help?!

  4. Wacky problem – a SAN volume mounted to a 2008 (not R2) 32bit enterprise server had been working fine. After a reboot of the server, the disk was offline. Putting it back online, no problem, diskpart details for the volume show “Read Only: No”. Got support feom Dell and foud the the Volume was listed as Read Only. Simple fix, change the Volume to “Read Only: No” with Diskpart. 4 hours later, the Volume is marked as “read only” again.No chnages made by us, nothing in the Windows logs.
    The disk is an Dell/Emc SAN LUN, fiber connected, exclusive use to this machine. Have another LUN, almost the same size attached the same way to this machine, no problems with that. Appreciate any thoughts or places to look.

    Reply

  5. Ahhh, nice! A perfect tutorial! Thanks a lot!

    Reply

  6. Great article! I just spent 2 hours trying to figure out why my san disks weren’t showing and this was the fix.

    Thank you!

    Reply

  7. Thank you, thank you, thank you! This article helped me with an IBM DS3000 and an IBM System x3650M3 Windows Server 2008 R2. Thumbs up to you! I’d be still trying to figure why I couldn’t configure these drives!

    Reply

  8. These settings are good for window server 2008 R1 and R2. It breaks again with R2 SP1 ;-(. Is there any solution for R2 SP1?

    Reply

  9. Thanks. Very helpful.

    Reply

  10. Wonderful article..thanks a lot dude!!

    Reply

  11. This worked perfectly for me. I tried figuring it out on my own but just couldn’t get it to work within VMware Workstation.

    Reply

  12. let me know i how to remove is read only attribute and bring online. if i access san directly then it possible.
    i have two server in one server its show online but in second server its display reserved the disk offline message.

    I’m also trying all this but the return message I get same problem in disk part is “DiskPart failed to clear disk attributes.”. Any further advice?

    DISKPART> san policy=OnlineAll

    DiskPart successfully changed the SAN policy for the current operating system.

    DISKPART> rescan

    Please wait while DiskPart scans your configuration…

    DiskPart has finished scanning your configuration.

    DISKPART> select disk 1

    Disk 1 is now the selected disk.

    DISKPART> attributes disk clear readonly

    DiskPart failed to clear disk attributes.

    DISKPART> attributes disk
    Current Read-only State : Yes
    Read-only : Yes
    Boot Disk : No
    Pagefile Disk : No
    Hibernation File Disk : No
    Crashdump Disk : No
    Clustered Disk : Yes

    DISKPART> san

    SAN Policy : Online All

    Reply

  13. Exactly the answer I was looking for!

    Reply

  14. Well done – fixed me right up.

    Reply

  15. Perfect answer for a vexing problem. I had no clue where to look for

    Reply

  16. This is really helpful article ! Many Thanks.

    Reply

  17. Thanks for your reply!

  18. Thanks, this was very helpful for me.

    Reply

  19. Hi same problem here, the disk says its a clustered disk but i don’t have it in the Failover cluster manager. Its just a dedicated disk to one server from the san.. have cleared simultaneous connections and only one server is connected now but still won’t come online. Any help would be great.
    Thanks

    Reply

  20. Anonym

CLUSTERING SERVER 2012 R2 WITH ISCSI STORAGE

Wednesday, December 31, 2014   , , , , , , , , ,,   No comments

Source: Exit The Fast Lane

Yay, last post of 2014! Haven’t invested in the hyperconverged Software Defined Storage model yet? No problem, there’s still time. In the meanwhile, here is how to cluster Server 2012 R2 using tried and true EqualLogic iSCSI shared storage.

EQL Group Manager

First, prepare your storage array(s), by logging into EQL Group Manager. This post assumes that your basic array IP, access and security settings are in place.  Set up your local CHAP account to be used later. Your organization’s security access policies or requirements might dictate a different standard here.

SNAGHTML3b62e029

Create and assign an Access Policy to the VDS/VSS in Group Manager otherwise this volume will not be accessible. This will make subsequent steps easier when it’s time to configure ASM.image

Create some volumes in Group Manager now so you can connect your initiators easily in the next step. It’s a good idea to create your cluster quorum LUN now as well.

image

Host Network Configuration

First configure the interfaces you intend to use for iSCSI on your cluster nodes. Best practice says that you should limit your iSCSI traffic to a private Layer2 segment, not routed and only connecting to the devices that will participate in the fabric. This is no different from Fiber Channel in that regard, unless you are using a converged methodology and sharing your higher bandwidth NICs. If using Broadcom NICs you can choose Jumbo Frames or hardware offload, the larger frames will likely net a greater performance impact. Each host NIC used to access your storage targets should have a unique IP address able to access the network of those targets within the same private Layer2 segment. While these NICs can technically be teamed using the native Windows LBFO mechanism, best practice says that you shouldn’t, especially if you plan to use MPIO to load balance traffic. If your NICs will be shared (not dedicated to iSCSI alone) then LBFO teaming is supported in that configuration. To keep things clean and simple I’ll be using 4 NICs, 2 dedicated to LAN, 2 dedicated to iSCSI SAN. Both LAN and SAN connections are physically separated to their own switching fabrics as well, this is also a best practice.

image

MPIO – the manual method

First, start the MS iSCSI service, which you will be prompted to do, and check its status in PowerShell using get-service –name msiscsi.

image

Next, install MPIO using Install-WindowsFeature Multipath-IO

Once installed and your server has been rebooted, you can set additional options in PowerShell or via the MPIO dialog under  File and Storage Services—> Tools.

image

Open the MPIO settings and tick “add support for iSCSI devices” under Discover Multi-Paths. Reboot again. Any change you make here will ask you to reboot. Make all changes once so you only have to do this one time.

image

The easier way to do this from the onset is using the EqualLogic Host Integration Tools (HIT Kit) on your hosts. If you don’t want to use HIT for some reason, you can skip from here down to the “Connect to iSCSI Storage” section.

Install EQL HIT Kit (The Easier Method)

The EqualLogic HIT Kit will make it much easier to connect to your storage array as well as configure the MPIO DSM for the EQL arrays. Better integration, easier to optimize performance, better analytics. If there is a HIT Kit available for your chosen OS, you should absolutely install and use it. Fortunately there is indeed a HIT Kit available for Server 2012 R2.

image

Configure MPIO and PS group access via the links in the resulting dialog.

image

In ASM (launched via the “configure…” links above), add the PS group and configure its access. Connect to the VSS volume using the CHAP account and password specified previously. If the VDS/VSS volume is not accessible on your EQL array, this step will fail!

image

Connect to iSCSI targets

Once your server is back up from the last reboot, launch the iSCSI Initiator tool and you should see any discovered targets, assuming they are configured and online. If you used the HIT Kit you will already be connected to the VSS control volume and will see the Dell EQL MPIO tab.

image

Choose an inactive target in the discovered targets list and click connect, be sure to enable multi-path in the pop-up that follows, then click Advanced.

image

Enable CHAP log on, specify the user/pw set up previously:

image

If your configuration is good the status of your target will change to Connected immediately. Once your targets are connected, the raw disks will be visible in Disk Manager and can be brought online by Windows.

image

When you create new volumes on these disks, save yourself some pain down the road and give them the same label as what you assigned in Group Manager! The following information can be pulled out of the ASM tool for each volume:

image

Failover Clustering

With all the storage pre-requisites in place you can now build your cluster. Setting up a Failover Cluster has never been easier, assuming all your ducks are in a row. Create your new cluster using the Failover Cluster Manager tool and let it run all compatibility checks.

image

Make sure your patches and software levels are identical between cluster nodes or you’ll likely fail the clustering pre-check with differing DSM versions:

image

Once the cluster is built, you can manipulate your cluster disks and bring any online as required. Cluster disks will not be able to be brought online until all nodes in the cluster can access the disk.

image

Next add your cluster disks to Cluster Shared Volumes to enable multi-host read/write and HA.

image

The new status will be reflected once this change is made.

image

Configure your Quorum to use the disk witness volume you created earlier. This disk does not need to be a CSV.

image

Check your cluster networks and make sure that iSCSI is set to not allow cluster network communication. Make sure that your cluster network is setup to allow cluster network communication as well as allowing client connections. This can of course be further segregated if desired using additional NICs to separate cluster and client communication.

image

Now your cluster is complete and you can begin adding HA VMs, if using Hyper-V, SQL, File or other roles as required.

References:

http://blogs.technet.com/b/keithmayer/archive/2013/03/12/speaking-iscsi-with-windows-server-2012-and-hyper-v.aspx

http://blogs.technet.com/b/askpfeplat/archive/2013/03/18/is-nic-teaming-in-windows-server-2012-supported-for-iscsi-or-not-supported-for-iscsi-that-is-the-question.aspx

DHCP: Clustering DHCP in Windows Server 2012 R2

Microsoft Step-by-Step: Configure DHCP for Failover

Microsoft: Understand and Deploy DHCP Failover

Microsoft Blog: Pierre Roman Step-by-Step: DHCP High Availability with Windows Server 2012 R2

The following is a repost of “DHCP Failover with Microsoft Server 2012 R2” from Windowsnetworking.com by Scott D. Lowe.  This guy knows his stuff.

Introduction

One of the great features in Windows Server 2012 R2 is the DHCP failover for Microsoft DHCP scopes. For those who have experienced Microsoft DHCP management in Windows 2000/2003/2008 you will recall that one of the long requested features was a true load balancing and failover option.

Prior to Windows Server 2012 the only failover option was to have full copies of the scope definitions on a secondary server with the scope disabled. You would have to manually enable to scope in the event of a failure at the primary server, but this would be time consuming and could cause IP address conflicts as machines requested new IP information from the new DHCP server.

Alternatively, you could load balance the scopes by using the same scope and gateway information, but different portions of the scope are active on each server. This can be done by using a technique known as DHCP scope splitting.

Advertisement

Load Balancing versus Failover

One of the first things that can confuse many Systems Administrators is the difference between load balancing and failover. They aren’t always mutually exclusive, so it is even more important to understand the key differences.

Load balancing is the use of an active-active configuration which shares services between multiple nodes which may be spread among remote sites for safety and redundancy. In application environments, load balancing will be configured based on a balancing algorithm that may be as simple as round robin, and as complex as route cost based on latency and response for service delivery by locality.

Failover is where the environment suffers an outage of a service which triggers the failover of that service function to a secondary server or site. The assumption for most failover configurations is that the primary server is completely unavailable. With our DHCP failover, we can actually mix the two roles of failover and load balancing by operating the scope on multiple servers across your data centers. This hybrid operational model greatly reduces the risk of service loss.

What about our DHCP Configuration on Network Equipment?

As you may already know, we require configuration at our routers, and potentially Layer 3 switches, to make our client nodes aware of the DHCP servers that are servicing the subnet. This is done with the DHCP Relay Agent, sometimes known as the DHCP Helper address.

By defining the DHCP Relay, any nodes on that switch with access to the appropriate VLAN will go through the normal process of requesting an IPv4 DHCP address. This is done with the DHCPDISCOVER request which is forwarded by the router to the DHCP server by the DHCP Relay Agent. The DHCP server returns a DHCPOFFER, followed by a DHCPREQUEST, and finally a DHCPACK confirming the IP address and lease information.

This is all good, but what happens in the case where we have more than one DHCP server, and require multiple DHCP Relay Agent addresses at the router? This is a great question, and it explains why we will really appreciate the new Windows Server 2012 R2 DHCP services.

If we have multiple 2003/2008 DHCP servers, the scopes must be disabled on the second server, otherwise both DHCP servers will be replying to the DHCPDISCOVER broadcast and may hand out the same IP address to more than one workstation since they aren’t aware of each other.

Under Windows Server 2012 R2, we have the new failover DCHP model that has the primary server actively servicing DHCP requests, and the failover instance is aware of, but not active in the process.

Let’s take a look at the configuration of a failover scope before we go any further.

Configuring a DHCP Failover Scope

There is no difference in the base configuration of a DHCP scope to prepare it for being protected with a failover scope on a secondary server. Here is a sample scope named Failover Scope A (10.20.30.0/24) that we configure just like a normal scope.

DHCP Failover with Microsoft Server 2012 R2 :: Windows Server 2012 :: Articles & Tutorials :: WindowsNetworking.com

// // // 40) f_key = f_key.substring(0,40);
var l_key = loc.substr(loc.lastIndexOf(‘/’)).toLowerCase().replace(‘.html’,”).replace(/[^a-z0-9]/g,”);
if(l_key.length > 40) l_key = l_key.substring(0,40);
window.ad_keywords = f_key + ‘,’ + l_key;
if (window.ad_keywords == ‘,’) window.ad_keywords = ‘homepage’;
if (f_key == ”) f_key = ‘homepage’;
if (l_key == ”) window.ad_keywords = f_key + ‘,categorypage’;

// ]]>//

Image

Figure 1

The IP address configuration is also done in the typical way that we have done in any DHCP server up to this point.

Image

Figure 2

Now that we have our demo scope available, we will right-click the Scope in the DHCP Manager console and select Configure Failover:

// // // 40) f_key = f_key.substring(0,40);
var l_key = loc.substr(loc.lastIndexOf(‘/’)).toLowerCase().replace(‘.html’,”).replace(/[^a-z0-9]/g,”);
if(l_key.length > 40) l_key = l_key.substring(0,40);
window.ad_keywords = f_key + ‘,’ + l_key;
if (window.ad_keywords == ‘,’) window.ad_keywords = ‘homepage’;
if (f_key == ”) f_key = ‘homepage’;
if (l_key == ”) window.ad_keywords = f_key + ‘,categorypage’;

// ]]>//

// // // 40) f_key = f_key.substring(0,40);
var l_key = loc.substr(loc.lastIndexOf(‘/’)).toLowerCase().replace(‘.html’,”).replace(/[^a-z0-9]/g,”);
if(l_key.length > 40) l_key = l_key.substring(0,40);
window.ad_keywords = f_key + ‘,’ + l_key;
if (window.ad_keywords == ‘,’) window.ad_keywords = ‘homepage’;
if (f_key == ”) f_key = ‘homepage’;
if (l_key == ”) window.ad_keywords = f_key + ‘,categorypage’;

// ]]>//

Image

Figure 3

The Failover Configuration wizard opens up with a list of scopes that are available to protect. In our sample, we have the 10.20.30.0/24 scope:

// // // 40) f_key = f_key.substring(0,40);
var l_key = loc.substr(loc.lastIndexOf(‘/’)).toLowerCase().replace(‘.html’,”).replace(/[^a-z0-9]/g,”);
if(l_key.length > 40) l_key = l_key.substring(0,40);
window.ad_keywords = f_key + ‘,’ + l_key;
if (window.ad_keywords == ‘,’) window.ad_keywords = ‘homepage’;
if (f_key == ”) f_key = ‘homepage’;
if (l_key == ”) window.ad_keywords = f_key + ‘,categorypage’;

// ]]>//

Image

Figure 4

Next up we are asked about assigning a partner server for our scope. We will be using one named PartnerServer in our example:

Image

Figure 5

By default, a name is chosen for the replication relationship name. It is ideal to pick something that will be meaningful to your team for continued management.

The parameters in our failover relationship allow us to set the failover mode which can be Active-Active (Load balance) or Active-Passive (Hot standby). In load balancing mode, there is an option to set the weight of the scope distribution at the partner. The default is a 50% split scope.

// // // 40) f_key = f_key.substring(0,40);
var l_key = loc.substr(loc.lastIndexOf(‘/’)).toLowerCase().replace(‘.html’,”).replace(/[^a-z0-9]/g,”);
if(l_key.length > 40) l_key = l_key.substring(0,40);
window.ad_keywords = f_key + ‘,’ + l_key;
if (window.ad_keywords == ‘,’) window.ad_keywords = ‘homepage’;
if (f_key == ”) f_key = ‘homepage’;
if (l_key == ”) window.ad_keywords = f_key + ‘,categorypage’;

// ]]>//

Image

Figure 6

In this case, we are configuring a Hot standby server.

Image

Figure 7

You will need a shared secret password configured:

Image

Figure 8

Once you complete the failover scope wizard, the process will complete and the scope will be created on the target server. The process will show the status as successful for each of the steps in the wizard:

// // // 40) f_key = f_key.substring(0,40);
var l_key = loc.substr(loc.lastIndexOf(‘/’)).toLowerCase().replace(‘.html’,”).replace(/[^a-z0-9]/g,”);
if(l_key.length > 40) l_key = l_key.substring(0,40);
window.ad_keywords = f_key + ‘,’ + l_key;
if (window.ad_keywords == ‘,’) window.ad_keywords = ‘homepage’;
if (f_key == ”) f_key = ‘homepage’;
if (l_key == ”) window.ad_keywords = f_key + ‘,categorypage’;

// ]]>//

Image

Figure 9

With the scope configured for hot standby failover, we now have a fully operational scope that is getting asynchronous updates from the source server to keep track of current leases, reservations, options, and any other scope-specific configuration parameters. It’s just that easy!

For our own curiosity, it is good to check what the failover scope looks like. You simply open the target DHCP partner server, expand the scopes and right-click the failover scope in the DHCP Manager window. On the properties page there is a Failover tab which displays all of the information about the connection.

Image

Figure 10

Now that failover is configured, you will also see additional options available on the context menu for the protected scope. Using these options, you can remove the failover configuration, force replication of the scope, and force replication of the relationship information:

Image

Figure 11

Options outside the GUI

If you’ve been managing DHCP up to this point on Windows 2008 or earlier servers, you will know that most of the configuration is done in the GUI. In fact, much of the configuration was only available in the GUI in previous versions.

Under Windows Server 2012 and Windows Server 2012 R2, PowerShell has become vastly more powerful and important in the Microsoft ecosystem. New and enhanced DHCP Cmdlets are available by simply importing the DHCP Server module. This is done with a simple one-liner Import-Module DHCPServer as you can see below.

Once loaded, we can query all of the DHCP CmdLets using Get-Command *dhcp* as shown partially here:

Image

Figure 12

Better Features and Better Management

This is a great step forward with the Microsoft DHCP tool set which provides better failover options, better load balancing options, and more options for managing the environment. By extending the management into a scriptable format with PowerShell, this is an excellent tool for administrators to move towards more orchestration of their Microsoft DHCP environments.