Clustering


Clustering

What is Clustering?
A cluster is a group of independent computer systems, referred to as nodes, working together as a unified computing resource. A cluster provides a single name for clients to use and a single administrative interface, and it guarantees that data is consistent across nodes.
Microsoft servers provide three technologies to support clustering:
§  Network Load Balancing (NLB),
§  Component Load Balancing (CLB), and
§  Microsoft Cluster Service (MSCS) Failover Cluster.

Network Load Balancing

Network Load Balancing acts as a front-end cluster, distributing incoming IP traffic across a cluster of servers, and is ideal for enabling incremental scalability and outstanding availability for e-commerce Web sites. Up to 32 computers can be connected to share a single virtual IP address. NLB enhances scalability by distributing its client requests across multiple servers within the cluster. As traffic increases, additional servers can be added to the cluster; up to 32 servers are possible in any one cluster. NLB also provides high availability by automatically detecting the failure of a server and repartitioning client traffic among the remaining servers within 10 seconds, while it provides users with continuous service.

Component Load Balancing

Component Load Balancing distributes workload across multiple servers running a site's business logic. It provides for dynamic balancing of COM+ components across a set of up to eight identical servers. Both CLB and Microsoft Cluster Service can run on the same group of machines.

Failover Clustering

Cluster Service acts as a back-end cluster; it provides high availability for applications such as databases, messaging and file and print services. MSCS attempts to minimize the effect of failure on the system as any node (a server in the cluster) fails or is taken offline.


Figure 1. Three Microsoft server technologies support clustering
MSCS failover capability is achieved through redundancy across the multiple connected machines in the cluster, each with independent failure states.
Windows 2003        - 8 Nodes
Windows 2008/2008R2 - 16 Nodes
Windows 2012/2012R2 - 64 Nodes

Each node has its own memory, system disk, operating system and subset of the cluster's resources. If a node fails, the other node takes ownership of the failed node's resources (this process is known as "failover"). Microsoft Cluster Service then registers the network address for the resource on the new node so that client traffic is routed to the system that is available and now owns the resource. When the failed resource is later brought back online, MSCS can be configured to redistribute resources and client requests appropriately (this process is known as "failback").

Microsoft Cluster Service is based on the shared-nothing clustering model. The shared-nothing model dictates that while several nodes in the cluster may have access to a device or resource, the resource is owned and managed by only one system at a time.

Microsoft Cluster Service is comprised of three key components:
The Cluster Service, Resource Monitor and Resource DLLs.

The Cluster Service

The Cluster Service is the core component and runs as a high-priority system service. The Cluster Service controls cluster activities and performs such tasks as coordinating event notification, facilitating communication between cluster components, handling failover operations and managing the configuration. Each cluster node runs its own Cluster Service.

The Resource Monitor

The Resource Monitor is an interface between the Cluster Service and the cluster resources, and runs as an independent process. The Cluster Service uses the Resource Monitor to communicate with the resource DLLs. The DLL handles all communication with the resource, so hosting the DLL in a Resource Monitor shields the Cluster Service from resources that misbehave or stop functioning. Multiple copies of the Resource Monitor can be running on a single node, thereby providing a means by which unpredictable resources can be isolated from other resources.

The Resource DLL

The third key Microsoft Cluster Service component is the resource DLL. The Resource Monitor and resource DLL communicate using the Resource API, which is a collection of entry points, callback functions and related structures and macros used to manage resources.


What is a Quorum?

What is a quorum? To put it simply, a quorum is the cluster’s configuration database. The database resides in a file named \MSCS\quolog.log. The quorum is sometimes also referred to as the quorum log.
Although the quorum is just a configuration database, it has two very important jobs. First of all, it tells the cluster which node should be active. Think about it for a minute. In order for a cluster to work, all of the nodes have to function in a way that allows the virtual server to function in the desired manner. In order for this to happen, each node must have a crystal clear understanding of its role within the cluster. This is where the quorum comes into play. The quorum tells the cluster which node is currently active and which node or nodes are in standby.
It is extremely important for nodes to conform to the status defined by the quorum. It is so important in fact, that Microsoft has designed the clustering service so that if a node cannot read the quorum, that node will not be brought online as a part of the cluster.
The other thing that the quorum does is to intervene when communications fail between nodes. Normally, each node within a cluster can communicate with every other node in the cluster over a dedicated network connection. If this network connection were to fail though, the cluster would be split into two pieces, each containing one or more functional nodes that cannot communicate with the nodes that exist on the other side of the communications failure.
When this type of communications failure occurs, the cluster is said to have been partitioned. The problem is that both partitions have the same goal; to keep the application running. The application can’t be run on multiple servers simultaneously though, so there must be a way of determining which partition gets to run the application. This is where the quorum comes in. The partition that “owns” the quorum is allowed to continue running the application. The other partition is removed from the cluster.

Types of Quorums


Standard quorum
Majority Node Set Quorum (MNS)
So far in this article, I have been describing a quorum type known as a standard quorum. The main idea behind a standard quorum is that it is a configuration database for the cluster and is stored on a shared hard disk, accessible to all of the cluster’s nodes.
In Windows Server 2003, Microsoft introduced a new type of quorum called the Majority Node Set Quorum (MNS). The thing that really sets a MNS quorum apart from a standard quorum is the fact that each node has its own, locally stored copy of the quorum database.
Types:
1)      Quorum Disk
2)      Local Only Quorum
3)      MNS (Majority Node Set)

Windows 2008/2008 R2/2012 have different types of Quorums:
Cluster Aware Applications:
·         SQL Server Database Services
·         SQL Server Analysis Services

Cluster Unaware Applications:
·         SQL Server Reporting Services
·         Integration Services
·         Notification Services
How Clustering Works
In a two-cluster node Active / Active setup, if any one of the nodes fail, then the another active node will take over the active resources of the failed instance. It is always preferred while creating two-node cluster that each node be connected to a shared disk array using either fiber channel or SCSI cables.           

The shared data in the cluster must be stored on shared disks, otherwise, when a failover occurs; the node which is taking over in the cluster pack cannot access it. As we are already aware, clustering does not help protect data or the shared disk array that it is stored on. So it is very important that you select a shared disk array that is very reliable and includes fault tolerance. 
Both nodes of the cluster are also connected to each other via a private network. This private network is used for each node to keep track of the status of the other node. For example, if one of the node experiences a hardware failure, the other node will detect this and will automatically initiate a failover. 

When clients initiate a connection, how will they know what to do when a failover occurs? This is the most intelligent part of Microsoft Cluster Services. When a user establishes a connection with SQL Server, it is through SQL Server’s own virtual name and virtual TCP/IP address. This name and address are shared by both of the servers in the cluster. In other words, both nodes can be defined as preferred owners of this virtual name and TCP/IP address.
           
Usually, a client will connect to the SQL Server cluster using the virtual name used by the cluster. And as far as a client is concerned, there is only one physical SQL Server, not two. Assuming that the X node of the SQL Server cluster is the node running SQL Server ‘A’ in an Active/Active cluster design, then the X node will respond to the client’s requests. But if the X node fails, and failover to the next node Y occurs, the cluster will still retain the same SQL Server virtual name and TCP/IP address ‘A’, although now a new physical server will be responding to client’s requests.     

During the failover period, which can last up to several minutes, clients will be unable to access SQL Server, so there is a small amount of downtime when failover occurs. The exact amount of time depends on the number and sizes of the databases on SQL Server, and how active they are.


Clustering Terms
Cluster Nodes
A cluster node is a server within a cluster group. A cluster node can be Active or it can be Passive as per SQL Server Instance installation.

Heartbeat
The heartbeat is a checkup mechanism arranged between two nodes using a private network set up to see whether a node is up and running. This occurs at regular intervals known as time slices. A failover is initiated, if heartbeat is not functioning, and another node in the cluster will take over the active resources.

Private Network
The Private Network is available among cluster nodes only. Every node will have a Private Network IP address, which can be ping from one node to another. This is to check the heartbeat between two nodes.

Public Network
The Public Network is available for external connections. Every node will have a Public Network IP address, which can be connected from any client within the network.

Shared Cluster Disk Array
A shared disk array is a collection of storage disks that is being accessed by the cluster. This could be SAN or SCSI RAIDs. Windows Clustering supports shared nothing disk arrays. Any one node can own a disk resource at any given time. All other nodes will not be allowed to access it until they own the resource (Ownership change occurs during failover). This protects the data from being overwritten when two computers have access to the same drives concurrently.

Quorum Drive
This is a logical drive assigned on the shared disk array specifically for Windows Clustering. Clustering services write constantly on this drive about the state of the cluster. Corruption or failure of this drive can fail the entire cluster setup.

Cluster Name
This name refers to Virtual Cluster Name, not the physical node names or the Virtual SQL Server names. It is assigned to the cluster as a whole.

Cluster IP Address
This IP address refers to the address which all external connections use to reach to the active cluster node.

Cluster Administrator Account
This account must be configured at the domain level, with administrator privileges on all nodes within the cluster group. This account is used to administer the failover cluster.

Cluster Resource Types
This includes any services, software, or hardware that can be configured within a cluster. Ex: DHCP, File Share, Generic Application, Generic Service, Internet Protocol, Network Name, Physical Disk, Print Spooler, and WINS.

Cluster Group
Conceptually, a cluster group is a collection of logically grouped cluster resources. It may contain cluster-aware application services, such as SQL Server 2000.

SQL Server Network Name (Virtual Name)
This is the SQL Server Instance name that all client applications will use to connect to the SQL Server.

SQL Server IP Address (Virtual IP Address)
This refers to the TCP/IP address that all client applications will use to connect to SQL Server; the Virtual Server IP address.

SQL Server 2000 Full-text
Each SQL Virtual Server has one full-text resource.

Microsoft Distributed Transaction Coordinator (MS DTC)
Certain SQL Server Components require MS DTC to be up and running. MS DTC is shared for all named / default instances in cluster group.

SQL Server Virtual Server Administrator Account
This is the SQL Server service account, and it must follow all the rules that apply to SQL Service user accounts in a non-clustered environment.

How to Cluster Windows Server 2003

Before Installing Windows 2003 Clustering

Before you install Windows 2003 clustering, we need to perform a series of important preparation steps. This is especially important if you didn't build the cluster nodes, as you want to ensure everything is working correctly before you begin the actual cluster installation. Once they are complete, then you can install Windows 2003 clustering. Here are the steps you must take:
  • Double check to ensure that all the nodes are working properly and are configured identically (hardware, software, drivers, etc.).
  • Check to see that each node can see the data and Quorum drives on the shared array or SAN. Remember, only one node can be on at a time until Windows 2003 clustering is installed.
  • Verify that none of the nodes has been configured as a Domain Controller.
  • Check to verify that all drives are NTFS and are not compressed.
  • Ensure that the public and private networks are properly installed and configured.
  • Ping each node in the public and private networks to ensure that you have good network connections. Also ping the Domain Controller and DNS server to verify that they are available.
  • Verify that you have disabled NetBIOS for all private network cards.
  • Verify that there are no network shares on any of the shared drives.
  • If you intend to use SQL Server encryption, install the server certificate with the fully qualified DNS name of the virtual server on all nodes in the cluster.
  • Check all of the error logs to ensure there are no nasty surprises. If there are, resolve them before proceeding with the cluster installation.
  • Add the SQL Server and Clustering service accounts to the Local Administrators group of all the nodes in the cluster.
  • Check to verify that no antivirus software has been installed on the nodes. Antivirus software can reduce the availability of clusters and must not be installed on them. If you want to check for possible viruses on a cluster, you can always install the software on a non-node and then run scans on the cluster nodes remotely.
  • Check to verify that the Windows Cryptographic Service Provider is enabled on each of the nodes.
  • Check to verify that the Windows Task Scheduler service is running on each of the nodes.
  • If you intend to run SQL Server 2005 Reporting Services, you must then install IIS 6.0 and ASP .NET 2.0 on each node of the cluster.

These are a lot of things you must check, but each of these is important. If skipped, any one of these steps could prevent your cluster from installing or working properly.

 

How to Install Windows Server 2003 Clustering

Now that all of your physical nodes and shared array or SAN is ready, you are now ready to install Windows 2003 clustering. In this section, we take a look at the process, from beginning to end.
To begin, you must start the Microsoft Windows 2003 Clustering Wizard from one of the nodes. While it doesn't make any difference to the software which physical node is used to begin the installation, I generally select one of the physical nodes to be my primary (active) node, and start working there. This way, I won't potentially get confused when installing the software.
If you are using a SCSI shared array, and for many SAN shared arrays, you will want to make sure that the second physical node of your cluster is turned off when you install cluster services on the first physical node. This is because Windows 2003 doesn't know how to deal with a shared disk until cluster services is installed. Once you have installed cluster services on the first physical node, you can turn on the second physical node, boot it, and then proceed with installing cluster services on the second node.
ew in Windows Server 2003
These are some of the improvements Windows Server 2003 has made in clustering:
·         Larger clusters: The Enterprise Edition now supports up to 8-node clusters. Previous editions only supported 2-node clusters. The Datacenter Edition supports 8-node clusters as well. In Windows 2000, it supported only 4-node clusters.
·         64-bit support: This feature allows clustering to take advantage of the 64-bit version of Windows Server 2003, which is especially important to being able to optimize SQL Server 2000 Enterprise Edition.
·         High availability: With this update to the clustering service, the Terminal Server directory service can now be configured for failover.
·         Cluster Installation Wizard: A completely redesigned wizard allows you to join and add nodes to the cluster. It also provides additional troubleshooting by allowing you to view logs and details if things go wrong. It can save you some trips to the Add/Remove Programs applet.
·         MSDTC configuration: You can now configure MSDTC once and it is replicated to all nodes. You no longer have to run the comclust.exe utility on each node.

1) Installing SQL Server SP on a cluster both SQL 2005 and 2012
When applying Service Pack on a cluster follow Rolling upgrade. In SQL Server 2005 if patch is applied entire instance will face downtime as resource database and all binaries are patched at same time and there is only one resource database in shared disk.

Whereas in SQL Server 2008 onwards, each node has its own resource database and hence patching can be split between the nodes. We can first patch Passive node and then restart the server (during this time business will run from Active Node) and perform Failover and patch the previously active node (Now business will run from New Active Node).

2) Configuring Backups on a cluster?
Backups on a cluster are to be taken to a dedicated SAN Shared Clustered Disk which is a part of the Clustered Group. SQL Backups on cluster cannot happen to a local drive and hence a clustered disk is good configuration setting considering a risk of failover.

Jobs can be created and configured to Shared Disk, so that if even if failover occurs the jobs will re-run as per schedule and continue to use the shared disk.

3) How many IPs are required for a Cluster?
This question can be answered only when we know the number of nodes.

Assume number of nodes is n value then 2(n) +3n = number of nodes
The three additional IPs are
1) Windows Virtual IP
2) SQL Virtual IP
3) MSDTC IP

4) Multiple Instance cluster (Active-Active)
If there are multiple instances on a cluster to utilize Node hardware resources optimally then that configuration is called Active-Active or Multi-Instance cluster.

5) Adding a Disk on a cluster
Adding a disk is a multi-step process.

1) First Add disk to the Cluster Administrator as Clustered Disk.
2) After adding cluster disk, make sure that Clustered Disk is added to SQL Server Cluster Group.
3) After Adding to SQL Server Cluster Group, set Dependency to SQL Server Main Service with the newly added clustered disk.
4) Verify in sys.dm_io_shared_disks DMV in the SQL Server instance if the newly added drive is visible.

7) What are dependencies in a cluster? What is Dependency Report?
Dependencies are important for Cluster functionality.

SQL Server Agent ->AND->SQL Server Main Service -> AND -> All Disks + SQL Server Name
SQL Server Name -> AND -> SQL Server Virtual IP

As a thumb rule all dependencies mostly in SQL Server clustered instances will ideally be AND dependency (except in the case of Multi-Subnet Failover Clusters)

8) Possible Owners and Preferred Owners

Possible Owners:-
It is the list of all the nodes that are configured for a clustered instance. If a failover occurs the choice of failover WILL/MUST be from one of the members from this list. If a node is not a possible owner then failed over instance will not come online on that node.  If no possible owner nodes are up, then the group will still failover to a node that’s not a possible owner, but it will not come online.

Preferred Owners:-
Preferred owners are the nodes we would like to have it on under ideal conditions, but maybe not the only one it can be on. For example, Node 1 and 3 are "Preferred" owners, and nodes 1,2 and 3 are Possible owners, then if the service is on node 1 and node 1 fails, then the service will move to Node 3 and only go to Node 2 if both 1 and 3 are not available.

10) Clustering Commands?
cluster /list
cluster node /status
cluster group /status
cluster network /status
cluster netinterface /status
cluster resource /status
cluster group "SQL Server (SQLSEENU143)" /move:Node2

11) How to read Quorum log?
Cluster log can be read from C:\Windows\Cluster\Reports\Cluster.log file from each node.

Reading from Quorum Drive is not recommended as Local Admin we would not have admin rights on the MSCS Cluster directory in Quorum Drive.

cluster log /gen
Generates recent cluster.log on Node1 and Node2.

12) Cluster Checks? IsAlive and LookAlive?
LookAlive check (called as Basic resource health check) verifies that SQL Server is running on the current node. By default it checks every 5 seconds. If LookAlive check fails Windows Cluster performs IsAlive check.

IsAlive check (called as thorough resource health check) runs every 60 seconds and verifies instance is up and running or not using the command in Resource DLL called SELECT @@SERVERNAME every 60 seconds. If this query fails, the check runs additional retry logic to avoid stress-related failures.

sp_server_diagnostics

13) How to failover SQL Server cluster using a command?
cluster group "SQL Server (SQL2K12)" /move:Node22

14) Splitbrain situation
A split-brain scenario happens when all the network communication links between two or more cluster nodes fail. In these cases, the cluster may be split into two or more partitions that cannot communicate with each other.

HA clusters usually use a heartbeat private network connection which is used to monitor the health and status of each node in the cluster. If heartbeat communication fails for any network reason, split-brain situation occurs (Partitioning). Every node thinks that other node is down and there is a risk of starting services. So to avoid this risk, Quorum updates the nodes about wellbeing of other nodes. Quorum acts as a point of communication till Private network is up and running.

The node that owns the quorum resource puts a reservation on the device every three seconds; this guarantees that the second node cannot write to the quorum resource. When the second node determines that it cannot communicate with the quorum-owning node and wants to grab the quorum, it first puts a reset on the bus.

The reset breaks the reservation, waits for about 10 seconds to give the first node time to renew its reservation at least twice, and then tries to put a reservation on the quorum for the second node. If the second node's reservation succeeds, it means that the first node failed to renew the reservation. And the only reason for the failure to renew is because the node is dead. At this point, the second node can take over the quorum resource and restart all the resources.

15) What is the significance of MSDTC? Can we configure Multiple MSDTC’s?
MSDTC is used for distributed transactions between clustered SQL Server instances and any other remote data source. If we need to enlist a query on a clustered instance in a distributed transaction we need MSDTC running on the cluster as a clustered resource. It can run on any node in your cluster - We usually have it running on the passive node.

1) Before installing SQL Server on a failover cluster, Microsoft strongly recommends that you install and configure Microsoft Distributed Transaction Coordinator (MS DTC)

2) SQL Server requires MS DTC in the cluster for distributed queries and two-phase commit transactions, as well as for some replication functionality.

3) Microsoft only supports running MSDTC on cluster nodes as a clustered resource. We do not recommend or support running MSDTC in stand-alone mode on a cluster. Using MSDTC as a non-clustered resource on a Windows cluster is problematic and it can cause data corruption if a cluster failover occurs.

4) To help ensure availability between multiple clustered applications, Microsoft highly recommends that the MS DTC have its own resource group and resources.

16) Why is SQL Server Service Manual on a cluster?
Whenever node restarts each node should not attempt to start SQL Server. Hence by design in SQL Server clustering the services are configured as manual.

17) What to do if Quorum fails? (Windows Task)
Quorum crash/failure is more of disk corruption that would have occurred crashing the Quorum. Ideally we have windows team addressing this issue through Monitoring implementation.

18) Intro to Mirror/Log Shipping on Cluster?


19) Service SID?
It is a mechanism that assigns privileges to the service itself, rather than to the account under which the service runs.

Service SIDs managed to improve our security because they enable you to use the user Service Account with the least privileges required.

20) How to cluster troubleshooting?
1) Refer Cluster Administrator (cluadmin.msc) and check Cluster Events.
2) Issues can be Disk Related, Network Related, Service Related (SQL Server), Cluster Related.
3) As per the issue contact the respective team.
4) If SQL Server is the issue, Check Event Viewer on why Service went down.
   i) Check Event Viewer
  ii) Check SQL Server Error Log
 iii) Verify for any errors and troubleshoot as per issue.

5) Additional sources of troubleshooting.

   C:\Windows\Cluster\Report\cluster.log file will help in identifying underlying issue in the cluster in the specific node. Cluster log would be present on both the nodes.

http://www.sql-server-performance.com/articles/clustering/cluster_infrastructure_p1.aspx
http://www.sql-server-performance.com/articles/clustering/clustering_best_practices_p1.aspx

Cluster Aware Applications:
SQL Server Database Services
SQL Server Analysis Services

Cluster Unaware Applications:
SQL Server Reporting Services
Integration Services
Notification Service:

Instance Aware Services:
SQL Server Main Service
SQL Server Agent Service
SQL Server Full Text Search

Instance UnAware Services:
Browser Service
VSS Writer
SQL Server AD Helper

Prerequisites for Configuring SQL Server Prerequisites of Cluster:-
 1) SQL Server Media on both nodes
 2) Create three Global Groups (Domain Groups) Optional
 3) Create service accounts
 4) SQL Server Virtual IP (Ask Network Team)
 5) SQL Server Virtual Name
 6) S: drive for Data/Log files as Shared iSCSI drives
 7) Components that are cluster aware are Database Services and Analysis Services
 8) Configure MSDTC as a Cluster Resource
 9) Add Disk Dependency in SQL Server group to the SQLData drives.
10) Hardware check on both nodes (equal)
11) Validate Windows Cluster

Sequence of Cluster Resources during Failover:

Stopping Order
1) SQL Server Agent Service
2) SQL Server Main Service
3) SQL Server IP
4) SQL Server Name
5) All Disk(s)

Starting Order
1) All Disks
2) SQL Server IP
3) SQL Server Name
4) SQL Server Main Service
5) SQL Server Agent Service

Scenarios in Cluster:

1) Applying SP in SQL Server 2005 cluster (v)
2) Adding a disk to the cluster for SQL Server (v)
5) Failovers and Failbacks. (v)
4) Adding/Deleting a Node in a cluster.
6) Preferred Owner and Possible owners. (v)
7) Look Alive and IsAlive. (v)
8) Changing the Virtual IP for the SQL Server cluster. (V)
9) Master corruption in SQL Server cluster.
10) IP Addresses needed for Two Cluster configuration.

If it is a two node cluster --à 2(N)+3 => 2(2)+3=7 IP’S

1 Public, 1 Private at Node1
1 Public, 1 Private at Node2
1 IP for Windows Cluster
1 Virtual IP for SQL Cluster
1 IP for MSDTC
1 IP for Quorum
1 IP for Backups (if third party backup solutions are used)

Adding a new disk to the cluster:

1) Contact Storage team checking for possibility of extending the disk or Adding a new disk. Extending disk sometimes involves downtime, so it depends on customer providing downtime.

2) Once the disk is either extended/added. Ensure Windows team makes the disk as a Clustered Disk.

Cluadmin.msc->Storage->Add Disk

3) Add the disk as a resource under SQL Server Cluster Group.
Cluadmin.msc->SQLServer Cluster Group-> Add Storage-> Add the new clustered disk.

4) Set Disk Dependency.
Cluadmin.msc->SQLServer Cluster Group->Right click on SQL Server Main Service->Properties->Dependencies->Insert->AND (Disk Number).

Single Instance:

Active/Passive clustering means having instances running in the cluster as Active on one Node and second node is always passive to take over responsibilities when First node crashes.

The terminology has been changed to Single Instance Cluster to avoid confusion.

Multiple Instance:
Active/Active clustering simply means having two separate instances running in the cluster—one (or more) per machine.

The terminology has been changed to Multi-Instance Cluster.

MSDTC:-
MSDTC is an acronym for Microsoft Distributed Transaction Coordinator.

The Microsoft Distributed Transaction Coordinator service (MSDTC) tracks all parts of the transactions process, even over multiple resource managers on multiple computers.

This helps ensure that the transaction is committed, if every part of the transaction succeeds, or is rolled back, if any part of the transaction process fails.

Do we need MSDTC? Is it Compulsory?

SQL 2005 does require MSDTC for setup, since it uses a transactions to control setup on multiple nodes. However, SQL Server 2008/2008R2/2012 and SQL 2014 setup does NOT require MSDTC to install SQL.

N+1:-
Having one passive dedicated for failovers.

Assume 3 Node cluster, 2 Nodes are Active and 1 Node is allocated to be Passive.

N+M:-
Having multiple passives dedicated for failovers.

Assume 5 Node cluster, 3 Nodes are Active and 2 Nodes are allocated to be Passive.

Geo Cluster:-
Geo-cluster is a cluster between two different subnets or a group of subnets. These subnets may be present at same place or at different geographies.

Ideally Geo clustering involves when performing clustering between different data centers.

Maximum number of instances on a clustered instance is 25. 50 instances are possible if we choose SMB file shares.

Reason behind 25 instances on a cluster is due to Shared Disk Drive Letter Availability.
Number of Nodes on a Cluster:-

Windows 2003               - 8 Nodes
Windows 2008/2008R2 - 16 Nodes
Windows 2012/2012R2 - 64 Nodes

Quorum:
1) Quorum stores cluster configurations. Also called as cluster config database
2) Quorum contains information of active owner
3) Quorum helps in communications during heartbeat breakdown.

Types of Quorums:

1) Node Majority quorum mode -
This model requires an odd number of nodes. (Example 3).

Then the cluster can survive till 1 node failure. Where majority of votes are available to keep the cluster alive.

-- Let’s say we are starting our cluster using N nodes than, at any point of time, we must have at least (N + 1)/2 no of nodes alive\working. Means this cluster can sustain up to (N-1)/2 node failures.
Example - (N + 1)/2
If N = 11 than (11 + 1)/2 =6
Then at any point of time It needs atleast\minimum 6 working nodes

2) Node and Disk Majority quorum mode -
This model is combination of Node and Quorum disk and it is used when there are even number of nodes.

This Quorum Model can be used for clusters where the nodes are all in the one data center. An extra vote gets added in the form of Disk. So that the risk of failure can be reduced.

If there are 4 nodes, an extra node gets added to make cluster survive till 2 failures.

-- Let’s say we are starting our cluster using N nodes than, at any point of time, we must have at least (N+1 + 1)/2 no of nodes alive\working. Means this cluster can sustain up to (N+1 - 1)/2 node failures.
Example - (N+1 + 1)/2
If N = 10 than (10+1 + 1)/2 =6
then at any point of time It needs atleast\minimum 6 working nodes including disk vote , that is 5 working nodes + disk vote.

3) Node and File Share Majority quorum mode –
This model is combination of Node and File Share Application majority.

An extra vote gets added in form of File Share Application so that the risk of failure can be reduced.

4) No Majority: Disk Only quorum mode –

Traditional Windows 2003 Quorum Disk Model. Recommend to discontinue use of this Model

Only Disk contains the Quorum and there is high risk of failure of cluster if Quorum crashes.



Step-by-Step Configuring a 2-node multi-site cluster on Windows Server 2008 R2

Option 1 – place the file share in the primary site.
Option 2 – place the file share in the secondary site.
Option 3 – place the file share witness in a 3rd geographic location

Configure the Cluster

Add the Failover Clustering Role: Add the Failover Clustering feature to both nodes of your cluster from Add Features Wizard.

Change the names of your network connections: It is best if you rename the connections on each of your servers to reflect the network that they represent. This will make things easier to remember later.

Make sure your public network is first: Go into the Advanced Settings of your Network Connections (hit Alt to see Advanced Settings menu) of each server and make sure the Public network is first in the list.

Private network settings: Your private network should only contain an IP address and Subnet mask. No Default Gateway or DNS servers should be defined. Your nodes need to be able to communicate across this network, so make sure the servers can communicate across this network; add static routes if necessary.

Validate a Configuration: The first step is to “Validate a Configuration”.
Open up the Failover Cluster Manager and click on Validate a Configuration.

Add the cluster nodes: The Validation Wizard launches and presents you the first screen as shown below. Add the two servers in your cluster and click next to continue.

Select “Run only tests I select”: A multi-site cluster does not need to pass the storage validation (see Microsoft article). Toskip the storage validation process,click on “Run only the tests I select” and click Continue.

Unselect the Storage test: In the test selection screen, unselect Storage and click next

Confirm your selection: You will be presented with the following confirmation screen. Click Next to continue.

View the validation report: If you have done everything right, you should see a summary page that looks like the Notice that the yellow exclamation point indicates that not all of the tests were run. This is to be expected in a multi-site cluster because the storage tests are skipped. As long as everything else checks out OK, you can proceed. If the report indicates any other errors, fix the problem, re-run the tests, and continue.

Create your cluster: You are now ready to create your cluster. In the Failover Cluster Manager, click on Create a Cluster.

Skip the validation test: The next step asks whether or not you want to validate your cluster. Since you have already done this you can skip this step. Note this will pose a little bit of a problem later on if installing SQL as it will require that the cluster has passed validation before proceeding. When we get to that point I will show you how to by-pass this check via a command line option in the SQL Server setup. For now, choose No and Next.

Choose a unique name and IP address: create a name for this cluster and IP for administering this cluster. This will be the name that you will use to administer the cluster, not the name of the SQL cluster resource which you will create later. Enter a unique name and IP address and click next.

Note: This is also the computer name that will need permission to the File Share Witness as described later in this document.

Confirm your choices: Confirm your choices and click next.

View the report to find out what the warning is all about:  if you have done everything right you will see the Summary page. Notice the yellow exclamation point; obviously something is not perfect. Click on View Report to find out what the problem may be.

Implementing a Node and File Share Majority quorum

We need to identify the server that will hold our File Share witness. Remember, as we discussed earlier, this File Share witness should be located in a 3rd location, accessible by both nodes of the cluster. Once you have identified the server, share a folder as you normally would share a folder. In my case, I create a share called MYCLUSTER on a server named DEMODC.

The key thing to remember about this share is that you must give the cluster computer name read/write permissions to the share at both the Share level and NTFS level permissions. If you recall back at Figure 13, I created my cluster and gave it the name “MYCLUSTER”. You will need to make sure you give the cluster computer account read/write permissions

Give the cluster computer account share level permissions: Give the cluster computer account share level permissions

Change your quorum type: Now with the shared folder in place and the appropriate permissions assigned, you are ready to change your quorum type. From Failover Cluster Manager, right-click on your cluster, choose More Actions and Configure Cluster Quorum Settings.

Choose Node and File Share Majority: On the next screen choose Node and File Share Majority and click next.

Choose your file share witness: In this screen, enter the path to the file share you previously created and click next.

Click Next to confirm your quorum change to Node and File Share Majority: Confirm that the information is correct and click next.

A successful quorum change: Assuming you did everything right, you should see the following Summary page.




No comments:

Post a Comment

Popular Posts