This
document provides some methods for testing whether your failover cluster is
configured properly.
To
verify that the private and public networks are communicating properly, perform
the following steps. It is imperative to know the IP address for each network
adapter in the cluster, as well as for the IP cluster resources
From
each Server Node ping
From
a Client machine ping
For
SQL Server to be able to manage its resources and perform correctly the service
account must be part of the cluster ACL. To ensure that this is configured
properly, execute the following in a SQL Query Analyzer window:
select * from ::fn_virtualservernodes()
If
there is no output, ensure the account SQL Server is running under is part of
the cluster ACL.
Test3: Verifying SQL Server 2000
Failover Cluster Dependencies
Cluster resources may be dependent
on other resources to start before they are brought online. Make sure the
following dependencies exist.
Resource |
Dependencies |
Cluster
IP Address |
None |
Cluster
Name |
Cluster
IP Address |
Quorum |
None |
MS
DTC |
Cluster
Name, Cluster IP, Disk Resource (Quorum is the default) |
SQL Server IP Address
|
None |
SQL
Server Network Name |
SQL
Server IP Address |
SQL
Server (the virtual server itself) |
SQL
Server Network Name, Disk Resources associated with the instance |
SQL
Server Agent |
SQL
Server |
SQL
Server FullText |
SQL
Server |
Perform
a failover of SQL Server virtual server to ensure that all resources fail over
and restart on another node with no problems or affecting any other groups.
Check Windows Event Log and SQL Server Error Log to ensure that no error
occurred.
Initiating
a failure simulates failure of cluster resource and makes cluster service
perform all the steps necessary to failover to the other node. By default the
cluster service checks the failure condition three times before it decides that
it is a real failure and it has to perform a failover. So initiating failure
does not actually fail over the resource but performs all checks.
To
initiate a failure, right click on any resource under SQL Server Group and
select Initiate Failure.
Check
Windows Event log and SQL Server Error Log to ensure that no error occurred.
Next three tests involve breaking the cluster physically. Engineering is against doing this as it might damage the disk when you pull the plug. |
In
this test, you will first need to ensure that all of the default groups are
located on the primary node. Then you will physically turn off (flip the
switch) the primary node. If it is not possible to switch off the node,
shutdown the cluster service in the primary node.
All
resources should failover to the secondary node.
Check
Windows Event log and SQL Server Error Log for any potential error messages
after this occurs. Connect to the database from a SQL Server Client.
Once
you have checked for any potential problems, turn on the primary node and wait
until it fully boots (or restart the cluster service).
Now
turn off the secondary node (or stop the cluster service). All resources should
failover to the primary node. (If you have Cluster Administrator open in
primary node it will generate an error, open Cluster Administrator in secondary
node to see if the node failed over)
Check
Windows Event log and SQL Server Error Log for any potential error messages
after this occurs. Connect to the database from a SQL Server Client.
Once
the groups fail back to the primary node, turn the secondary node back on (or
restart the cluster service), and wait until it boots up fully.
This
test is similar in concept as the above test, which initiates a failover. But
instead of simulating a computer failure, it simulates a network-related error.
From
the primary node, remove the network cable from the public network card (or
disable the network card from Control Panel - Network and Dial-up Connections).
This will simulate a failure of the primary node, and should initiate a
failover to the secondary node.
All
resources should failover to the secondary node.
Check
Windows Event log and SQL Server Error Log for any potential error messages
after this occurs. Connect to the database from a SQL Server Client.
If
there are no errors plug the network cable back into the primary node (or
re-enable the network card), and then remove the network cable from the public
network card on the secondary node (or disable the network card).
As
before, all resources should failover to the primary node.
Check
Windows Event log and SQL Server Error Log for any potential error messages
after this occurs. Connect to the database from a SQL Server Client.
If
there are no errors plug the network cable back into the public network card on
the secondary node (or re-enable the network card).
If
you are doing this test remotely, create local user account in each node with
administrative privileges and connect to the node which you are testing using
remote desktop on the heartbeat IP so that you can re-enable the public network
card after the test.
The
cluster should continue to function as other two networks (private and public)
are configured to allow internal communications.
This
test is designed to help uncover potential issues with the shared disk array.
This test is designed to simulate what would happen if the controller card or
cable connected from a node to the shared disk array fails.
From
the primary node, remove the cable from the card used to connect to the shared
array (or disable the Private Network card that connects the node to filer).
This will simulate a failure of the primary node, and should initiate a
failover to the secondary node.
All
resources should failover to the secondary node. It will take some time for
cluster to realize that the disks are not accessible, but eventually cluster
service will stop and SQL Server will fail over. It takes more time for fail
over if the private network card in the node that owns the Quorum drive is
disabled.
Check
Windows Event log and SQL Server Error Log for any potential error messages
after this occurs. Connect to the database from a SQL Server Client.
Once
you have checked for any potential problems, plug the cable back into the
primary node (or enable the Private Network card that connects the node to
filer).
Now
remove the cable from the card used to connect to the shared array on the
secondary node (or disable the Private Network card that connects the node to
filer).
All
resources should failover to the primary node.
Check
Windows Event log and SQL Server Error Log for any potential error messages
after this occurs. Connect to the database from a SQL Server Client.
Once
you have checked for any potential problems, plug the cable back into the
secondary node (or enable the Private Network card that connects the node to
filer).
Failover
the cluster during the following activities
Activity |
Expected
Result |
Running
a large transaction |
Rollback |
Backup |
Incomplete
backup � RESTORE HEADERONLY command reveals that the backup file is
incomplete. If the backup is appended to an existing file, the file rollbacks
to original size. |
Restore |
Restore
fails leaving database in Loading state. Data and log files remain in disks. |
DBCC
CHECKDB |
Fails
with connectivity error. No Impact on database |