Sunday, March 30, 2014

SQL Server Database Mirroring: automating mirror failovers

Problem:

You have a bunch of databases on a production server and you decided to go with SQL Server mirroring as a DR solution. Your IT dept, in its infinite wisdom, thought that it is a waste of resources to have a witness server and you were asked to somehow adjust with a Principal and a Mirror server. Your databases are critical and though the option of automated failovers is ruled out due to lack of witness, you would like to put process in place such that you are always battle ready to deal with a possible disaster and recover quickly. The solution should be designed in such a way that even the most junior of your DBAs should be able to handle it.

Solution:

For the sake of this explanation, lets us assume my production server is YIN and Mirror server is YANG. So what happens when a disaster occurs? Assuming that you already have mirroring setup between YIN and YANG and it is functional, you might have to do the following after a disaster.

  • When YIN goes down, databases on YANG server will be in restoring/disconnected state. You'll bring all those databases online  by performing a forced failover with allow data loss.
  • When the failover is done, you might want to fix orphaned users
  • You now make necessary changes to whichever mechanism your application uses to connect to the databases to point to the new server. In our case, we store database names and connection strings in a SQL that sits on a well maintained cluster (lets call this MOTHERSHIP), so we update those connection strings with IP address of YANG. 
  • We have lot of business specific SQL Agent jobs running on our production servers apart from regular maint jobs. These jobs exist both on YIN and YANG. The jobs are kept in disabled state on mirror server. Now that we had a disaster and we failover to the evil YANG, we might have to enable these jobs. 
  • Send a notification that failover of some sort occurred. 
What Next?

Ok, you did all of that and we were back in business with minimal downtime. Our former Principal server was resurrected and is back online. YANG is currently the Principal and you might want to add YIN back to mirroring setup as the new mirror. After a couple of days, you are sure that YIN is now stable and it should be primary because it probably has better resources compared to YANG. You might have to do a soft (planned) failover. So following are various scenarios that are possible in a setup like this:
  • Planned Failover from Yin to Yang. Stop jobs on Yin, enable jobs on Yang, fix users, update connection strings
  • Planned Failover from  Yang to Yin etc.,
  • Forced Failover on Yin
  • Forced Failover on Yang
  • Resume Mirroring on Yin
  • Resume Mirroring on Yang
Enough of theory. Lets get our hands dirty. Creating batch scripts for each of these scenarios appeared like the easiest simplest (if not elegant) solution to me. The actual logic that performs a failover can be a stored procedures and the batch scripts can be stored on a server which is highly available. 

Create these stored procedures on both Principal and Mirror Servers.
 Example procedure below. Click links above for other sps. 

USE [master]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_ForcedFailover]
AS

/* stored procedure for doing a forced failover of all database to YANG
example usage:  EXEC master.dbo.usp_MirrorFailover
*/

DECLARE @sql NVARCHAR(300)
 ,@maxid INT
 ,@i INT
 ,@dbname SYSNAME
DECLARE @MirrorDBs TABLE (
 [dbid] INT identity(1, 1) NOT NULL PRIMARY KEY
 ,dbname SYSNAME NOT NULL
 )

SET NOCOUNT ON

INSERT INTO @MirrorDBs (dbname)
SELECT DB_NAME(database_id)
FROM sys.database_mirroring
WHERE mirroring_role = 2 -- Mirror
 AND mirroring_state IN (0,1) -- Disconnected or Suspended
 AND mirroring_safety_level = 2

--select * from @MirrorDBs
SELECT @maxid = max([dbid])
 ,@i = 1
FROM @MirrorDBs;

WHILE @i <= @maxid
BEGIN
 SELECT @dbname = dbname
 FROM @MirrorDBs
 WHERE [dbid] = @i

 SET @sql = 'ALTER DATABASE ' + QUOTENAME(@dbname) + ' SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS;'

 PRINT @sql

 EXEC (@sql);

 SET @i = @i + 1
END

SET NOCOUNT OFF


Next, create scripts for enabling/disabling jobs:
declare @sql nvarchar(max) = '';
select
@sql += N'exec msdb.dbo.sp_update_job @job_name = ''' + name + N''', @enabled = 1;
' from msdb.dbo.sysjobs
where enabled = 0
order by name;

print @sql;
exec (@sql);


Create scripts for fixing users

DECLARE @sql NVARCHAR(300)
 ,@maxid INT
 ,@i INT
 ,@dbname SYSNAME
DECLARE @Mirrors TABLE (
 [dbid] INT identity(1, 1) NOT NULL PRIMARY KEY
 ,dbname SYSNAME NOT NULL
 )

SET NOCOUNT ON

INSERT INTO @mirrors (dbname)
SELECT DB_NAME(database_id)
FROM sys.database_mirroring
WHERE mirroring_role = 1 -- Principal 
 AND mirroring_state = 4 -- Synchronized
 AND mirroring_safety_level = 2

--select * from @mirrors
SELECT @maxid = max([dbid])
 ,@i = 1
FROM @mirrors;

WHILE @i <= @maxid
BEGIN
 SELECT @dbname = dbname
 FROM @mirrors
 WHERE [dbid] = @i

 SET @sql = 'USE ' + QUOTENAME(@dbname) + 
 ' EXEC sp_change_users_login ''Auto_Fix'', ''myUser1'';
EXEC sp_change_users_login ''Auto_Fix'', ''myUser2'';
 '

 PRINT @sql

 EXEC (@sql);

 SET @i = @i + 1
END
 
Update connection strings on MOTHERSHIP server
USE ConnectionDB
GO
UPDATE COMPANY SET ConnectionString='Data Source=10.10.2.10;User ID=fancyuser;Password=fancypass;Max Pool Size=200' 
WHERE ConnectionString='Data Source=10.10.2.11;User ID=fancyuser;Password=fancypass;Max Pool Size=200';
Finally, the batch file that brings all these together:
@echo off
goto :startbatch

Description
=============================================================
This batch file is used for 
1. Performing a soft failover of YIN to YANG. YANG will become the primary
2. Fix Database users
3. Update connection strings in connectiondb database post failover
4. Disable jobs on YIN
5. Enable jobs on YANG
6. Send a notification email 
===============================================================

:startbatch

set source_server=10.10.2.10
set mothership_server=10.10.2.23
set mirror_server=10.10.2.11
set serverlogin=fancyuser
set serverpass=fancypass
set connectiondb=connectiondb

set home_dir=d:\mirrorme\Failover_YIN_TO_YANG
set failover_script=%home_dir%\failover.sql
set failover_out=%home_dir%\failover_out.txt
set fixusers_script=%home_dir%\fixusers.sql
set fixusers_out=%home_dir%\fixusers_out.txt
set update_connectiondb=%home_dir%\update_connectiondb.sql
set update_connectiondb_out=%home_dir%\update_connectiondb_out.txt
set disablejobs_script=%home_dir%\disablejobs.sql
set disablejobs_out=%home_dir%\disablejobs_out.txt
set enablejobs_script=%home_dir%\enablejobs.sql
set enablejobs_out=%home_dir%\enablejobs_out.txt




REM Failover from YIN to YANG
sqlcmd -S tcp:%source_server%,1433 -d master -U %serverlogin% -P %serverpass% -s "," -W -h -1 -i %failover_script% -o %failover_out%

REM Run fix users script on YANG to deal with orphan users
sqlcmd -S tcp:%mirror_server%,1433 -d master -U %serverlogin% -P %serverpass% -s "," -W -h -1 -i %fixusers_script% -o %fixusers_out%

REM update connectiondb database with new connection string
sqlcmd -S tcp:%connectiondb_server%,1433 -d %connectiondb% -U %serverlogin% -P %serverpass% -s "," -W -h -1 -i %update_connectiondb% -o %update_connectiondb_out%

REM disable jobs on YIN because we failedover to YANG
sqlcmd -S tcp:%source_server%,1433 -d master -U %serverlogin% -P %serverpass% -s "," -W -h -1 -i %disablejobs_script% -o %disablejobs_out%

REM Enable jobs on YANG since it is the new primary
sqlcmd -S tcp:%mirror_server%,1433 -d master -U %serverlogin% -P %serverpass% -s "," -W -h -1 -i %enablejobs_script% -o %enablejobs_out%


REM send emails on completion
C:\work\tool\sendemail -t allcoolsdudes@Nirvanainc.com -cc -u "Mirroring failover of YIN to YANG" -m "Database Mirroring Failover of YIN to YANG has been triggered and completed. SQL Agent jobs on YIN were disabled and jobs on YANG were enabled. Check logs attached. " -s 192.24.2.110:25 -f "LordOfServers@NirvanaInc.com" -a %failover_out% -a %fixusers_out% -a %update_connectiondb_out% -a %disablejobs_out% -a %enablejobs_out%


To do:

Update this post with scripts for all other possible failover scenarios: