Binary Yoga: April 2014

Thursday, April 24, 2014

Upload backups to Amazon S3 bucket using boto, Python

Here is a script to upload backups older than 1 day to S3 bucket. We would like to exclude transaction log backups so I used "if 'LOG' not in name" - you may change it accordingly.

import os, sys, time
from boto.s3.connection import S3Connection
from boto.s3.key import Key
conn = S3Connection('YOURKEY','SECRETKEY')
bucket = conn.get_bucket('MyBeautifulBucket')
age = 86400 #1 day = 24*60*60
backupdir = 'O:\Backups\'

for root, dirs, files in os.walk(backupdir, topdown=False):
    for name in files:
        if 'LOG' not in name: #exclude copying transaction log backups
            f = os.path.join(root, name)
            now = time.time()
            modified = os.stat(f).st_mtime
            if modified > now - age:
                k=Key(bucket)
                k.key= 'mySubBucket/' + str(name)
                k.set_contents_from_filename(f)

Saturday, April 12, 2014

Run a query on multiple schemas postgres

If you have one huge database with multiple schemata in it, one for each customer (single database, multi tenant model), then from time to time you might have to run a query on all schemata, for example - getting a build number or user count etc. You could either write a shell script to bring all tenants from a tenant list and build sql queries to execute dynamically or you could use a function in postgres pl sql. Below is the function to get a count of users from all schemata, querying the table tUser:

-- create a record for the returned data 
DROP FUNCTION IF EXISTS count_em_all ();
DROP TYPE IF EXISTS table_count;
CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER); 

   CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS 
   $$
   DECLARE 
       the_count RECORD; 
       t_name RECORD; 
       r table_count%ROWTYPE; 

   BEGIN
       -- list the schemas
       FOR t_name IN 
         select distinct schemaname
         from pg_catalog.pg_tables
         where tablename = 'tUser' --and schemaname not ilike 'something|_you|_liketofilter%' escape '|'      
           LOOP
               -- Run your query on each schema
               FOR the_count IN EXECUTE 'SELECT COUNT(*) AS "count" FROM "'||t_name.schemaname||'".tUser'
               LOOP 
               END LOOP; 

               r.table_name := t_name.schemaname; 
               r.num_rows := the_count.count; 
               RETURN NEXT r; 
           END LOOP; 
           RETURN; 
   END;
   $$ LANGUAGE plpgsql;

select count_em_all();

SQL Server: Setup Mirroring Step by Step using TSQL

It is 2014 and people should be using AlwaysOn instead of mirroring. But some clients are still on SQL Server 2008R2 and I am about to setup mirroring for one such customer in a couple of days and posting these steps here won't hurt. There is a handy GUI wizard that lets you create mirroring pretty easily but if you have to mirror around 30 databases, using scripts is lot easier.

Our setup is simple. SQL01 and SQL02 are our two servers - SQL01 is the principal and SQL02 is the mirror. We are not using a witness for this setup.

Steps on Mirror Server (SQL02)

1. On SQL01, take a full backup and also a transaction log backup of the database you would like to mirror. (So your database would have to be in FULL recovery mode, otherwise no mirroring for you!)

2. Restore database on SQL02

Use master;
RESTORE DATABASE [MirrorMe] FROM DISK = N'K:\Backups\MirrorMe_041214.bak' WITH FILE = 1,  
NORECOVERY,  NOUNLOAD,  STATS = 10;
RESTORE LOG [MirrorMe] FROM DISK = N'K:\Backups\MirrorMe_log.trn' WITH FILE = 1,  
NORECOVERY,  NOUNLOAD,  STATS = 10;
GO

3. Create an endpoint for mirror instance, SQL02. An endpoint is necessary for database mirroring/alwayson to recieve connections from other servers.

CREATE ENDPOINT [endpoint_SQL02] 
 AUTHORIZATION [BINYOGA\yogirk]
 STATE=STARTED
 AS TCP (LISTENER_PORT = 7024, LISTENER_IP = ALL)
 FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE, ENCRYPTION = REQUIRED ALGORITHM RC4);
GO

4. Create a login for using in mirroring setup and grant connect on the endpoint.

USE [master]
CREATE LOGIN [BINYOGA\iMirror] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
GRANT CONNECT on ENDPOINT::endpoint_SQL02 TO [BINYOGA\iMirror];
GO

5. You might want to create and maintain same set of logins/users on both Principal and Mirror server. When you do a failover, you can simply fix orphaned users and your application can keep talking to the database.

SELECT 
  'create login [' + p.name + '] ' + 
  case when p.type in('U','G') then 'from windows ' else '' end +
  'with ' +
  case when p.type = 'S' then 'password = ' + master.sys.fn_varbintohexstr(l.password_hash) + ' hashed, ' +
  'sid = ' + master.sys.fn_varbintohexstr(l.sid) + 
  ', check_expiration = ' + case when l.is_expiration_checked > 0 then 'ON, ' else 'OFF, ' end + 
  'check_policy = ' + case when l.is_policy_checked > 0 then 'ON, ' else 'OFF, ' end +
  case when l.credential_id > 0 then 'credential = ' + c.name + ', ' else '' end 
  else '' end +
  'default_database = ' + p.default_database_name +
  case when len(p.default_language_name) > 0 then ', default_language = ' + p.default_language_name else '' end
FROM sys.server_principals p 
LEFT JOIN sys.sql_logins l ON p.principal_id = l.principal_id 
LEFT JOIN sys.credentials c ON  l.credential_id = c.credential_id
WHERE p.type in('S','U','G')
  AND p.name <> 'sa'
  --AND p.name in ('myLogin1','myLogin2' );

6. Add the databases to Mirroring. You'll notice that the endpoint is set to 7023 and that is yet to be created on our Primary server, SQL01.

USE master;
GO
--assuming that you have already restored these databases and at least one transaction log backup with NORECOVERY 
ALTER DATABASE MirrorMe01 SET PARTNER = 'TCP://10.10.2.10:7023';
ALTER DATABASE MirrorMe02 SET PARTNER = 'TCP://10.10.2.10:7023';
ALTER DATABASE MirrorMe03 SET PARTNER = 'TCP://10.10.2.10:7023';
ALTER DATABASE MirrorMe04 SET PARTNER = 'TCP://10.10.2.10:7023';

Steps on Principal Server (SQL01)

1. Now switch to Primary server, in this case SQL01 and create an endpoint

--you can check existing endpoints, if any using below query
SELECT name, role_desc, state_desc FROM sys.database_mirroring_endpoints;
SELECT name, port FROM sys.tcp_endpoints;
--create an endpoint

CREATE ENDPOINT [endpoint_SQL01] 
 AUTHORIZATION [BINYOGA\yogirk]
 STATE=STARTED
 AS TCP (LISTENER_PORT = 7023, LISTENER_IP = ALL)
 FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE, ENCRYPTION = REQUIRED ALGORITHM RC4);
GO

2.Mirror login

USE [master]
CREATE LOGIN [BINYOGA\iMirror] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
GRANT CONNECT on ENDPOINT::endpoint_SQL02 TO [BINYOGA\iMirror];
GO

3. Add databases to Mirroring

USE master;
ALTER DATABASE MirrorMe01 SET PARTNER = 'TCP://10.10.2.21:7024';
ALTER DATABASE MirrorMe02 SET PARTNER = 'TCP://10.10.2.21:7024';
ALTER DATABASE MirrorMe03 SET PARTNER = 'TCP://10.10.2.21:7024';
ALTER DATABASE MirrorMe04 SET PARTNER = 'TCP://10.10.2.21:7024'; 
GO

Done, you should now see the databases in mirrored state.

Saturday, April 5, 2014

Shell script: Analyze schema (update statistics) - PostgreSQL

Analyze in PostgreSQL is like sp_updatestats of SQL Server. Analyze gathers statistics about number of rows in a table and stores them in pg_statistics system catalog, which are used by query planner to create optimal query plans. Below is a shell script that is useful to update statistics on a schema. If you are on a single database and multi-tenant model, this is pretty useful to update the target tenant schema, without updating the entire database.

Before executing this, you might want to add postgres user, password and host to your pgpass file, so that you can execute this shell script without being prompted for password.

vi /root/.pgpass 

#Add details to file hostname:port:database:username:password
*:*:*:dbadmin:SysAdminPwd

#set file permissions
chmod 600 /root/.pgpass

Now the shell script: You can download script from here

#!/bin/sh
#
#
# Title    : analyze_schema.sh
# Description:  
# Shell script 
#  
# Options:
#  Schema name that requires a statistic update
#
# Requirements:
#  Installed psql and added path to binary in PATH system variable. 
#  
# Examples:
#  $HOME/analyze_schema.sh "MySchemaName"
#

SCHEMANAME="$1"
DBNAME="${2:-MyDatabase}"
DBUSER="${3:-dbadmin}"
DBPORT="${4:-7432}"
 
ANALYZESQL="anaylze_schema.sql"
ANALYZELOG="analyze_schema.log"

test=$(psql -A -1 -t -U $DBUSER -d $DBNAME -p $DBPORT -c "select * from pg_namespace where nspname='$SCHEMANAME'")
if [[ "$test" == "" ]]; then
  echo "Schema \"$SCHEMANAME\" in db \"$DBNAME\" for dbuser \"$DBUSER\" and port \"$DBPORT\" does not exists"
  exit 1
fi

GENANALYZESQL="
select 'ANALYZE VERBOSE \"'||current_database()||'\".\"'||t.schemaname||'\".\"'||t.tablename||'\"; ' as analyze_statement
from pg_tables t
where schemaname = '$SCHEMANAME'
order by t.tablename;
"

echo "Started generating analyze script for Schema \"$SCHEMANAME\" started at $(date +'%Y-%m-%d %T')"
psql -A -q -1 -t -p 7432 -U dbadmin -d MyDatabase -c "$GENANALYZESQL" > "$ANALYZESQL" 2>"$ANALYZELOG"
if ! [[ "$(cat $ANALYZELOG)" == "" ]]; then
  echo "Generation of analyze script for schema \"$SCHEMANAME\" in db \"$DBNAME\" for dbuser \"$DBUSER\" and port \"$DBPORT\" had error(s):"
  cat "$ANALYZELOG"
  exit 1
fi
echo "Analyzing schema \"$SCHEMANAME\" started at $(date +'%Y-%m-%d %T')"
psql -A -q -1 -t -p 7432 -U dbadmin -d MyDatabase -f "$ANALYZESQL" 2>&1 | /usr/sbin/prepend_datetime.sh >> "$ANALYZELOG"
echo "Analyzing schema \"$SCHEMANAME\" ended at $(date +'%Y-%m-%d %T')"
echo "Analyzing schema sql script is in file $ANALYZESQL. Analyze log is below:"
cat "$ANALYZELOG"

Tuesday, April 1, 2014

Delete files older than x days - Python, bash etc

Our postgres logs directory had logs from almost 6 months and we only needed 5 days worth of data. Its pretty straightforward to remove stuff using bash:

find /db/logs/archive_data/* -type f -mtime +5 | xargs rm

But find sometimes fails with the error "too many arguments" when it has to deal with too many files.

Here is another way in Python:

Example usage: Save this as cleanup.py and run - python cleanup.py /db/data 5

from sys import argv
import os, time

script, dir, age = argv
print "Searching directory %s for file older than %s day(s)" % (str(argv[1]), str(argv[2]))

#convert age to sec. 1 day = 24*60*60
age = int(age)*86400

for file in os.listdir(dir):
    now = time.time()
    filepath = os.path.join(dir, file)
    modified = os.stat(filepath).st_mtime
    if modified < now - age:
        if os.path.isfile(filepath):
            os.remove(filepath)
            print 'Deleted: %s (%s)' % (file, modified)