Binary Yoga: September 2012

Tuesday, September 18, 2012

Create a database user if it doesn't already exist

Similar to OBJECT_ID function which can be used to check whether it already exists before creating/dropping, we can use SUSER_ID function check whether the database user exists before creating. This can be handy when you have a user rebuild script and it often errors out when the script tries to add a user that is already in the database

IF SUSER_ID('Cherry') IS NULL
 CREATE USER [Cherry] FOR LOGIN [Cherry]
ELSE Print 'Already exists'

Sunday, September 2, 2012

Delete duplicate rows using CTE, Surrogate Key, and SELECT DISTINCT

Once again, this is from one of the chapters from the book SQL Server 2008 Bible. I have used other sources like MSDN, Stackoverflow and Google in general in writing this post :)

Removing Duplicates

Lets create a table and populate it with some values

USE AdventureWorks;
GO 

SELECT TOP 100 * INTO dba_PersonCopy FROM Person.Person
GO

--Lets see the data

SELECT * FROM dba_PersonCopy
GO

The Original table has Primary Key on Business_Entity_Id, but the table we generated using SELECT INTO will not inherit primary keys, indexes etc. Since we would like to deal with duplicates on a table that doesn't have a primary key this suits our purpose.So lets go ahead and add insert top 50 records a couple of times so that we have 200 rows, 100 of them duplicate and waiting to be removed. I added 100 records 50 at a time by using GO [Count] syntax. Check it out, its cool.

INSERT INTO dba_PersonCopy 
 SELECT TOP 50 * FROM Person.Person
GO 2

Lets check if the table indeed has duplicates

SELECT BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 AdditionalContactInfo,
 Demographics,
 rowguid,
 ModifiedDate,
 COUNT (*) AS DuplicateCount
FROM dba_PersonCopy
 GROUP BY BusinessEntityID,
  Persontype,
  NameStyle,
  Title,
  FirstName,
  MiddleName,
  LastName,
  Suffix,
  EmailPromotion,
  AdditionalContactInfo,
  Demographics,
  rowguid,
  ModifiedDate
 HAVING COUNT(*) >1;

At this point I faced an error.

Error: Msg 305, Level 16, State 1, Line 25
The XML data type cannot be compared or sorted, except when using the IS NULL operator.

It seems the demographics column and AdditionalContactInfo are the offending ones. While I need to study more on this problem, since its out of topic, I will just get rid of the columns

ALTER TABLE dba_PersonCopy DROP COLUMN Demographics;
ALTER TABLE dba_PersonCopy DROP COLUMN AdditionalContactInfo;
GO

Modified query to check duplicates

SELECT BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 rowguid,
 ModifiedDate,
 COUNT (*) AS DuplicateCount
FROM dba_PersonCopy
 GROUP BY BusinessEntityID,
  Persontype,
  NameStyle,
  Title,
  FirstName,
  MiddleName,
  LastName,
  Suffix,
  EmailPromotion,
  rowguid,
  ModifiedDate
 HAVING COUNT(*) >1;

Take a look at the DuplicateCount column and you'll see duplicates. So how do we go about eliminating them?

1. Removing duplicates by Over() clause with a Row_number() function and a partition. Details about ROW_NUMBER() Function here

SELECT BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 rowguid,
 ModifiedDate,
 ROW_NUMBER() 
OVER 
(
 PARTITION BY
 BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 rowguid,
 ModifiedDate
 ORDER BY BusinessEntityID
) AS RowNumber
FROM dba_PersonCopy;

When you execute this, you'll see duplicates will have the RowNumber column greater than 1. We now need a CTE (Common table expression) to delete the duplicate rows by using a WHERE clause

 
BEGIN TRAN

;WITH DupesCTE 
AS
(SELECT BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 rowguid,
 ModifiedDate,
 ROW_NUMBER() 
OVER 
(
 PARTITION BY
 BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 rowguid,
 ModifiedDate
 ORDER BY BusinessEntityID
) AS RowNumber
FROM dba_PersonCopy
)
DELETE DupesCTE WHERE RowNumber >1
--ROLLBACK
-- 100 rows affected, duplicates removed!

Btw, I struggled for about half an hour with syntax issues here. It said there is a syntax error at the keyword 'AS'. The problem is, I missed a ";" at the beginnig! Thanks to the answer on this thread or I'd be stuck forever.

2. Next method is to delete duplicates by using a Surrogate Key. We'll create a Key column to uniquely identify each column since such a column doesn't exist on the table already

ALTER TABLE dba_PersonCopy 
 ADD TheKey INT IDENTITY NOT NULL
 CONSTRAINT PK_PersonCopyDuplicates PRIMARY KEY;
SELECT * FROM dba_PersonCopy

Now we issue a DELETE command to get rid of the duplicates

BEGIN TRAN
DELETE dba_PersonCopy
 WHERE EXISTS (
  SELECT * FROM dba_PersonCopy AS T1
   WHERE T1.BusinessEntityID = [dba_PersonCopy].BusinessEntityID
   AND T1.PersonType =[dba_PersonCopy].Persontype
   AND T1.NameStyle = [dba_PersonCopy].NameStyle
   AND T1.Title = [dba_PersonCopy].NameStyle
   AND T1.FirstName = [dba_PersonCopy].FirstName
   AND T1.LastName = [dba_PersonCopy].LastName
   AND T1.MiddleName = [dba_PersonCopy].MiddleName
   AND T1.Suffix = [dba_PersonCopy].Suffix
   AND T1.EmailPromotion = [dba_PersonCopy].EmailPromotion
   AND T1.rowguid = [dba_PersonCopy].rowguid
   AND T1.ModifiedDate = [dba_PersonCopy].ModifiedDate
   AND T1.TheKey > [dba_PersonCopy].TheKey)
--ROLLBACK

I'll just Drop the primary key and the column to test next example

ALTER TABLE dba_PersonCopy DROP CONSTRAINT PK_PersonCopyDuplicates;
ALTER TABLE dba_PersonCopy DROP COLUMN TheKey;

3. Deleting using SELECT DISTANT

SELECT DISTINCT 
 BusinessEntityID,
 Persontype,
 NameStyle,
 Title,
 FirstName,
 MiddleName,
 LastName,
 Suffix,
 EmailPromotion,
 rowguid,
 ModifiedDate INTO dba_PersonCopy_Noduplicates 
FROM dba_PersonCopy;

SELECT COUNT(*) FROM dba_PersonCopy_NoDuplicates
-- 100 rows remaining

I like the CTE option the best of all these methods

Return data from an UPDATE and DELETE -- OUTPUT clause

It is possible to view the changes you've made using the OUTPUT clause while you update a table. Lets test that. I created a copy of Person.Person table from AdventureWorks to test this with top 100 rows.

--Lets delete something 
USE AdventureWorks;
GO

BEGIN TRAN
DELETE TOP (10)  dba_PersonCopy 
OUTPUT deleted.* 

--ROLLBACK

You'll see The deleted Rows.

--Lets Do an update and see what was updated and what was it updated to

UPDATE dba_PersonCopy 
 SET PersonType = 'NT'
 OUTPUT deleted.Persontype AS OldPersonType, inserted.PersonType AS UpdatePersonType
WHERE rowguid = '92C4279F-1207-48A3-8448-4636514EB7E2'

More info at BOL

Saturday, September 1, 2012

Insert results of a stored procedures into a temporary table

I found many people asking similar questions on stackexhange and other places. Of all the methods, I liked this answer from stackoverflow which uses OPENROWSET. Below is the method (I didn't write this, just saving it here for my future reference. Full credit goes to the original posted on StackOverflow) You first need to enable adhoc distributed queries using sp_configure in order to use openrowset.

EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
EXEC sp_configure 'Ad Hoc Distributed Queries', 1
RECONFIGURE
GO


-- Then use the openrowset query to insert into temp table

SELECT * INTO #MyTempTable FROM OPENROWSET('SQLNCLI', 'Server=(local)\BINYOGA;Trusted_Connection=yes;',
     'EXEC getBusinessLineHistory')

SELECT * FROM #MyTempTable

I was reading the SQL Server Bible yesterday and learned that we can insert the results of a stored procedure into a table using the below syntax


INSERT INTO dbo.MyTable[(columns)]
  EXEC StoredProcedure Parameters;

So when I tried to test that, there was an error -

So, it looks like you are not allowed to use a temptable there. You need to create the table first (it can be a temp table) so that it has the right number of columns as the output.

-- Lets create a simple stored procedure
CREATE PROCEDURE dba_databaselist
AS
SELECT name FROM sysdatabases

--Create the temp table
CREATE TABLE #MyTempTable
(
 name varchar(30),
 
)

--insert from stored procedure

INSERT #myTempTable
 EXEC dba_databaselist

-- verify

SELECT * FROM #myTempTable

UPDATE: Here is another way to do it. Declare a table variable and insert the results of stored procedure to the table.

 

-- Lets create a stored procedure. I am gonna use AdventureWorks database
USE AdventureWorks
GO
CREATE PROC dba_Person
AS
SET NOCOUNT ON;
SELECT FirstName, Lastname 
FROM Person.Person;
--another result set
SELECT FirstName, Lastname
FROM Family.dbo.Person
RETURN;

--Lets execute the stored procedure

Exec AdventureWorks.dbo.dba_Person

-- You'll see two result sets, now lets declare a table variable and insert the data from result sets

USE AdventureWorks;
GO
DECLARE @MyTableVar table(
    FirstName varchar(30),
 LastName varchar(30)
    );

-- Now, INSERT/EXECUTE command
INSERT @MyTableVar
 EXEC dbo.dba_Person
-- Verify
SELECT * FROM @MyTableVar

NOTE: I have emulated examples from SQL Server Bible 2008 book in this post.