Wednesday, June 5, 2013

Manipulating Excel data with Python: Building SQL statements from data in excel

I got an excel doc with a couple of sheets from our dev team. Each sheet has two columns from one of our database tables. Lets call these columns CurrentID, LegacyID. I need to update the LegacyID based on CurrentID. Each sheet has more than 2000 values and it is pointless trying to build the SQL statements manually. Its pretty easy to get this done using TSQL. Just import this data into a temporary table and build update statements using TSQL. Since I have been trying to learn Python, I thought I'd give it a try. I used xlrd module. Check the documentation here. Below code writes the update statements to a text file from sheet1.

import xlrd

book = xlrd.open_workbook('D:\Scripts\UpdateID01.xls')
sheet = book.sheet_by_index(0)

myList = []
for i in range(1,sheet.nrows):
   myList.append(sheet.row_values(i))

#print myList
outFile = open('D:\Scripts\update.txt', 'wb')

for i in myList:
    outFile.write("\nUPDATE MyTable SET LegacyID = '%s' WHERE CurrentID = '%s'" %( int(i[1]), str(i[0])))

The second sheet had white spaces at the end of CurrentID values and the update statements will fail if they are not removed. To deal with it, use str.replace method.

import xlrd

book = xlrd.open_workbook('D:\Scripts\UpdateID01.xls')
sheet = book.sheet_by_index(1)

myList = []
for i in range(1,sheet.nrows):
   myList.append(sheet.row_values(i))

#print myList
outFile = open('D:\Scripts\updatemyesi.txt', 'wb')

for i in myList:
    i[0] = str(i[0]).replace('  ', '')
    outFile.write("\nUPDATE MyTable SET LegacyID = 0 WHERE CurrentID = '%s'" %(str(i[0])))