Optimization Story

time to read 11 min | 2058 words

I just got out of struggling with a very long query. If you missed it, I'm working on SQL quite a bit lately, and I keep discovering new things about it. Here is a query similar to the one I was working on, the idea is to update the Summary table with new information about actions that haven't been summed yet. For instance, a transaction that occured three days ago, but only now entered the database. This query took over 20 minutes to run before I killed it. (Sorry for the rather complex query, I'm afraid it doesn't really translate itself with simpler ones).

INSERT INTO LogSummary

(

      PartitionId,

      SummaryDate,

      ActionId,  

      SuccessCount,

      FailCount        

)

SELECT

      dbo.LogSummaryPartitionId(td.TradeDate),

      td.TradeDate,

      la.ActionId,

      SUM(case Result when 'V' then 1 else 0 end) as SuccessCount,

      SUM(case Result when 'X' then 1 else 0 end) as FailCount

FROM LogActions la join (SELECT TradeDate FROM TradeDates where Status = 1) td

ON    la.Date = td.TradeDate

WHERE NOT EXISTS (SELECT 1 FROM LogSummary ls

      WHERE ls.PartitonId = dbo.LogSummaryPartitionId(td.TradeDate)

      AND   ls.SummaryDate = td.TradeDate

      AND ls.ActionId = la.ActionID)

GROUP BY ls.ActionId, td.TradeDate

 

I know that this query doesn't make much sense, by the way, it is the best I could come up with for the purpose of discussion.

If we look at the query we can see that we have a call to LogSummaryPartitionId() function and a subquery make sure we are not inserting duplicate rows (which also call to LogSummaryPartitionId() ). When you consider that the first LogSummaryPartitionId() call and the sub query needed to run per row, and that the second LogSummaryPartitionId() call run per each row in both queries, I think you can realize why it took such a long time to run.

Considerring that the information from LogSummaryPartitionId() doesn't change that often, I decided to move it outside the query, so it would be calculated once, instead of per row. It ended up looking like this:

SELECT dbo.LogSummaryPartitionId(TradeDate) as PartitionId, TradeDate

INTO #TradeDateWithPartitionId

FROM TradeDates WHERE Status = 1

 

INSERT INTO LogSummary

(

      PartitionId,

      SummaryDate,

      ActionId,  

      SuccessCount,

      FailCount        

)

SELECT

      td.PartitonId,

      td.TradeDate,

      la.ActionId,

      SUM(case Result when 'V' then 1 else 0 end) as SuccessCount,

      SUM(case Result when 'X' then 1 else 0 end) as FailCount

FROM LogActions la join #TradeDateWithPartitionId td

ON    la.Date = td.TradeDate

WHERE NOT EXISTS (SELECT 1 FROM LogSummary ls

      WHERE ls.PartitonId = td.PartitonId

      AND   ls.SummaryDate = td.TradeDate

      AND ls.ActionId = la.ActionID)

GROUP BY ls.ActionId, td.TradeDate

 

DROP TABEL #TradeDateWithPartitionId

This query run in twenty seconds. Now that I call impressive.