Optimization Story
I just got out of struggling with a very long query. If you missed it, I'm working on SQL quite a bit lately, and I keep discovering new things about it. Here is a query similar to the one I was working on, the idea is to update the Summary table with new information about actions that haven't been summed yet. For instance, a transaction that occured three days ago, but only now entered the database. This query took over 20 minutes to run before I killed it. (Sorry for the rather complex query, I'm afraid it doesn't really translate itself with simpler ones).
INSERT INTO LogSummary
(
PartitionId,
SummaryDate,
ActionId,
SuccessCount,
FailCount
)
SELECT
dbo.LogSummaryPartitionId(td.TradeDate),
td.TradeDate,
la.ActionId,
SUM(case Result when 'V' then 1 else 0 end) as SuccessCount,
SUM(case Result when 'X' then 1 else 0 end) as FailCount
FROM LogActions la join (SELECT TradeDate FROM TradeDates where Status = 1) td
ON la.Date = td.TradeDate
WHERE NOT EXISTS (SELECT 1 FROM LogSummary ls
WHERE ls.PartitonId = dbo.LogSummaryPartitionId(td.TradeDate)
AND ls.SummaryDate = td.TradeDate
AND ls.ActionId = la.ActionID)
GROUP BY ls.ActionId, td.TradeDate
I know that this query doesn't make much sense, by the way, it is the best I could come up with for the purpose of discussion.
If we look at the query we can see that we have a call to LogSummaryPartitionId() function and a subquery make sure we are not inserting duplicate rows (which also call to LogSummaryPartitionId() ). When you consider that the first LogSummaryPartitionId() call and the sub query needed to run per row, and that the second LogSummaryPartitionId() call run per each row in both queries, I think you can realize why it took such a long time to run.
Considerring that the information from LogSummaryPartitionId() doesn't change that often, I decided to move it outside the query, so it would be calculated once, instead of per row. It ended up looking like this:
SELECT dbo.LogSummaryPartitionId(TradeDate) as PartitionId, TradeDate
INTO #TradeDateWithPartitionId
FROM TradeDates WHERE Status = 1
INSERT INTO LogSummary
(
PartitionId,
SummaryDate,
ActionId,
SuccessCount,
FailCount
)
SELECT
td.PartitonId,
td.TradeDate,
la.ActionId,
SUM(case Result when 'V' then 1 else 0 end) as SuccessCount,
SUM(case Result when 'X' then 1 else 0 end) as FailCount
FROM LogActions la join #TradeDateWithPartitionId td
ON la.Date = td.TradeDate
WHERE NOT EXISTS (SELECT 1 FROM LogSummary ls
WHERE ls.PartitonId = td.PartitonId
AND ls.SummaryDate = td.TradeDate
AND ls.ActionId = la.ActionID)
GROUP BY ls.ActionId, td.TradeDate
DROP TABEL #TradeDateWithPartitionId
This query run in twenty seconds. Now that I call impressive.
Comments
Comment preview