Piping, Primary Keys and Three Hours I'll not get back

time to read 5 min | 826 words

Okay, so I mentioned that I'd a query that violated a primary key constraint, and I couldn't figure out why. Well, I finally got a hold of what was going on. It's simple and non-obvious at the same time. Let's start with a code example that shows the problem and then I'll put in a couple of words about the reasons behind it. First, let's create a table with a unique constraint (easier than a primary key):

CREATE TABLE Foo

(

      Bar INT UNIQUE

)

Now, let's insert some data into the table:

INSERT INTO Foo

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3

So far, everything is nice and cozy, but now I want to add more numbers, so I want to make sure that I won't break the code, so now I'll add a guard clause to the next inserts:

INSERT INTO Foo

SELECT Kabum FROM

(

      SELECT 4 as Kabum UNION ALL

      SELECT 5 as Kabum UNION ALL

      SELECT 5 as Kabum

) Bang

WHERE NOT EXISTS

      (SELECT 1 FROM Foo where Bang.Kabum = Bar)

Running the above result in this error:

Violation of UNIQUE KEY constraint 'UQ__Foo__5832119F'. Cannot insert duplicate key in object 'dbo.Foo'.

Do you realize what happened here? We failed because when we checked it was just fine to put 5 into Foo, but when we actually did put the second 5 into Foo, that caused a constraint violation. Go figure that when you're dealing with sixteen million records.

The solution to this problem is to add a distinct to the select statement, that would make sure that your own query isn't causing the very same query to fail.

On that note, I wonder if I can call it a concurrency bug.