Transactions and concurrency

time to read 25 min | 4844 words

I just had a head-scratching, hair-pulling, what-the-hell-is-going-on bug. Basically, a classic producer / consumer issue. Each work item is composed of several sub-items, which should always be processed as a single unit (exactly one time). The problem was that I suddenly started to get duplicate processing of the same work item, but with different sub-items each time.

I went over the code with a comb, and I couldn't see anything wrong, I investigated the database, and everything was fine there as well. I knew that I was in trouble when I considerred going down to the SQL Server protocol level and check if somehow network problems were causing this issue.

Here is a simplified version of the producer (and yes, I would never write this kind of code for production, test, or anything but a short and to the point demo). As you can see, it merely generate 500 records into a table.

private static void Producer()

{

SqlConnection connection = new SqlConnection(connectionString);

int count = 0;

while (true)

{

connection.Open();

SqlTransaction sqlTransaction = connection.BeginTransaction(isolationLevel);

for (int i = 0; i < 500; i++)

{

SqlCommand sqlCommand = connection.CreateCommand();

sqlCommand.Transaction = sqlTransaction;

sqlCommand.CommandText = "INSERT INTO t (Id) VALUES(@p1)";

sqlCommand.Parameters.AddWithValue("@p1", count);

sqlCommand.ExecuteNonQuery();

sqlCommand.Dispose();

}

sqlTransaction.Commit();

Console.WriteLine("Wrote 500 records with count " + count);

count += 1;

connection.Close();

}

And here is the consumer, which read from the table, and ensure that it reads in batches of 500:

private static void Consumer()

{

SqlConnection connection = new SqlConnection(connectionString);

while (true)

{

connection.Open();

SqlTransaction sqlTransaction = connection.BeginTransaction(isolationLevel);

SqlCommand sqlCommand = connection.CreateCommand();

sqlCommand.Transaction = sqlTransaction;

sqlCommand.CommandText = "SELECT COUNT(*) FROM t GROUP BY id";

SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();

if (sqlDataReader.RecordsAffected != -1)

Console.WriteLine("Read: {0}", sqlDataReader.RecordsAffected);

while (sqlDataReader.Read())

{

int count = sqlDataReader.GetInt32(0);

Console.WriteLine("Count = {0}", count);

if (count != 500)

Environment.Exit(1);

}

sqlDataReader.Dispose();

sqlCommand.Dispose();

sqlTransaction.Commit();

connection.Close();

}

Note that I have painted the isolationLevel red in both. Here is the code that run both methods:

private static void Main()

{

Delete();

new Thread(Producer).Start();

new Thread(Consumer).Start();

}

If I set isolationLevel to ReadCommited or RepeatableRead, this consistently fails within a couple of seconds, it manage to do partial read of the records that were inserted by the consumer. I I set the isolationLevel to Serializable or Snapshot, it behaves as expected.

I may be missing something, but I would expect ReadCommited to only allow the second transaction to read... well, commited rows. Just to point out, the real scenario doesn't involve aggregation, and I am seeing the same issue. I suppose that the new records were commited (and thus made visible) when the query already scanned that part of the table, and thus it missed some one the rows, while snapshot and serializable force either a static image all the way through or waiting till the end.

Any comments?

Tweet Share Share 25 comments

Tags:

Comments

25 Mar 2007
18:01 PM

Stuart Carnie

Oren, what is the actual failure?

Just not sure what you mean by 'fails in a couple of seconds'.

Cheers,

Stuart

25 Mar 2007
19:08 PM

Andrew

Deadlock probably?

25 Mar 2007
19:24 PM

Ayende Rahien

@Stuart,

I mean that it fails by giving a result that is not 500.

This means that it is able to see only a partial result from the second transaction.

25 Mar 2007
19:25 PM

Ayende Rahien

@Andrew,

There is no dead look.

it is just that the second transaction is able to see partial results from the first transaction

26 Mar 2007
00:34 AM

Stuart Carnie

Read this article : http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/16/1345.aspx

I think this has your answer.

Cheers,

Stu

26 Mar 2007
01:33 AM

Jeremy Boyd

Interesting issue. Replicating your code I can yield the same results and the behavior seems clearly wrong. From subjective testing it seems to clearly be some kind of race issue around the row/page locks being acquired and released between the two as you can make it work cleanly by changing the Consumer statement to:

SELECT COUNT(*) FROM t WITH (TABLOCK) GROUP BY id

While it seems that technicallly the transaction is working correctly this is a bit of a trap for the unknowing about the consistency of the read results..

26 Mar 2007
07:39 AM

Ayende Rahien

@Stuart,

That was what I was thinking, but this is plain ugly.

I expected that either all the new rows or none of the new rows would be visible to the transaction.

This makes me wonder what is the point of read commited in this scenario.

26 Mar 2007
07:41 AM

Ayende Rahien

@Jeremy,

I understand what is going on, it is basically scanning the table and finding newly commited rows, but skipping some of the rows commited in the same transaction.

It looks like exactly the wrong thing to do as far as I am concerned, certainly a very nasty trap.

I guess that is why Oracle devs are laughing at SQL Server, and why there is the new Snapshot level

26 Mar 2007
12:41 PM

Tomas Restrepo

I haven't tried this yet, but as far as I remember, neither the Read Committed or Repeatable Reads isolation levels will prevent you from reading phantoms, which might be part of the issue.

However, I'm also not clear as to why the consumer is expecting to get a count of 500... since the select query groups by the ID. To be consistent, shouldn't the consumer also loop asking for the count of records for a specific id?

26 Mar 2007
12:47 PM

Ayende Rahien

@Tomas,

This isn't the case of phantom rows, at least not as I understand them. There isn't any deletion done at any point in time.

The producer inserts 500 rows per transaction, the consumer reads the counts per id.

If there was true isolation, it would always see 500 records per ID, the issue is that it manages to see a partial result, and get a number of rows that is different than 500.

26 Mar 2007
13:09 PM

Tomas Restrepo

@Ayende: As I understand them, phantoms apply to both added as well as deleted rows. But I'll be the first one to say I'm not sure if that's the case here!

26 Mar 2007
13:32 PM

El Guapo

Does anybody actually know how the isolation levels work? Does they guy who wrote it even know?

26 Mar 2007
13:58 PM

Ayende Rahien

@El,

Yes, they do.

That doesn't mean that it is easy to figure out.

Like most concurrent stuff, this is hard.

26 Mar 2007
17:25 PM

Stuart Carnie

@Ayende,

Whilst I'm probably stating the obvious, it is interesting that you can change the semantics of the select using table hints (HOLDLOCK), so that whilst the INSERT is using ReadCommitted, the SELECT can use the consistency of the SERIALIZABLE locking semantics.

I did try this, and it never failed.

Also, the smaller the size of the insert transaction, the less frequently it fails.

Cheers,

Stuart

27 Mar 2007
22:30 PM

Alessandro Riolo

SQL Server 2005 currently support 6 different Isolation Level behaviours, and only 2 of them (Snapshot and Serializable) are preventing Phantom reads. About your code, are you sure the actual issued queries have not any WHERE clause?

27 Mar 2007
22:35 PM

Ayende Rahien

They definitely have a where clause.

27 Mar 2007
22:37 PM

Ayende Rahien

I thought that phantom reads where for records that were deleted, not partially created.

What this shows is that some of the information is made available to the second transaction, but not all of it.

28 Mar 2007
10:11 AM

Alessandro Riolo

I took the following from the definition of the Phantom phenomena in the SQL standard (actually I took it from SQL-99, but I recall it as being more or less the same on SQL-92):

P3 (‘‘Phantom’’): SQL-transaction T1 reads the set of rows N that satisfy some <search condition>.

SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same <search condition>, it obtains a different collection of rows.

28 Mar 2007
10:12 AM

Ayende Rahien

I don't have an issue with that, I have an issue with geting a partial view of the results of T2

28 Mar 2007
10:21 AM

Alessandro Riolo

Incidentally, while looking for the SQL-92 standard, I found this very interesting paper from Microsoft Research:

ftp://ftp.research.microsoft.com/pub/tr/tr-95-51.pdf

It is the theorical foundation for the Snapshot Isolation Level, the feature introduced in SQL Server 2005. Although probably they didn't invented that, it is quite amazing to discover that the time to market from their R&D awareness to the moment in which it was released in their flagship database product was more or less 10 years ..

04 Apr 2007
18:42 PM

Alessandro Riolo

I just wrote a post about this issue:

http://ale.riolo.co.uk/2007/04/transaction-and-concurrency-on-sql.html

I found out that using a table with a primary key (I used a surrogate key through identity column) everything work as expected.

05 Apr 2007
00:20 AM

Ayende Rahien

I have run into this issue on production, in a table that most definitely had a PK (identity).

An identity PK (clustered) would basically sort the table, so the scan would tend to be linear. The code that I had used a where clause that filtered by several columns, thus probably producing a non linear table scan.

05 Apr 2007
01:52 AM

Vitaly

Ayende,

Have you tried to run your code with the READ_COMMITTED_SNAPSHOT db option set to ON? This will keep a row version for rows that are affected by the statement in the transaction, and wil present those versions of the row instead of doing locking. I would imagine that this setting (which is OFF by default) would create consistent reads/writes while not impacting concurrency too much.

05 Apr 2007
01:58 AM

Ayende Rahien

Is it the same as SNAPSHOT isolation?

05 Apr 2007
03:29 AM

Vitaly

Semantically, they're the same. From what I understand, the SNAPSHOT isolation leve is effective at the transaction level (i.e. the snapshot is created when the transaction starts), whereas the READ COMMITTED with the snapshot option is at a statement level (i.e. snapshot is created before the statement is executed).

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB