Transactions and concurrency

time to read 25 min | 4844 words

I just had a head-scratching, hair-pulling, what-the-hell-is-going-on bug. Basically, a classic producer / consumer issue. Each work item is composed of several sub-items, which should always be processed as a single unit (exactly one time). The problem was that I suddenly started to get duplicate processing of the same work item, but with different sub-items each time.

I went over the code with a comb, and I couldn't see anything wrong, I investigated the database, and everything was fine there as well. I knew that I was in trouble when I considerred going down to the SQL Server protocol level and check if somehow network problems were causing this issue.

Here is a simplified version of the producer (and yes, I would never write this kind of code for production, test, or anything but a short and to the point demo). As you can see, it merely generate 500 records into a table.

private static void Producer()

{

       SqlConnection connection = new SqlConnection(connectionString);

       int count = 0;

       while (true)

       {

              connection.Open();

              SqlTransaction sqlTransaction = connection.BeginTransaction(isolationLevel);

              for (int i = 0; i < 500; i++)

              {

                     SqlCommand sqlCommand = connection.CreateCommand();

                     sqlCommand.Transaction = sqlTransaction;

                     sqlCommand.CommandText = "INSERT INTO t (Id) VALUES(@p1)";

                     sqlCommand.Parameters.AddWithValue("@p1", count);

                     sqlCommand.ExecuteNonQuery();

                     sqlCommand.Dispose();

              }

              sqlTransaction.Commit();

              Console.WriteLine("Wrote 500 records with count " + count);

              count += 1;

              connection.Close();

       }

}

And here is the consumer, which read from the table, and ensure that it reads in batches of 500:

private static void Consumer()

{

       SqlConnection connection = new SqlConnection(connectionString);

       while (true)

       {

              connection.Open();

              SqlTransaction sqlTransaction = connection.BeginTransaction(isolationLevel);

              SqlCommand sqlCommand = connection.CreateCommand();

              sqlCommand.Transaction = sqlTransaction;

              sqlCommand.CommandText = "SELECT COUNT(*) FROM t GROUP BY id";

              SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();

              if (sqlDataReader.RecordsAffected != -1)

                     Console.WriteLine("Read: {0}", sqlDataReader.RecordsAffected);

              while (sqlDataReader.Read())

              {

                     int count = sqlDataReader.GetInt32(0);

                     Console.WriteLine("Count = {0}", count);

                     if (count != 500)

                           Environment.Exit(1);

              }

              sqlDataReader.Dispose();

              sqlCommand.Dispose();

              sqlTransaction.Commit();

              connection.Close();

       }

}

Note that I have painted the isolationLevel red in both. Here is the code that run both methods:

private static void Main()

{

       Delete();

       new Thread(Producer).Start();

       new Thread(Consumer).Start();

}

If I set isolationLevel to ReadCommited or RepeatableRead, this consistently fails within a couple of seconds, it manage to do partial read of the records that were inserted by the consumer. I I set the isolationLevel to Serializable or Snapshot, it behaves as expected.

I may be missing something, but I would expect ReadCommited to only allow the second transaction to read... well, commited rows. Just to point out, the real scenario doesn't involve aggregation, and I am seeing the same issue. I suppose that the new records were commited (and thus made visible) when the query already scanned that part of the table, and thus it missed some one the rows, while snapshot and serializable force either a static image all the way through or waiting till the end.

Any comments?