The importance of comments

time to read 22 min | 4333 words

A while ago I got a comment to a post of mine “if you [ in here the commenter is talking to another commenter ] think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.

Let us look at the following piece of code.

   1: private long GetNewLength(long current)
   2: {
   3:     DateTime now = DateTime.UtcNow;
   4:     TimeSpan timeSinceLastIncrease = (now - _lastIncrease);
   5:     if (timeSinceLastIncrease.TotalSeconds < 30)
   6:     {
   7:         _increaseSize = Math.Min(_increaseSize * 2, MaxIncreaseSize);
   8:     }
   9:     else if (timeSinceLastIncrease.TotalMinutes > 2)
  10:     {
  11:         _increaseSize = Math.Max(MinIncreaseSize, _increaseSize / 2);
  12:     }
  13:     _lastIncrease = DateTime.UtcNow;
  14:     current = Math.Max(current, 256 * PageSize);
  15:     var actualIncrease = Math.Min(_increaseSize, current / 4);
  16:     return current + actualIncrease;
  17: }

This piece of code is responsible for decided the new file size when we run out of room in the current space we use.

That is 10 lines of code. And what happens is quite clear, even if I say so myself. But the why for this code is lost. In particular, there is a lot of reasoning behind the actual logic here. You can see that we play with the actual increase size, to make sure that if we increase often, we will reserve more space from the OS. That seems clear enough, but what about lines 14 – 15?

In this case, just before those two lines we have:

   1: // At any rate, we won't do an increase by over 25% of current size, to prevent huge empty spaces
   2: // and the first size we allocate is 256 pages (1MB)
   3: // 
   4: // The reasoning behind this is that we want to make sure that we increase in size very slowly at first
   5: // because users tend to be sensitive to a lot of "wasted" space. 
   6: // We also consider the fact that small increases in small files would probably result in cheaper costs, and as
   7: // the file size increases, we will reserve more & more from the OS.
   8: // This also plays avoids "I added 300 records and the file size is 64MB" problems that occur when we are too
   9: // eager to reserve space
  10:  

And yes, this isn’t about comments such as // increment by 1. I try writing comments like that when explaining policy or heuristics. Or when I am doing something pretty funky all around for some (hopefully) good reason.

Another example might be this:

   1: // here we try to optimize the amount of work we do, we will only 
   2: // flush the actual dirty pages, and we will do so in sequential order
   3: // ideally, this will save the OS the trouble of actually having to flush the 
   4: // entire range
   5: long start = sortedPagesToFlush[0];
   6: long count = 1;
   7: for (int i = 1; i < sortedPagesToFlush.Count; i++)
   8: {
   9:     var difference = sortedPagesToFlush[i] - sortedPagesToFlush[i - 1];
  10:     // if the difference between them is not _too_ big, we will just merge it into a single call
  11:     // we are trying to minimize both the size of the range that we flush AND the number of times
  12:     // we call flush, so we need to balance those needs.
  13:     if (difference < 64)
  14:     {
  15:         count += difference;
  16:         continue;
  17:     }
  18:     FlushPages(start, count);
  19:     start = sortedPagesToFlush[i];
  20:     count = 1;
  21: }
  22: FlushPages(start, count);

Again, relatively short code. And really easy to follow. But good luck trying to figure out that this code is responsible for a 100% perf boost, or that you need to balance multiple aspects to get an optimal solution without getting comments.

Now, I think that when people talk about “good code doesn’t need comments”, they think about code like this (all samples taken from a popular OSS project):

   1: var query = _backInStockSubscriptionRepository.Table;
   2: //customer
   3: query = query.Where(biss => biss.CustomerId == customerId);
   4: //store
   5: if (storeId > 0)
   6:     query = query.Where(biss => biss.StoreId == storeId);
   7: //product
   8: query = query.Where(biss => !biss.Product.Deleted);
   9: query = query.OrderByDescending(biss => biss.CreatedOnUtc);

Yes, I can read, thank you. And here is an example of two comments, the first one good, and the second bad:

   1: if (_commonSettings.UseStoredProceduresIfSupported && _dataProvider.StoredProceduredSupported)
   2: {
   3:     //stored procedures are enabled and supported by the database. 
   4:     //It's much faster than the LINQ implementation below 
   5:  
   6:     #region Use stored procedure
   7:     
   8:     //pass category identifiers as comma-delimited string
   9:     string commaSeparatedCategoryIds = "";
  10:     if (categoryIds != null)
  11:     {
  12:         for (int i = 0; i < categoryIds.Count; i++)

The first comment explain why. And that is crucial. You should only need to explain the how very rarely, and then, you need a comment to justify why you need the comment. (Performance, maybe?)