The importance of comments

time to read 22 min | 4333 words

A while ago I got a comment to a post of mine “if you [ in here the commenter is talking to another commenter ] think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.

Let us look at the following piece of code.

   1: private long GetNewLength(long current)

   2: {

   3:     DateTime now = DateTime.UtcNow;

   4:     TimeSpan timeSinceLastIncrease = (now - _lastIncrease);

   5:     if (timeSinceLastIncrease.TotalSeconds < 30)

   6:     {

   7:         _increaseSize = Math.Min(_increaseSize * 2, MaxIncreaseSize);

   8:     }

   9:     else if (timeSinceLastIncrease.TotalMinutes > 2)

  10:     {

  11:         _increaseSize = Math.Max(MinIncreaseSize, _increaseSize / 2);

  12:     }

  13:     _lastIncrease = DateTime.UtcNow;

  14:     current = Math.Max(current, 256 * PageSize);

  15:     var actualIncrease = Math.Min(_increaseSize, current / 4);

  16:     return current + actualIncrease;

  17: }

This piece of code is responsible for decided the new file size when we run out of room in the current space we use.

That is 10 lines of code. And what happens is quite clear, even if I say so myself. But the why for this code is lost. In particular, there is a lot of reasoning behind the actual logic here. You can see that we play with the actual increase size, to make sure that if we increase often, we will reserve more space from the OS. That seems clear enough, but what about lines 14 – 15?

In this case, just before those two lines we have:

   1: // At any rate, we won't do an increase by over 25% of current size, to prevent huge empty spaces

   2: // and the first size we allocate is 256 pages (1MB)

   3: //

   4: // The reasoning behind this is that we want to make sure that we increase in size very slowly at first

   5: // because users tend to be sensitive to a lot of "wasted" space.

   6: // We also consider the fact that small increases in small files would probably result in cheaper costs, and as

   7: // the file size increases, we will reserve more & more from the OS.

   8: // This also plays avoids "I added 300 records and the file size is 64MB" problems that occur when we are too

   9: // eager to reserve space

10:

And yes, this isn’t about comments such as // increment by 1. I try writing comments like that when explaining policy or heuristics. Or when I am doing something pretty funky all around for some (hopefully) good reason.

Another example might be this:

   1: // here we try to optimize the amount of work we do, we will only

   2: // flush the actual dirty pages, and we will do so in sequential order

   3: // ideally, this will save the OS the trouble of actually having to flush the

   4: // entire range

   5: long start = sortedPagesToFlush[0];

   6: long count = 1;

   7: for (int i = 1; i < sortedPagesToFlush.Count; i++)

   8: {

   9:     var difference = sortedPagesToFlush[i] - sortedPagesToFlush[i - 1];

  10:     // if the difference between them is not _too_ big, we will just merge it into a single call

  11:     // we are trying to minimize both the size of the range that we flush AND the number of times

  12:     // we call flush, so we need to balance those needs.

  13:     if (difference < 64)

  14:     {

  15:         count += difference;

  16:         continue;

  17:     }

  18:     FlushPages(start, count);

  19:     start = sortedPagesToFlush[i];

  20:     count = 1;

  21: }

  22: FlushPages(start, count);

Again, relatively short code. And really easy to follow. But good luck trying to figure out that this code is responsible for a 100% perf boost, or that you need to balance multiple aspects to get an optimal solution without getting comments.

Now, I think that when people talk about “good code doesn’t need comments”, they think about code like this (all samples taken from a popular OSS project):

   1: var query = _backInStockSubscriptionRepository.Table;

   2: //customer

   3: query = query.Where(biss => biss.CustomerId == customerId);

   4: //store

   5: if (storeId > 0)

   6:     query = query.Where(biss => biss.StoreId == storeId);

   7: //product

   8: query = query.Where(biss => !biss.Product.Deleted);

   9: query = query.OrderByDescending(biss => biss.CreatedOnUtc);

Yes, I can read, thank you. And here is an example of two comments, the first one good, and the second bad:

   1: if (_commonSettings.UseStoredProceduresIfSupported && _dataProvider.StoredProceduredSupported)

   2: {

   3:     //stored procedures are enabled and supported by the database.

   4:     //It's much faster than the LINQ implementation below

5:

   6:     #region Use stored procedure

7:

   8:     //pass category identifiers as comma-delimited string

   9:     string commaSeparatedCategoryIds = "";

  10:     if (categoryIds != null)

  11:     {

  12:         for (int i = 0; i < categoryIds.Count; i++)

The first comment explain why. And that is crucial. You should only need to explain the how very rarely, and then, you need a comment to justify why you need the comment. (Performance, maybe?)

Tweet Share Share 27 comments

Tags:

Comments

29 Aug 2013
09:40 AM

tobi

I wish this blog was read by my work colleagues. They feel compelled to add the most useless XML docs to all members, but document nothing else. A property FirstName does not need to be documented as "The person's first name.".

29 Aug 2013
09:44 AM

Xing Yang

In the first example, parts of the comment can be replaced by two extra variables:

long firstAllocateSize = 256 * PageSize long maxAllowedIncrease = current / 4

And other parts can be expressed in test case name:

[Test] public void Size_increment_should_be_limited_so_my_customer_wont_scream()

Comment can be good sometimes, but better when it's organic part of the code.

29 Aug 2013
09:55 AM

Xing Yang

For the second example, maybe a batchSize variable will do the same.

29 Aug 2013
09:56 AM

Ayende Rahien

Xing, Tests aren't located near the code, they cannot explain _reasoninig_.

29 Aug 2013
10:48 AM

Arnis Lapsa

This applies when code is in more simple form without 'why' than it would be when 'why' inclusion were forced with abstractions that do just that.

Single responsibility principle plays important role in this.

29 Aug 2013
11:37 AM

Rob Lyndon

I think the days of

// Make a cup of tea x.MakeACupOfTea();

are numbered, if not entirely gone. Few developers who are committed to learning would advocate that kind of verbosity. In a similar vein, I'm seeing less and less of this sort of thing.

// Some lame smart-alec remark x.MakeACupOfTea();

Top tip: You're not as funny as you think you are.

However, I have had at least one manager who maintained that ALL comments were evil, and should be expunged from the codebase. In some ways, I can understand where he's coming from. In this day and age, it is perfectly possible to make a career out of writing bog-standard LOB e-commerce apps to sell cheese graters online. Fire up your .NET MVC4 template; write a few views, controllers and API calls; steer clear of innovation; use standard, time-tested patterns everywhere. If you need to explain yourself, you're doing too much. Ultimately, that's a bit of a soul-destroying way of working, and a waste of your own talent. But it pays the bills, and for some people, that is the only consideration worth your time.

29 Aug 2013
11:44 AM

Carlos Ble

Although the comments for your first snippet make sense, as they explain the rationale behind the algorithm, I think the code can be rewritten to be more readable:

1: private long NewLength(long currentSize) 2: { 3: increaseSize = calculateIncreaseSizeBasedOnLengthChangeFrequency(); 4: currentSize = atLeastTheMininumNumberOfPagesConfigured(currentSize); 5: var actualIncrease = recalculateIncreaseAlwaysKeepingItBelow25percent(increaseSize, current); 6: return currentSize + actualIncrease; 7: }

Or:

1: private long NewLength(long currentSize) 2: { 3: return currentSize + recalculateIncreaseAlwaysKeepingItBelow25percent( 4: increaseBasedOnLengthChangeFrequency(), 5: atLeastTheMininumNumberOfPagesConfigured(currentSize)); 6: }

I have removed the underscore from the instance members on purpose. I don't think it helps at all. I also have removed the "Get" from the method, it does not bring any value to me.

In general, comments are necessary when they add context to the code, when that context can't be inferred from the code itself. Usually referring to where the code is plugged, how is it consumed. Complex algorithms certainly deserve comments.

Thanks :-)

29 Aug 2013
11:59 AM

Rafal

what about:

var increase = lastUpdateTime < 2 minutes ? 20% : 10% this doesn't require comments and probably works just as well

29 Aug 2013
12:29 PM

Joshka

The comment about the first increase being 1MB seems to be a lie. The first allocated size by this code is at least 1MB + MinIncreaseSize.

29 Aug 2013
12:56 PM

Gene Hughson

Absolutely agree.

Both the 'throw in a bunch of inane comments' and 'all comments are teh evil' camp seem to come down to preferring absolutes over thought. The right measure, IMHO, is value, not slogan conformance.

29 Aug 2013
13:15 PM

Rasmus Schultz

@Carlos: I'm not sure sure if you're being facetious or not :-)

Using names like "recalculateIncreaseAlwaysKeepingItBelow25percent" instead of "recalculateIncrease" communicates both what you're doing and how you're doing it - but it still doesn't communicate why. When there's a longer explanation behind the "why", I don't think you could cram all of that into a method-name even if you wanted to. And when a method-name communicates how it's doing something, you add a new burden of renaming the method every time the "how" changes.

One way to make code more self-documenting, is to avoid using literal values - I use lots of constants to make methods more self-documenting, for example, EXPIRATION_TIME when seen in an expression computing an expiration date, is more semantic than for example the literal expression 606024.

Most people use constants for values they need to use more than once - try using them simply to give name and meaning to certain values or expressions. The code shown in this post could use some of that.

29 Aug 2013
13:15 PM

joshka

Second example (pseudo-ish code)

foreach (var pageRange in pages.Sort().PagesRangesOptimalToFlushInASingleCall())
{
    pageRange.Flush();
}

...

Fourth example:

IWhateverStrategy FastestWhateverStrategyAvailable()
{
    if (CanUseStoredProcedures())
        return new StoredProcedureWhateverStrategy();
    else
        return new LinqWhateverStrategy();
}

...

29 Aug 2013
13:23 PM

Jeff

I tend to lean towards the "you really shouldn't need comments" camp (I'm not saying it's an absolute, but I don't know that I've seen a really good example of 'need'). This blog expresses a common argument in support of comments. As Xing and Carlos have alluded, when you find yourself agreeing that there should be a comment to explain the 'why', it's typically a case where the code violates a SOLID principle or exercise poor naming choices.

I would aver that the examples above suffer from both of those mishaps.

29 Aug 2013
13:47 PM

Rob Lyndon

@Jeff, I really don't understand why you'd say that when the comment explains precisely what user experience the strategy is addressing, and the attitudes of the users, based on their feedback over several years. Do you really think it's better to try to embed that information within variable names and method names?

You could say that such information belongs in external documentation, but I'd say it's at its most useful when it's right there next to the place it's being applied.

I don't see how the code snippet violates any kind of SOLID Principle: certainly not the Single Responsibilty Principle. The method is called GetNewLength(); it does what it says it does. You can inject an external strategy if you want, but unless other strategies are used, or that strategy is used elsewhere, that's just code bloat for the sake of dogma.

29 Aug 2013
14:48 PM

Carlos Ble

@Rasmus, not am not trying to be facetious. I just wanted to leave the code better than I found it. Obviously it is not the perfect code, it can be improved. Code can always be improved. One improvement could be this:

recalculateIncrease(increaseSize, current).alwaysKeepingItBelow(MAX_INCREASE).percent()

Usually method names should not reveal how they are implemented, IF that "how" is not significant. IF that "how" can be extended. In the case of concrete algorithms though, the "how" can't be changed, it is significant and it can be part of the method name if necessary. SRP and OCP always depend on the nature of the problem, the domain. Given that I don't really know the context of this particular problem and don't care about the algorithm, I can't really tell if there are several reasons to change or not, or whether it is open to extension enough.

The "why" is usually part of the context, which is what I sometimes need to explain with comments.

Cheers :-)

29 Aug 2013
15:13 PM

Jiggaboo

That comment was from funny guy named Howard Chu to me. I wonder what he says now. He will probably tell you to get any computer programming course Ayende. Just like he did last time.

29 Aug 2013
17:12 PM

Stacey

I comment a lot, and much of it is very verbose and unnecessary - but I am still in that rookie "fresh out of school solo developer" phase of my career, and I think I could do a lot worse than just having foolish comment pattern.

Much of it is just my OCD demanding things look nice and even on the page. It seems silly, but ... it is what it is. I haven't found it to hurt my development time, at least.

29 Aug 2013
17:14 PM

Piotr Perak

My thoughts exactly.

29 Aug 2013
17:49 PM

Jiggaboo

I think I don't understand first 2 sentences. It' either my english or yours.

“if you think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.

What is very false statement? The only statement I see is from Howard Chu. But now I think that you think about "good code doesn't need comments". Please someone with better english skills correct me but I think it isn't clear. That's why I posted my first comment above.

So I wanted to make it clear. I said exactly "Good code doesn't need comments most of the time.". Notice "most of the time". The time comments are needed is what you show in your examples. That's exactly what I meant. In everyday I see things like // increment x, // call webservice, or my favourite // default constructor :)

29 Aug 2013
17:50 PM

Howard Chu

Actually Ayende was quoting Daniel Moreira Yokoyama. This entire post completely agrees with what I said.

29 Aug 2013
17:55 PM

Ayende Rahien

Carlos, How do you do the reasoning behind it? Size vs responsiveness vs angry emails in the mailing list?

29 Aug 2013
18:14 PM

Carlos Ble

Sorry @Ayende, I am not sure what are you asking. I am not angry at all :-) I am not worrying about size or responsiveness, just code clarity. I like code that can be read like prose, that I can understand without much thinking. Just reading like a book. Sorry if I am looking angry, might be my English :-s

29 Aug 2013
18:57 PM

peter

many times I pseudo-code up some logic, and then fill in the code between the comments. They don't come out until the base is stable.

Perhaps the oss comments you found are similar.

30 Aug 2013
00:49 AM

joshka

Another counterpoint. Comments support developers reading the code in future times. Code naming etc. support every developer that uses that code from a point external to that method. For me, I think the latter is the more useful optimisation.

30 Aug 2013
02:23 AM

David Gudeman

In any complex program, comments should be used to help the reader understand the model, if nothing else. Sure, you can use meaningful names like "getPageSize", but that doesn't help if the reader doesn't know what a page is in your program. "setUpdateFlag" is a nice name, but if the reader doesn't know which of several possible kinds of changes you consider an "update", it isn't very helpful. As a general rule, describing the underlying model and concepts of the program takes more words than you can reasonably fit into a variable name. Once the reader understands your model, only then do the names start being really helpful.

It would be like starting a work of fiction with ambiguous references: "The Regulator dropped his cloak and stepped forward into the breach." What's a "Regulator"? A title for some kind of court officer? A robot? What's a "cloak"? A piece of clothing? Some form of magical invisibility? What's the "breach"? A breach in a battle line? A breach in a ship's hull? And is he stepping into the breach to seal the breach or to exploit the breach?

In fiction, that kind of uncertainty is fine because the point is to hook the reader and reveal things later. In programming, your responsibility is to make things as clear as possible with as little effort as possible on the part of the reader. So early explanations of the overall model are important.

30 Aug 2013
09:35 AM

Tudor

There is another good reason for code comments - until your code is less-than-perfect and not yet self-explanatory, do add comments to compensate for this - do not avoid comments in the hope that someday you will refactor the code to be perfect.

We live in an imperfect world, and not all developers write self-explanatory code that "reads like prose" from start - do not use this ideal as an excuse to deliver code without any comments that only you will understand.

First make sure that your source control system is backed-up regularly and that you know how to restore it and upgrade it to a new version, before deciding that the commit comments can replace comments in the source code telling why a change was made.

31 Aug 2013
16:46 PM

Phil

I advocate commenting what the intent of the method/class is. That is what it supposed to do. I can read the code to see what it does, but is that what it is supposed to do? Obviously, there should be unit tests to validate what the code is supposed to do.

I am not saying there should a huge prose of comments added to you code. In .NET, adding xml doc comments to indicate what exceptions are thrown in what cases is important. Of course the developer can read the code to determine which exceptions are thrown, but they should'nt have to. This be comes more of a problem when using 3rd party libraries via NuGet. Is the developer expected to read through the Entity Framework/Json.NET/Unity/StructureMap/... source code to determine what errors they should avoid? Of course not.

As usual, great post Ayende.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB