The importance of comments
A while ago I got a comment to a post of mine “if you [ in here the commenter is talking to another commenter ] think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.
Let us look at the following piece of code.
1: private long GetNewLength(long current)2: {
3: DateTime now = DateTime.UtcNow;
4: TimeSpan timeSinceLastIncrease = (now - _lastIncrease);
5: if (timeSinceLastIncrease.TotalSeconds < 30)6: {
7: _increaseSize = Math.Min(_increaseSize * 2, MaxIncreaseSize);
8: }
9: else if (timeSinceLastIncrease.TotalMinutes > 2)10: {
11: _increaseSize = Math.Max(MinIncreaseSize, _increaseSize / 2);
12: }
13: _lastIncrease = DateTime.UtcNow;
14: current = Math.Max(current, 256 * PageSize);
15: var actualIncrease = Math.Min(_increaseSize, current / 4);
16: return current + actualIncrease;17: }
This piece of code is responsible for decided the new file size when we run out of room in the current space we use.
That is 10 lines of code. And what happens is quite clear, even if I say so myself. But the why for this code is lost. In particular, there is a lot of reasoning behind the actual logic here. You can see that we play with the actual increase size, to make sure that if we increase often, we will reserve more space from the OS. That seems clear enough, but what about lines 14 – 15?
In this case, just before those two lines we have:
1: // At any rate, we won't do an increase by over 25% of current size, to prevent huge empty spaces2: // and the first size we allocate is 256 pages (1MB)3: //4: // The reasoning behind this is that we want to make sure that we increase in size very slowly at first5: // because users tend to be sensitive to a lot of "wasted" space.6: // We also consider the fact that small increases in small files would probably result in cheaper costs, and as7: // the file size increases, we will reserve more & more from the OS.8: // This also plays avoids "I added 300 records and the file size is 64MB" problems that occur when we are too9: // eager to reserve space10:
And yes, this isn’t about comments such as // increment by 1. I try writing comments like that when explaining policy or heuristics. Or when I am doing something pretty funky all around for some (hopefully) good reason.
Another example might be this:
1: // here we try to optimize the amount of work we do, we will only2: // flush the actual dirty pages, and we will do so in sequential order3: // ideally, this will save the OS the trouble of actually having to flush the4: // entire range5: long start = sortedPagesToFlush[0];6: long count = 1;7: for (int i = 1; i < sortedPagesToFlush.Count; i++)8: {
9: var difference = sortedPagesToFlush[i] - sortedPagesToFlush[i - 1];
10: // if the difference between them is not _too_ big, we will just merge it into a single call11: // we are trying to minimize both the size of the range that we flush AND the number of times12: // we call flush, so we need to balance those needs.13: if (difference < 64)14: {
15: count += difference;
16: continue;17: }
18: FlushPages(start, count);
19: start = sortedPagesToFlush[i];
20: count = 1;
21: }
22: FlushPages(start, count);
Again, relatively short code. And really easy to follow. But good luck trying to figure out that this code is responsible for a 100% perf boost, or that you need to balance multiple aspects to get an optimal solution without getting comments.
Now, I think that when people talk about “good code doesn’t need comments”, they think about code like this (all samples taken from a popular OSS project):
1: var query = _backInStockSubscriptionRepository.Table;
2: //customer3: query = query.Where(biss => biss.CustomerId == customerId);
4: //store5: if (storeId > 0)6: query = query.Where(biss => biss.StoreId == storeId);
7: //product8: query = query.Where(biss => !biss.Product.Deleted);
9: query = query.OrderByDescending(biss => biss.CreatedOnUtc);
Yes, I can read, thank you. And here is an example of two comments, the first one good, and the second bad:
1: if (_commonSettings.UseStoredProceduresIfSupported && _dataProvider.StoredProceduredSupported)2: {
3: //stored procedures are enabled and supported by the database.4: //It's much faster than the LINQ implementation below5:
6: #region Use stored procedure7:
8: //pass category identifiers as comma-delimited string9: string commaSeparatedCategoryIds = "";10: if (categoryIds != null)11: {
12: for (int i = 0; i < categoryIds.Count; i++)
The first comment explain why. And that is crucial. You should only need to explain the how very rarely, and then, you need a comment to justify why you need the comment. (Performance, maybe?)
Comments
I wish this blog was read by my work colleagues. They feel compelled to add the most useless XML docs to all members, but document nothing else. A property FirstName does not need to be documented as "The person's first name.".
In the first example, parts of the comment can be replaced by two extra variables:
long firstAllocateSize = 256 * PageSize long maxAllowedIncrease = current / 4
And other parts can be expressed in test case name:
[Test] public void Size_increment_should_be_limited_so_my_customer_wont_scream()
Comment can be good sometimes, but better when it's organic part of the code.
For the second example, maybe a
batchSize
variable will do the same.Xing, Tests aren't located near the code, they cannot explain _reasoninig_.
This applies when code is in more simple form without 'why' than it would be when 'why' inclusion were forced with abstractions that do just that.
Single responsibility principle plays important role in this.
I think the days of
// Make a cup of tea x.MakeACupOfTea();
are numbered, if not entirely gone. Few developers who are committed to learning would advocate that kind of verbosity. In a similar vein, I'm seeing less and less of this sort of thing.
// Some lame smart-alec remark x.MakeACupOfTea();
Top tip: You're not as funny as you think you are.
However, I have had at least one manager who maintained that ALL comments were evil, and should be expunged from the codebase. In some ways, I can understand where he's coming from. In this day and age, it is perfectly possible to make a career out of writing bog-standard LOB e-commerce apps to sell cheese graters online. Fire up your .NET MVC4 template; write a few views, controllers and API calls; steer clear of innovation; use standard, time-tested patterns everywhere. If you need to explain yourself, you're doing too much. Ultimately, that's a bit of a soul-destroying way of working, and a waste of your own talent. But it pays the bills, and for some people, that is the only consideration worth your time.
Although the comments for your first snippet make sense, as they explain the rationale behind the algorithm, I think the code can be rewritten to be more readable:
1: private long NewLength(long currentSize) 2: { 3: increaseSize = calculateIncreaseSizeBasedOnLengthChangeFrequency(); 4: currentSize = atLeastTheMininumNumberOfPagesConfigured(currentSize); 5: var actualIncrease = recalculateIncreaseAlwaysKeepingItBelow25percent(increaseSize, current); 6: return currentSize + actualIncrease; 7: }
Or:
1: private long NewLength(long currentSize) 2: { 3: return currentSize + recalculateIncreaseAlwaysKeepingItBelow25percent( 4: increaseBasedOnLengthChangeFrequency(), 5: atLeastTheMininumNumberOfPagesConfigured(currentSize)); 6: }
I have removed the underscore from the instance members on purpose. I don't think it helps at all. I also have removed the "Get" from the method, it does not bring any value to me.
In general, comments are necessary when they add context to the code, when that context can't be inferred from the code itself. Usually referring to where the code is plugged, how is it consumed. Complex algorithms certainly deserve comments.
Thanks :-)
what about:
var increase = lastUpdateTime < 2 minutes ? 20% : 10% this doesn't require comments and probably works just as well
The comment about the first increase being 1MB seems to be a lie. The first allocated size by this code is at least 1MB + MinIncreaseSize.
Absolutely agree.
Both the 'throw in a bunch of inane comments' and 'all comments are teh evil' camp seem to come down to preferring absolutes over thought. The right measure, IMHO, is value, not slogan conformance.
@Carlos: I'm not sure sure if you're being facetious or not :-)
Using names like "recalculateIncreaseAlwaysKeepingItBelow25percent" instead of "recalculateIncrease" communicates both what you're doing and how you're doing it - but it still doesn't communicate why. When there's a longer explanation behind the "why", I don't think you could cram all of that into a method-name even if you wanted to. And when a method-name communicates how it's doing something, you add a new burden of renaming the method every time the "how" changes.
One way to make code more self-documenting, is to avoid using literal values - I use lots of constants to make methods more self-documenting, for example, EXPIRATION_TIME when seen in an expression computing an expiration date, is more semantic than for example the literal expression 606024.
Most people use constants for values they need to use more than once - try using them simply to give name and meaning to certain values or expressions. The code shown in this post could use some of that.
Second example (pseudo-ish code)
...
Fourth example:
...
I tend to lean towards the "you really shouldn't need comments" camp (I'm not saying it's an absolute, but I don't know that I've seen a really good example of 'need'). This blog expresses a common argument in support of comments. As Xing and Carlos have alluded, when you find yourself agreeing that there should be a comment to explain the 'why', it's typically a case where the code violates a SOLID principle or exercise poor naming choices.
I would aver that the examples above suffer from both of those mishaps.
@Jeff, I really don't understand why you'd say that when the comment explains precisely what user experience the strategy is addressing, and the attitudes of the users, based on their feedback over several years. Do you really think it's better to try to embed that information within variable names and method names?
You could say that such information belongs in external documentation, but I'd say it's at its most useful when it's right there next to the place it's being applied.
I don't see how the code snippet violates any kind of SOLID Principle: certainly not the Single Responsibilty Principle. The method is called GetNewLength(); it does what it says it does. You can inject an external strategy if you want, but unless other strategies are used, or that strategy is used elsewhere, that's just code bloat for the sake of dogma.
@Rasmus, not am not trying to be facetious. I just wanted to leave the code better than I found it. Obviously it is not the perfect code, it can be improved. Code can always be improved. One improvement could be this:
recalculateIncrease(increaseSize, current).alwaysKeepingItBelow(MAX_INCREASE).percent()
Usually method names should not reveal how they are implemented, IF that "how" is not significant. IF that "how" can be extended. In the case of concrete algorithms though, the "how" can't be changed, it is significant and it can be part of the method name if necessary. SRP and OCP always depend on the nature of the problem, the domain. Given that I don't really know the context of this particular problem and don't care about the algorithm, I can't really tell if there are several reasons to change or not, or whether it is open to extension enough.
The "why" is usually part of the context, which is what I sometimes need to explain with comments.
Cheers :-)
That comment was from funny guy named Howard Chu to me. I wonder what he says now. He will probably tell you to get any computer programming course Ayende. Just like he did last time.
I comment a lot, and much of it is very verbose and unnecessary - but I am still in that rookie "fresh out of school solo developer" phase of my career, and I think I could do a lot worse than just having foolish comment pattern.
Much of it is just my OCD demanding things look nice and even on the page. It seems silly, but ... it is what it is. I haven't found it to hurt my development time, at least.
My thoughts exactly.
I think I don't understand first 2 sentences. It' either my english or yours.
“if you think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.
What is very false statement? The only statement I see is from Howard Chu. But now I think that you think about "good code doesn't need comments". Please someone with better english skills correct me but I think it isn't clear. That's why I posted my first comment above.
So I wanted to make it clear. I said exactly "Good code doesn't need comments most of the time.". Notice "most of the time". The time comments are needed is what you show in your examples. That's exactly what I meant. In everyday I see things like // increment x, // call webservice, or my favourite // default constructor :)
Actually Ayende was quoting Daniel Moreira Yokoyama. This entire post completely agrees with what I said.
Carlos, How do you do the reasoning behind it? Size vs responsiveness vs angry emails in the mailing list?
Sorry @Ayende, I am not sure what are you asking. I am not angry at all :-) I am not worrying about size or responsiveness, just code clarity. I like code that can be read like prose, that I can understand without much thinking. Just reading like a book. Sorry if I am looking angry, might be my English :-s
many times I pseudo-code up some logic, and then fill in the code between the comments. They don't come out until the base is stable.
Perhaps the oss comments you found are similar.
Another counterpoint. Comments support developers reading the code in future times. Code naming etc. support every developer that uses that code from a point external to that method. For me, I think the latter is the more useful optimisation.
In any complex program, comments should be used to help the reader understand the model, if nothing else. Sure, you can use meaningful names like "getPageSize", but that doesn't help if the reader doesn't know what a page is in your program. "setUpdateFlag" is a nice name, but if the reader doesn't know which of several possible kinds of changes you consider an "update", it isn't very helpful. As a general rule, describing the underlying model and concepts of the program takes more words than you can reasonably fit into a variable name. Once the reader understands your model, only then do the names start being really helpful.
It would be like starting a work of fiction with ambiguous references: "The Regulator dropped his cloak and stepped forward into the breach." What's a "Regulator"? A title for some kind of court officer? A robot? What's a "cloak"? A piece of clothing? Some form of magical invisibility? What's the "breach"? A breach in a battle line? A breach in a ship's hull? And is he stepping into the breach to seal the breach or to exploit the breach?
In fiction, that kind of uncertainty is fine because the point is to hook the reader and reveal things later. In programming, your responsibility is to make things as clear as possible with as little effort as possible on the part of the reader. So early explanations of the overall model are important.
There is another good reason for code comments - until your code is less-than-perfect and not yet self-explanatory, do add comments to compensate for this - do not avoid comments in the hope that someday you will refactor the code to be perfect.
We live in an imperfect world, and not all developers write self-explanatory code that "reads like prose" from start - do not use this ideal as an excuse to deliver code without any comments that only you will understand.
First make sure that your source control system is backed-up regularly and that you know how to restore it and upgrade it to a new version, before deciding that the commit comments can replace comments in the source code telling why a change was made.
I advocate commenting what the intent of the method/class is. That is what it supposed to do. I can read the code to see what it does, but is that what it is supposed to do? Obviously, there should be unit tests to validate what the code is supposed to do.
I am not saying there should a huge prose of comments added to you code. In .NET, adding xml doc comments to indicate what exceptions are thrown in what cases is important. Of course the developer can read the code to determine which exceptions are thrown, but they should'nt have to. This be comes more of a problem when using 3rd party libraries via NuGet. Is the developer expected to read through the Entity Framework/Json.NET/Unity/StructureMap/... source code to determine what errors they should avoid? Of course not.
As usual, great post Ayende.
Comment preview