Readability matters
In one of my recent posts about performance, a suggestion was raised:
Just spotted a small thing, you could optimise the call to:
_buffer[pos++] = (byte)'\';
with a constant as it's always the same.
There are two problems with this suggestion. Let us start with the obvious one first. Here is the disassembly of the code:
b[0] = (byte) '/';
00007FFC9DC84548 mov rcx,qword ptr [rbp+8]
00007FFC9DC8454C mov byte ptr [rcx],2Fh
b[0] = 47;
00007FFC9DC8454F mov rcx,qword ptr [rbp+8]
00007FFC9DC84553 mov byte ptr [rcx],2Fh
As you can see, in both cases, the exact same instructions are carried out.
That is because we are no longer using compilers that had 4KB of memory to work with and required hand holding and intimate familiarity with how the specific compiler version we wrote the code for behaved.
The other problem is closely related. I've been working with code for the past 20 years. And while I remember the ASCII codes for some characters, when reading b[0] = 47, I would have to go and look it up. That puts a really high burden on the reader of a parser, where this is pretty much all that happens.
I recently saw it when I looked at the Smaz library. I ported that to C# and along the way I made sure that it was much more understandable (at least in my opinion). This resulted in a totally unexpected pull request that ported my C# port to Java. Making the code more readable made it accessible and possible to work with. Whereas before it was a impenetrable black box.
Consider what this means for larger projects, where there are large sections that are marked with "there be dragons and gnarly bugs"… This really kills systems and teams productivity.
In the cas of the Smaz library port, because the code was easier to work with, Peter was able to not just port it to Java, but was able to repurpose it into a useful util for compressing mime types very efficiently.
Comments
How about:
b[0] = 47; // Byte code for the '\' character
???
BudGoode, Why would I want to do that? The compiler already handles that for me
(byte)'\' is better than 47, I agree.
But I propose an even more elegant solution, IMHO, which I use in my code:
private static readonly byte Header = (byte)'\';
...
b[0] = Separator; // more meaningful
Andrei, The code in question is dealing with dozens of such constants. It is a parser, and it is important for the readability of the code that you'll actually see what term you are currently parsing
I think the problem these days is to know what compiler actually optimizes and what does not. I get that some lower lvl programmers (C++) see these optimizations for the price of readability. So I guess this comes to whether I know how to optimize or whether compiler will optimize it for me. For sure knowing what compiler will optimize is better for readability, but is it also easier to learn for developer? I don't think so. It would be awesome to have some cheat-sheet of what compiler optimizes and what not. Any suggestions? :)
Biosek, You can pretty much assume that the compiler will do all the basic optimizations. In other words, if you look at the code and there is an "obvious" optimization, there is a very high chance that the compiler already does it
Comment preview