Readability matters

time to read 2 min | 346 words

In one of my recent posts about performance, a suggestion was raised:

Just spotted a small thing, you could optimise the call to:

_buffer[pos++] = (byte)'\';

with a constant as it's always the same.

There are two problems with this suggestion. Let us start with the obvious one first. Here is the disassembly of the code:

            b[0] = (byte) '/';

00007FFC9DC84548  mov         rcx,qword ptr [rbp+8]  

00007FFC9DC8454C  mov         byte ptr [rcx],2Fh  

            b[0] = 47;

00007FFC9DC8454F  mov         rcx,qword ptr [rbp+8]  

00007FFC9DC84553  mov         byte ptr [rcx],2Fh  

As you can see, in both cases, the exact same instructions are carried out.

That is because we are no longer using compilers that had 4KB of memory to work with and required hand holding and intimate familiarity with how the specific compiler version we wrote the code for behaved.

The other problem is closely related. I've been working with code for the past 20 years. And while I remember the ASCII codes for some characters, when reading b[0] = 47, I would have to go and look it up. That puts a really high burden on the reader of a parser, where this is pretty much all that happens.

I recently saw it when I looked at the Smaz library. I ported that to C# and along the way I made sure that it was much more understandable (at least in my opinion). This resulted in a totally unexpected pull request that ported my C# port to Java. Making the code more readable made it accessible and possible to work with. Whereas before it was a impenetrable black box.

Consider what this means for larger projects, where there are large sections that are marked with "there be dragons and gnarly bugs"… This really kills systems and teams productivity.

In the cas of the Smaz library port, because the code was easier to work with, Peter was able to not just port it to Java, but was able to repurpose it into a useful util for compressing mime types very efficiently.