Readability matters

Feb 11 2016

Readability matters

time to read 2 min | 346 words

In one of my recent posts about performance, a suggestion was raised:

Just spotted a small thing, you could optimise the call to:
_buffer[pos++] = (byte)'\';
with a constant as it's always the same.

There are two problems with this suggestion. Let us start with the obvious one first. Here is the disassembly of the code:

b[0] = (byte) '/';

00007FFC9DC84548 mov rcx,qword ptr [rbp+8]

00007FFC9DC8454C mov byte ptr [rcx],2Fh

b[0] = 47;

00007FFC9DC8454F mov rcx,qword ptr [rbp+8]

00007FFC9DC84553 mov byte ptr [rcx],2Fh

As you can see, in both cases, the exact same instructions are carried out.

That is because we are no longer using compilers that had 4KB of memory to work with and required hand holding and intimate familiarity with how the specific compiler version we wrote the code for behaved.

The other problem is closely related. I've been working with code for the past 20 years. And while I remember the ASCII codes for some characters, when reading b[0] = 47, I would have to go and look it up. That puts a really high burden on the reader of a parser, where this is pretty much all that happens.

I recently saw it when I looked at the Smaz library. I ported that to C# and along the way I made sure that it was much more understandable (at least in my opinion). This resulted in a totally unexpected pull request that ported my C# port to Java. Making the code more readable made it accessible and possible to work with. Whereas before it was a impenetrable black box.

Consider what this means for larger projects, where there are large sections that are marked with "there be dragons and gnarly bugs"… This really kills systems and teams productivity.

In the cas of the Smaz library port, because the code was easier to work with, Peter was able to not just port it to Java, but was able to repurpose it into a useful util for compressing mime types very efficiently.

Tweet Share Share 6 comments

Tags:

development

Comments

12 Feb 2016
11:33 AM

BudGoode

How about:

b[0] = 47; // Byte code for the '\' character

???

12 Feb 2016
11:34 AM

Oren Eini

BudGoode, Why would I want to do that? The compiler already handles that for me

13 Feb 2016
20:01 PM

Andrei Rînea

(byte)'\' is better than 47, I agree.

But I propose an even more elegant solution, IMHO, which I use in my code:

private static readonly byte Header = (byte)'\';

...

b[0] = Separator; // more meaningful

13 Feb 2016
21:12 PM

Oren Eini

Andrei, The code in question is dealing with dozens of such constants. It is a parser, and it is important for the readability of the code that you'll actually see what term you are currently parsing

14 Feb 2016
00:07 AM

Biosek

I think the problem these days is to know what compiler actually optimizes and what does not. I get that some lower lvl programmers (C++) see these optimizations for the price of readability. So I guess this comes to whether I know how to optimize or whether compiler will optimize it for me. For sure knowing what compiler will optimize is better for readability, but is it also easier to learn for developer? I don't think so. It would be awesome to have some cheat-sheet of what compiler optimizes and what not. Any suggestions? :)

14 Feb 2016
07:12 AM

Oren Eini

Biosek, You can pretty much assume that the compiler will do all the basic optimizations. In other words, if you look at the code and there is an "obvious" optimization, there is a very high chance that the compiler already does it

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB