Making code faster: Micro optimizations and parallel work

architecture (613) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (895) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1086) rss
raven (1452) rss
ravendb.net (536) rss
reviews (184) rss

2025
- July (2)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Nov 24 2016

Making code fasterMicro optimizations and parallel work

time to read 2 min | 339 words

I really wanted to leave this series of posts alone. Getting 135 times faster should be fast enough for everyone, just like 640KB was.

Unfortunately, performance optimization is addictive. Last time, we left it at 283 ms per run. But we still left some performance on the table. I mean, we had inefficient code like this:

Just look at it. Analysis showed that it is always called with 2, 4 or 8 only. So we naturally simplified things:

Forcing the inlining of those methods also helped, and pushed us further toward 240 ms.

Another cost that we had was date diff calculation, we optimized for the case where the day is the same, but in our dataset, we have about 2 million records that cross the day line. So we further optimized for the scenario where the year & month are the same, and just the day is different. That pushed us further toward 220 ms.

At this point the profiler was basically laughing at us, and we had no real avenues to move forward, so I made the code use 4 threads, each processing the file at different locations.

That gave me: 73 ms and allocated 5,640 kb with peak working set of 300,580 kb

527 times faster than the original version.
Allocate 1350 times less memory.
1/3 of the working set.
Able to process 3.7 GB / sec.

Note that at this point, we are relying on this being in the file system cache, because if I was reading it from disk, I wouldn’t be able to do more than 100 – 200 MB / sec.

Here is the full code, write code like this at your peril.

Tweet Share Share 12 comments

Tags:

Comments

24 Nov 2016
12:41 PM

Jesús López

Your code is buggy

allStats = new int[4][];
...
for (var i = 0; i < stats.Length; i++)
 {
           var value = stats[i][0] + stats[i][1] + stats[i][2] + stats[i][3];

stats.Length is 4.

Also, when you fix this bug, you will have to deal with another problem: there is no warranty stats arrays be the same size, so the following code can fail with IndexOutOfRangeException:

var value = stats[0][i] + stats[1][i] + stats[2][i] + stats[3][i];

24 Nov 2016
15:09 PM

Johnny Lee

You forgot to mention one optimization you made in the final version.

The code now handles the case where the year and month are the same, but the days are different. There's no need to worry about leap year in this case, so the code just has to deal with different calculating the difference of days in seconds as well.

The bookkeeping code I added showed the following for the sample data: same day count = 5,454,657 same month, different day count = 333,066 other = 11,104 (calls the full, slow datetime parse code)

Jesus's bug report means that the summary.txt file is empty since there are no non-zero values in the first 4 slots.

Once you fix Jesus's bug, the file contents are wrong because of another bug.

As I commented in yesterday's article, there's a bug in the Parse() method.

the first line of Parse has the start and end positions swapped:

        public static void Parse(byte* buffer, out int id, out int duration)
        {
            duration = DiffTimesInSecond(buffer + 20, buffer);
            id = ParseInt8(buffer + 40);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int DiffTimesInSecond(byte* start, byte* end)
        {

24 Nov 2016
15:31 PM

Johnny Lee

There's a third bug, this time in the WriteFormattedTime() method.

In the output, the timespan numbers are supposed to be separated by colons, but WriteFormattedTime() uses the wrong indices for writing the seconds value, thereby overwriting the second colon:

        private static void WriteFormattedTime(TimeSpan ts, byte[] temp, int pos)
        {
            var hours = ts.Hours;
            temp[pos] = (byte)(hours / 10 + '0');
            temp[pos + 1] = (byte)(hours % 10 + '0');

            var min = ts.Minutes;
            temp[pos + 3] = (byte)(min / 10 + '0');
            temp[pos + 4] = (byte)(min % 10 + '0');

            var sec = ts.Seconds;
            temp[pos + 5] = (byte)(sec / 10 + '0');  // <<< should be + 6
            temp[pos + 6] = (byte)(sec % 10 + '0');  // <<< should be + 7
        }
    }

24 Nov 2016
15:52 PM

Johnny Lee

After fixing the above-mentioned 3 bugs, app runtime is ~135ms.

You can reduce the runtime further by not creating a Timespan object for each value output in the WriteOutput() method. Instead you calculate the seconds, minutes and hours directly. Along with inlining the WriteFormattedTime() function, this reduces the runtime by about 10-15ms in the parallel version.

With the above change, the runtime is around 120-125ms on my machine.

24 Nov 2016
17:35 PM

Johnny Lee

N.B. I re-read today's entry and found where you mentioned the optimization for the same month - I was distracted by the pretty pix of source code. :)

I found a fourth bug in the code.

In the Main() function, the code that calculates the start and end ranges for each Task to process is wrong.

The integer division used does an implicit floor() of the division result. That messes up the calculation for the end point since the bias is to round down to the nearest integer.

Here's debug output of the start, end, and total entries in the file for each Task.

Task | Start   | End    | Total
-----+---------+--------+--------
 0   | 0       |1449706 |5798827
 1   | 1449706 |2899412 |5798827
 2   | 2899412 |4349118 |5798827
 3   | 4349118 |5798824 |5798827

The previous Task's range end should be the next Task's range start.

The problem is that the last Task's range end stops short of the actual entries end, so the code doesn't handle all the entries in the file.

Fixing the bug shouldn't affect the runtime significantly.

Thanks for writing all the blog entries. It's been an interesting read.

25 Nov 2016
20:45 PM

Oren Eini

Jesús , Thanks for noticing, after fixing this issue, I'm seeing runs of about 130 ms.

25 Nov 2016
20:49 PM

Oren Eini

Johnny, Thanks for noticing, I fixed the issue with start/end swap, but I'm not following on where I'm still using leap year if the year/month is the same. The only time this happens is on the full blown date parsing, and it seems that there isn't enough on the datafile to justify more complexity there.

25 Nov 2016
21:08 PM

Oren Eini

Johnny, Good ideas, I also converted the writes to mmap as well, and write to it in parallel as well. Performance is now ~100 ms.

29 Nov 2016
17:55 PM

Johnny Lee

Some more optimizations which reduce the Parse runtime by about 20--25%

For DiffTimeInSeconds, time conversion code can be replaced with

                var st = (*(long*)(start + 11)) & 0x0F07000F07000F03;
                var st1 = (st * 2561) >> 8;
                var st2 = ((st1 & 0x000000003F00001F) * 0xE1000003C) >> 24;
                var st3 = (st1 >> 48);
                var startTime = (int)(st3 + st2) & 0x1FFFF;

                var en = (*(long*)(end + 11)) & 0x0F07000F07000F03;
                var en1 = (en * 2561) >> 8;
                var en2 = ((en1 & 0x000000003F00001F) * 0xE1000003C) >> 24;
                var en3 = (en1 >> 48);
                var endTime = (int)(en3 + en2) & 0x1FFFF;

                var diff = endTime - startTime;

For ParseInt8, replace with

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int ParseInt8(byte* buffer)
        {
            var n = (*(long*)(buffer)) & 0x0F0F0F0F0F0F0F0F;

            var n1 = (n) * 2561 >> 8;
            var n2 = (n1 & 0x00FF00FF00FF00FF) * 6553601 >> 16;
            var num = (int)((n2 & 0x0000FFFF0000FFFF) * 42949672960001 >> 32);

            return num;
        }

30 Nov 2016
07:34 AM

Oren Eini

Johnny, Can you explain what you are doing there?

01 Dec 2016
05:55 AM

Johnny Lee

See https://johnnylee-sde.github.io/Fast-numeric-string-to-int/

and https://johnnylee-sde.github.io/Fast-time-string-to-seconds/

01 Dec 2016
06:19 AM

Oren Eini

Johnny, That is pretty awesome stuff, thanks

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterMicro optimizations and parallel work

More posts in "Making code faster" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

More posts in "Making code faster" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication