Premature optimizations
I just finished doing a second de-optimization for NMemcached. This is the second such change I do, turning the code from using the async read pattern to using a simple serial read from a stream. Both those changes together takes the time to complete my simple (and trivial) benchmark from 5768.5 ms to 2768.6 ms.
That is less than 50% of the original time! In both cases, I started with high use of BeginXyz, in order to get as much parallelism as much as possible, but it actually turned out to be a bad decision, since it meant that in many cases where the data was already there, I would pay the price of an async call, vs. just grabbing the data from the kernel buffer.
Comments
That does mean that you incur more thread contention, though, since in cases where the data isn't available you have to block a thread for the read operation.
Did you try checking the amount of data waiting on the socket before performing the read? If so, how did that work out?
It would seem so, wouldn't it?
But that isn't what would tend to happen, I have set a short timeout value for the read, which means that if the data isn't available, I abort the request.
The road to hell is paved with optimization
Comment preview