The cost of timing out
Let’s assume that you want to make a remote call to another server. Your code looks something like this:
var response = await httpClient.GetAsync("https://api.myservice.app/v1/create-snap", cancellationTokenSource.Token);
This is simple, and it works, until you realize that you have a problem. By default, this request will time out in 100 seconds. You can set it to a shorter timeout using HttpClient.Timeout property, but that will lead to other problems.
The problem is that internally, inside HttpClient, if you are using a Timeout, it will call Cancellation
Well, in theory, but there is a problem with this approach. Let's sa look at how this actually works, shall we?
It ends up setting up a Timer instance, as you can see in the code. The problem is that this will modify a global value (well, one of them, there are by default N timers in the process, where N is the number of CPUs that you have on the machine.
What that means is that in order to register a timeout, you need to take a look. If you have a high concurrency situation, setting up the timeouts may be incredibly expensive.
Given that the timeout is usually a fixed value, within RavenDB we solved that using a different manner. We set up a set of timers that will go off periodically and then use this instead. We can request a task that will be completed on the next timeout duration. This way, we'll not be contending on the global locks, and we'll have a single value to set when the timeout happens.
The code we use ends up being somewhat more complex:
var sendTask = httpClient.GetAsync("https://api.myservice.app/v1/create-snap", cancellationTokenSource.Token);
var waitTask = TimeoutManager.WaitFor(TimeSpan.FromSeconds(15), cancellationTokenSource.Token);
if (Task.WaitAny(sendTask, waitTask) == 1)
{
        throw new TimeoutException("The request to the service timed out.");
}
Because we aren't spending a lot of time doing setup for a (rare) event, we can complete things a lot faster.
I don't like this approach, to be honest. I would rather have a better system in place, but it is a good workaround for a serious problem when you are dealing with high-performance systems.
You can see how we implemented the TimeoutManager inside RavenDB, the goal was to get roughly the same time frame, but we are absolutely fine with doing roughly the right thing, rather than pay the full cost of doing this exactly as needed. For our scenario, roughly is more than accurate enough.
 

Comments
Interesting approach! Would also be interesting to see benchmarks for both approaches?
Did you look at using a hashed timer wheel too? (paper: http://cseweb.ucsd.edu/users/varghese/PAPERS/twheel.ps.Z, impl: https://github.com/wangjia184/HashedWheelTimer)
Cocowalla,
Thanks for the reference, I remember the hashed wheel timer paper from a while ago. Would be interesting to check the project properly, but I don't really like the dedicated thread and associated behavior.
You just exposed some of the hidden ugliness of async/await. There is some elegance in letting the kernel manage scheduling, synchronization and i/o, maybe paying small perf penalty is worth it?
Rafal,
The kernel isn't magic, it is doing the same thing, usually.It is nice in terms of how things work, but there is a HUGE cost associated with it compared to async
Yep, it's my main concern that the async code is just re-implementing everything that the kernel does already. And you can't even cancel an I/O operation without support from the kernel (can't just interrupt any operation and say 'time's up'). You say the overhead of context switch is huge, i dont have any idea how huge, but i wonder if it wouldn't be better if Microsoft spent time on improving thread scheduler instead of inventing the whole async/await thing (that has a nasty habit of taking over and transforming all your code after you use it once)
Rafal,
Well, to start with, Linux exists :-) The issue of the Kernel's cost is all about the time that it takes to switch from User mode to Kernel mode, which is expensive. There are literally whole subsystems in both Windows & Linux dedicated to avoiding that.
See how
gettimeofdayis implemented as vDSO (https://www.man7.org/linux/man-pages/man7/vdso.7.html) to reduce those costs Or IO URings impl.sometimes it's good to disagree just to get interesting discussion :) One more thought i have here - the need for non blocking code comes mainly from the IO - network or disk. At least from the perspective of typical web applications. Everything else is pretty much instant and can be done synchronously. And with IO there's no way to avoid the kernel mode, all the data transfers with io devices happen there. Even if you share the memory buffer with userspace the signaling needs to go thru system calls. And .net async code is no exception here, so i wonder if it really reduces the number of kernel mode switches compared to sync code. For sure it can reduce the number of threads if you only use async io and can get more work done inside same system thread, but the number of system calls will be more or less the same as with sync code. So doesnt it all boil down to the efficiency of system thread scheduler vs the .net task queue? Dont think i have a clear point here, just a feeling that converting all code to async is a bit too much if you only need to make async io easier to use and i wonder if there are any less radical and less bloated but still effective approaches.
Rafal,
IO_URingis there specifically to avoid hitting the kernel directly. You have a shared queue using memory map that both usermode and kernel mode can access.Note that .NET is getting IO_URing support. And even before hand,epolland friends mean that there is one thread doing IO waits, and everything else isn't hitting the kernel. A key aspect here is the number of kernel transitions, by the way.Comment preview