When your memory won’t go down
A user reported something really mysterious, RavenDB was using a lot of memory on their machine. Why the mystery? Because RavenDB is very careful to account for the memory it uses, exactly because we run into many such cases before.
According to RavenDB’s own stats, we got:
- 300 MB in managed memory
- 300 MB in unmanaged memory
- 200 MB in mmap files
-----------------------------------------------
- 1.7 GB total working set
You might notice the discrepancy in numbers. I’ll admit that we have started debugging everything from the bottom up, and in this case, we used “strace –k” to figure out who is doing the allocations. In particular, we watched for anonymous mmap calls (which is a way to ask the OS for memory) and trace who is making these calls. As it turns out, it was the GC. But we are also checking the GC itself for the amount of managed memory we use, and we get a far smaller number. What is going on?
By default, RavenDB is optimized to use as much of the system resources as it can, to provide low latency and lightning responses. One part of that is the use of the Server GC mode with the RetainVM flag. This means that instead of returning the memory to the operating system, the GC was keeping it around in case it will be needed again. In this case, we got to the point where most of the memory in the process was held by the GC in case we’ll need it again.
The fix was to set the RetainVM flag to false, which meant that the GC Will release memory that is not in used back to the OS much more rapidly.
Comments
The behavior you are describing suggests you set the gcServer element to "true" in your .config for garbage collection? The .NET Framework will create a two managed heap regions for each CPU when using server garbage collection (a normal and a Large Object Heap), along with a dedicated thread, that is one cause of additional memory consumption when using server garbage collection. The standard workstation garbage collection only creates a single set of normal and Large Object Heaps. More heaps = greater memory requirement. It's a balancing act.
https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/
Michael, Yes, we are using the server mode as well. That shouldn't be an issue because I'm not worried so much about the heap usage, I'm worried about the working set usage.
The working set will increase when gcConcurrent is set to "true". Beginning with .NET 4.5 it defaults to "true" when gcServer is "true". You may want to try setting gcConcurrent to "false" as a test:
<runtime> <gcServer enabled="true"/> <gcConcurrent enabled="false"/> </runtime>
This configuration is used by Microsoft for its compilers (e.g. csc.exe).
If you have to deal with large managed objects (i.e. objects >= 85,000 bytes), your application would benefit from setting GCSettings.LargeObjectHeapCompactionMode to GCLargeObjectHeapCompactionMode.CompactOnce periodically, so that the next gen 2 collection will defrag the Large Object Heap. Otherwise, your app could run out of memory when you actually have sufficient remaining memory overall (when attempting to allocate memory for large objects). You can also just call GC.Collect() after setting this value to immediately defrag the LOH (not generally recommended, but sometimes...). This GCSettings.LargeObjectHeapCompactionMode setting reverts back to the default behavior after a garbage collection (i.e. it won't defrag again until the property is set to GCLargeObjectHeapCompactionMode.CompactOnce).
Michael, Yes, I'm aware of this. That isn't actually what is going on. We typically have a low amount of managed memory in use, but we sometimes have peeks. It looks like the retain vm is keeping the memory in the _working set_, not just the virtual memory allocated. On the face of it, this seems wrong, and MS seems to agree, see: https://github.com/dotnet/coreclr/issues/15790
Great information in the link, thank you.
Comment preview