Tracking down RavenDB I/O usage in Linux
Today I had to look into the a customer whose RavenDB instance was burning through a lot of I/O. The process is somewhat ingrained in me by this point, but I thought that it would make for a good blog post so I’ll recall that next time.
Here is what this looks like from the point of view of the disk:
We are seeing a lot of reads in terms of MB/sec and a lot of write operations (but far less in terms of bandwidth). That is the external details, can we figure out more? Of course.
We start our investigation by running:
sudo iotop -ao
This command gives us the accumulative time for threads that do I/O. One of the important things that RavenDB is to tag its threads with the tasks that they are assigned. Here is a sample output:
TID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND 2012 be/4 ravendb 1748.00 K 143.81 M 0.00 % 0.96 % Raven.Server -c /ravendb/config/settings.json [Follower thread] 9533 be/4 ravendb 189.92 M 86.07 M 0.00 % 0.60 % Raven.Server -c /ravendb/config/settings.json [Indexing of Use] 1905 be/4 ravendb 162.73 M 72.23 M 0.00 % 0.39 % Raven.Server -c /ravendb/config/settings.json [Indexing of Use] 1986 be/4 ravendb 154.52 M 71.71 M 0.00 % 0.38 % Raven.Server -c /ravendb/config/settings.json [Indexing of Use] 9687 be/4 ravendb 185.57 M 70.34 M 0.00 % 0.59 % Raven.Server -c /ravendb/config/settings.json [Indexing of Car] 1827 be/4 ravendb 172.60 M 65.25 M 0.00 % 0.69 % Raven.Server -c /ravendb/config/settings.json ['Southsand']
In this case, we see the top 6 threads in terms of I/O (for writes). We can see that we have a lot of of indexing and documents writes. That said, thread names in Linux are limited to 14 characters, so we probably need to give better names to them.
That is part of the task, let’s look at the cost in terms of reads:
TID PRIO USER DISK READ> DISK WRITE SWAPIN IO COMMAND 11191 be/4 ravendb 2.09 G 31.75 M 0.00 % 7.58 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool] 11494 be/4 ravendb 1353.39 M 14.54 M 0.00 % 19.62 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool] 11496 be/4 ravendb 1265.96 M 4.97 M 0.00 % 16.56 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool] 11211 be/4 ravendb 1120.19 M 42.66 M 0.00 % 2.83 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool] 11371 be/4 ravendb 1114.50 M 35.25 M 0.00 % 5.00 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool] 11001 be/4 ravendb 1102.55 M 43.35 M 0.00 % 3.12 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool] 11340 be/4 ravendb 825.43 M 26.77 M 0.00 % 4.85 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
That is a lot more complicated, however. Now we don’t know what task this is running, only that something is reading a lot of data.
We have the thread id, so we can make use of that to see what it is doing:
sudo strace -p 11191 -c
This command will track the statistics on the systems calls that are issued by the specified thread. I’ll typically let it run for 10 – 30 seconds and then hit Ctrl+C, giving me:
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 90.90 3.868694 681 5681 82 futex 8.28 0.352247 9 41035 sched_yield 0.79 0.033589 1292 26 pwrite64 0.03 0.001246 52 24 1 recvfrom 0.01 0.000285 285 1 restart_syscall 0.00 0.000000 0 2 madvise 0.00 0.000000 0 2 sendmsg ------ ----------- ----------- --------- --------- ---------------- 100.00 4.256061 46771 83 total
I’m mostly interested in the pwrite64 system call. RavenDB uses mmap() for most of its data access, so that is harder to read, but we can get a lot of information from the output. Now I’m going to run the following command:
sudo strace -p 11191 -e trace=pwrite64
This will give us a trace of all the pwrite64() system calls from that thread, looking like this:
pwrite64(315, "\365\275"..., 4113, 51080761896) = 4113 pwrite64(315, "\344\371"..., 4113, 51080893512) = 4113
There is an strace option (-y) that can be used to show the file paths for system calls, but I forgot to use it, no worries, I can do:
sudo ls -lh /proc/11191/fd/315
Which will give me the details on this file:
lrwx------ 1 root root 64 Aug 7 09:21 /proc/11783/fd/315 -> /ravendb/data/Databases/Southsand/PeriodicBackupTemp/2022-08-07-03-30-00.ravendb-encrypted-full-backup.in-progress
And that tells me everything that I need to know. The reason we have high I/O is that we are generating a backup file. That explains why we are seeing a lot of reads (since we need to read in order to generate the backup).
The entire process is mostly about figuring out exactly what is going on, and RavenDB is very careful about leaving as many breadcrumbs as possible to make it easy to follow.
Comments
Comment preview