Production post mortem: The opener of files and destroyer of peace

time to read 2 min | 307 words

A customer opened a support call about a recurring problem that they had in their system.

System.InvalidOperationException: Errno: 1224='The requested operation cannot be performed on a file with a user-mapped section open.
' (rc=0) - 'Failed to rvn_write_header 'headers.one', reason : FailOpenFile'. FailCode=FailOpenFile.
   at Sparrow.Server.Platform.PalHelper.ThrowLastError(FailCodes rc, Int32 lastError, String msg) in C:\Builds\RavenDB-Stable-4.2\42040\src\Sparrow.Server\Platform\PalHelper.cs:line 35
   at Voron.StorageEnvironmentOptions.DirectoryStorageEnvironmentOptions.WriteHeader(String filename, FileHeader* header) in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\StorageEnvironmentOptions.cs:line 706
   at Voron.Impl.FileHeaders.HeaderAccessor.Modify(ModifyHeaderAction modifyAction) in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\Impl\FileHeaders\HeaderAccessor.cs:line 200
   at Voron.Impl.Journal.WriteAheadJournal.JournalApplicator.SyncOperation.UpdateDatabaseStateAfterSync() in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\Impl\Journal\WriteAheadJournal.cs:line 998
   at Voron.Impl.Journal.WriteAheadJournal.JournalApplicator.LockTaskResponsible.RunTaskIfNotAlreadyRan() in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\Impl\Journal\WriteAheadJournal.cs:line 1192

And the code that failed was:

Based on that information, the issue is obvious. Someone is poking their nose into RavenDB’s files. It is important to understand that this will cause the I/O operation to fail. And since this is part of a critical path in managing the file, it will force the entire database to unload.

We asked the customer to make sure that all indexing services, anti virus programs and the like would be instructed to ignore the RavenDB directories and were assured that this is already the case.

That led us to a problem. If no one else is touching our files, is it possible that it is our own code that is causing the issue?

I asked the customer to use Process Monitor to track system calls made on that particular file.  It took a day or so, but they came back with:
8:13:31.4865562 AM MsMpEng.exe 3112 CreateFileMapping D:\RavenDB\Server\RavenData\System\headers.one FILE LOCKED WITH WRITERS SyncType: SyncTypeCreateSection, PageProtection: PAGE_EXECUTE

We had our culprit, Windows Defender would mess around in our files, causing some I/O operations to fail.  The customer added RavenDB’s directories to Windows Defender’s excluded list as well, and that should be the end of things.

Overall, this is a short issue, but it was only short because we were able to prove what other process was causing us issues.