Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 681 words

I mentioned that RavenDB 4.0 uses x509 client certificate for authentication, right? As it turns out, this can create some issues for us when we need to do more than just blind routing to the right location.

Imagine that our proxy is setup in front of the RavenDB Docker Swarm not just for handling routing but to also apply some sort of business logic. It can be that you want to do billing per client basis on their usage, or maybe even inspect the incoming data into RavenDB to protect against XSS (don’t ask, please). But those are strange requirements. Let us go with something that you can probably emphasize with more easily.

We want to have a RavenDB cluster that uses a Let’s Encrypt certificate, but that certificate has a very short life time, typically around 3 months. So you probably don’t want to setup these certificates within RavenDB itself, because you’ll be replacing them all the time. So we want to write a proxy that would handle the entire process of fetching, renewing and managing Let’s Encrypt certificates for our database, but the certificates that the RavenDB cluster will use are internal ones, with much longer expiration times.

So far, so good. Except…

The problem that we have here is that here we have a problem. Previously, we used the SNI extension in our proxy to know where we are going to route the connection, but now we have different certificates for the proxy and for the RavenDB server. This means that if we’ll try to just pass the connection through to the RavenDB node, the client will detect that it isn’t using a trusted certificate and fail the request. On the other hand, if we terminated the SSL connection at the proxy, we have another issue, we use x509 client certificate for ensuring that the user actually have the access they desire. And we can’t just pass the client certificate forward, since we terminated the SSL connection.

Luckily, we don’t have to deal with a true man in the middle simulation here, because we can configure the RavenDB server to trust the proxy. All we are left now is to figure out how the proxy can tell the RavenDB server what is the client certificate that the proxy authenticated. A common way to do that is to send the client certificate details over in a header, and that would work, but…

Sending the certificate details in a header has two issues for us. First, it would mean that we need to actually parse and mutate the incoming data stream. Not that big a deal, but it is something that I would like to avoid if possible. Second, and more crucial for us, we don’t want to have to validate the certificate on each and every request. What we want to do is take advantage on the fact that connections are reused and do all the authentication checks once, when the client connects to the server. Authentication doesn’t cost too much, but when you are aiming at tens and thousands of requests a second, you want to reduce costs as much as possible.

So we have two problems, but we can solve them together. Given that the RavenDB server can be configured to trust the proxy, we are going to do the following. Terminate the SSL connection at the proxy, and validate the client certificate (just validate the certificate, not check permissions or such) and then the magic happens. The proxy will generate a new certificate, signed with the proxy own key and registering the original client certificate thumbprint in the new client certificate (caching that certificate, obviously). Then the proxy route the request to its destination, signed with its own client certificate. The RavenDB server will recognize that this is a proxied certificate, pull the original certificate thumbprint from the proxied client certificate and use that to verify the permissions to assign to the user.

The proxy can then manage things like refreshing the certificates from Let’s Encrypt and RavenDB can get proxied requests.

time to read 4 min | 661 words

RavenDB 4.0 uses x509 client certificates for authentication. That is good, because it means that we get both encryption and authentication on both ends, but it does make is more complex to handle some deployment scenarios. It turns out that there is quite a big demand for doing things to the data that goes to and from RavenDB.

We’ll start with the simpler case, of having dynamic deployment on Docker, with nodes that may be moved from location to location. Instead of exposing the nastiness of the internal network to the outside world with URLs such as (https://129-123-312-1.rvn-srv.local:59421) we want to have nice and clean urls such as https://orders.rvn.cluster. The problem is that in order to do that, we need to put a proxy in place.

That is pretty easy when you deal with HTTP or plain TCP, but much harder when you deal with HTTPS and TLS because you also need to handle the encrypted stream. We looked at various options, such as Ngnix and Traefik as well as a peek at Squid but we rule them out for various reasons, mostly related to the deployment pattern (Ngnix doesn’t handle dynamic routing), feature set (Traefik doesn’t handle client certificates properly) and usecase (Squid seems to be much more focused on being a cache). All of them didn’t support the proper networking model we want (1:1 connection matches from client to server, which we would really like to preserve because it simplify authentication costs significantly).

So I set out to explore what it would take to build an SSL Proxy to fit our needs. The first thing I looked at was how to handle routing. Given a user that type https://orders.rvn.cluster in the browser, how does this translate to actually hitting an internal Docker instance with a totally different port and host?

The answer, as it turned out, is that this is not a new problem. One of the ways to do that is to just intercept the traffic. We can do that because in this deployment model, we control both the proxy and the server, so we can put the certificate fro “orders.rvn.cluster” in the proxy, decrypt the traffic and then forward it to the right location. That works, but it means that we have a man in the middle. Is there another option?

As it turns out, this is such a common problem that there are multiple solutions for it. These are SNI (Server Name Indication) and ALPN (Application Layer Protocol Negotiation), both of which allow the client to specify what they want to get from the server as part of the initial (and unencrypted) negotiation. This is pretty sweet from the point of view of the proxy, because it can make routing decisions without needing to do the TLS negotiation but not so much for the user if they are currently trying to check “super-shady.site”, since while the contents of their request is masked, the destination is not. I’m not sure how big of a security problem this is (the end IP isn’t encrypted, after all, and even if you host a thousands sites on the same server, it isn’t that big a deal to narrow it down).

Anyway, the key here is that this is possible, so let’s make this happen. The solution is almost literally pulled from the StreamExtended readme page.

We get a TCP stream from a client, and we peek into it to read the TLS header, at which point we can pull the server name out. At this point, you’ll note, we haven’t touched SSL and we can forward the stream toward its destination without needing to inspect any other content, just carrying the raw bytes.

This is great, because it means that things like client authentication can just work and authenticate against the final server without any complexity. But it can be a problem if we actually need to do something with the traffic. I’ll discuss how to handle this properly in the next post.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}