The bare minimum a distributed system developer should know aboutCertificates
After explaining all the ways that trust can be subverted by DNS, CA and a random wave to wipe away the writing in the sand, let us get down to actual details about what matters here.
HTTPS / SSL / TLS, whatever it is called this week, provides confidentially over the wire for the messages you are sending. What it doesn’t provide you is confidentially from knowing who you talked too. This may seem non obvious at first, because the entire communication is encrypted, so how can a 3rd party know who I’m talking about?
Well, there are two main ways. It can happen through a DNS query. If you need to go to “http://my-awesome-service”, you need to know what the IP of that is, and for that you need to do a DNS query. There are DNS systems that are encrypted, but they aren’t widely deployed and in general you can assume that people can listen to your DNS and figure out what you are doing. If you go to “that-bad-place”, it is probably visible on someone’s logs somewhere.
But the other way that someone can know who you are talking to is that you told them so. How did you do that?
Well, let’s consider one of the primary reasons we have HTTPS. a user has to validate that the hostname they used matched the hostname on the certificate. That seems pretty reasonable, right? But that single requirement pretty much invalidates the notion of confidentiality of who I’m talking to.
Consider the following steps:
- I go to “https://my-awesome-service”
- This is resolved to IP address 28.23.155.123
- I’m starting an SSL connection to that IP, at port 443. Initially, of course, the connection is not encrypted, but I’ve just initiated the SSL connection.
At that point, any outside observer that can listen to the raw network traffic know what site you have visited. But how can this be? Well, at this point, the server needs to return a reply, and it needs to do that using a certificate.
Let us go with the “secure” option and say that we are simply sending over the wire “open ssl connection to 28.23.155.123”. What does this tell the listener? Well, since at this point the server doesn’t know what the client wants, it must reply with a certificate. That certificate must be the same for all such connections and the user will abort the connection if the certificate will not match the expected hostname.
What are the implications there? Well, even assuming that I don’t have a database of matching IP addresses to their hostnames (which I would most assuredly do), I can just connect myself to the remote server and get the certificate. At this point, I can just inspect the hostname from the certificate and know what site the user wanted to visit. This is somewhat mitigated by the fact that a certificate may contain multiple hostnames or even wildcards, but even that match gives me quite a lot of information about who you are talking to.
However, not sending who I want to talk to over the initial connection has a huge cost associated with it. If the server doesn’t know who you want, this means that each IP address may serve only a single hostname (otherwise we may reply with the wrong certificate. Indeed, one of the reasons HTTPS was expensive was this tying of a whole IP address for a single hostname. On the other hand, if we sent the hostname were were interested in, the server would be able to host multiple HTTPS websites on the same machine, and select the right certificate at handshake time.
There are two ways to do that, one is called SNI – Server Name Indication. Which is basically a header in the SSL connection handshake that says what the hostname is. The other is ALPN – Application Level Protocol Negotiation, which allows you to select how you want to talk to the server. This can be very useful if you want to connect to the server as one client with HTTP and on another using HTTP/2.0. That has totally different semantics, so routing based on ALPN can make things much easier.
At this point, the server can make all sorts of interesting decisions with regards to the connection. For example, based on the SNI field, it may forward the connection to another machine, either as the raw SSL stream or by stripping the SSL and sending the unencrypted data to the final destination. The first case, of forwarding the raw SSL stream is the more interesting scenario, because we can do that without having the certificate. We just need to inspect the raw stream header and extract the SNI value, at which point we route that to the right location and send the connection on its merry way.
I might do more posts like this, but I would really appreciate feedback. Both on whatever the content is good and what additional topics would you like me to cover?
More posts in "The bare minimum a distributed system developer should know about" series:
- (20 Nov 2017) Binding to IP addresses
- (15 Nov 2017) HTTPS Negotiation
- (06 Nov 2017) DNS
- (03 Nov 2017) Certificates
- (01 Nov 2017) Transport level security
- (31 Oct 2017) networking
Comments
I find this topic fascinating. Can we rely on certificates and standard HTTPS/TLS or should we take matters into our own hands in the application layer to provide an alternate or additional layer of proof using something like Secure Remote Password Protocol and Zero-knowledge proof for client and server side validation/verification of identity (not user, but system) against a shared secret that is not passed across the wire? In other words, how much do we trust the certificate is what I'm asking, I suppose.
Topic is great, more content would be AWESOME please!!
What I'd like to know in particular, can server receive a packet, opportunistically try decrypting with both HTTPS://A.COM and HTTPS://B.COM identities -- and go with what fits?
Ideally it should also recognise if incoming packet is HTTP or HTTPS too -- so even the port can be shared.
I do understand the handshake isn't just request-response -- hence asking people in the know.
Oleg: no. The handshake goes like:
Now they've got an encrypted connection.
But the server has to send down a certificate first, and it has to know which certificate to send back. It could send down all certificates it knows how to handle, but then the client would have to indicate its choice somehow.
If the only use case were HTTPS, then you could do that: the server would resolve which cert you used by seeing which decrypts a session key that can decrypt its input in a way that starts with 'HTTP/'. But the system is intended for use in arbitrary protocols built on top of TCP. Heck, there's a draft RFC for adding STARTTLS support to Telnet. So it's not worth it trying to detect after the fact what cert the connection needs.
It's much more reliable to have the client tell you in the initial packet what it's trying to connect to.
As an aside, ALPN means you can support, say, telnet and HTTP and SMTP and VNC on the same port. If you really need to.
dhasenan thanks!
It looks like you've missed one option though.
When server gets encrypted request, the server already KNOWS what protocols it supports. Server knows whether to expect HTTP/ or something else.
So server should be able to decrypt and choose the logical host safely (unless there are other troubles I might be missing).
The benefit of not having to layer another protocol is very material.
Tyler, If you control the protocol (IE, TCP + SSL), sure. You can use an additional handshake or something like that on top of this, and ignore any certificate chain errors. The problem is that you are open to MitM attacks still, because anyone in the middle can just proxy the connection back and forth between client and server and wait until you did your validation and then when you think that you know who you are talking to, listen / modify to everything that goes in between.
The issue isn't just proving encrypting the connection but also that you know who is on the other side.
If you already know who is on the other side, you can use the certificate thumbprint to validate the connection on both ends, without needing to trust a CA. But that leave you with how you'll let the other side know what is the right thumbprint.
A good example of that might be a connection string for a database that would include the expected certificate thumbprint and fail otherwise, not checking any CA trust along the way. You'll presumably trust that the DBA gave you the right cert thumbprint.
Except that certificate change, and when your admin update the cert (because it expired), all clients are going to be broken, so it isn't quite as simple.
I followed your logic until you reached the assumption that the SRPP and ZK do not establish a private once per connection symmetric encryption key that would prevent a man in the middle from ever being able to capture and read the data in the middle. And since only the real client and real server know the shared key and that key is never passed across the wire, the man in the middle impersonation is impossible. It does add overhead and you definitely need control of the protocol. No SSL or TLS involved. This is what my little MessageWire library does along with many other SRPP and ZK implementations. But as far as I can tell, the complexity of implementation and lack of any real standard prevents its general use, so you're stuck trusting certs.
Oleg, Suggesting specific topics for discussion is the best way to ensure that there will be more content.
Technically speaking, you can detect if an incoming connection is HTTP or HTTPS. but you must reply in the same protocol. That isn't trivial, see this post:
https://ayende.com/blog/180513/the-bare-minimum-a-distributed-system-developer-should-know-about-https-negotiation?key=6329dd9044194fbcb7017d018491f05b
As for trying to decrypt the packet, that isn't how it works. SSL works by having the client & server negotiate a shared secret for the connection, so just having the certificate for both
a.com
andb.com
won't help you to figure out what is going on if you have started from the middle of the connection. And you need to send the certificate back as one of the very first things that you do during the establishment of the SSL connection. That is why SNI is so important, it allow the user to select which certificate to send.Oleg, Actually, the server can't know that. Imagine that you have a SSL proxy that direct connection for HTTPS, S/MIME, sftp. So getting a connection doesn't tell you what kind of data is going through. This is intentional because it make it very widely useful. Clients can use ALPN to indicate what is the application level protocol they need, but that is mostly so you can select the proper certificate or maybe route it internally more easily.
Tyler, The problem here is that you need a shared secret between both client & server.
Imagine that I'm setting up a new server, let's call it
my-awesome-service-2
. Now, how do I let all the users know that not only this service exists, but what is the shared secret that they should use to access it? How do I rotate this key, etc?This is a level below authentication, because you need to know who you are talking to. Indeed, you can avoid MitM by encrypting the data itself, but I assume you were talking about identity of the 2nd party, not the data encryption.
In that case the shared secret is the user's password. If the server side does not also know that password or a hash of it, then the connection fails and the password never crosses the wire, in any form. This is how my MessageWire and ServiceWire libraries work.
I don't mean to suggest that these libraries or the general idea behind them are a solution to the problem, per we, only that they may get the juices flowing as part of the conversation. Hopefully it's interesting and of some use. Cheers, T
Tyler, Sure, that is easy, but this requires you to have a shared secret that you passed around. If you want to connect to another server without pre arranging such a secret, I don't think you have something better available than certs.
Oren, that is true, but one assumes that you have already passed a password to the user or the user has created one via a cert protected TLS interaction. That one time interaction seems less vulnerable to the man in the middle compromise. And if that interaction had been compromised and you were talking to a server that was not your own, then your server when you make a connection using SRPP would not know your secret or would have a different one than your client does and the connection would fail. Of course one might suppose the client would then always interact with that same fake man in the middle server, but that might be stretching it. I'm not suggesting SRPP is perfect for establishing shared secrets, but I do believe that for making and verifying connection identity in both ends on a repeated basis with unique encryption on every connection, it is superior to certs. Just my humble opinion. -T
Tyler, What are are describing is pretty much how SSH works. The first time you connect to a server, it remembers the key, and if it changes, it freaks out. The difference here is in terms of usage. Consider RavenDB specifically, it is easy to have this done for the usual communication, but what about using REST tools or the browser itself? They don't know anything about this custom protocol, which means a lot of operational overhead. By using SSL directly and HTTPS in particular, we gain something very valuable, inter-operability with pretty much everything in the market. And that is important.
Oren, thanks for posting the new article on the very related topic today, answering on of my questions!
I understand the extra load in recognizing the intended protocol from the leading frame. However that's a minor technical difficulty solvable with finite small resources. There are only so many (handful) protocols the proxy needs to support.
Comment preview