The bare minimum a distributed system developer should know about: Certificates

architecture (614) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1086) rss
raven (1454) rss
ravendb.net (538) rss
reviews (184) rss

2025
- July (4)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Nov 03 2017

The bare minimum a distributed system developer should know aboutCertificates

time to read 5 min | 848 words

After explaining all the ways that trust can be subverted by DNS, CA and a random wave to wipe away the writing in the sand, let us get down to actual details about what matters here.

HTTPS / SSL / TLS, whatever it is called this week, provides confidentially over the wire for the messages you are sending. What it doesn’t provide you is confidentially from knowing who you talked too. This may seem non obvious at first, because the entire communication is encrypted, so how can a 3rd party know who I’m talking about?

Well, there are two main ways. It can happen through a DNS query. If you need to go to “http://my-awesome-service”, you need to know what the IP of that is, and for that you need to do a DNS query. There are DNS systems that are encrypted, but they aren’t widely deployed and in general you can assume that people can listen to your DNS and figure out what you are doing. If you go to “that-bad-place”, it is probably visible on someone’s logs somewhere.

But the other way that someone can know who you are talking to is that you told them so. How did you do that?

Well, let’s consider one of the primary reasons we have HTTPS. a user has to validate that the hostname they used matched the hostname on the certificate. That seems pretty reasonable, right? But that single requirement pretty much invalidates the notion of confidentiality of who I’m talking to.

Consider the following steps:

I go to “https://my-awesome-service”
This is resolved to IP address 28.23.155.123
I’m starting an SSL connection to that IP, at port 443. Initially, of course, the connection is not encrypted, but I’ve just initiated the SSL connection.

At that point, any outside observer that can listen to the raw network traffic know what site you have visited. But how can this be? Well, at this point, the server needs to return a reply, and it needs to do that using a certificate.

Let us go with the “secure” option and say that we are simply sending over the wire “open ssl connection to 28.23.155.123”. What does this tell the listener? Well, since at this point the server doesn’t know what the client wants, it must reply with a certificate. That certificate must be the same for all such connections and the user will abort the connection if the certificate will not match the expected hostname.

What are the implications there? Well, even assuming that I don’t have a database of matching IP addresses to their hostnames (which I would most assuredly do), I can just connect myself to the remote server and get the certificate. At this point, I can just inspect the hostname from the certificate and know what site the user wanted to visit. This is somewhat mitigated by the fact that a certificate may contain multiple hostnames or even wildcards, but even that match gives me quite a lot of information about who you are talking to.

However, not sending who I want to talk to over the initial connection has a huge cost associated with it. If the server doesn’t know who you want, this means that each IP address may serve only a single hostname (otherwise we may reply with the wrong certificate. Indeed, one of the reasons HTTPS was expensive was this tying of a whole IP address for a single hostname. On the other hand, if we sent the hostname were were interested in, the server would be able to host multiple HTTPS websites on the same machine, and select the right certificate at handshake time.

There are two ways to do that, one is called SNI – Server Name Indication. Which is basically a header in the SSL connection handshake that says what the hostname is. The other is ALPN – Application Level Protocol Negotiation, which allows you to select how you want to talk to the server. This can be very useful if you want to connect to the server as one client with HTTP and on another using HTTP/2.0. That has totally different semantics, so routing based on ALPN can make things much easier.

At this point, the server can make all sorts of interesting decisions with regards to the connection. For example, based on the SNI field, it may forward the connection to another machine, either as the raw SSL stream or by stripping the SSL and sending the unencrypted data to the final destination. The first case, of forwarding the raw SSL stream is the more interesting scenario, because we can do that without having the certificate. We just need to inspect the raw stream header and extract the SNI value, at which point we route that to the right location and send the connection on its merry way.

I might do more posts like this, but I would really appreciate feedback. Both on whatever the content is good and what additional topics would you like me to cover?

Tweet Share Share 16 comments

Tags:

Comments

03 Nov 2017
11:46 AM

Tyler Jensen

I find this topic fascinating. Can we rely on certificates and standard HTTPS/TLS or should we take matters into our own hands in the application layer to provide an alternate or additional layer of proof using something like Secure Remote Password Protocol and Zero-knowledge proof for client and server side validation/verification of identity (not user, but system) against a shared secret that is not passed across the wire? In other words, how much do we trust the certificate is what I'm asking, I suppose.

03 Nov 2017
13:10 PM

Oleg Mihailik

Topic is great, more content would be AWESOME please!!

What I'd like to know in particular, can server receive a packet, opportunistically try decrypting with both HTTPS://A.COM and HTTPS://B.COM identities -- and go with what fits?

Ideally it should also recognise if incoming packet is HTTP or HTTPS too -- so even the port can be shared.

03 Nov 2017
13:13 PM

Oleg Mihailik

I do understand the handshake isn't just request-response -- hence asking people in the know.

03 Nov 2017
17:32 PM

dhasenan

Oleg: no. The handshake goes like:

The client sends a packet to start the connection. This can contain some headers. It can't be encrypted because the client doesn't have a way to encrypt anything that the server can decode at this point.
The server sends its certificate back in cleartext. Again, they haven't got anything they can use to encrypt the cert with. And certificates are not sensitive, except insofar as they can identify what site you're on (which your DNS traffic can do just as well).
The client validates the certificate, then encrypts a session key with the public key portion of the certificate and sends it back to the server.

Now they've got an encrypted connection.

But the server has to send down a certificate first, and it has to know which certificate to send back. It could send down all certificates it knows how to handle, but then the client would have to indicate its choice somehow.

If the only use case were HTTPS, then you could do that: the server would resolve which cert you used by seeing which decrypts a session key that can decrypt its input in a way that starts with 'HTTP/'. But the system is intended for use in arbitrary protocols built on top of TCP. Heck, there's a draft RFC for adding STARTTLS support to Telnet. So it's not worth it trying to detect after the fact what cert the connection needs.

It's much more reliable to have the client tell you in the initial packet what it's trying to connect to.

As an aside, ALPN means you can support, say, telnet and HTTP and SMTP and VNC on the same port. If you really need to.

03 Nov 2017
20:37 PM

Oleg Mihailik

dhasenan thanks!

It looks like you've missed one option though.

When server gets encrypted request, the server already KNOWS what protocols it supports. Server knows whether to expect HTTP/ or something else.

So server should be able to decrypt and choose the logical host safely (unless there are other troubles I might be missing).

The benefit of not having to layer another protocol is very material.

03 Nov 2017
21:56 PM

Oren Eini

Tyler, If you control the protocol (IE, TCP + SSL), sure. You can use an additional handshake or something like that on top of this, and ignore any certificate chain errors. The problem is that you are open to MitM attacks still, because anyone in the middle can just proxy the connection back and forth between client and server and wait until you did your validation and then when you think that you know who you are talking to, listen / modify to everything that goes in between.

The issue isn't just proving encrypting the connection but also that you know who is on the other side.

If you already know who is on the other side, you can use the certificate thumbprint to validate the connection on both ends, without needing to trust a CA. But that leave you with how you'll let the other side know what is the right thumbprint.

A good example of that might be a connection string for a database that would include the expected certificate thumbprint and fail otherwise, not checking any CA trust along the way. You'll presumably trust that the DBA gave you the right cert thumbprint.

Except that certificate change, and when your admin update the cert (because it expired), all clients are going to be broken, so it isn't quite as simple.

03 Nov 2017
22:11 PM

Tyler Jensen

I followed your logic until you reached the assumption that the SRPP and ZK do not establish a private once per connection symmetric encryption key that would prevent a man in the middle from ever being able to capture and read the data in the middle. And since only the real client and real server know the shared key and that key is never passed across the wire, the man in the middle impersonation is impossible. It does add overhead and you definitely need control of the protocol. No SSL or TLS involved. This is what my little MessageWire library does along with many other SRPP and ZK implementations. But as far as I can tell, the complexity of implementation and lack of any real standard prevents its general use, so you're stuck trusting certs.

03 Nov 2017
23:33 PM

Oren Eini

Oleg, Suggesting specific topics for discussion is the best way to ensure that there will be more content.

Technically speaking, you can detect if an incoming connection is HTTP or HTTPS. but you must reply in the same protocol. That isn't trivial, see this post:

https://ayende.com/blog/180513/the-bare-minimum-a-distributed-system-developer-should-know-about-https-negotiation?key=6329dd9044194fbcb7017d018491f05b

As for trying to decrypt the packet, that isn't how it works. SSL works by having the client & server negotiate a shared secret for the connection, so just having the certificate for both a.com and b.com won't help you to figure out what is going on if you have started from the middle of the connection. And you need to send the certificate back as one of the very first things that you do during the establishment of the SSL connection. That is why SNI is so important, it allow the user to select which certificate to send.

03 Nov 2017
23:38 PM

Oren Eini

Oleg, Actually, the server can't know that. Imagine that you have a SSL proxy that direct connection for HTTPS, S/MIME, sftp. So getting a connection doesn't tell you what kind of data is going through. This is intentional because it make it very widely useful. Clients can use ALPN to indicate what is the application level protocol they need, but that is mostly so you can select the proper certificate or maybe route it internally more easily.

03 Nov 2017
23:42 PM

Oren Eini

Tyler, The problem here is that you need a shared secret between both client & server.

Imagine that I'm setting up a new server, let's call it my-awesome-service-2. Now, how do I let all the users know that not only this service exists, but what is the shared secret that they should use to access it? How do I rotate this key, etc?

This is a level below authentication, because you need to know who you are talking to. Indeed, you can avoid MitM by encrypting the data itself, but I assume you were talking about identity of the 2nd party, not the data encryption.

03 Nov 2017
23:52 PM

Tyler Jensen

In that case the shared secret is the user's password. If the server side does not also know that password or a hash of it, then the connection fails and the password never crosses the wire, in any form. This is how my MessageWire and ServiceWire libraries work.

04 Nov 2017
00:44 AM

Tyler Jensen

I don't mean to suggest that these libraries or the general idea behind them are a solution to the problem, per we, only that they may get the juices flowing as part of the conversation. Hopefully it's interesting and of some use. Cheers, T

05 Nov 2017
07:17 AM

Oren Eini

Tyler, Sure, that is easy, but this requires you to have a shared secret that you passed around. If you want to connect to another server without pre arranging such a secret, I don't think you have something better available than certs.

05 Nov 2017
15:29 PM

Tyler Jensen

Oren, that is true, but one assumes that you have already passed a password to the user or the user has created one via a cert protected TLS interaction. That one time interaction seems less vulnerable to the man in the middle compromise. And if that interaction had been compromised and you were talking to a server that was not your own, then your server when you make a connection using SRPP would not know your secret or would have a different one than your client does and the connection would fail. Of course one might suppose the client would then always interact with that same fake man in the middle server, but that might be stretching it. I'm not suggesting SRPP is perfect for establishing shared secrets, but I do believe that for making and verifying connection identity in both ends on a repeated basis with unique encryption on every connection, it is superior to certs. Just my humble opinion. -T

05 Nov 2017
18:21 PM

Oren Eini

Tyler, What are are describing is pretty much how SSH works. The first time you connect to a server, it remembers the key, and if it changes, it freaks out. The difference here is in terms of usage. Consider RavenDB specifically, it is easy to have this done for the usual communication, but what about using REST tools or the browser itself? They don't know anything about this custom protocol, which means a lot of operational overhead. By using SSL directly and HTTPS in particular, we gain something very valuable, inter-operability with pretty much everything in the market. And that is important.

15 Nov 2017
14:18 PM

Oleg Mihailik

Oren, thanks for posting the new article on the very related topic today, answering on of my questions!

I understand the extra load in recognizing the intended protocol from the leading frame. However that's a minor technical difficulty solvable with finite small resources. There are only so many (handful) protocols the proxy needs to support.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

The bare minimum a distributed system developer should know aboutCertificates

More posts in "The bare minimum a distributed system developer should know about" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

More posts in "The bare minimum a distributed system developer should know about" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication