Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 543 words

It is easy to think about a service that listen to the network as just that, it listens to the network. In practice, this is often quite a bit more complex than that.

For example, what happens when I’m doing something like this?

image

In this case, we are setting up a web server with binding to the local machine name. But that isn’t actually how it works.

At the TCP level, there is no such thing as machine name. So how can this even work?

Here is what is going on. When we specify a server URL in this manner, we are actually doing something like this:

image

And then the server is going to bind to each and every one of them. Here is an interesting tidbit:

image

What this means is that this service doesn’t have a single entry point, you can reach it through multiple distinct IP addresses.

But why would my machine have so may IP addresses? Well, let us take a look. It looks like this machine has quite a few network adapters:

image

I got a bunch of virtual ones for Docker and VMs, and then the Wi-Fi (writing on my laptop) and wired network.

Each one of these represent a way to bind to the network. In fact, there are also over 16 million additional IP addresses that I’m not showing, the entire 127.x.x.x range. (You probably know that 127.0.0.1 is loopback, right? But so it 127.127.127.127, etc.).

All of this is not really that interesting, until you realize that this has real world implications for you. Consider a server that has multiple network cards, such as this one:

image

What we have here is a server that has two totally separate network cards. One to talk to the outside world and one to talk to the internal network.

When is this useful? In pretty much every single cloud provider you’ll have very different networks. On Amazon, the internal network gives you effectively free bandwidth, while you pay for the external one. And that is leaving aside the security implications

It is also common to have different things bound to different interfaces. Your admin API endpoint isn’t even listening to the public internet, for example, it will only process packets coming from the internal network. That adds a bit more security and isolation (you still need encryption, authentication, etc of course).

Another deployment mode (which has gone out of fashion) was to hook both network cards to the public internet, using different routes. This way, if one went down, you could still respond to requests, and usually you could also handle more traffic. This was in the days where the network was often the bottleneck, but nowadays I think we have enough network bandwidth that program efficiency is of more importance and this practice somewhat fell out of favor.

time to read 3 min | 586 words

I mentioned in a previous post that an SSL connection will typically use a Server Name Indication in the initial (unencrypted) packet to let the server know  which address it is interested in. This allow the server to do things such as select the appropriate certificate to answer this initial challenge.

A more interesting scenario is when you want to force your users to always use HTTPS. That is pretty trivial, you setup a website to listen on port 80 and port 443 and redirect all HTTP traffic from port 80 to port 443 as HTTPS. Pretty much any web server under the sun already have some sort of easy to use configuration for that that. Let us see how this will look like if we were writing this using bare bones Kestrel.

This is pretty easy, right? We setup a connection adapter on port 80, so we can detect that this is using the wrong port and then just redirect it. Notice that there is some magic that we need to apply here. At the connection adapter, we deal with raw TCP socket, but we don’t want to mess around with that, so we just pass the decision up the chain until we get to the part that deal with HTTP and let it send the redirect.

Pretty easy, right? But about about when a user does something like this?

http://my-awesome-service:443

Note that in this case, we are using the HTTP protocol and not the HTTPS protocol. At that point, things are a mess. A client will make a request and send a TCP packet containing HTTP request data, but the server is trying to parse that as an SSL client help message. What will usually happen is that the server will look at the incoming packet, decide that this is garbage and just close the connection. That lead to some really hard to figure out errors and much forehead slapping when you figure out what the issue is.

Now, I’m sure that you’ll agree that anyone seeing a URL as listed about will be a bit suspicious. But what about these ones?

  • http://my-awesome-service:8080
  • https://my-awesome-service:8080

Unlike before, where we would probably notice that :443 is the HTTPS port and we are using HTTP, here there is no additional indication about what the problem is. So we need to try both. And if a user is getting connection dropped error when trying the connection, there is very little chance that they’ll consider switching to HTTPS. It is far more likely that they will start looking at the firewall rules.

So now, we need to do protocol sniffing and figure out what to do from there. Let us see how this will look like in code:

We read the first few bytes of the request and see if this is the start of an SSL TCP connection. If it is, we forward the call to the usual Kestrel HTTPS behavior. If it isn’t, we mark the request as must redirect and pass it, as is, to the request parsed and ready for action and then send the redirect back.

In this way, any request on port 80 will be sent to port 443 and an HTTP request on a port that listens to HTTPS will be told that it needs to switch.

One note about the code in this post. This was written at 1:30 AM as a proof of concept only. I’m pretty sure that I’m heavily abusing the connection adapter system, especially with regards to the reflection bits there.

time to read 4 min | 737 words

DNS is used to resolve a hostname to an IP. This is something that most developers already know. What is not widely known, or at least not talked so much is the structure of the DNS network. To the right you can find the the map of root servers, at least in a historical point of view, but I’ll get to it.

If we have root servers, then we also have non root servers, and probably non root ones. In fact, the whole DNS system is based on 13 well known root servers who then delegate authority to servers who own the relevant portion of the namespace, you can see that in the diagram below. It goes down like that for pretty much forever.

File:Dns-server-hierarchy.gif

Things become a lot more interesting when you start to consider that traversing the full DNS path is fast, but it is done trillions of times per day. Because of that, there are always caching DNS servers in the middle. This is where the TTL (time to live) aspect of DNS records come into play.

A DNS is basically just a distributed database with very slow updates. The root servers allow you to reach the owner of a piece of the namespace and from that you can extract the relevant records for that namespace. All of that is backed with the premise that DNS values change rarely and that you can cache them for long durations, typically minutes at the low end and usually for days.

This means that a DNS query will most often hit a cache along the way and not have to traverse the entire path. For that matter, portions of the path are also cached. For example, the DNS route for the “.com” domain is usually cached for 48 hours. So even if you are using a new hostname, you’ll typically be able to skip the whole “let’s go to the root server” and stop at somewhere along the way.

For developers, the most common usage of DNS is when you’ll edit the “/etc/hosts” to enable some scenario (such as local development with the real URLs). But most organizations has their own DNS (if only so you’ll be able to find other machines on the organization network). This include the ability to modify the results of the public DNS, although this is mostly done at coffee shops.

I also mentioned earlier that the the map above is a historical view of how things used to be. This is where things gets really confusing. Remember when I said that a DNS is mapping a hostname to IP? Well, the common view about an IP being a pointer to a single server is actually false. Welcome to the wonderful world of IP Anycast. Using anycast, you can basically specify multiple servers with the same IP. You’ll typically route to the nearest node and you’ll usually only do that for connectionless protocols (such as DNS). This is one of the ways that the 13 root servers are actually implemented. The IPs are routed to multiple locations.

This misdirection is done by effectively laying down multiple paths to the same IP address using the low level routing protocols (a developer will rarely need to concern themselves with that, this is the realm of infrastructure and network engineers). This is how the internet usually works, you have multiple paths that you can send a packet and you’ll chose the best one. In this case, instead of all the paths terminating in a single location, they’ll each terminate in a different one, but they will behave in the same manner. This is typically only useful for UDP, since each packet in such a case may reach a totally different server, so you cannot use TCP or any connection oriented protocols.

Another really interesting aspect of DNS is that there really isn’t any limitation on the kind of answers it returns. In other words, querying “localtest.me” will give you 127.0.0.1 back, even though this is an entry that reside on the global internet, not in your own local network. There are all sort of fun games that one can play with this approach, by making a global address point to a local IP address. One of them is the possibility of issuing a SSL certificate for a local server, which isn’t expose to the internet. But that is a hack for another time.

time to read 5 min | 848 words

trust-1418901_640After explaining all the ways that trust can be subverted by DNS, CA and a random wave to wipe away the writing in the sand, let us get down to actual details about what matters here.

HTTPS / SSL / TLS, whatever it is called this week, provides confidentially over the wire for the messages you are sending. What it doesn’t provide you is confidentially from knowing who you talked too. This may seem non obvious at first, because the entire communication is encrypted, so how can a 3rd party know who I’m talking about?

Well, there are two main ways. It can happen through a DNS query. If you need to go to “http://my-awesome-service”, you need to know what the IP of that is, and for that you need to do a DNS query. There are DNS systems that are encrypted, but they aren’t widely deployed and in general you can assume that people can listen to your DNS and figure out what you are doing. If you go to “that-bad-place”, it is probably visible on someone’s logs somewhere.

But the other way that someone can know who you are talking to is that you told them so. How did you do that?

Well, let’s consider one of the primary reasons we have HTTPS. a user has to validate that the hostname they used matched the hostname on the certificate. That seems pretty reasonable, right? But that single requirement pretty much invalidates the notion of confidentiality of who I’m talking to.

Consider the following steps:

  • I go to “https://my-awesome-service”
  • This is resolved to IP address 28.23.155.123
  • I’m starting an SSL connection to that IP, at port 443. Initially, of course, the connection is not encrypted, but I’ve just initiated the SSL connection.

At that point, any outside observer that can listen to the raw network traffic know what site you have visited. But how can this be? Well, at this point, the server needs to return a reply, and it needs to do that using a certificate.

Let us go with the “secure” option and say that we are simply sending over the wire “open ssl connection to 28.23.155.123”. What does this tell the listener? Well, since at this point the server doesn’t know what the client wants, it must reply with a certificate. That certificate must be the same for all such connections and the user will abort the connection if the certificate will not match the expected hostname.

What are the implications there? Well, even assuming that I don’t have a database of matching IP addresses to their hostnames (which I would most assuredly do), I can just connect myself to the remote server and get the certificate. At this point, I can just inspect the hostname from the certificate and know what site the user wanted to visit. This is somewhat mitigated by the fact that a certificate may contain multiple hostnames or even wildcards, but even that match gives me quite a lot of information about who you are talking to.

However, not sending who I want to talk to over the initial connection has a huge cost associated with it. If the server doesn’t know who you want, this means that each IP address may serve only a single hostname (otherwise we may reply with the wrong certificate. Indeed, one of the reasons HTTPS was expensive was this tying of a whole IP address for a single hostname. On the other hand, if we sent the hostname were were interested in, the server would be able to host multiple HTTPS websites on the same machine, and select the right certificate at handshake time.

There are two ways to do that, one is called SNI – Server Name Indication. Which is basically a header in the SSL connection handshake that says what the hostname is. The other is ALPN – Application Level Protocol Negotiation, which allows you to select how you want to talk to the server. This can be very useful if you want to connect to the server as one client with HTTP and on another using HTTP/2.0. That has totally different semantics, so routing based on ALPN can make things much easier.

At this point, the server can make all sorts of interesting decisions with regards to the connection. For example, based on the SNI field, it may forward the connection to another machine, either as the raw SSL stream or by stripping the SSL and sending the unencrypted data to the final destination. The first case, of forwarding the raw SSL stream is the more interesting scenario, because we can do that without having the certificate. We just need to inspect the raw stream header and extract the SNI value, at which point we route that to the right location and send the connection on its merry way.

I might do more posts like this, but I would really appreciate feedback. Both on whatever the content is good and what additional topics would you like me to cover?

time to read 6 min | 1182 words

padlock-174084_640
Transport level security, also known as TLS or SSL, is a way to secure a connection from prying eyes. This is done using math, so we know that this is good. I can also count to 20 if I take my shoes off, so you know you can trust me on that.

On very slightly more serious mode, SSL/TLS gives us one half of the security requirements. We can negotiate a secure chiper and a key and rest assured that no outside source can listen to what we are saying to the other side.

Notice that I said one half? The other half is knowing who is on the other side. This is usually done using certificates, which provide the public / private keys for the connection, and the signer of the certificate is what provides the identity of the remote connection. In other words, when I’m using SSL/TLS, I need to also know who am I going to be talking to, and then verify in some manner using the certificate that they provide me that they are indeed who they are.

Let us deconstruct the simplest of operations, GET https://my-awesome-service:

  • First, we need to find the IP of my-awesome-service.
  • Then, we negotiate an secured connection with this IP.
  • Profit?

This would seem like the end of things, but we need to dig a bit deeper. I’m contacting my-awesome-service, but before I can do that, I need to first check what IP maps to that name. To do that I need to do a DNS query. DNS is usually unsecured, so anyone can see what kind of host names you are asking for. What is more interesting, there is absolutely nothing that prevent a Bad Guy from spoofing DNS responses to you. In fact, this has been a very fruitful area of attacks.

There is DNS Sec, which will protect you from forged requests in the last mile, but less than 15% of the world wide record are actually signed using DNS Sec, so you can usually assume that you won’t be using that. In fact, even if the domain is signed, because so many domains aren’t, most systems will be configured to assume that an unsigned request is valid by default, instead of the other way around. This make things fun at security circles, I’m sure. But for our purposes, you should know that DNS is great, but you probably shouldn’t rely on it. Errors, mistakes and outright forgery is possible.

If you want to see a simple example, head over to “/etc/hosts” on Linux or “%windir%/system32/drivers/etc/hosts” on Windows and add some fake entries there. You can have fun with pointing stackoverflow.com to lmgtfy.com, for example.

You can do it like so:

54.243.173.79 stackoverflow.com

With 54.243.173.79 being the IP address of lmgtfy.com.  Once you have done that, requests that you think are going to stackoverflow.com will be sent to lmgtfy.com, with hilarity soon to follow.

Oh, except that this won’t work. StackOverflow is using HTTPS, and they are also using HSTS (HTTP Strict Transport Security). Basically, this means that once you have visited StackOverflow even once, your browser will remember that this domains require HTTPS to work, and will outright refuse to access the site without it.

But what is the problem? HSTS is great, but it just require HTTPS. So if I managed to spoof your DNS somehow (if I could modify the hosts file, I’m already admin and own the box, but assuming that I haven’t gotten there), all I would really need to do is to make sure that the websites I spoof give you a certificate. But here the second half of SSL come into play. The client making the request is going to validate that the hostname it provide is located in the certificate that the server provided. So far, that make sense. But the server could just generate whatever certificate it wants, no?

In order to prevent that, there is a chain of trust. Basically, you need to have a list of trusted root certificates that your trust, and you verify that the certificate that you got from the remote server was directly (or indirectly, in some cases) signed by them, presumably after some level of verification. Reading the actual list of trusted roots is interesting.

The Mozilla list has about 160 root certificates and includes such entities as the Government of Turkey, where all journalists will tell you that the government is free & fair (all those who would say otherwise are not there). On my Windows machine, there are about 50 root certificates, and at least at one point that included Equifax, who we know can be trusted. On a work machine, you can be fairly certain that there are additional root certificates (from the domain, for example). But for now, we’ll ignore the possibility of a bad trusted root certificate and assume that the system is working as it is meant to be. And to be fair, any violations are punished by revocation of the root certificate. This is the current state with the Equifax root certificate on my machine, for example, it has been revoked.

Another mitigation here is that there is an ongoing process to encourage certificate issuance transparency. That means that a domain can specify which CA are allowed to issue certificates for it (called key pinning). Of course, this is distributed via DNS, and we already seen that this ain’t too hot either, but it is a matter of defense in depth. Key pinning also create some fun ransomware options. If I can get control over your DNS records in some manner, including by spoofing them, I can set key pinning to a key that only I have, resulting in large number of users unable to access you site because it is not using the “correct” key.  But I’m digressing. There is also the notion that a browser can do something called OCSP (online certificate status protocol), which basically states that a user can query the CA for whatever the certificate is valid. The catch is that if the CA doesn’t answer (vs. answer that the cert is invalid), the certificate is assume to be valid. This is done because a CA going down may then take down significant parts of the internet, leaving aside such concerns as the latency issues that this would require.

If you think the notion of a rouge trusted root is fantasy, there have been multiple cases of false certificates (DigiNotar, Symantec, TrustWave, etc), each with hundreds of certificates being issues (or even blank checks certificates, which can be used to generate any certificate you wish for). To combat that, there is now an effort to implement Certificate Transparency. Basically, in order to trust a certificate, it must show up in a public list. That allow admins to check that no one issued certificates for their domains.

This post has gotten quite long, so I’ll leave you with this worrisome ending and continue talking about how this applies to distributed systems in the next post.

time to read 5 min | 925 words

floppy-disk-27810_640
As a developer, it is easy to operate at the level of message passing between systems, utilizing sophisticated infrastructure to communicate between nodes and applications. Everything works, everything in its place and we can focus on the Big Picture.

That works great, because while everything is humming, you don’t need to know any of the crap that lies beneath the surface. Unfortunately, if you are developing distributed systems, you kinda of really need to know these things. As in, you can’t do proper job with them if you can’t.

We’ll start from the basics. Any networked service needs to listen on the network, which means that need to provide the service with the details on what to listen on.

Now, typically you’ll write something like http://my-awesome-service and leave it as that, not thinking about this any further, but let break this apart for a bit. When you hand the tcp listener an address such as this one, it doesn’t really know what it can do with this. At the TCP level, such a thing is meaningless.

At the TCP level, we deal with IP and ports. For simplicity’s sake, I’m going to ignore IPv4 vs. IPv6, because they don’t matter that much at this level (except that they do, but we’ll get to that later). This means that we need to have some way to translate “my-awesome-service” into an IP address. At this point, you are probably recalling about such a thing as DNS (Domain Name System), which is exactly the solution for that. And you would be right, almost.

The problem is that we aren’t dealing with a flat network view. In other words, we can’t assume that the IP address that “my-awesome-service” is mapped to is actually available on the end machine our server is running on. But how can that be? The whole point is that I can just point my client there and the server is there.

The following is a non exhaustive list (but certainly exhausting) of… things that are in the way.

  • NAT (Network Address Translation)
  • Firewalls
  • Routers
  • Load balancers
  • Proxies
  • VPN
  • Monitoring tools
  • Security filtering
  • Containers

In the simplest scenario, imagine that you have pointed my-awesome-service to IP address 31.167.56.251. However, instead of your service being there, there is a proxy that will forward any connections on port 80 to an internal address at 10.0.12.11 at port 8080. That server is actually doing on traffic analysis, metering and billing for the cloud provider you are using, after which it will pass on the connection to the actual machine you are using, at 10.0.15.23 on port 8888. You might think that the journey is over, but you are actually forgot that you are running your service as a container. So the host machine needs to forward that to the guest container on 127.0.0.3 on port 38880.  And believe it or not, this is probably skipping half a dozen steps that actually occur in production.

If you want to look at the network route, you can do that using “traceroute” of “tracert”. Here is what a portion of the output looks like from my home to ayende.com. Note that this is how long it takes me to get to the IP address that the DNS says is hosting ayende.com, not actually routing from that server to the actual code that runs this blog.

myhome.mynet [10.0.0.138]
bzq-179-37-1.cust.bezeqint.net [212.179.37.1]
10.250.0.170
bzq-25-77-10.cust.bezeqint.net [212.25.77.10]
bzq-219-189-2.dsl.bezeqint.net [62.219.189.2]
bzq-219-189-57.cablep.bezeqint.net [62.219.189.57]
ae9.cr1-lon2.ip4.gtt.net [46.33.89.185]
et-2-1-0.cr3-sea2.ip4.gtt.net [141.136.110.193]
amazon-gw.ip4.gtt.net [173.205.58.86]
52.95.52.146
52.95.52.159
54.239.42.215
54.239.43.130
52.93.13.44
52.93.13.35
52.93.12.110
52.93.12.133

This stuff is complex, but usually, you don’t need to dive that deeply into this. At certain point, you are going to call to the network engineers and let them figure it out. We are focused on the developer aspect of understanding distributed systems.

Now, here is the question, to test if you are paying attention. What did your service bind to, to be able to listen to over the network?

If you just gave it “http://my-awesome-service” as the configuration value, there isn’t much it can do. It cannot bind to 31.168.56.251, since that URL does not exist on the container. So the very first thing that we need to understand as distributed systems developers is that “what do I bind to” can be very different from “what do I type to get to the service”.

This is the first problem, and it is usually a major hurdle to grok. Mostly because when we are developing internally, you’ll typically use either machine names or IP addresses and you’ll typically consider a flat network view, not the segmented one that is actually in place. Docker and containers actually make you think about some of that a lot more, but even so most people don’t consider this too much.

I’m actually skipping on a bunch of details here. For example, a server may want to listen to multiple IPs (internal & external), maybe with different behaviors on each. A server may actually have multiple network cards and want to listen to both (for more bandwidth). It is also quite common to have a dedicate operations network, so the server will listen to the public network to respond to general queries, but things like SNMP or management interface is only exposed to the (completely separated) ops network. And sometimes things are even weirder, with crazy topologies that look Escher paintings.

This post has gone on long enough, but I’ll have at least another post in this series.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}