Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 487 words

RavenDB Cloud has a whole bunch of new features that were quietly launched over the past few months. I discuss them in this post. It turns out that the team keeps on delivering new stuff, faster than I can write about it.

The following new auto-scaling feature is a really interesting one because it is pretty simple to understand and has some interesting implications for production.

You need to explicitly enable auto-scaling on your cluster. Here is what that looks like:

Once you enabled auto-scaling - which usually takes under a minute - you can click the Configure button to set your own policies:

Here is what this looks like:

The idea is very simple, we routinely measure the load on the system, and if we detect a high CPU threshold for a long time, we’ll trigger scaling to the next tier (or maybe higher, see the Upscaling / Downscaling step options) to provide additional resources to the system. If there isn’t enough load (as measured in CPU usage), we will downscale back to the lowest instance type.

Conceptually, this is a simple setup. You use a lot of CPU, and you get a bigger machine that has more resources to use, until it all balances out.

Now, let’s talk about the implications of this feature. To start with, it means you only pay based on your actual load, and you don’t need to over-provision for peak load.

The design of this feature and RavenDB in general means that we can make scale-up and scale-down changes without any interruption in service. This allows you to let auto-scaling manage the size of your instances.

In the image above, you may have noticed that I’m using the PB line of products (PB10 … PB50). That stands for burstable instances, which consume CPU credits when in use. How this interacts with auto-scaling is really interesting.

As you use more CPU, you consume all the CPU credits, and your CPU usage becomes high. At this point, auto-scaling kicks in and moves you to a higher tier. That gives you both more baseline CPU credits and a higher CPU credits accrual rate.

Together with zero downtime upscaling and downscaling, this means you can benefit from the burstable instances' lower cost without having to worry about running out of resources.

Note that auto-scaling only applies to instances within the same family. So if you are running on burstable instances, you’ll get scaling from burstable instances, and if you are running on the P series (non-burstable), your auto-scaling will use P instances.

Note that we offer auto-scaling for development instances as well. However, a development instance contains only a single RavenDB instance, so auto-scaling will trigger, but the instance will be inaccessible for up to two minutes while it scales. That isn’t an issue for the production tier.

time to read 2 min | 341 words

Our cloud team just finished pushing a big set of features to production. Some of them are user facing and add some nice features that I wanted to talk about. The most important feature we have in this cycle is directly exposing your instances metrics to you.

Here is what this looks like:

image

This is a significant quality of life improvement for both our users and the cloud support team, since that makes it much easier to understand what is going on from an operational perspective.

From experience, one of the most common issues that users are running into is hitting the limits of their I/O. Disk I/O in the cloud is a… complicated beast. As a database, RavenDB is sensitive to the I/O platform that it is running on. We have now made it clear what exactly you are getting from the underlying system. This is what this looks like:

image

You can also raise those values, of course. In fact, you can now selectively raise your disk performance selectively on Azure (you could always do that on AWS). This is what this looks like:

image

As you can see, you can change both the size of the disk (which is permanent) and the performance tier. On Azure, you may change the performance tier for the disk every 12 hours (6 hours on AWS), so that isn’t something that you enable instantly. It is a very useful feature if you are expected a high load (such as big import, deployment of new indexes on large databases, initial replication, etc). Once the load is complete, you can reduce the performance tier and use a cheaper disk for your needs.

The metrics & the ability to change the disk performance tier means that you don’t need to contact support to either figure out what is wrong or what to do about it.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}