RavenDB ShardingAdding a new shard to an existing cluster, the easy way
Continuing on the theme of giving a full answer to interesting questions on the mailing list in the blog, we have the following issue.
We have a sharded cluster, and we want to add a new node to the cluster, how do we do it? I’ll discuss a few ways in which you can handle this scenario. But first, let us lay out the actual scenario.
We’ll use the Customers & Invoices system, and we put the data in the various shard along the following scheme:
Customers | Sharded by region |
Invoices | Sharded by customer |
Users | Shared database (not sharded) |
We can configure this using the following:
var shards = new Dictionary<string, IDocumentStore> { {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}}, {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}}, {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}}, };ShardStrategy shardStrategy = new ShardStrategy(shards) .ShardingOn<Company>(company =>company.Region, region => { switch (region) { case "USA": case "Canada": return "NA"; case "UK": case "France": return "EU"; default: return "Shared"; } }) .ShardingOn<Invoice>(invoice => invoice.CompanyId) .ShardingOn<User>(user=> "Shared");
So far, so good. Now, we have so much work that we can’t just have two servers for customers & invoices, we need more. We change the sharding configuration to include 2 new servers, and we get:
var shards = new Dictionary<string, IDocumentStore> { {"Shared", new DocumentStore {Url = "http://rvn1:8080", DefaultDatabase = "Shared"}}, {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}}, {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}}, {"EU2", new DocumentStore {Url = "http://rvn4:8080", DefaultDatabase = "Europe-2"}}, {"NA2", new DocumentStore {Url = "http://rvn5:8080", DefaultDatabase = "NorthAmerica-2"}}, }; var shardStrategy = new ShardStrategy(shards); shardStrategy.ShardResolutionStrategy = new NewServerBiasedShardResolutionStrategy(shards.Keys, shardStrategy) .ShardingOn<Company>(company => company.Region, region => { switch (region) { case "USA": case "Canada": return "NA"; case "UK": case "France": return "EU"; default: return "Shared"; } }) .ShardingOn<Invoice>(invoice => invoice.CompanyId) .ShardingOn<User>(user => user.Id, userId => "Shared");
Note that we have a new shard resolution strategy, what is that? This is how we control lower level details of the sharding behavior, in this case, we want to control where we’ll write new documents.
public class NewServerBiasedShardResolutionStrategy : DefaultShardResolutionStrategy { public NewServerBiasedShardResolutionStrategy(IEnumerable<string> shardIds, ShardStrategy shardStrategy) : base(shardIds, shardStrategy) { } public override string GenerateShardIdFor(object entity, ITransactionalDocumentSession sessionMetadata) { var generatedShardId = base.GenerateShardIdFor(entity, sessionMetadata); if (entity is Company) { if (DateTime.Today < new DateTime(2015, 8, 1) || DateTime.Now.Ticks % 3 != 0) { return generatedShardId + "2"; } return generatedShardId; } return generatedShardId; } }
What is the logic? If we have a new company, we’ll call the GenerateShardIdFor(entity) method, and for the next 3 months, we’ll create all new companies (and as a result, their invoices) in the new servers. After the 3 months have passed, we’ll still generate the new companies on the new servers at a rate of two thirds on the new servers vs. one third on the old servers.
Note that in this case, we didn’t have to modify any data whatsoever. And over time, the data will balance itself out between the servers. In my next post, we’ll deal with how we actually split an existing shard into multiple shards.
More posts in "RavenDB Sharding" series:
- (22 May 2015) Adding a new shard to an existing cluster, splitting the shard
- (21 May 2015) Adding a new shard to an existing cluster, the easy way
- (18 May 2015) Enabling shards for existing database
Comments
Nice hack. But not agile enough for real-world scalable applications. Let's say I'm introducing a new shard as my current shard for the USA is dying due to the load. The provided solution would not help me as it would not decrease the load for the current shard (as only new documents are saved to a new node).
You announced a real shards rebalancer in one of the previous posts (if I got you right). What is the planned release? RavenDb 4.0?
Idsa, It says right there in the title, it is the easy way, there are most posts coming. And the server side sharding is probably 4.0, yes
@Idsa it's not a hack. "it would not decrease the load for the current shard " of course not, there's no free lunch. Adding shards AND rebalancing them INCREASES LOAD. So the time to act is always before you hit capacity (or even near it). So if you were monitoring the system well, Ayende's solution is a perfectly viable option. (You should have metrics on average reads/writes etc by shard, blind sharding is almost never helpful)
If you're ever in a situation where you need to rebalance shards because you're out of capacity, you've already lost. There's no meaningful solution other than taking an outage or putting your site into read-only mode (assuming it has that capability) and hoping that in read-only mode doesn't still produce too much stress for you to rebalance the shards.
Comment preview