Raven Replication: Scenarios
I am currently writing the tests for the replication bundle. I managed to overcome the biggest problem (my stupidity), and we now have some passing tests :-)
What I am looking for is more scenarios like this, so we would have as many tested scenarios as possible. I don’t want code, or anything like that, just give me a set scenario to try against the replication system.
Feel free to make them as complex as you wish.
Here is an example of how I think about he tests that we currently have:
Command | Result | |
Replicating PUT | PUT raven1/docs/ayende | 200 |
GET raven2/docs/ayende | 200 | |
Replicating DELETE | PUT raven1/docs/ayende | 200 |
GET raven2/docs/ayende | 200 | |
DELETE raven1/docs/ayende | 201 | |
GET raven2/docs/ayende | 404 |
Comments
I'd probably test the usual 'bank transfer' scenario, ensuring transactions are atomic across replication too?
What about concurrent transactions, does replication update correctly there?
Andrew,
That is pretty meaningless, since Raven is transactional, and the replication system only operation on committed data.
Are you sure you have defeated your stupidity? I'm trying for thirty years with little effect...
Have you tested replication of indexes as well? Are they stores as data?
Robert,
No, indexes aren't replicated. This is because it is easier to just compute them than to send them on the wire
With CouchDB, I've been looking at using replication in two ways.
1) Using HAProxy as a load balancer, then using CouchDB to continuously replicate between the two loads in a master-master approach. This in affect creates a database cluster in a very cost effective fashion - try doing that in SQL Server...
2) Similar fashion to above by using HAProxy, but in a master-child approach. Master holds all the data, then each child holds a subset done via the Replication Filters. Thinking about using this for geo-location of catalog data between our different stores... Having everything geo-located would be expensive, having just the subset would solve that problem. If the node goes down, then HAProxy restores back to the master, or more ideally another node which holds the data.
I've been meaning to write a blog post about it....
These the kind of scenarios you are looking for?
Ben
Ben,
That helps, yes, although I was thinking about stuff lower down the stack.
With the master / master approach, don't you find that you get conflicts?
The expectation is that you'll setup the indexes on all machines as part of your initial setup.
Raven's indexes are the closest thing to a schema that it has. It doesn't make sense to replicate the indexes, because you might not want to pay the indexing cost on a backup only copy, or might want to have different projections.
Moving the index data is too costly when we can just recalcute it
Why not to replicate just index definitions?
Jesus,
Because it remove the ability to have different indexes on different machines.
It complicate things because we now have to track whatever an index was changed or not.
It means that we need to track when an index was changed, in case the user wanted to reforce an index re-fresh.
I don't see the benefit
Ayende,
I see one benefit in the high availability scenario: simpler administration. Since you need both machines identical to serve the same requests, if automatic replication of indices is not in place, you need to repeat all index creations and modifications in all servers manually.
I'm not saying that all replicated servers must have the same indexes, but I think having the option to replicate index definitions to other server would be nice.
I'd have to agree there, I'd prefer (and expect) indexes to be replicated (their definitions at least, let each server calculate the data though).
Simpler administration, and its expected behavior.
Even on a backup server, I'd opt to have it replicated - it doesn't matter if it is or isn't surely.
Minor point. Should the DELETE scenario return a 200 (OK) instead of a 201 (Created)?
Question about RavenDb.
Couchdb takes the approach that views are not indexed until the view is requested (in order to save cpu cycles). Personally I disagree with this approach because because I would prefer that the view be ready for me so to speak.
Also views in couch do not pass stale results by default unless you pass the stale:"ok" param. I would in the majority of cases prefer to have the stale results and then have the view index update async after the request.
I'd be really interested to hear what approach are you taking with Raven and your thought process behind it?
Dokieboy ,
Raven actually returns 204 (No Content), that is a top on my part
Martin,
Indexes in Raven are built on the background, and they will return stale results (with the appropriate notification)
Comment preview