What is the cost of storage, again?
So, I got a lot of exposure about my recent post about the actual costs of saving a few bytes from the fields names in schema-less databases. David has been kind enough to post some real numbers about costs, which I am going to use for this post.
The most important part is here:
- We have now moved to a private cloud with Terremark and use Fibre SANs. Pricing for these is around $1000 per TB per month.
- We are not using a single server – we have 4 servers per shard so the data is stored 4 times. See why here. Each shard has 500GB in total data so that’s 2TB = $2000 per month.
So, that gives a price point of 4 US Dollars per gigabyte.
Note that this is a pre month cost, which means that it is going to cost a whopping of 48$ per year. Now, that is much higher cost than the 5 cents that I gave earlier, but let us see what this gives us.
We will assume that the saving is actually higher than 1 GB, let us call it 10 GB across all fields across all documents. Which seems a reasonable number.
That thing now costs 480 $per year.
Now let us put this in perspective, okay?
At 75,000$ a year (which is decidedly on the low end, I might add), that comes to less than 2 days of developer time.
It is also less than the following consumer items
- The cheapest iPad - 499$
- The price of the iPhone when it came out - 599$
But let us talk about cloud stuff, okay?
- A single small Linux instance on EC2 – 746$ per year.
In other words, your entire saving isn’t even the cost of adding the a single additional node to your cloud solution.
And to the nitpickers, please note that we are talking about data is is already replicated 4 times, so it already includes such things as backups. And back to the original problem. You are going to lose more than 2 days of developer time on this usage scenario when you have variable names like tA.
A much better solution would have been to simply put the database on a compressed directory, which would slow down some IO, but isn’t really important to MongoDB, since it does most operations in RAM anyway, or just implement document compression per document, like you can do with RavenDB.
Comments
If you use some kind of mapping the developer never has to see those field names. So what's wrong with it? As far as I'm concerned the names can be "\001" ,"\002", etc. as long as the developer doesn't ever see them.
You hit the problem that people raise with the size of column names in your last sentence. MongoDB is in RAM. If you do not have your entire dataset in RAM you will get no where near the performance that MongoDB is touted for and this is exacerbated on EC2 when there are quite a few times per day where disk access can suddenly drop of 400-600ms per request. If you don't have your entire dataset in RAM on EC2 you are not going to get the performance you want. That may not matter for your particular app, but it is an important consideration when trying to understand costs. Its not the disks - its the ram.
I'm not exactly sure what kind of mapping you are talking about, but would a developer not have to deal with 001 and 002 when updating the data model (and the mapping) to, say, support some new feature?
For me personally to go with something like that, it would have to be a pretty sizable saving - maintainability is really gonna suffer.
Again you are considering only a single installation. If Raven is only to be used by you for a single application then these figures are correct. If however Raven is used many people and of all of those there are only 100 of those applications which meet the criteria you mention above then that is a total cost of $48,000 per annum.
$480 per annum may not sound like much, but when you are writing a developer tool which is (hopefully) going to be used by hundreds (if not thousands) of people then $48,000 or even $480,000 per annum is a figure that is much more like the total cost per annum to your customers for using your tools.
Just trying to do what you yourself are doing, adding some "real life" perspective on it. You can't just look at 1 customer's app and say "it's inexpensive", you have to look at the big picture.
The short version.
Application developers judge the cost of developing a feature for their single app. Application tool developers need to judge the cost to all their customers combined - you are comparing apples and pears.
Peter,
a) They are talking about a single installation. They are a service provider.
2) Those numbers that they give are across all customers.
Oren, but are they 1 of YOUR customers talking about the installation of one of their apps?
What I am saying is that as a provider of tools you have to consider how much it costs all of your customers combined.
Peter,
We aren't talking about my stuff. We are talking about the scenario shown in the post that I linked to.
Where they are using MonogoDB to store customer data.
I once was in a ssimilar situation. Because we were running on 10$/month shared hosting I had to turn nvarchars into varchars to save space ;-) I reverted that immediately once we got our own server.
Ah, my mistake then. I was under the impression that you were using it to justify repeating identifier names in Raven rows.
"A much better solution would have been to simply put the database on a compressed directory, which would slow down some IO ..."
I don't agree.
Compression needs CPU. We got a lot of more IO by switching on compression (it's just less to write and read). Previous our CPU was about 40%, now averaging at 70%. Compression rate safes us about 30% per file. After switching on compression our IO bound application was about 20% faster.
We are currently planning switching on compression on all our production servers over christmas, because using cpu-cores for compression is even cheaper than adding hard disks and raid for performance.
Chris,
That is a very good point, I'll put out a new post about that.
Another solution: stop doing pointless micro-optimisations. If this leaves the DBAs with nothing to do, fire them - that's a real saving!
Comment preview