What is the cost of storage, again?

time to read 3 min | 447 words

So, I got a lot of exposure about my recent post about the actual costs of saving a few bytes from the fields names in schema-less databases. David has been kind enough to post some real numbers about costs, which I am going to use for this post.

The most important part is here:

  • We have now moved to a private cloud with Terremark and use Fibre SANs. Pricing for these is around $1000 per TB per month.
  • We are not using a single server – we have 4 servers per shard so the data is stored 4 times. See why here. Each shard has 500GB in total data so that’s 2TB = $2000 per month.

So, that gives a price point of 4 US Dollars per gigabyte.

Note that this is a pre month cost, which means that it is going to cost a whopping of 48$ per year. Now, that is much higher cost than the 5 cents that I gave earlier, but let us see what this gives us.

We will assume that the saving is actually higher than 1 GB, let us call it 10 GB across all fields across all documents. Which seems a reasonable number.

That thing now costs 480 $per year.

Now let us put this in perspective, okay?

At 75,000$ a year (which is decidedly on the low end, I might add), that comes to less than 2 days of developer time.

It is also less than the following consumer items

  • The cheapest iPad - 499$
  • The price of the iPhone when it came out - 599$

But let us talk about cloud stuff, okay?

  • A single small Linux instance on EC2 – 746$ per year.

In other words, your entire saving isn’t even the cost of adding the a single additional node to your cloud solution.

And to the nitpickers, please note that we are talking about data is is already replicated 4 times, so it already includes such things as backups. And back to the original problem. You are going to lose more than 2 days of developer time on this usage scenario when you have variable names like tA.

A much better solution would have been to simply put the database on a compressed directory, which would slow down some IO, but isn’t really important to MongoDB, since it does most operations in RAM anyway, or just implement document compression per document, like you can do with RavenDB.