What is the cost of storage, again?

time to read 3 min | 447 words

So, I got a lot of exposure about my recent post about the actual costs of saving a few bytes from the fields names in schema-less databases. David has been kind enough to post some real numbers about costs, which I am going to use for this post.

The most important part is here:

We have now moved to a private cloud with Terremark and use Fibre SANs. Pricing for these is around $1000 per TB per month.
We are not using a single server – we have 4 servers per shard so the data is stored 4 times. See why here. Each shard has 500GB in total data so that’s 2TB = $2000 per month.

So, that gives a price point of 4 US Dollars per gigabyte.

Note that this is a pre month cost, which means that it is going to cost a whopping of 48$ per year. Now, that is much higher cost than the 5 cents that I gave earlier, but let us see what this gives us.

We will assume that the saving is actually higher than 1 GB, let us call it 10 GB across all fields across all documents. Which seems a reasonable number.

That thing now costs 480 $per year.

Now let us put this in perspective, okay?

At 75,000$ a year (which is decidedly on the low end, I might add), that comes to less than 2 days of developer time.

It is also less than the following consumer items

The cheapest iPad - 499$
The price of the iPhone when it came out - 599$

But let us talk about cloud stuff, okay?

A single small Linux instance on EC2 – 746$ per year.

In other words, your entire saving isn’t even the cost of adding the a single additional node to your cloud solution.

And to the nitpickers, please note that we are talking about data is is already replicated 4 times, so it already includes such things as backups. And back to the original problem. You are going to lose more than 2 days of developer time on this usage scenario when you have variable names like tA.

A much better solution would have been to simply put the database on a compressed directory, which would slow down some IO, but isn’t really important to MongoDB, since it does most operations in RAM anyway, or just implement document compression per document, like you can do with RavenDB.

Tweet Share Share 13 comments

Tags:

Design

Comments

23 Oct 2010
18:51 PM

configurator

If you use some kind of mapping the developer never has to see those field names. So what's wrong with it? As far as I'm concerned the names can be "\001" ,"\002", etc. as long as the developer doesn't ever see them.

23 Oct 2010
20:08 PM

Alex

You hit the problem that people raise with the size of column names in your last sentence. MongoDB is in RAM. If you do not have your entire dataset in RAM you will get no where near the performance that MongoDB is touted for and this is exacerbated on EC2 when there are quite a few times per day where disk access can suddenly drop of 400-600ms per request. If you don't have your entire dataset in RAM on EC2 you are not going to get the performance you want. That may not matter for your particular app, but it is an important consideration when trying to understand costs. Its not the disks - its the ram.

23 Oct 2010
20:48 PM

Olav

I'm not exactly sure what kind of mapping you are talking about, but would a developer not have to deal with 001 and 002 when updating the data model (and the mapping) to, say, support some new feature?

For me personally to go with something like that, it would have to be a pretty sizable saving - maintainability is really gonna suffer.

23 Oct 2010
20:51 PM

Peter Morris

Again you are considering only a single installation. If Raven is only to be used by you for a single application then these figures are correct. If however Raven is used many people and of all of those there are only 100 of those applications which meet the criteria you mention above then that is a total cost of $48,000 per annum.

$480 per annum may not sound like much, but when you are writing a developer tool which is (hopefully) going to be used by hundreds (if not thousands) of people then $48,000 or even $480,000 per annum is a figure that is much more like the total cost per annum to your customers for using your tools.

Just trying to do what you yourself are doing, adding some "real life" perspective on it. You can't just look at 1 customer's app and say "it's inexpensive", you have to look at the big picture.

23 Oct 2010
20:56 PM

Peter Morris

The short version.

Application developers judge the cost of developing a feature for their single app. Application tool developers need to judge the cost to all their customers combined - you are comparing apples and pears.

24 Oct 2010
05:48 AM

Ayende Rahien

Peter,

a) They are talking about a single installation. They are a service provider.

2) Those numbers that they give are across all customers.

24 Oct 2010
08:08 AM

Peter Morris

Oren, but are they 1 of YOUR customers talking about the installation of one of their apps?

What I am saying is that as a provider of tools you have to consider how much it costs all of your customers combined.

24 Oct 2010
11:00 AM

Ayende Rahien

Peter,

We aren't talking about my stuff. We are talking about the scenario shown in the post that I linked to.

Where they are using MonogoDB to store customer data.

24 Oct 2010
11:06 AM

tobi

I once was in a ssimilar situation. Because we were running on 10$/month shared hosting I had to turn nvarchars into varchars to save space ;-) I reverted that immediately once we got our own server.

24 Oct 2010
14:53 PM

Peter Morris

Ah, my mistake then. I was under the impression that you were using it to justify repeating identifier names in Raven rows.

25 Oct 2010
14:47 PM

chris

"A much better solution would have been to simply put the database on a compressed directory, which would slow down some IO ..."

I don't agree.

Compression needs CPU. We got a lot of more IO by switching on compression (it's just less to write and read). Previous our CPU was about 40%, now averaging at 70%. Compression rate safes us about 30% per file. After switching on compression our IO bound application was about 20% faster.

We are currently planning switching on compression on all our production servers over christmas, because using cpu-cores for compression is even cheaper than adding hard disks and raid for performance.

25 Oct 2010
14:56 PM

Ayende Rahien

Chris,

That is a very good point, I'll put out a new post about that.

08 Nov 2010
14:35 PM

Richard Nagle

Another solution: stop doing pointless micro-optimisations. If this leaves the DBAs with nothing to do, fire them - that's a real saving!

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB