The best argument for scale out
I am writing a presentation, and I thought it would be interesting to get some numbers:
Server | Cost |
PowerEdge T110 II (Basic) – 8 GB, 3.1 Ghz Quad 4T | $1,350.00 |
PowerEdge T110 II (Basic) – 32 GB, 3.4 Ghz Quad 8T | $12,103.00 |
PowerEdge C2100 - 192 GB, 2 x 3 Ghz | $19,960.00 |
IBM System x3850 X5 – 8 x 2.4 Ghz, 2048 GB | $645,605.00 |
Blue Gene/P – 14 teraflops, 4,096 cpus | $1,300,000 |
K Computer (fastest super computer) - 10 petaflops, 705,024 cores, 1,377 TB | $10,000,000 annual operating cost No data on actual cost to build |
And then what? |
Comments
Then you start reading Jacek Dukaj with his idea of Ultimative Inclusions:) http://dukaj.pl/English
But hardware is only a minor component of overall service cost; power & cooling, datacenter rack space, networking equipment, software licensing and management costs. All these rise linearly with the # of machines and (almost) orthogonally to the processing power of each machine. And that doesn't even consider the huge added complexity (=cost!) of developing and running distributed software.
The bottom line is that there are a few textbook scenarios where scale-out is clearly superior, and many others where scale up or a mixed approach is more effective. And even then the cost-effectiveness usually isn't the deciding factor.
So best argument for scale-out is hw cost ? Maybe. But then it is also the best argument for scale-up.
We are missing context here. For what kind of apps ? Business apps, web sites ? Yes, I 'd agree.
But then, I doubt that Blue Gene runs web sites :))
Most people never need more power than a single poweredge pizza box server. Mind you, stackoverflow ran for quite some time on 3 of those ( think they still do, not sure)
10 Dell R610 IIS web servers (3 dedicated to Stack Overflow): 1x Intel Xeon Processor E5640 @ 2.66 GHz Quad Core with 8 threads 16 GB RAM Windows Server 2008 R2 2 Dell R710 database servers: 2x Intel Xeon Processor X5680 @ 3.33 GHz 64 GB RAM 8 spindles SQL Server 2008 R2 + HAProxy servers + Redis servers + .....
http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html
IBM Sequoia in 2012 - 20 Petaflops - which is equivalent to the processing power of the human brain
Yeah, kinda looks more like an argument for scale-up. to get 705,034 cores with the poweredge C2100 you're looking at $1,172,706,553.00 or $40,878,080.00 to get 4096 CPUs with C2100...
Of course, assumes you need to scale to that amount of power at least sometime...
@Josh Reuben: whose brain :)) You think that if I get a job in IBM they would pay me accordingly :D
@petar I'd be happy to sell 50% of my brain's processing power to some data center. It's an used brain with far from perfect condition so maybe it has only 2 or 3 petaflops left, but still this should be worth few millions a year.
Schooletz, I am not familiar with him, what is his Ultimate Inclusion?
Addys, 1 EC2 machine (large) for 1 year - 1,756.8 $ 100 EC2 machines (large) for 1 year - 175,680 $
Seems to be a pretty linear scale to me. Actually, the more you use, the better deal you can get.
No one said it is going to be cheap, but it is usually more cost effective.
Yes, distributed programming is more complex, but even if you are on a single machine, it isn't like you can assume only a single thread is running, and a lot of the same issues you have to deal with are there anyway.
Strongly agree, I use a chart similar to this in my advanced networking class to show precisely why scaling out is often far better than scaling up after a certain point. The only problem is that a LOT of devs just have no idea precisely how to scale out and how to write that type of software, and no clue that mathematical landmines are waiting for them that would actually render their entire architecture unusable (not an opinion or observation, there's a formal mathematical proof for these). Really need to take a look at optimistic concurrency models and serial equivalence of transactions, CAP, the Fischer Consensus problem, and queuing theory and truly understand that algorithms that suffer from and address these problems. If not, you're opening yourself up to synchronization problems, deadlocks, consistency issues, and weird one-off Heisenbugs that aren't actually just edge cases and will utterly destroy you outside of a test environment because they don't repro regularly in test but the rate at which those bugs scale is exponential so you see a ridiculous number of them start occurring in a big hurry. In conclusion, it's call computer science for a reason, so start sciencing! (and if sargable is a word, the sciencing is a word too)
re @Addys "All these rise linearly with the # of machines...", and isn't it a point that precisely supports the original argument? The figures on this post show that scaling-up rises costs exponentially. Linear costs seem quite a jolly good bargain when it comes to scalability.
I'm all for scale out, even more so for cloud.
Especially in a web scenario there are diminishing returns for scale up performance, Scale out performance, is nearly linear, that's in addition to added redundancy etc.
The additional benefit cloud presents is "Scale fast, fail fast". It's not the ability to turn on servers quickly, it's the ability to discard them when they're no longer needed.
He's a Polish hard SF writer:) The Ultimative Inclusion is the optimal computer you can get in a universe based on given physics. To create a better one, you need to create a new universe with 'better' constants. Each of his novels is a masterpiece. If you like SF (and I think you do), it's a very good position for your to-read list :)
@Scooletz - Dujak looks very interesting. Are there any english translations available?
@Sean, unfortunately not. Dukaj said on convention that for now every negotations with foreign publishers did not finished in contracts and for now, and he does not predict any to be successfull in near future :(
Comment preview