On Hadoop

Feb 15 2009

On Hadoop

time to read 2 min | 207 words

Yesterday or the day before that I read the available chapters for Hadoop in Action. Hadoop is a Map Reduce implementation in Java, and it includes some very interesting ideas.

The concept of Map Reduce isn't new, but I liked seeing the actual code examples, which made it so much easier to follow what is actually going on. As usual, an In Action book has a lot of stuff in it that relates to getting things started, and since I don't usually work in Java, they were of little interest to me. But the core ideas are very interesting.

It does seems to be limited to a very small set of scenarios, needing to, in essence, index large sets of data. Some of the examples in the book made sense as theoretical problems, but I think that I am still missing the concrete "order to cash" scenario, seeing how we take a business problem and turn that into a set of technical challenges that can be resolved by utilizing Map Reduce in some part of the problem.

As I said, only the first 4 chapters are currently available, and I was reading the early access version, so it is likely will be fixed when more chapters are coming in.

Tweet Share Share 5 comments

Tags:

Books

Comments

15 Feb 2009
11:03 AM

Sasha Goldshtein

You might want to take a look at DryadLINQ ( research.microsoft.com/en-us/projects/DryadLINQ/). It is a framework that extends LINQ to the Dryad distributed execution environment. Basically you write LINQ queries (including action queries) and they are automatically distributed to a cluster.

15 Feb 2009
14:46 PM

Chris Patterson

Hadoop, to me at least, is more than just a MR implementation.

Hadoop includes a number of useful subsystems, including HDFS (the Hadoop File System). HDFS is a distributed, replicated storage that feeds the splitting/grouping parts of the MR process.

I've been looking at HDFS from a purely low-tech way of long term document storage. Since all of the documents are identified by a key, quick retrieval is easy and the data is replicated across cheap machines. Since I could then build access methods on top using MR to get at the data and filter/query the contents, the infrequent projections of data into some sort of document list/report would be easy to build.

I've been spending more time in Java the past few weeks, and it has been nice to just pull down an OS project and use it instead of constantly thinking "Okay, now this is how they did it in Java, maybe I should port it to .NET"

Mind you, I'm not a convert away from .NET, I just a thriving ecosystem of Java open source projects that are helping us get things done without a lot of pain.

15 Feb 2009
18:40 PM

I think most order to cash scenarios don't involve a cluster of processing (though they may be load balanced to some degree) which is why you don't see too many examples like that. The kind of problems google has to solve are very different than most business problems. Unless the business scenario involves huge amounts of data that can't be represented in the normal ways I think you're unlikely to really need all that and the standard stuff will work fine.

16 Feb 2009
01:40 AM

Ayende Rahien

pb,

My point was, I want to see the reasons for why you would do that.

Not how you do it, but what you are doing.

03 Apr 2009
13:03 PM

David

I haven't played around with hadoop yet, but it looks like

Amazon has added hadoop as a option to their offers in the cloud.

See link: aws.amazon.com/.../announcing-amazon-elastic-ma...

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB

On Hadoop

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication