Tune and monitor the cluster
A single bad stick of RAM in one machine can make an entire cluster sluggish. When you’re building your applications and your Hadoop cluster itself, you’ll want to be sure you’re able to monitor your jobs all the way through the process. Chris Wensel, CTO and founder of Concurrent, said that you and your team have some important decisions to make as you’re designing your processes and your cluster.
Wensel said that, overall, “reducing latency is your ultimate goal, but also reducing the likelihood of failure. The way these technologies were built, they weren’t intended for operational systems.” As such, it is only recently that Hadoop and its many sub-projects have even added high availability support for the underlying file system.
That means Hadoop can still be somewhat brittle. Wensel said teams must first “decide if your application is something with an SLA. Is it something that has to complete in two hours every day, or every 10 minutes? Is it something you don’t want to think about at 10 p.m. when the pager goes off? If it’s an application that’s driving revenue, you need to really think about that. If you decide it has an SLA, you need to adopt some structural integrity in the application itself.”
Be ready for change
This is a two-sided tip, as it pertains to your cluster, and to Hadoop as a whole. On the micro scale, be sure you keep in mind the fact that your application is going to change once it hits the live data. Said Concurrent’s Wensel: “The other side of the problem is that as you’re developing an application, as you get larger and larger data sets, your application changes. It is a challenge to build an application and have it grow to larger data sets. Be very conscious of the fact that things are changing.” – See more at: http://sdtimes.com/getting-a-handle-on-hadoop/4/#sthash.BFJyRsNc.dpuf