Big Data: How to “Write Once and Deploy” across Big Data Fabrics

Dick Weisinger, Formtek
May 13, 2014
http://formtek.com/blog/big-data-how-to-write-once-and-deploy-across-big-data-fabrics

Cascading is a Java-based framework for big-data/data-centric application development created and supported by Concurrent. The framework abstracts and hides complex implementation details involved in the writing of big data applications. Cascading is used by companies like eBay, Linkedin, the Climate Corp and Twitter. The framework has seen broad adoption across many different industry segments that include Finance, Telecom, Marketing, Entertainment, and Enterprise IT. We covered the 2.5 release of the framework last November.

Concurrent announced the release of the Cascading 3.0 framework today. This newest release expands the support for the underlying data engines which can be plugged into the framework. Cascading 3.0 will immediately support out of the box the local in-memory fabric, which has been in Cascading since the beginning. Apache MapReduce, which has been the default and the foundation on which Hadoop is been built, will be available too. Beyond that, the 3.0 release now also supports Apache Tez, and in the very near future support will be added for Apache Spark, followed by support for Apache Storm.

Gary Nakamura, Concurrent CEO, said that “what we’ve done in Cascading 3.0 is that we’ve allowed for data applications to execute on different fabrics. So essentially we’ve made Hadoop and MapReduce, and Spark and Tez an implementation detail of the framework. The enterprise can now develop these applications unencumbered without having to think about latency or scale, and then simply pick the modality on which they want to deploy their application.”

Cascading has become a top choice for building data-centric applications. Since mid-2012, the Cascading framework has seen 10 percent month-over-month growth. The number of software downloads has gone from roughly 20,000 per month to more than 150,000 downloads per month, and as a result, there are now more than 7000 deployments of the Cascading framework.

By using Cascading some of the risk of data-centric application development can be taken out of the equation. Cascading provides a stable API and framework on which to build data applications and, once it’s in place, then it’s possible to swap in or out whichever underlying data engine or fabric that you want to deploy on.

Nakamura commented that “for anyone developing on Cascading today, it will become very easy them to migrate their data-centric applications to new computation fabrics when the enterprise is ready to upgrade their Hadoop distribution. They could standardize on one API to solve a variety of business problems. ISVs can now leverage Cascading as the interface between their value-added solution and Hadoop or Spark or Storm without having to write directly to each of the different fabrics for the different modalities that they want to offer to their end customers. This translates to other data apps that are built on top of Cascading and they will benefit from this portability.”

What types of organizations are benefiting from using Big Data? Nakamura thinks that businesses which can see the value in Big Data and then deeply architect the use of it in their enterprise applications will benefit the most. ”End users that are just stopping at ad-hoc have a hard time having conversations with their bosses about budgets going forward. The ones that are building and operationalizing data inside of Hadoop and are standing up enterprise applications and consistently delivering data products to their end users” are seeing the rewards that can be derived with Big Data. ”It’s not necessarily a conversation about how much money we are making off of this data product or how much money are we saving because of it. It’s more transformative.”