With Cascading 3.0, application developers can operationalize the Hadoop ecosystem

SHARE: Facebooktwittergoogle_pluslinkedinmail

Maria Deutscher, Silicon Angle
May 13, 2014

Concurrent, an up-and-coming startup working to simplify the creation of data-driven applications, has pulled the curtains back on a revamped version of its flagship development framework that facilitates integration across the full spectrum of technologies in the Hadoop ecosystem to enable an entirely new set of use cases.

Cascading, as the San Francisco-based company’s software is called, is available under an Apache license and serves as an abstraction layer between Hadoop and the applications using it, shielding developers from the inherent complexity of MapReduce. The third release, introduced this morning, extends that simplicity to the dozens of complementary open source components available for the batch processing platform in order to make the capabilities of those tools more accessible to enterprise applications.

The new functionality represents a major step forward towards Concurrent’s vision for democratizing analytics, which is founded on the classical notion that business logic must to be decoupled from the code that handles information.

“Building applications on top of Hadoop was very difficult. That’s why our founder Chris Wensel created a framework so you could have a separate business logic layer from the data layer, and it’s written in Java so any Java programmer can pick it up,” Guy Nakamura, the CEO of Concurrent, explained to SiliconANGLE in an exclusive interview on theCUBE during O’Reilly Fluent Conference 2013.

“The requirement for the enterprise is not to learn new skills for Hadoop but to leverage existing skills, existing systems and existing investments they already made in their infrastructure,” Nakamura added. Cascading now delivers that abstraction for the various specialized tools in the Hadoop ecosystem as well through both direct and indirect support.

Out-of-the-box, the framework is compatible with Tez, a distributed execution engine that offers superior performance to MapReduce with lower latency, a combination that is especially useful for fast-moving streaming workloads such as sensory data. Other technologies can be plugged into Cascading utilizing a new built-in query planner that Concurrent said will be used to add support for two additional Apache projects in the near future.

One of the items on the list is Spark, a separate implementation of the concepts detailed in the 2007 Microsoft research paper Tez is based on that is generally considered more mature and better suited for production as a result. The firm said that Cascading will eventually also work with Storm, a third real-time processing framework that was open sourced after Twitter acquired original developer BackType.

Of particular note is that, through the integrations and a new local caching function, the latest version of framework allows for in-memory processing. That is significant because, as Wikibon co-founder and chief analyst Dave Vellante explained in recent segment on theCUBE, eliminating the overhead associated with retrieving information from disk can improve application performance by several orders of the magnitude, removing the I/O limitations that have historically prevented developers from taking full advantage of their data.

SHARE: Facebooktwittergoogle_pluslinkedinmail