Hortonworks Keen on Cascading-Tez Combo

Apr 21, 2014
Alex Woodie

http://www.datanami.com/datanami/2014-04-21/hortonworks_keen_on_cascading-tez_combo.html

In the future, it will be easier to build big data applications, and they’ll run faster and utilize more real-time data than today’s apps, too. Two vendors working to make that future a reality, Hortonworks and Concurrent, today announced they’ll work together to build and assemble the next generation of Hadoop apps running on YARN, Tez, and Apache Spark.

Hortonworks and Concurrent have been partners for some time. As one of the central Hadoop players, Hortonworks is well aware of Concurrent and its open-source Cascading development framework, which abstracts away the difficult part of writing MapReduce applications with an easy-to-use Java API and library.

Cascading is one of the success stories of first-gen Hadoop apps. Concurrent boasts more than 6,000 commercial deployments of its Cascading framework, and says customers like Nokia, Kohl’s, and Twitter are using it to simplify development of MapReduce apps on Hadoop. The product is being downloaded about 130,000 times per month, putting it on the cusp of big data rock star status.

With the upcoming launch of Cascading 3.0 in June, Concurrent will add support for Tez and Apache Spark, giving customers powerful new options for developing Hadoop applications beyond the MapReduce paradigm. Hortonworks, which already supports Tez with its Hadoop distribution HDP 2.1 and is currently offering a tech preview of Spark, likes where Cascading is headed and wanted to get ahead of customer demands, according to John Kreisa, vice president of corporate strategy for Hortonworks.

“Given the clear adoption patterns we’re seeing with Hadoop around building various data centric apps and the desire to put those apps into production, it made sense to deepen the relationship [with Concurrent],” Kreisa tells Datanami. “We know our customers want to develop apps. We know Cascading is popular–we see it in our user base. So it just made sense for us to take this next step and include it directly in the platform to accelerate the adoption of Hadoop.”

Previously, the two companies worked together to certify the integration and testing of HDP and Cascading, but it was up to customers to obtain the Cascading code and ensure that it worked. Under the expanded pact, Hortonworks will distribute the Cascading software development kit (SDK) as part of HDP and provide level one and level two technical support for customers; Concurrent will provide level three support.

In early June, Hortonworks will include support for the forthcoming release of Cascading 3.0 as a tech preview in the HDP sandbox environment. It will become generally available (GA) in late summer or early fall, says Tim Hall, vice president of product management for Hortonworks.

Hortonworks is a big believer in how Concurrent is building support for Tez into Cascading 3.0. “Tez is a significant leap forward,” Hall says. “It’s one of the critical things Hortonworks has been investing in from the open community for Hadoop, which is moving this from a batch-centric, mostly serialized approach to accessing data on Hadoop–that was MapReduce 1–and shifting this to a mixed workload environment that runs on YARN.”

The recently launch of HDP 2.1, which enabled Hive to either use the legacy MapReduce execution engine or the new Tez engine, is Hortonworks’ contribution. “Concurrent is going to follow that lead and go down that path as well with the Cascading SDK,” Hall said. “We will likely invest in working with the open source community to move some of these other tools from legacy MapReduce 2 to the next generation, which is Tez.”

Cascading 3.0 will also support Apache Spark, the in-memory framework that’s gaining a ton of momentum as yet another replacement for MapReduce. Hortonworks, whose developers largely spearheaded the development of Tez, is taking a bit of a wait-and-see approach regarding Cascading and its Spark prospects.

“One of the interesting things about the Cascading SDK is it does provide some additional libraries on top of the Java API. One of the ones we’re most interested in is Scalding libraries, which provides us a Scala interface,” Hall says. “Obviously having that access point, and seeing what the interest is in the community of Scala and the relationship of that Scalding SDK and how it does or does not work with Spark, will be something we’ll be looking at very closely with our customers.”