Cascading on Apache Tez -- Delivering on the promise of next generation compute

Gary Nakamura, Concurrent, Inc.
Sep 23, 2014
http://hortonworks.com/blog/cascading-on-apache-tez

Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.

In this guest blog, Gary Nakamura, CEO at Concurrent, talks about Concurrent’s recent milestone and the road ahead.

The “developer release” of Apache Tez is here, and we are happy to re-affirm our support for the community and the project.

Concurrent, the team behind Cascading, would like to add our congratulations to the Apache Tez community on achieving this milestone. This is an important project for the broader ecosystem, and we expect to see Tez continue to move forward quickly.

What Cascading on Tez Means for ISVs
It’s early days for Cascading on Apache Tez. Simultaneously delivering on performance, scale and reliability is no small feat, but we see Hortonworks and the Apache Tez community delivering on the promise of a next generation compute engine.

Cascading and Tez together represent another important milestone, providing users and independent software vendors (ISVs) the flexibility to quickly build their data apps and then choose the appropriate compute engine for the business problem at hand (in-memory, batch mode, streaming or otherwise).

This week we announced that the latest Cascading 3.0 WIP adds Apache Tez as a supported runtime platform. This was a significant milestone for Cascading in its own right as we delivered a pluggable query planner to make this support possible. With this release, Cascading users can start testing their existing applications on the Apache Tez compute engine.

Thousands of enterprises around the world will welcome a more efficient, high-performance compute engine – one that delivers the reliability and scale that they are accustomed to and one that will allow them to easily and seamlessly migrate their business-critical data applications. Tez has promised this and that commitment stands to benefit the entire Hadoop ecosystem.

What’s Next?
From here, we will work closely with the Tez community to run performance and scalability tests, and capture feedback from new and existing users. We will also work with the broader Cascading community to migrate Scalding, Cascalog, Lingual and Pattern to Apache Tez.

This is a big win for the community, our contributors, partners, ISVs and for enterprises driving their data strategy and next-generation data applications on Hadoop.

We share an unwavering commitment to developer productivity, ease of deployment, ease of manageability, and above all, innovation for the future of data app development.

At the end of the day, we are all in the data business.

Resources for Cascading 3.0 on Apache Tez

Download Cascading 3.0 WIP and its documentation: http://www.cascading.org/wip
Cascading on Apache Tez Notes: https://github.com/cwensel/cascading/tree/wip-3.0/cascading-hadoop2-tez
Sample Applications: https://github.com/Cascading/cascading.samples/tree/wip-3.0

Performance Management for Big Data Apps

Cascading on Apache Tez — Delivering on the promise of next generation compute