DataSong Software

Key Takeaways

DataSong Software replaced their SAS based prototype with Cascading and realized higher productivity, testability, and scalability of it’s application.  Developers did not have to learn a new syntax or programming model, and thus delivered a higher quality product with faster time to market.


DataSong chose Cascading for use with Hadoop to streamline data manipulation, allow development of reusable components and bring their new application to market faster. The company has integrated Cascading as an integral part of its own product. DataSong also chose to leverage the Rackspace Cloud to instantly provision as much or as little capacity as needed for customer jobs. DataSong also uses Cascading to plan and manage complicated jobs executed on Hadoop clusters. Each customer job is run separately, with the data kept segregated.

Cascading made coding for Hadoop easier and more repeatable and was a much better fit for DataSong than raw MapReduce or tools like Pig and Hive that lacked flexibility. Many of the typical data manipulation scenarios were already thought through by Cascading’s authors, which accelerated the company’s development process. In addition, Cascading made it easy to create reusable components. This reusability was key, since the company wanted to be able to use one tool for many customers and allow for fast turnaround of new features.  In addition, Cascading allows other developers to look at the existing code, understand what it’s meant to do and make changes or reuse the code elsewhere.

During product development, DataSong ported existing prototype code from SAS to Hadoop/Cascading.  They called on Concurrent to train their developers and analysts on Cascading so they could become productive very quickly.


Cascading delivers faster time to market for new products and features by allowing multiple developers to work on the same code base and create reusable components.  Developers can also write in Java and don’t have to think in MapReduce which saves valuable development time. Cascading also provided far more flexibility than either Pig or Hive and didn’t require developers to learn a new syntax, making development quick and easy.

Customers of DataSong will benefit from Cascading’s contributions to the revenue attribution and customer level response modeling modules. These modules identify the most profitable channels for every customer and guide marketers on where to spend promotional dollars most efficiently. Using Cascading, DataSong is able to evaluate data from multiple marketing and purchasing channels to understand customer-level response and generate actionable customer lists for each.

“Cascading is an important part of our DataSong product,” noted Brandon Mason, VP of Product, DataSong. “Our customers are global brands who need to analyze huge volumes of data to guide their marketing activities. With Cascading, we were able to develop a high volume data manipulation and analysis tool quickly and efficiently. The reusable components we created with Cascading will also speed time to market for future products and features.”

In the future, DataSong plans to leverage Cascading to help develop new products or add features to existing products.  Cascading is now their internal standard tool for data manipulation, a key piece of their arsenal for building products that create business value from data.