With Scalding, SoundCloud accelerated their application development and simplified deployment for their critical data applications on Hadoop. SoundCloud used Scalding data workflows to handle data processing and incorporate a custom ranking algorithm used for their highly visible data products.
By adopting Cascading, AdMobius accelerated their application development for their critical enterprise data applications. The development team utilized Cascading workflows to handle the data processing, incorporate a custom scoring algorithm, and write out results to HDFS and MySQL.
DataSong Software replaced their SAS based prototype with Cascading and realized higher productivity, testability, and scalability of it’s application. Developers did not have to learn a new syntax or programming model, and thus delivered a higher quality product with faster time to market.
Solusi247 integrated Cascading into it’s ETL application used to analyze call data records for it’s telco customers. Developers are able to provide differentiated functionality in it’s application, while under the hood, Cascading creates optimized MapReduce jobs.
The flexibility in extending Cascading functionality provides Scale Unlimited the power to analyze a wide range of data types and allows its developers to quickly create applications in Java to complete data workflows for client projects much faster than writing in raw MapReduce.
Razorfish chose Cascading running on Amazon EMR instead of traditional data management systems to deploy their targeted advertising system for e-commerce customers. This enabled Razorfish to handle extremely large data sets at low infrastructure costs, while being able to rapidly scale the application infrastructure on demand.
Using Cascading deployed on Amazon Elastic MapReduce, Ion Flux created a scalable gene sequencing and realignment algorithm to analyze massive amounts of data. Ion Flux is able to readily hire Java programmers instead of MapReduce experts and focus on genomic problems rather than infrastructure technicalities.
Telenav has built a sophisticated machine learning framework using Cascading. Complex analytics algorithms were broken down into a collection of basic data flows and integrations. Developers build upon these reusable process flows to quickly develop new applications.
Cascading is the core component of Trulia’s data processing pipeline for data cleansing, entity resolution, and metadata extraction. This helps Trulia provide higher quality content with the newest listings and the best deals in their customers’ markets.
Cascading provides data scientists at The Climate Corporation a solid foundation to develop advanced machine learning applications in Cascalog that get deployed directly onto Amazon EMR clusters consisting of 2000+ cores. This results in significantly improved productivity with lower operating costs.
Etsy runs over 50 Cascading applications daily to study customer behavior and product sales. Programming in JRuby, Etsy can quickly test and create new applications on its e-commerce site that helps it acquire new customers and sell more products.
Airbnb uses Cascading because it provides developers more control when conducting advanced data analysis workflows, data normalization and cleansing. With Cascading applications are easier to test and developers are more confident that applications will work.
Twitter has invested heavily in making Cascading a key component of their data analytics infrastructure. Cascading enables Twitter engineers to create complex data processing workflows in their favorite programming languages easily as well while providing the scalability to seamlessly handle terabytes and petabytes of data.