Key Takeaways

Razorfish chose Cascading running on Amazon EMR instead of traditional data management systems to deploy their targeted advertising system for e-commerce customers.  This enabled Razorfish to handle extremely large data sets at low infrastructure costs, while being able to rapidly scale the application infrastructure on demand.


To deal with the combination of huge datasets and custom segmentation targeting activities, coupled with price sensitive clients, Razorfish decided to move away from their rigid data infrastructure status quo. This migration helped Razorfish process vast amounts of data to handle the need for rapid scaling at both the application and infrastructure levels. Razorfish selected Ad Serving integration, Amazon Web Services (AWS), Amazon Elastic MapReduce (a hosted Apache Hadoop service), Cascading, and a variety of chosen applications to power their targeted advertising system based on these benefits:


Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays. Amazon Elastic MapReduce and Cascading lets Razorfish focus on application development without having to worry about time-consuming set-up, management, or tuning of Hadoop clusters or the compute capacity upon which they sit.

Ease of integration: Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms.

Flexible: Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms.

Adaptable: Cascading simplifies the integration of Hadoop with external ad systems.

Scalable: AWS infrastructure helps Razorfish reliably store and process huge (Petabytes) data sets.

The AWS elastic infrastructure platform allows Razorfish to manage wide variability in load by provisioning and removing capacity as needed. Mark Taylor, Program Director at Razorfish, said, “With our implementation of Amazon Elastic MapReduce and Cascading, there was no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired. We completed development and testing of our first client project in six weeks. Our process is completely automated. Total cost of the infrastructure averages around $13,000 per month. Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.”