Airbnb uses Cascading because it provides developers more control when conducting advanced data analysis workflows, data normalization and cleansing. With Cascading applications are easier to test and developers are more confident that applications will work.
Airbnb chose Cascading for their ETL processes. Cascading gives them more control over the underlying MapReduce jobs and allows them to write custom code easily and in a single step. The development team found the Cascading API very easy to use, and the company already has dozens of tasks that run using it.
The analytics system includes both automated and manual functions. First, complicated infrastructure tasks including data normalization and cleansing is done by applications written using Cascading. Cascading is also used to reconstruct corrupted files and combine multiple data files into one. In combination with Cascading, Pig and Hive are used by analysts to run batch scripts to perform ad hoc analysis. With these tools, analysts are able to study data important to their business such as click-through rates, page statistics, drop-off rates and number of bookings. The analysts create queries for other interesting indicators, such as comparing regional performance and identifying potential problems with the site.
Airbnb uses Cascading for infrastructure work and any functions that are more complex. Because it’s an API instead of a language, custom code can be written in a single step, without compiling or the need for a language-specific wrapper. Cascading also provides much more control over underlying MapReduce jobs and when analyzing user flows on the website. In development, Cascading simplifies the testing process for custom code and provides Airbnb with more confidence that the application will work as intended when deployed. Because of these advantages, the Airbnb team is much more productive writing custom applications in Cascading and is able to react more quickly to business requirements.
Airbnb has been growing rapidly – seeing over 5 million guest nights booked since the company’s founding in 2008 and with over 4 million guest nights booked in the last 12 months alone. As noted by Florian Leibert of Airbnb, “things happen fast here and we have to make changes on the fly. We’ll definitely be using Cascading for more projects. We’ve only just gotten started taking advantage of all that it can do for us.”