Telenav has built a sophisticated machine learning framework using Cascading. Complex analytics algorithms were broken down into a collection of basic data flows and integrations. Developers build upon these reusable process flows to quickly develop new applications.
To meet these goals, TeleNav chose Cascading, an open source data processing, integration and process scheduling API for creating Enterprise-grade data and analytics applications on Apache Hadoop. Cascading solved all of the immediate manageability issues with MapReduce jobs and parallelized processing so complex, compute-intensive jobs finish in a reasonable amount of time. In contrast to other options, Cascading also had better integration with Java, automatic process flow optimization and the ability to easily transfer data to traditional databases and HBase.
TeleNav used Cascading to develop all the basic flows in its machine learning framework and integrate to a variety of data sources. Cascading ‘Pipes’ and ‘Taps’1 are components of data processing flows, and Cascading’s ‘Tuple’2 concept is used extensively for data handling. The developers also implemented optimized binary files to store interim data which can be then be easily transferred or accessed by multiple flows.
This Cascading processing model allowed TeleNav to create sophisticated, reusable machine learning frameworks that are independent of the integration complexities that normally bottleneck project development. The result is a flexible, scalable system capable of supporting business needs as they arise.
“Cascading was used to create all the basic flows and data integrations in our machine learning framework,” noted Pramod Narasimha, Senior Software Engineer at TeleNav . “We can now build on top of these reusable process flows easily and quickly. Cascading manages and optimizes the MapReduce jobs automatically.”
Using Cascading, TeleNav has built a flexible, sophisticated machine learning framework that will be the basis for future development.
1 A ‘Pipe’ is a data process flow, a ‘Tap’ is a connection to a data source.
2 A ‘Tuple’ can be considered the same as a database record where every value is a column in that table.