Scalding is an extension to Cascading that enables application development with Scala, a powerful language for solving functional problems. Made popular by Twitter, it is now used at leading Internet sites and enterprises to build data applications: ETL, recommendation engines, and other predictive applications.
This class will make you a productive Scalding developer in one day — you will learn how to write Hadoop applications in Scala and run them both in a fast local mode, convenient for development and testing, and on a Hadoop cluster. A rich Scalding API provides a wealth of functions, for manipulating the data, which we cover with many examples. We also introduce Test Driven Development practices for Scalding, external data stores API that allows us to access data from various relational and NoSQL stores. Next is the Matrix operations API, useful for advanced Big Data problems. We conclude with summary of best practices for Scalding.
The course has numerous examples and exercises – expect to do a lot of programming! Hands on exercises represent about 50% of the course.
The primary audience is developers who have a working knowledge of Scala and who want the ability to build data applications on Hadoop.
- Working knowledge of Scala basics, particularly Scala collections (experience with Cascading is not necessary)
- Familiarity with key Big Data concepts: MapReduce and Hadoop, but no hands-on experience required
- Before class, students are expected to have:
- Set up Java SDK 1.6 or 1.7, Scala, and SBT
- To use an IDE, install Scala IDE for Eclipse or IntelliJ with the Scala plugin
- Bring your own laptop with the minimum specifications:
- 4 GB RAM minimum, 8 GB RAM preferred
- 5 GB Disk Space
- Scala and its key features for data development
- Scalding processing model and APIs
- Taps, pipes, sinks, and more!
- Scalding: A Scala DSL for Cascading
- Executing in local mode and on Hadoop
- Working with Matrix API
Scalding and External Data Stores
- Working with SQL stores
- Working with NoSQL stores
Developing with Scalding
- Developing applications with Scalding
- Map-like operations
- Join operations
- Pipe operations
- Grouping and reducing
- Group operations
- Working with Scalding type-safe API
Scalding Patterns and Best Practices
- Scalding Patterns
- Test Driven Development with Scalding
- Scalding Best Practices