If your team is new to Cascading, the fastest way to learn about it is with Cascading Quick Start training program.
The Quick Start program is a two-day training event that teaches you to apply Cascading and related tools to develop your ETL and other data applications. In addition to covering the core fundamentals of Cascading, this course will teach how to use Cascading to operationalize Hadoop applications. In addition, students will learn and apply best-practices to develop Cascading applications that are deployed successfully the first time!
This is a hands-on class, with 50% of time spent in labs showing how to quickly build and deploy applications for Cascading Hadoop. Learn best-practices to develop applications that are scale-free, reusable, testable, and qualified for stringent enterprise operational needs. The class provides enterprise Java developers with the skills and examples to build data applications on Hadoop — from ETL to predictive algorithms.
The primary audience is Java developers and architects working in enterprise companies. The course is suitable for Java Developers of all levels.
- Enterprise Java development experience
- Before class, students are expected to have:
- Set up Java SDK 1.6+ and Gradle (version 1.x)
- Set up VirtualBox and Vagrant to be able to install a 1-node Hadoop cluster
- Install Eclipse or IntelliJ to use an IDE
- Bring your own laptop with the minimum specifications:
- 4 GB RAM minimum, 8 GB RAM preferred
- 5 GB Disk Space
- Ability to connect to public maven repository
- ETL processes to clean and prepare data
- Processing data with filters, functions, aggregators and buffers
- Creating custom filters, functions, aggregators and buffers
- Merging and splitting data to optimize processing
- Applying different types of Joins depending on data characteristics
- Best practices using Cascading