Course Outline
Introduction
Overview of Data Access Approaches (Hive, databases, etc.)
Overview of Spark Features and Architecture
Installing and Configuring Spark
Understanding Dataframes in Spark
Defining Tables and Importing Datasets
Querying Data Frames using SQL
Carrying out Aggregations, JOINs and Nested Queries
Uploading and Accessing Data
Querying Different Types of Data
- JSON, Parquet, etc.
Querying Data Lakes with SQL
Troubleshooting
Summary and Conclusion
Requirements
- Experience with SQL queries
- Programming experience in any language
Audience
- Data analysts
- Data scientists
- Data engineers
Testimonials (5)
A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution
Rafal - Nordea
Course - Apache Spark MLlib
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
practice tasks
Pawel Kozikowski - GE Medical Systems Polska Sp. Zoo
Course - Python and Spark for Big Data (PySpark)
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
This is one of the best hands-on with exercises programming courses I have ever taken.