Course not currently available
CPB101: Serverless Data Analysis with BigQuery and Cloud Dataflow Training Course
Course
Online
Description
-
Type
Course
-
Methodology
Online
This 8 hour instructor led course builds upon the CPB100 (which is a prerequisite). Through a combination of instructor-led presentations, demonstrations, and hands-on labs, students learn how to carry out no-ops data warehousing, analysis and pipeline processing.
This class is intended for data analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.
Objectives
Build up a complex BigQuery using clauses, inner selects, built-in functions and joins
Load and export data to/from BigQuery
Identify need for nested, repeated fields and user-defined functions
Understand pipeline processing, terms and concepts
Write pipelines in Java or Python and launch them locally or in the Cloud
Implement Map, Reduce trransforms in Dataflow pipelines.
Join datasets as side inputs
Interoperate Dataflow, BigQuery and Cloud Pub/Sub for real-time streaming
About this course
Google Cloud Platform Big Data & Machine Learning Fundamentals to the level of CPB 100
Experience using a SQL-like query language to analyze data
Knowledge of either Python or Java
Reviews
Subjects
- Java
- Data warehousing
- Data analysis
- Warehousing
- Public
- Export
- Logistics
- Database
- IT
- IT Development
Course programme
The basic thrust is to cover the foundations in Module 2, workloads they could migrate to GCP immediately (i.e., lift-and-shift) in Module 3, and the more transformational things (i.e., what’s next) in Module 4.
Module 0: Welcome [⅓ hr]We assume that attendees may attended CPB100.
- Logistics
- Introductions
A 3 hour (1.5 hours lecture + 1.5 hours hands-on) deep dive into details of BigQuery.
- What is BigQuery?
- Queries and functions + lab
- Load and export data + lab
- Advanced Capabilities
- Performance and pricing
A 3 hour (1.5 hours lecture + 1.5 hours hands-on) deep dive into details of Cloud Dataflow. What is Dataflow?
- Data pipeline + lab
- MapReduce in Dataflow + lab
- Side inputs + lab
- Streaming + demo
- Where to go from here
- Resources
Additional information
CPB101: Serverless Data Analysis with BigQuery and Cloud Dataflow Training Course
