Cloud Engineering
Data Engineering on Google Cloud
Gain practical expertise in implementing core machine learning processes on Google Cloud in this comprehensive four-day course. Learn to design and build efficient data processing systems that power cutting-edge ML solutions
4 Days

Target Audience
Perfect for developers who:
Manage ETL processes to ensure clean, validated data.
Design robust data pipelines and architectures.
Integrate analytics and ML into workflows seamlessly.
Query datasets, visualize insights, and deliver impactful reports
What you'll learn
- Data Processing System Design
- Learn to design and build robust data processing systems on Google Cloud tailored to meet diverse business needs.
- Batch and Streaming Data Processing
- Implement autoscaling data pipelines using Dataflow to handle both batch and streaming data efficiently.
- Unstructured Data and Machine Learning
- Work with unstructured data using Spark and ML APIs on Dataproc to enhance data processing and analysis.
- Real-Time Data Insights
- Enable instant, actionable insights from streaming data to support time-sensitive decision-making.

Prerequisites for Success
Prerequisites for Success
Participants should have completed Google Cloud Fundamentals: Big Data & Machine Learning or have equivalent experience. Additional prerequisites include:
Basic SQL proficiency.
Experience with data modeling and ETL.
Application development skills in Python.
Familiarity with machine learning or statistics.

COURSE AGENDA
Introduction to Data Engineering
- Explore data engineering challenges and solutions.
- Learn about data lakes, data warehouses, and transactional databases.
- Understand data governance and access management.
- Build production-ready data pipelines.
- Lab: Perform data analysis using BigQuery.
Building a Data Lake
- Understand the role and structure of data lakes.
- Learn storage and ETL options on Google Cloud.
- Build and secure a data lake with Cloud Storage.
- Explore relational data lakes with Cloud SQL.
Building a Data Warehouse
- Dive into modern data warehouse concepts.
- Get started with BigQuery: loading data and exploring schemas.
- Optimize schemas with partitioning and clustering.
- Labs: Load data into BigQuery and work with JSON and array data.
Introduction to Building Batch Data Pipelines
- Learn the differences between EL, ELT, and ETL processes.
- Address data quality considerations in pipelines.
- Use ETL to resolve data quality issues.
Executing Spark on Dataproc
- Explore the Hadoop ecosystem and how it integrates with Google Cloud.
- Run optimized Apache Spark jobs on Dataproc.
- Lab: Execute Spark jobs using Dataproc.
Serverless Data Processing with Dataflow
- Understand why customers value Dataflow for real-time and batch processing.
- Learn Dataflow pipelines, including aggregation, side inputs, and windowing.
- Labs: Build Dataflow pipelines with Python/Java, including MapReduce and side inputs.
Manage Data Pipelines with Cloud Data Fusion & Cloud Composer
- Build batch data pipelines visually with Cloud Data Fusion.
- Use Cloud Composer (Apache Airflow) to orchestrate workflows.
- Labs: Build pipelines with Data Fusion and explore workflow orchestration with Composer
Production ML Pipelines
- Explore ML workflows on Google Cloud using Vertex AI Pipelines and AI Hub.
- Lab: Run production-ready ML pipelines on Vertex AI.
Custom Model Building with AutoML
- Explore the power of AutoML for building vision, NLP, and tabular models.
- Understand how AutoML simplifies the machine learning process.