Machine Learning
Data Integration with Cloud Data Fusion
Discover the power of Cloud Data Fusion! Learn everything about its components, capabilities, and how it streamlines data integration and management across various sources and formats
2 Days

Target Audience
Who should attend?
Data engineers and data analysts looking to streamline data integration and management on Google Cloud
What you'll learn
Cloud Data Fusion Overview:
Explore its features, components, and use cases for data integration.
Pipeline Design & Execution:
Build, transform, and monitor batch and real-time pipelines with Wrangler.
Integration & Governance:
Seamlessly integrate diverse data sources while ensuring metadata and lineage traceability.

Prerequisites for Success
Prerequisites for Success
Completion of Google Cloud Fundamentals: Big Data and Machine Learning or equivalent knowledge in cloud data and machine learning is recommended

COURSE AGENDA
Introduction
- Course Objectives: Understand the goals and expected outcomes of the course.
Introduction to Data Integration & Cloud Data Fusion
- Understand the importance of data integration, the challenges it addresses, and the roles involved. Explore industry tools and discover Cloud Data Fusion’s capabilities as an effective integration platform. Familiarize yourself with its UI components for building and managing pipelines.
Building Pipelines
- Cloud Data Fusion Architecture: Understand the architecture behind Cloud Data Fusion and how it supports scalable data integration.
- Core Concepts: Learn the core concepts that drive data integration in Cloud Data Fusion.
- Data Pipelines & DAGs: Understand data pipelines and directed acyclic graphs (DAG) used for building efficient workflows.
- Pipeline Lifecycle: Learn about the stages of a pipeline’s lifecycle, from design to execution.
- Designing Pipelines: Use Pipeline Studio to design and build data pipelines in Cloud Data Fusion.
Designing Complex Pipelines
- Branching, Merging & Joining: Learn how to branch, merge, and join different components of a data pipeline.
- Actions & Notifications: Set up actions and notifications to automate tasks and alert users on certain events.
- Error Handling & Macros: Implement error handling strategies and use macros to enhance pipeline flexibility.
- Pipeline Configurations: Explore the configurations for scheduling, importing, and exporting data within pipelines.
Pipeline Execution Environment
- Schedules & Triggers: Learn to set up schedules and triggers to automate pipeline execution.
- Execution Environment: Understand the components of the execution environment, including
Building Transformations & Preparing Data with Wrangler
- Wrangler: Understand how Wrangler helps in transforming and preparing data for downstream processing.
- Directives: Learn to use directives in Wrangler for data transformations.
- User-Defined Directives: Create custom directives to meet specific data transformation needs.
Connectors & Streaming Pipelines
- Data Integration Architecture: Understand the overall data integration architecture of Cloud Data Fusion.
- Connectors: Explore the various connectors available for integrating data from different sources.
- Cloud DLP API: Learn to use the Cloud Data Loss Prevention (DLP) API to protect sensitive data.
- Streaming Pipelines: Understand the reference architecture for streaming pipelines and how to build and execute them in Cloud Data Fusion.