Machine Learning- Data Integration with Cloud Data Fusion -2

Machine Learning

Data Integration with Cloud Data Fusion

Discover the power of Cloud Data Fusion! Learn everything about its components, capabilities, and how it streamlines data integration and management across various sources and formats

Duration

2 Days

Target Audience

Who should attend?
Data engineers and data analysts looking to streamline data integration and management on Google Cloud

What you'll learn

Cloud Data Fusion Overview:

Explore its features, components, and use cases for data integration.

Pipeline Design & Execution:

Build, transform, and monitor batch and real-time pipelines with Wrangler.

Integration & Governance:

Seamlessly integrate diverse data sources while ensuring metadata and lineage traceability.

COURSE AGENDA

Introduction

Course Objectives: Understand the goals and expected outcomes of the course.

Introduction to Data Integration & Cloud Data Fusion

Understand the importance of data integration, the challenges it addresses, and the roles involved. Explore industry tools and discover Cloud Data Fusion’s capabilities as an effective integration platform. Familiarize yourself with its UI components for building and managing pipelines.

Building Pipelines

Cloud Data Fusion Architecture: Understand the architecture behind Cloud Data Fusion and how it supports scalable data integration.
Core Concepts: Learn the core concepts that drive data integration in Cloud Data Fusion.
Data Pipelines & DAGs: Understand data pipelines and directed acyclic graphs (DAG) used for building efficient workflows.
Pipeline Lifecycle: Learn about the stages of a pipeline’s lifecycle, from design to execution.
Designing Pipelines: Use Pipeline Studio to design and build data pipelines in Cloud Data Fusion.

Designing Complex Pipelines

Branching, Merging & Joining: Learn how to branch, merge, and join different components of a data pipeline.
Actions & Notifications: Set up actions and notifications to automate tasks and alert users on certain events.
Error Handling & Macros: Implement error handling strategies and use macros to enhance pipeline flexibility.
Pipeline Configurations: Explore the configurations for scheduling, importing, and exporting data within pipelines.

Pipeline Execution Environment

Schedules & Triggers: Learn to set up schedules and triggers to automate pipeline execution.
Execution Environment: Understand the components of the execution environment, including

Building Transformations & Preparing Data with Wrangler

Wrangler: Understand how Wrangler helps in transforming and preparing data for downstream processing.
Directives: Learn to use directives in Wrangler for data transformations.
User-Defined Directives: Create custom directives to meet specific data transformation needs.

Connectors & Streaming Pipelines

Data Integration Architecture: Understand the overall data integration architecture of Cloud Data Fusion.
Connectors: Explore the various connectors available for integrating data from different sources.
Cloud DLP API: Learn to use the Cloud Data Loss Prevention (DLP) API to protect sensitive data.
Streaming Pipelines: Understand the reference architecture for streaming pipelines and how to build and execute them in Cloud Data Fusion.