Data Warehousing & ETL/ELT with Modern Tools (e.g., Snowflake, Databricks)
Build scalable cloud data warehouses with tools like Snowflake, BigQuery, and Databricks. Learn ELT/ETL workflows using dbt, Airflow, and PySpark for analytics and BI. Ideal for data engineers building end-to-end pipelines.
Duration: 12
Lecture: 50
Category: Data Engineering & Big Data
Language: English & Japanese
$ 1,500.00
Data Warehousing & ETL/ELT with Modern Tools is a comprehensive, hands‑on course that teaches professionals to design, implement, and operate modern cloud‑native data‑warehouse ecosystems. It begins by tracing the evolution of data warehouses from costly on‑premise appliances to elastic platforms such as Snowflake, Databricks, and Google BigQuery, explaining how decoupled storage‑and‑compute architectures, micro‑partition pruning, and pay‑per‑second billing transformed analytics economics. Learners master dimensional modeling concepts—star and snowflake schemas, fact and dimension tables, surrogate keys, and slowly changing dimensions (Types 1, 2, 3)—ensuring data structures optimize OLAP workloads for BI tools like Tableau, Power BI, and Looker. The curriculum contrasts OLTP versus OLAP query patterns, clarifying why columnar storage and clustering keys accelerate large aggregations. Students then shift from classic ETL pipelines to the modern ELT paradigm, leveraging warehouse horsepower for set‑based transforms. Practical labs employ Fivetran, Talend, and Debezium to ingest batch, micro‑batch, and change‑data‑capture streams, landing raw records in a bronze layer, refining them to silver, and publishing curated gold marts. Using dbt, Apache Airflow, and Prefect, learners build dependency‑aware DAGs with testing, documentation, lineage, retries, and Slack alerts. Delta Lake on Databricks and Snowflake’s time‑travel, zero‑copy cloning, and secure data sharing features demonstrate reproducible pipelines and governed collaboration. Optimization modules teach automatic clustering, materialized views, secondary indexes, result caching, partition pruning, and cost‑aware warehouse sizing with auto‑suspend policies. Students profile queries, interpret execution plans, and tune Spark jobs to reduce shuffle and spill. Security lessons guide learners through role‑based access control, row‑level policies, dynamic masking, column‑level encryption, and audit logging, while governance topics integrate AWS Glue, Azure Purview, and Alation for cataloging, lineage visualization, and policy enforcement. Data‑quality frameworks such as Great Expectations and Monte Carlo introduce freshness SLAs, anomaly detection, and data‑health dashboards. Learners implement version control and CI/CD for data models and pipeline code with GitHub Actions or GitLab CI, plus Terraform scripts for repeatable infrastructure. Real‑world use cases—customer 360 analytics, IoT telemetry, finance regulatory reporting, and machine‑learning feature stores—illustrate design trade‑offs around latency, concurrency, retention, and compliance. Disaster‑recovery labs configure cross‑region replication, point‑in‑time restore, and failover testing, while cost‑management exercises analyze storage lifecycle policies, warehouse credit consumption, and rightsizing strategies. By course completion, participants have architected and deployed a production‑ready cloud data warehouse, equipped with automated ingestion, scalable ELT transformations, governed access, robust observability, and cost‑optimized performance. They graduate ready to excel as data engineers, analytics engineers, or platform architects who can deliver trusted, self‑service analytics and machine‑learning readiness across any industry.