ELT Pipeline with dbt + Snowflake

The Problem

Raw data sitting in a data warehouse is useless without transformation. Needed a repeatable, tested, and documented process to turn source data into analytics-ready tables with proper orchestration.

What I Built

Medallion Architecture

Implemented three-layer data architecture for progressive data refinement:

1
models/
2
├── staging/      # Bronze → Raw data cleanup
3
├── intermediate/ # Silver → Business logic
4
└── marts/        # Gold → Analytics-ready tables
5
    ├── dim_customers.sql
6
    ├── dim_products.sql
7
    └── fact_orders.sql

Dimensional Models

Designed star schema with fact and dimension tables
Implemented Type 2 slowly changing dimensions for historical tracking
Enabled sub-second query performance on analytics workloads

Airflow Integration

Orchestrated dbt runs via Apache Airflow for scheduled transformations
Built DAGs with dependency management, retries, and alerting
Containerized with Docker for consistent execution across environments

Testing & CI/CD

50+ automated data quality tests (uniqueness, not-null, referential integrity, custom business rules)
GitHub Actions workflow runs dbt compile, dbt test, and generates documentation on every PR
Zero-downtime production deployments

Technical Approach

Incremental Processing: Only process new/changed records—critical for large tables.

{{ config(materialized='incremental', unique_key='order_id') }}
 
SELECT * FROM {{ source('raw', 'orders') }}
{% if is_incremental() %}
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Snowflake Optimization: Clustering keys on frequently filtered columns. Materialized views for common aggregations.

Results

Sub-second query performance on analytics workloads
50+ automated tests ensuring data accuracy and consistency
Full data lineage and documentation auto-generated
Scalable architecture supporting future data sources

Key Technologies

dbt, Snowflake, Apache Airflow, SQL, Docker, GitHub Actions