From SSIS to Lakehouse: An enterprise migration guide

Written by

Paleti Lakshmikanth

Data Engineering

I have spent enough time working across data engineering and media analytics to know that the most expensive mistakes in a migration project rarely happen in the code. They happen in the planning phase, when teams make architectural decisions under deadline pressure without fully understanding the tradeoffs.

Moving from SSIS to a cloud-native lakehouse is not a lift and shift. It is a re-architecture of how data flows, scales, and serves the business. The packages move, but so does everything underneath them: the storage model, the transformation logic, the governance layer, and the way BI and AI consumers access the data.

This series is a production-grade blueprint for that migration, built on Azure Databricks with Azure Data Factory orchestration. It covers the full journey: foundation and governance, historical data migration, daily pipeline replacement, and BI and AI enablement. Each post corresponds to a phase of the migration, with real code, real tradeoffs, and the reasoning behind every decision.

Before any of that begins, though, there is a decision that shapes everything else. Which migration strategy is right for your situation?

The three migration strategies

There is no single correct way to migrate from SSIS to a lakehouse. The right approach depends on your timeline, your tolerance for risk, and how much technical debt you are willing to carry into the new platform. These are the three patterns worth understanding before you start.

Strategy A: Direct Gold (lift and shift)

In this approach, data from the legacy data warehouse is loaded directly into the Gold layer of the lakehouse. The existing denormalized structure is preserved as-is, with calculated metrics added to enable integration with Databricks AI/BI Genie.

Use when: You are working to a tight deadline and need zero disruption to existing reporting.

Pros: Fastest to implement, lowest initial cost, and business continuity is maintained from day one.

Cons: The architecture is rigid, CDC handling is poor, and the structure is difficult to evolve. You are essentially moving the problem rather than solving it.

Strategy B: Rebuilt Gold (full modernization)

This approach takes the historical, clean, denormalized data from the legacy data warehouse and loads it into the Bronze layer first. In the Silver layer, that data is normalized and refined. Transformations are then applied to produce a properly structured Gold layer, where the data is denormalized again for reporting and analytics.

The flow is: Sources to Bronze to Silver to Gold, rebuilt from the ground up.

Use when: You are building a long-term platform that requires support for AI, ML, and scalability.

Pros: Flexible, scalable, and fully governed. The architecture can evolve as requirements change.

Cons: Higher initial effort and longer time to first value.

Strategy C: Hybrid (enterprise reality)

This is the approach most production migrations actually use, and the one this series is built around. It works in four stages:

Phase 1: Load the legacy DW directly to Gold to keep the business running
Phase 2: Build the full Bronze to Silver to Gold pipeline in parallel
Phase 3: Validate the new pipeline against the legacy output
Phase 4: Cut over once confidence is established

Why hybrid wins: It delivers zero downtime, controlled risk, gradual cost optimization, and the ability to validate the new pipeline in parallel with SSIS before committing to the cutover.

The architecture at a glance

The full migration moves from a legacy SQL Server data warehouse with SSIS ETL into a cloud-native Databricks Lakehouse with Unity Catalog governance, Delta Lake storage, and a Medallion architecture.

Starting from your SQL Server OLTP system and legacy data warehouse, data flows through ADLS Gen2 into a Databricks Lakehouse governed by Unity Catalog, producing a clean star schema for Power BI and AI consumption.

The four source tables used throughout this series are Customers, Orders (with Order Items), and Products, each flowing through all three Medallion layers with appropriate transformations at every stage.

The Medallion layers

Bronze: Raw data, append-only, with audit columns. No transformation.
Silver: Normalized, cleansed, and standardized. The single source of truth.
Gold: Denormalized for BI consumption, with star schema structure and aggregates.

The principles behind this blueprint

Before moving into the technical phases, it is worth being explicit about the principles that guide every decision in this series.

Use a hybrid approach. Keep the legacy system running while you rebuild. This is not a weakness in the strategy; it is the strategy. Parallel operation gives you the confidence to cut over on your terms rather than under pressure.

Do not trust denormalized data warehouse tables blindly. The flat structure of a legacy data warehouse embeds logic that is often undocumented and inconsistently applied. Rebuilding that logic explicitly in Silver and Gold is the only way to know what you actually have.

Adopt Medallion architecture. Bronze, Silver, and Gold are not just naming conventions. They represent distinct contracts about data quality, transformation state, and access patterns. Respecting those contracts makes the platform predictable and governable.

Govern everything with Unity Catalog. Centralized access control, lineage, and audit logging are not optional in an enterprise environment. They are what make the platform trustworthy to the business.

Automate everything with Terraform and Databricks Asset Bundles. Manual infrastructure is technical debt. Everything that can be codified should be.

What comes next

The remaining posts in this series walk through each phase of the migration in detail, with full code, configuration, and the reasoning behind each decision.

Post 2: Foundation, Unity Catalog setup, and governance
Post 3: Historical data migration, Bronze to Silver to Gold
Post 4: Daily pipeline and incremental CDC
Post 5: BI and AI enablement, and the complete SSIS to Databricks mapping reference

FAQs

Q1: What is the difference between a lift and shift migration and a full modernization? A lift and shift migration moves your existing data warehouse structure into the new platform with minimal changes. It is fast and low-risk in the short term but preserves the limitations of the original architecture. A full modernization rebuilds the data model from the ground up using Medallion architecture, which takes more time upfront but produces a platform that can scale and evolve with the business.

Q2: Why is the hybrid strategy recommended over a clean cutover? A clean cutover requires the new platform to be fully validated and production-ready before the legacy system is switched off. In practice, that validation period is difficult to compress without introducing risk. The hybrid approach keeps the legacy system running in parallel while the new pipeline is built and validated, which means the cutover happens when you are confident rather than when the deadline forces it.

Q3: What is Medallion architecture and why does it matter? Medallion architecture organizes data into three layers: Bronze for raw ingested data, Silver for cleansed and normalized data, and Gold for consumption-ready denormalized data. Each layer has a defined contract about data quality and transformation state. This structure makes the platform predictable for both engineers building pipelines and analysts consuming data.

Q4: Do I need Unity Catalog to follow this blueprint? Unity Catalog is strongly recommended for any enterprise migration on Databricks. It provides centralized governance, access control, data lineage, and audit logging across all workspaces. Without it, access management becomes fragmented and lineage is difficult to track. Unity Catalog requires Databricks Runtime 13.3 LTS or later and must be enabled at the account level by an Account Admin before workspace-level setup.

Q5: How do I know which migration strategy is right for my organization? The primary factors are timeline and risk tolerance. If you need to maintain reporting continuity with minimal disruption and are working to a tight deadline, Strategy A gives you the fastest path to the new platform. If you are building for the long term and have the runway for a proper rebuild, Strategy B produces the best architecture. For most enterprise environments, Strategy C is the realistic answer because it manages risk without sacrificing the quality of the end state.

Q6: What source systems does this blueprint assume? This blueprint is built around a SQL Server OLTP system as the primary source and a legacy denormalized data warehouse as the migration starting point. The patterns for Bronze ingestion, Silver normalization, and Gold star schema construction apply broadly to other source systems, but the specific connection configuration and ADF pipeline setup in later posts assume Azure-native infrastructure.

How we help teams move faster

View our approach