MICROSOFT FABRIC what is it
Microsoft Fabric is Microsoft’s unified, SaaS-based data and analytics platform that brings data engineering, data integration, lakehouse/warehouse, real-time analytics, data science, and Power BI into a single, OneLake-backed environment. It is positioned as an end‑to‑end, AI-powered data platform that replaces the usual patchwork of Azure Synapse + Data Factory + separate lake/warehouse/BI stacks with one coherent surface.123
Core idea
- All-in-one analytics: Covers ingestion, transformation, storage (lakehouse/warehouse), modeling, real-time processing, and BI in one product, exposed as persona-centric experiences (Data Engineering, Data Factory, Real-Time Intelligence, Data Science, Power BI, etc.).41
- SaaS over OneLake: Everything runs as SaaS over OneLake, a logical data lake built on ADLS that provides a single data plane with Delta/Parquet and “zero-copy” sharing across workloads.31
Key components
- OneLake: Unified storage abstraction (lake-centric, Delta/Parquet-first) used by all Fabric workloads, enabling shared data across lakehouse, warehouse, and BI without duplication.51
- Workloads: Data Factory for pipelines, Data Engineering (Spark notebooks, lakehouse), Data Warehouse (SQL-first), Real-Time Intelligence/Analytics for streaming, Data Science, and tightly integrated Power BI for semantic modeling and visualization.61
- Copilot & AI: Embedded “Copilot” features assist with authoring pipelines, queries, transformations, and generating insights over Fabric artifacts.21
Positioning and use cases
- Target: Enterprises that want a single Microsoft-native stack for modern analytics, reducing integration overhead between Azure Data Factory, Synapse, and Power BI, and centralizing governance with Purview-style controls.13
- Typical workloads: Central analytics lake/warehouse, real-time dashboards, ML/AI over enterprise data, self-service BI over governed semantic models, and cross-M365 consumption (Excel, Teams, etc.).61
How it differs from “classic” Azure data stack
- Unification vs. assembly: Instead of wiring ADF + ADLS + Synapse + Power BI yourself, Fabric presents a single product surface and capacity model.31
- Lake-first semantics: Strong emphasis on an open, lake-centric architecture (Delta/Parquet, shortcuts, mirroring) rather than purely warehouse-centric designs.53
If you want, a next step could be: “map Fabric concepts to an existing Synapse + ADF + Power BI architecture” or “an architecture comparison: Fabric vs Snowflake + dbt + Power BI” and derive migration patterns.
Security & Compliance Features
Microsoft Fabric provides a layered security architecture with the following capabilities:
| Feature | Description |
|---|---|
| Encryption | Data encrypted at rest and in transit by default 1617 |
| RBAC / IAM | Role-based access control at workspace, item, and row/column level; Entra ID (Azure AD) integration 1718 |
| MFA | Multi-factor authentication configurable for all users 16 |
| Information Protection Labels | Microsoft Purview sensitivity labels applied to Fabric items for classification 1619 |
| Data Loss Prevention (DLP) | Define rules to detect sensitive data (PII, credit cards, etc.) and block/alert on policy violations 17 |
| Auditing & Logging | Detailed activity logs tracking user actions, data access, and changes for forensic/compliance reporting 1917 |
| Purview Integration | Lineage tracking, data cataloging, and governance policies via Microsoft Purview 16 |
| Compliance Certifications | Supports GDPR, HIPAA, SOC2, and other standards out of the box with compliance reporting tools 1617 |
| Data Sovereignty | Multi-geo capacities for residency requirements 16 |
Migrating Azure Synapse Dedicated SQL to Fabric
Microsoft has built a Migration Assistant directly into Fabric to simplify the transition from Azure Synapse dedicated SQL pools:20
Prerequisites
- Export a DACPAC file from your Synapse dedicated SQL pool (captures metadata: schema, views, stored procedures)2120
- Contributor/Admin permissions on target Fabric workspace20
- Read access to source Synapse pool20
Migration Steps
- Copy Metadata: In Fabric workspace → click Migrate → select "Analytical T-SQL warehouse" → upload DACPAC → name your new Fabric Warehouse → click Migrate20
- Validate Schema: Review object deployment; T-SQL is largely compatible with minor exceptions21
- Copy Data: Use the built-in Copy Job wizard—connect to source Synapse, select tables, map columns, choose one-time full copy mode, run20
Microsoft also offers free expert support via the Migration Factory program, plus DevOps, governance, and performance guidance in official migration documentation.22
Fabric REST API
Yes, Microsoft Fabric exposes a comprehensive REST API. The primary reference is the Microsoft Fabric REST API documentation.
Capabilities
- CRUD operations on pipelines, lakehouses, warehouses, semantic models, and other Fabric items2324
- Automation: Programmatically create, deploy, update, delete objects and trigger dataflows/pipelines25
- Authentication: Uses Entra ID (Azure AD) tokens; from Fabric notebooks you can call
mssparkutils.credentials.getToken()to obtain a bearer token23
Example: Create a Lakehouse via API (Python)
import requests
workspace_id = "<your-workspace-guid>"
url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/lakehouses"
token = mssparkutils.credentials.getToken("https://api.fabric.microsoft.com")
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
payload = {"displayName": "MyLakehouse"}
response = requests.post(url, headers=headers, json=payload)
# Expect 201 Created
Suggested follow-ups:
- "Map Fabric RBAC model to existing Synapse + Power BI permissions" for a security migration plan.
- "Fabric REST API endpoints for semantic model deployment" if you want CI/CD automation patterns.
- "Compare Fabric DLP/Purview capabilities vs. Snowflake governance" for a cross-platform compliance view.
How to migrate data warehouses with minimal downtime to Fabric
Minimal-downtime migration to Microsoft Fabric is essentially a dual-running + cutover pattern: stand up Fabric in parallel, sync data incrementally, then switch workloads during a short outage window.3132
Principles for minimal downtime
- Run source and Fabric in parallel: Keep your existing warehouse (e.g., Synapse dedicated SQL) as the system of record while you build schemas and pipelines in Fabric.3334
- Use incremental sync, not one big bang: Do an initial full load to Fabric, then keep it in sync with change-only (CDC/incremental) loads until cutover.3531
High-level migration pattern
- Assess and prepare
- Schema-first migration
- Initial full data load
- Use Copy Job in Fabric Data Factory or Fabric pipelines to bulk load all historical data from the source warehouse (or via staging in ADLS/Blob as Parquet/CSV) into Fabric.3831
- For very large tables, partition loads (date or ID ranges) and leverage
COPY INTOinto Fabric Warehouse to maximize throughput.3438
- Set up incremental synchronization
- Dual-running analytics
- Cutover with a short freeze
- Post-cutover safety net
Tactics that reduce downtime risk
- Metadata-driven pipelines: Use parameterized Fabric Data Factory pipelines to drive table-by-table loads and re-runs, simplifying incremental sync.3831
- Virtualization where possible: Use OneLake shortcuts or mirroring to virtually expose existing data without immediate full copies, then gradually materialize where needed.3235
- Environment parity: Recreate security (roles, row-level security, masking), maintenance jobs, and data quality checks in Fabric before cutover.3132
If you share more context
If you outline your current stack (e.g., “Synapse dedicated SQL + ADF + Power BI with CDC from SQL Server”), a concise next prompt could be: “Design a minimal-downtime migration plan from <current stack> to Microsoft Fabric, including detailed pipeline patterns and cutover sequence.”
Does Microsoft Fabric provide a public REST API for management Assess which Synapse components require refactoring for Fabric compatibility
Best practices for near-zero-downtime data migration strategies How to design an incremental migration plan for large warehouses Tools to replicate live SQL data to Microsoft Fabric during cutover How to test and validate data consistency post-migration quickly Does Microsoft Fabric provide a public REST API for management
Microsoft Fabric does expose public REST APIs for management, and a near‑zero‑downtime migration from Synapse to Fabric is typically done via refactor‑where‑needed, dual‑running, and incremental sync patterns.4647
1. Public REST APIs for management
- Microsoft publishes a Fabric REST API surface for automation and embedded analytics, covering workspaces, items (lakehouses, warehouses, semantic models), jobs, and admin operations.46
- For pipelines specifically, Fabric Data Factory exposes REST endpoints to create, read, update, delete, and run pipelines on demand, authenticated via Entra ID bearer tokens.48
- Example run‑pipeline call:
POST https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{itemId}/jobs/instances?jobType=PipelinewithAuthorization: Bearer <access-token>.48
2. Synapse components that require refactoring
For a Synapse‑to‑Fabric move, the main refactor hotspots are:
- Mapping Data Flows: Synapse Mapping Data Flows are not directly available in Fabric and must be re‑implemented using Dataflows Gen2, Spark notebooks, or Data Factory transformations.4950
- T‑SQL incompatibilities: Certain dedicated‑pool T‑SQL features (some system views, DW‑specific hints, distribution directives, CTAS patterns) require adjustment for Fabric Data Warehouse semantics.5149
- Orchestration: ADF/Synapse pipelines with legacy activities (SSIS, certain custom activities, some integration runtime patterns) need translation to Fabric Data Factory equivalents or external orchestration.5249
- Security & governance: Synapse RBAC, row‑level security, masking, and Purview integration need re‑expression using Fabric’s workspace roles, object‑level permissions, and Purview‑backed governance.5349
A practical approach is to generate an inventory and compatibility report (via Microsoft’s Migration Assistant and/or partner tools) that tags objects as “lift‑and‑shift”, “minor refactor”, or “redesign”.5452
3. Near‑zero‑downtime migration strategies
Best‑practice patterns for near‑zero‑downtime revolve around parallel running + continuous sync:
- Parallel environments: Keep Synapse as system of record while Fabric Warehouse/Lakehouse is built and backfilled; only cut over when validated.4755
- Initial bulk load + CDC: Perform a full historical load into Fabric, then use change‑only pipelines (CDC/change tracking timestamps) to keep Fabric nearly real‑time.5657
- Phased cutover: Move non‑critical workloads and reports first, then mission‑critical ones in a controlled, short read‑only window on the source.4749
Microsoft and partners explicitly recommend parallel‑running strategies to achieve “minimal or no downtime” for Fabric migrations.47
4. Designing an incremental migration plan for large warehouses
A workable incremental plan for large Synapse warehouses typically has these stages:
- Stage 1 – Discovery and slicing
- Stage 2 – Schema and object migration
- Stage 3 – Initial backfill
- Stage 4 – Incremental sync
- Stage 5 – Workload migration and cutover
5. Tools to replicate live SQL data into Fabric
For live or near‑real‑time replication during cutover, typical tools/patterns include:
- Fabric Data Factory
- Third‑party CDC tools
- Striim, Fivetran, Qlik, Debezium‑based platforms, and similar log‑based CDC tools can stream changes into ADLS/OneLake or directly into Fabric sinks for minimal‑lag replication.6162
- Some Fabric‑focused migration accelerators (e.g., Kanerika FLIP) automate converting ADF/Synapse pipelines and validating row counts and checksums in parallel.5247
- Database‑native replication
6. Fast data consistency testing post‑migration
Efficient validation focuses on automated, aggregate‑level checks plus targeted deep dives:
- Row counts and checksums
- Business KPI parity
- Data quality and lineage checks
If you share a concise description of your current Synapse/SQL topology (sizes, CDC mechanisms, SLAs), a tight next prompt would be: “Design a Fabric migration blueprint for <current environment> with concrete APIs, pipeline patterns, and CDC tool choices to achieve <RPO/RTO>.”
WHAT OTHER QUESTIONS SHOULD I BE ASKING?
You’re already asking the right migration questions; the next leverage points are around architecture, governance, operations, and vendor lock‑in in Fabric.70
Architecture & capabilities
- How do Fabric Warehouse vs Lakehouse vs Real‑Time Analytics map to my current patterns (batch, streaming, ML, BI), and where should each land?7172
- What limits/quotas (capacity SKUs, concurrency, file sizes, streaming throughput) affect my largest workloads, and how do they compare with current Synapse limits?7370
Security, governance, and compliance
- How do Fabric’s workspace/scoped RBAC, RLS, CLS, and Purview integration map to my existing security and compliance controls (HIPAA/SOC2/etc.)?7475
- What is my strategy for multi‑tenant isolation, data residency, and BYOK / encryption policies inside Fabric capacities?7674
Reliability, SRE, and operations
- How will observability work: logs, metrics, traces, and cost telemetry for Fabric workloads, and how do these integrate with my existing monitoring stack?7074
- What SLOs (latency, freshness, availability) will be defined for Fabric and how will incident response, rollback, and disaster recovery be handled?7770
CI/CD, APIs, and platform integration
- How do I standardize CI/CD for Fabric artifacts (warehouse schema, lakehouse, semantic models, pipelines) using REST APIs, Git integration, and IaC‑like patterns?7879
- Which management plane operations are covered by public Fabric REST APIs vs still requiring portal/manual steps, and what’s my automation strategy around that?8078
Cost, performance, and lock‑in
- What is the cost model for Fabric capacities vs current Synapse/ADF/Power BI, and how do I set guardrails to prevent runaway spend?7370
- How “open” is my data (Delta/Parquet + OneLake shortcuts) and what is my contingency plan to move workloads off Fabric if needed in the future?8182
If you want a focused next step, you could ask: “Generate a Fabric adoption checklist for an enterprise Synapse environment, with sections for architecture, security, CI/CD, SRE, and cost management.”
Architecture & capabilities
How do Fabric Warehouse vs Lakehouse vs Real‑Time Analytics map to my current patterns (batch, streaming, ML, BI), and where should each land? What limits/quotas (capacity SKUs, concurrency, file sizes, streaming throughput) affect my largest workloads, and how do they compare with current Synapse limits? Security, governance, and compliance How do Fabric’s workspace/scoped RBAC, RLS, CLS, and Purview integration map to my existing security and compliance controls (HIPAA/SOC2/etc.)? What is my strategy for multi‑tenant isolation, data residency, and BYOK / encryption policies inside Fabric capacities? Reliability, SRE, and operations How will observability work: logs, metrics, traces, and cost telemetry for Fabric workloads, and how do these integrate with my existing monitoring stack? What SLOs (latency, freshness, availability) will be defined for Fabric and how will incident response, rollback, and disaster recovery be handled? CI/CD, APIs, and platform integration How do I standardize CI/CD for Fabric artifacts (warehouse schema, lakehouse, semantic models, pipelines) using REST APIs, Git integration, and IaC‑like patterns? Which management plane operations are covered by public Fabric REST APIs vs still requiring portal/manual steps, and what’s my automation strategy around that? Cost, performance, and lock‑in What is the cost model for Fabric capacities vs current Synapse/ADF/Power BI, and how do I set guardrails to prevent runaway spend? How “open” is my data (Delta/Parquet + OneLake shortcuts) and what is my contingency plan to move workloads off Fabric if needed in the future? If you want a focused next step, you could ask: “Generate a Fabric adoption checklist for an enterprise Synapse environment, with sections for architecture, security, CI/CD, SRE, and cost management.”
Here’s a Fabric adoption checklist structured around the questions you highlighted, tuned for a Synapse‑heavy shop.
Architecture & capabilities
- Map workloads to Fabric primitives
- Define which workloads live in Warehouse (SQL‑first EDW, governed reporting), Lakehouse (multi‑engine, ML, large/cheap storage), and Real‑Time Analytics/KQL (streaming, log/IoT analytics).8384
- For each current pattern (batch ETL, streaming, ML, ad‑hoc BI), decide if it hits Warehouse, Lakehouse, or KQL DBs, and document reference architectures for each path.8586
- Understand capacity limits and performance envelopes
- Document Fabric capacity SKUs you plan to use and their implications: Spark VCores per Capacity Unit, max concurrent Spark jobs, queue limits, and burst factors.87
- Confirm there are no hard “user concurrency per SKU” benchmarks, and plan to use the Capacity Metrics App to tune concurrency and scale decisions instead of static numbers.88
Security, governance, and compliance
- Map identity, RBAC, RLS/CLS, and Purview
- Align Entra ID groups and Fabric workspace roles with existing Synapse/Power BI roles; design patterns for row‑level and object‑level security in Warehouse and Lakehouse.8990
- Define how Purview catalogs, lineage, DLP, and sensitivity labels in Fabric will mirror or replace current governance controls for HIPAA/SOC2/GDPR workloads.9190
- Tenancy, residency, and encryption strategy
Reliability, SRE, and operations
- Observability and cost telemetry
- Standardize where Fabric logs and metrics go (Log Analytics, SIEM, APM) and how you will monitor query performance, Spark job health, and capacity utilization.8890
- Decide on cost‑monitoring patterns (Capacity Metrics App, tagging, budgeting alerts) and how they integrate into your existing FinOps dashboards.9288
- SLOs, incident response, and DR
CI/CD, APIs, and platform integration
- Standardize CI/CD and artifact lifecycle
- Decide how Warehouse schemas, Lakehouse structures, notebooks, and semantic models are versioned and deployed (Fabric Git integration, REST APIs, external DevOps pipelines).9596
- Define promotion flows (dev → test → prod) and how to handle schema migrations and data backfills as code rather than ad‑hoc portal actions.9697
- Clarify management surface vs manual ops
- Enumerate which management operations are available via public Fabric REST APIs (workspaces, items, jobs, pipelines) vs still requiring portal/PowerShell.9895
- Build an automation strategy around those APIs for provisioning, deployments, scheduled operations, and access management so Fabric behaves like a first‑class platform in your stack.9596
Cost, performance, and lock‑in
- Capacity and cost guardrails
- Compare expected Fabric capacity costs (CUs/SKUs, pay‑as‑you‑go vs reserved) to current Synapse/ADF/Power BI spend, including overprovisioning and idle time.9294
- Define limits and governance: who can create new capacities/workspaces, what budget alerts exist, and how to review/query heavy users and jobs regularly.8892
- Openness and exit strategy
- Confirm that critical data is stored in Delta/Parquet in OneLake and accessible via open formats/engines, and list the external systems that must be able to read it directly.9985
- Sketch a contingency plan: if moving off Fabric, how would you extract metadata and data (e.g., Delta files, shortcuts, parquet exports) into another lake/warehouse without major downtime.8399
If you share a brief description of your current Synapse/ADF/Power BI topology and regulatory constraints, the next prompt could be: “Given this environment, fill in this checklist with concrete design decisions and recommended Fabric SKUs, security models, and CI/CD patterns.”
Which workloads suit Fabric Warehouse versus Lakehouse for BI
How to design hybrid Warehouse plus Lakehouse in Fabric Best practices for real‑time analytics with Fabric KQL Capacity and quota limits for Fabric SKUs F64 to F1024 How to size Fabric capacity for concurrent Spark jobs
Fabric Warehouse is generally the BI-facing, modeled SQL layer, while Lakehouse is the engineering/ML/landing and refinement layer, with KQL for real‑time/log workloads; capacity sizing then depends on your Spark and query concurrency profile across those layers.104105
Warehouse vs Lakehouse for BI
- Use Fabric Warehouse when
- You have conformed, relational models (star/snowflake) that need strong governance, fine‑grained security, and high‑performance SQL for large BI user populations.105106
- Workloads are primarily Power BI/SQL‑centric, with predictable schemas and SLAs for reporting; Warehouse is optimized as the “gold” layer for curated BI datasets.107108
- Use Lakehouse when
- You need to ingest diverse/raw data (files, semi‑structured, large historical sets), run Spark‑based transformations, or support data science and ML on top of Delta tables.106109
- BI can still sit on Lakehouse via SQL endpoint/Direct Lake, but this is usually for exploratory/“silver” and advanced analytics rather than tightly governed EDW‑style reporting.110104
Designing a hybrid Warehouse + Lakehouse
- Canonical pattern
- Design tips
- Keep heavy ETL, ML, and mixed‑format data in Lakehouse; keep dimensional models and SLA‑driven BI in Warehouse, but avoid redundant copies by sharing Delta where possible.105106
- Define clear ownership: data engineering owns Lakehouse layers; BI/analytics owns Warehouse models and semantic models, with contracts between them.108110
Real‑time analytics with Fabric KQL
- When to use KQL
- Best practices
- Model data as narrow, time‑partitioned tables and use summarization over time windows instead of row‑by‑row analysis; leverage KQL’s time‑series and window functions.112111
- Treat KQL as the “hot” store and periodically archive or downsample into Lakehouse/Warehouse for long‑term storage and historical BI.114113
Capacity/quotas for F64–F1024 (Spark focus)
From the Spark concurrency documentation table:115
| SKU | Spark VCores (base) | Max Spark VCores with burst (3×) | Queue limit (Spark jobs) |
|---|---|---|---|
| F64 | 128 | 384 | 64 |
| F128 | 256 | 768 | 128 |
| F256 | 512 | 1536 | 256 |
| F512 | 1024 | 3072 | 512 |
| F1024 | 2048 | 6144 | 1024 |
- Burst lets you use up to 3× the base Spark VCores for concurrency across jobs but does not change the base allocation used for capacity billing.116115
- Queue limit is the max number of Spark jobs that can be waiting when all cores are in use; beyond that, submissions are rejected.115
Sizing Fabric for concurrent Spark jobs
- Translate workloads into VCore demand
- For each Spark workload type (ETL notebooks, streaming jobs, ML training), estimate required cores per job (e.g., small: 4–8, medium: 16–32, large: 64+).115
- Multiply by expected concurrent jobs to get total active VCore demand, then map to SKU base VCores and consider 3× burst as a concurrency buffer, not a guarantee.117115
- Practical rules of thumb
- If most jobs are medium (16–32 VCores) and you need, say, 8–10 concurrent jobs, an F64 (128 base) is usually tight; F128 (256 base) is safer for headroom and burst.115
- Use multiple smaller pools (e.g., 8–16 VCore configs) to increase concurrency without starving other jobs, and monitor queue lengths and job runtimes to iteratively adjust SKU and pool sizing.117115
If you share representative job counts, typical VCore sizes, and latency expectations, a good next prompt is: “Design a Fabric capacity plan (SKU + pool configs) for this Spark workload mix, and map each workload to Warehouse, Lakehouse, or KQL.”
Design a hybrid Fabric architecture for BI and streaming
When should I use Warehouse vs Lakehouse for BI metrics How to implement Direct Lake for Power BI in Fabric Best practices for syncing curated warehouse tables from lakehou
A solid hybrid Fabric design uses Lakehouse for ingestion/refinement + Warehouse for curated BI + Direct Lake for Power BI + KQL for streaming, all over OneLake.120121
Hybrid Fabric architecture for BI + streaming
- Data flow
- Land raw/operational and streaming data into Lakehouse bronze/silver layers (files, semi‑structured, mirrored sources), and into KQL databases for hot, real‑time analytics.121122
- Publish curated, conformed “gold” dimensional data into Warehouse (and/or Lakehouse gold tables) feeding Power BI semantic models via Import/Direct Lake depending on size and freshness needs.123120
- Workload placement
Warehouse vs Lakehouse for BI metrics
- Favour Warehouse when
- Metrics rely on stable star/snowflake models, strict governance, and many concurrent BI users; Warehouse exposes a SQL‑optimized, enterprise EDW surface with integrated semantic layer controls.125126
- You want fine‑grained permissions, predictable performance, and straightforward SQL development for BI teams familiar with classic warehouse patterns.127123
- Favour Lakehouse when
- Metrics are built atop wide Delta tables or evolving schemas from data science/engineering teams, or when mixing structured with semi‑structured sources.128129
- You prioritize Direct Lake from Delta, rapid iteration, and ML/feature engineering on the same physical data as BI, accepting looser warehouse discipline.130120
Implementing Direct Lake for Power BI
- When to use Direct Lake
- Implementation steps
- Model curated Delta tables in Fabric Lakehouse or Warehouse (gold layer) with proper partitioning and surrogate keys.132120
- In Power BI (within Fabric workspace), create a semantic model using Direct Lake storage mode against the lakehouse/warehouse; tune relationships, aggregations, and fallback behaviour (DirectQuery fallback for certain features/SKU constraints).133132
Syncing curated Warehouse tables from Lakehouse
- Pattern: Lakehouse gold → Warehouse
- Use Fabric Data Factory or Spark to materialize gold tables in Lakehouse, then use Copy activities or
INSERT/MERGEinto Warehouse tables pointing to the same or derived Delta data.120123 - For many tables, drive this with a config‑driven pipeline that loops over table metadata, applying truncate‑and‑load or upsert patterns depending on change volume.134121
- Use Fabric Data Factory or Spark to materialize gold tables in Lakehouse, then use Copy activities or
- Practices that minimize disruption
- Avoid relying on the Lakehouse SQL Endpoint for tight SLAs; pull directly from Delta files when syncing to Warehouse to reduce latency and sync issues.134
- Use transactional truncate‑then‑insert or swap‑table patterns, and coordinate Power BI semantic model refresh (or Direct Lake preload) so users always see consistent versions of the curated tables.133134
If you paste a short description of your current BI stack (model size, refresh SLAs, streaming sources), a concise next prompt is: “Map this environment onto a concrete Fabric design: which metrics in Warehouse vs Lakehouse, which use Direct Lake vs Import, and how to wire KQL for streaming.”
What are common hybrid patterns combining Warehouse and Lakehouse
How to design data flows for real-time streaming into Fabric When to use materialized views in the Warehouse for BI metrics Steps to implement Direct Lake with large fact tables How to configure semantic models for Direct Lake fallback behavior Sajid
Common hybrid patterns in Fabric use Lakehouse for ingestion/refinement, Warehouse for curated BI, and Direct Lake semantic models to read shared Delta data, while Real‑Time Intelligence handles streaming into KQL and/or Lakehouse.139140
Hybrid Warehouse + Lakehouse patterns
- Medallion + Warehouse “gold”
- Bronze/silver in Lakehouse (raw → refined Delta), then push curated gold tables into Warehouse for star schemas and governed BI, using SQL modeling, constraints, and indexing.141140
- Both layers sit on OneLake/Delta, so you can avoid duplication by sharing Delta files where possible and only materializing dimensional structures in Warehouse.142143
- Exploration in Lakehouse, production in Warehouse
Real‑time streaming data flows into Fabric
- End‑to‑end streaming path
- Ingest from Event Hubs/Kafka/IoT Hub or databases with CDC into Eventstream in Real‑Time Intelligence.144145
- Route and transform events in Eventstream, then land into KQL databases (for hot analytics) and optionally into Lakehouse Delta tables for historical storage and downstream Warehouse consumption.145146
- Design practices
When to use Warehouse materialized views for BI metrics
- Good use cases
- Frequently queried aggregations over large fact tables (e.g., daily sales, rolling 7‑day metrics) where precomputing improves query latency and cost.148141
- Reusable business metrics that multiple reports and semantic models depend on; materialized views centralize logic and can be indexed/optimized like core tables.141
- Design tips
- Build materialized views on stable, partitioned fact tables; refresh them on a schedule aligned with data arrival (batch or micro‑batch) to keep BI SLAs predictable.148
Implementing Direct Lake with large fact tables
- Core steps
- Store large fact tables as well‑partitioned Delta tables in Lakehouse or Warehouse (gold), with surrogate keys and sensible clustering.139141
- In Power BI (Fabric workspace), create a semantic model using Direct Lake storage mode pointed at those Delta tables; ensure you meet capacity guardrails (table/model size, row counts) to avoid forced fallback.149150
- Performance considerations
Configuring semantic models for Direct Lake fallback
- Understanding fallback behavior
- Direct Lake semantic models using SQL endpoints can fall back to DirectQuery when queries touch views, RLS/OLS‑protected tables, or exceed capacity guardrails.149
- Direct Lake on OneLake models created from Power BI Desktop do not support fallback to DirectQuery; the Direct Lake behavior property is fixed and greyed out.150
- Controlling and monitoring fallback
- In model properties, use the Direct Lake behavior setting (where available) to allow or disallow fallback, choosing between “Direct Lake and DirectQuery” vs “Direct Lake only.”150149
- Monitor fallback via DMV
info.DELTATABLEMETADATASTORAGES()and build a small Power BI report to track tables and reasons for fallback (e.g., RLS, views, missing tables), then refactor offending objects.152
If you share a brief description of your main fact tables (row counts, update cadence, RLS needs), a tight next prompt is: “Propose a concrete Direct Lake + Warehouse design for these facts, including which ones allow fallback, where to use materialized views, and how to stream hot data into KQL/Lakehouse.”
Sample notebook code for streaming ingestion with Spark in Fabric Lakehouse
A minimal Spark Structured Streaming “hello world” for Fabric Lakehouse is just a regular Delta stream with Fabric‑style paths and checkpointing.155
Basic notebook pattern: rate source → Delta table
from pyspark.sql import functions as F
# Lakehouse-relative paths (default lakehouse attached to notebook)
table_name = "streaming_rate_demo"
checkpoint_path = f"Files/checkpoints/{table_name}"
delta_path = f"Tables/{table_name}"
# Simple streaming source (generates rows over time)
stream_df = (
spark.readStream
.format("rate")
.option("rowsPerSecond", 5)
.load()
)
# Optional: add business columns
stream_df = stream_df.withColumn("ingest_ts", F.current_timestamp())
# Write stream into a Delta table in the Lakehouse
query = (
stream_df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpoint_path)
.toTable(table_name) # registers as a Lakehouse table
)
query.awaitTermination()
File-based streaming into Lakehouse Delta
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
source_path = "Files/incoming_csv" # folder monitored for new files
checkpoint_path = "Files/checkpoints/csv_stream"
target_table = "fact_events_stream"
schema = StructType([
StructField("event_id", StringType(), False),
StructField("user_id", StringType(), True),
StructField("event_type", StringType(), True),
StructField("value", IntegerType(), True),
StructField("event_ts", StringType(), True),
])
stream_df = (
spark.readStream
.schema(schema)
.option("header", True)
.csv(source_path)
)
query = (
stream_df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpoint_path)
.toTable(target_table)
)
query.awaitTermination()
Streaming with foreachBatch upsert into a gold table
from delta.tables import DeltaTable
source_table = "bronze_events_stream"
gold_table = "gold_events_agg"
checkpoint_path = "Files/checkpoints/bronze_to_gold"
# Stream from a Delta table (bronze) with change feed if enabled
stream_df = (
spark.readStream
.format("delta")
.table(source_table)
)
def upsert_to_gold(batch_df, batch_id):
gold = DeltaTable.forName(spark, gold_table)
(
gold.alias("t")
.merge(
batch_df.alias("s"),
"t.event_id = s.event_id"
)
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute()
)
query = (
stream_df.writeStream
.foreachBatch(upsert_to_gold)
.option("checkpointLocation", checkpoint_path)
.start()
)
query.awaitTermination()
These patterns are valid as notebooks or as Spark Job Definitions bound to a Lakehouse in Fabric.156157
Show a minimal Spark Structured Streaming notebook for Fabric Lakehouse
A minimal Spark Structured Streaming notebook in Fabric Lakehouse can be just a single input stream writing to a Delta table in the attached Lakehouse.170
1. Simple rate stream → Lakehouse Delta table
from pyspark.sql import functions as F
# --- Config: Lakehouse-relative locations ---
table_name = "demo_stream_rate"
checkpoint_path = f"Files/checkpoints/{table_name}"
# --- Streaming source (built-in generator) ---
stream_df = (
spark.readStream
.format("rate") # emits rows with 'timestamp' and 'value'
.option("rowsPerSecond", 5)
.load()
)
# Optional: add metadata
stream_df = stream_df.withColumn("ingest_ts", F.current_timestamp())
# --- Write stream into a Lakehouse Delta table ---
query = (
stream_df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpoint_path)
.toTable(table_name) # creates/uses a Lakehouse table
)
query.awaitTermination()
2. Minimal “file drop” stream → Delta
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
source_path = "Files/incoming" # monitored folder in the Lakehouse
checkpoint_path = "Files/checkpoints/demo_csv_stream"
table_name = "demo_csv_stream"
schema = StructType([
StructField("id", StringType(), False),
StructField("value", IntegerType(), True),
])
stream_df = (
spark.readStream
.schema(schema)
.option("header", True)
.csv(source_path)
)
query = (
stream_df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpoint_path)
.toTable(table_name)
)
query.awaitTermination()
Paste either cell pair into a Fabric notebook attached to a Lakehouse, run it, then start dropping files (for the second example) or just watch the rate stream append rows.
Example code to write streaming data to a Delta table in Fabric
How to enable Change Data Capture on Fabric lakehouse tables How to deploy a Spark Job Definition for a streaming notebook in Fabric
For Fabric, treat this as three layers: stream → Delta table → CDC/Job Definition.174
1) Minimal streaming write to a Delta table
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql import functions as F
# Lakehouse-relative locations
checkpoint_path = "Files/checkpoints/events_stream"
table_name = "events_stream"
schema = StructType([
StructField("id", StringType(), False),
StructField("event_type", StringType(), True),
StructField("value", IntegerType(), True),
])
# Stream from a source (replace with eventhubs/kafka/etc.)
df = (
spark.readStream
.schema(schema)
.option("header", True)
.csv("Files/incoming_events") # folder in Lakehouse
.withColumn("ingest_ts", F.current_timestamp())
)
# Write streaming data into a Delta table
query = (
df.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpoint_path)
.toTable(table_name) # creates/uses Delta table
)
query.awaitTermination()
This is the same pattern as the Event Hubs → Delta example: writeStream.format("delta").option("checkpointLocation", "...").outputMode("append").toTable("deltaeventstable").174
2) Enabling CDC (Delta Change Data Feed) on Lakehouse tables
In Fabric, you use Delta Change Data Feed (CDF) rather than classic SQL CDC.175176
- Enable CDF by default for new tables:
spark.conf.set(
"spark.microsoft.delta.properties.defaults.enableChangeDataFeed",
"true"
)
All newly created Delta tables now have CDF enabled automatically.175
- Or enable CDF on an existing table:
spark.sql("""
ALTER TABLE lakehouse_table_name
SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
""")
- Read changes for incremental processing:
changes_df = spark.read.format("delta") \
.option("readChangeData", "true") \
.option("startingVersion", 0) \
.table("lakehouse_table_name")
This exposes _change_type, _commit_version, _commit_timestamp for downstream ETL/streaming.176175
3) Deploying a streaming notebook as a Spark Job Definition
Fabric guides you to move streaming code from notebooks into a Spark Job Definition (SJD) and wire that into orchestration.177178
- Extract notebook code to a .py file
- Put your streaming logic (like the first snippet) into
stream_to_delta.pywith amain()function andif __name__ == "__main__": main().177
- Put your streaming logic (like the first snippet) into
- Create the Spark Job Definition in Fabric portal
- Configure compute and schedule
That gives you a production‑ready pattern: structured streaming into Delta, CDF for downstream increments, and Spark Job Definitions for durable execution.