MICROSOFT FABRIC what is it

Microsoft Fabric is Microsoft’s unified, SaaS-based data and analytics platform that brings data engineering, data integration, lakehouse/warehouse, real-time analytics, data science, and Power BI into a single, OneLake-backed environment. It is positioned as an end‑to‑end, AI-powered data platform that replaces the usual patchwork of Azure Synapse + Data Factory + separate lake/warehouse/BI stacks with one coherent surface.¹²³

Core idea

All-in-one analytics: Covers ingestion, transformation, storage (lakehouse/warehouse), modeling, real-time processing, and BI in one product, exposed as persona-centric experiences (Data Engineering, Data Factory, Real-Time Intelligence, Data Science, Power BI, etc.).⁴¹
SaaS over OneLake: Everything runs as SaaS over OneLake, a logical data lake built on ADLS that provides a single data plane with Delta/Parquet and “zero-copy” sharing across workloads.³¹

Key components

OneLake: Unified storage abstraction (lake-centric, Delta/Parquet-first) used by all Fabric workloads, enabling shared data across lakehouse, warehouse, and BI without duplication.⁵¹
Workloads: Data Factory for pipelines, Data Engineering (Spark notebooks, lakehouse), Data Warehouse (SQL-first), Real-Time Intelligence/Analytics for streaming, Data Science, and tightly integrated Power BI for semantic modeling and visualization.⁶¹
Copilot & AI: Embedded “Copilot” features assist with authoring pipelines, queries, transformations, and generating insights over Fabric artifacts.²¹

Positioning and use cases

Target: Enterprises that want a single Microsoft-native stack for modern analytics, reducing integration overhead between Azure Data Factory, Synapse, and Power BI, and centralizing governance with Purview-style controls.¹³
Typical workloads: Central analytics lake/warehouse, real-time dashboards, ML/AI over enterprise data, self-service BI over governed semantic models, and cross-M365 consumption (Excel, Teams, etc.).⁶¹

How it differs from “classic” Azure data stack

Unification vs. assembly: Instead of wiring ADF + ADLS + Synapse + Power BI yourself, Fabric presents a single product surface and capacity model.³¹
Lake-first semantics: Strong emphasis on an open, lake-centric architecture (Delta/Parquet, shortcuts, mirroring) rather than purely warehouse-centric designs.⁵³

If you want, a next step could be: “map Fabric concepts to an existing Synapse + ADF + Power BI architecture” or “an architecture comparison: Fabric vs Snowflake + dbt + Power BI” and derive migration patterns. ⁷⁸⁹¹⁰¹¹¹²¹³¹⁴¹⁵

⁂

Security & Compliance Features

Microsoft Fabric provides a layered security architecture with the following capabilities:

Feature	Description
Encryption	Data encrypted at rest and in transit by default ¹⁶¹⁷
RBAC / IAM	Role-based access control at workspace, item, and row/column level; Entra ID (Azure AD) integration ¹⁷¹⁸
MFA	Multi-factor authentication configurable for all users ¹⁶
Information Protection Labels	Microsoft Purview sensitivity labels applied to Fabric items for classification ¹⁶¹⁹
Data Loss Prevention (DLP)	Define rules to detect sensitive data (PII, credit cards, etc.) and block/alert on policy violations ¹⁷
Auditing & Logging	Detailed activity logs tracking user actions, data access, and changes for forensic/compliance reporting ¹⁹¹⁷
Purview Integration	Lineage tracking, data cataloging, and governance policies via Microsoft Purview ¹⁶
Compliance Certifications	Supports GDPR, HIPAA, SOC2, and other standards out of the box with compliance reporting tools ¹⁶¹⁷
Data Sovereignty	Multi-geo capacities for residency requirements ¹⁶

Migrating Azure Synapse Dedicated SQL to Fabric

Microsoft has built a Migration Assistant directly into Fabric to simplify the transition from Azure Synapse dedicated SQL pools:²⁰

Prerequisites

Export a DACPAC file from your Synapse dedicated SQL pool (captures metadata: schema, views, stored procedures)²¹²⁰
Contributor/Admin permissions on target Fabric workspace²⁰
Read access to source Synapse pool²⁰

Migration Steps

Copy Metadata: In Fabric workspace → click Migrate → select "Analytical T-SQL warehouse" → upload DACPAC → name your new Fabric Warehouse → click Migrate²⁰
Validate Schema: Review object deployment; T-SQL is largely compatible with minor exceptions²¹
Copy Data: Use the built-in Copy Job wizard—connect to source Synapse, select tables, map columns, choose one-time full copy mode, run²⁰

Microsoft also offers free expert support via the Migration Factory program, plus DevOps, governance, and performance guidance in official migration documentation.²²

Fabric REST API

Yes, Microsoft Fabric exposes a comprehensive REST API. The primary reference is the Microsoft Fabric REST API documentation.

Capabilities

CRUD operations on pipelines, lakehouses, warehouses, semantic models, and other Fabric items²³²⁴
Automation: Programmatically create, deploy, update, delete objects and trigger dataflows/pipelines²⁵
Authentication: Uses Entra ID (Azure AD) tokens; from Fabric notebooks you can call mssparkutils.credentials.getToken() to obtain a bearer token²³

Example: Create a Lakehouse via API (Python)

import requests

workspace_id = "<your-workspace-guid>"
url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/lakehouses"
token = mssparkutils.credentials.getToken("https://api.fabric.microsoft.com")
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
payload = {"displayName": "MyLakehouse"}

response = requests.post(url, headers=headers, json=payload)
# Expect 201 Created

Suggested follow-ups:

"Map Fabric RBAC model to existing Synapse + Power BI permissions" for a security migration plan.
"Fabric REST API endpoints for semantic model deployment" if you want CI/CD automation patterns.
"Compare Fabric DLP/Purview capabilities vs. Snowflake governance" for a cross-platform compliance view. ²⁶²⁷²⁸²⁹³⁰

⁂

How to migrate data warehouses with minimal downtime to Fabric

Minimal-downtime migration to Microsoft Fabric is essentially a dual-running + cutover pattern: stand up Fabric in parallel, sync data incrementally, then switch workloads during a short outage window.³¹³²

Principles for minimal downtime

Run source and Fabric in parallel: Keep your existing warehouse (e.g., Synapse dedicated SQL) as the system of record while you build schemas and pipelines in Fabric.³³³⁴
Use incremental sync, not one big bang: Do an initial full load to Fabric, then keep it in sync with change-only (CDC/incremental) loads until cutover.³⁵³¹

High-level migration pattern

Assess and prepare
- Inventory schemas, jobs, and downstream consumers, then decide which workloads land in Fabric Warehouse vs Lakehouse.³²³¹
- Identify objects that need refactoring (unsupported T-SQL features, external dependencies, custom security).³⁴³³
Schema-first migration
- Use Fabric Migration Assistant for Data Warehouse (DACPAC/online connectivity) to migrate T‑SQL schema (tables, views, procs, functions) into a Fabric Warehouse.³⁶³³
- Resolve incompatible objects reported by the assistant before moving data to avoid repeated failures.³⁷³⁶
Initial full data load
- Use Copy Job in Fabric Data Factory or Fabric pipelines to bulk load all historical data from the source warehouse (or via staging in ADLS/Blob as Parquet/CSV) into Fabric.³⁸³¹
- For very large tables, partition loads (date or ID ranges) and leverage COPY INTO into Fabric Warehouse to maximize throughput.³⁴³⁸
Set up incremental synchronization
- Configure incremental loads using:
  - Source CDC tables or change tracking columns (e.g., LastModifiedDate) in the original warehouse.³¹³⁵
  - Data Factory pipelines scheduled at high frequency (e.g., every 5–15 min) to pull only changed records into Fabric Warehouse/Lakehouse.³⁵³¹
- Keep both platforms in sync for a stabilization period while you validate that metrics, reports, and aggregates match.³²³¹
Dual-running analytics
- Point non-critical analytics and new experiments to Fabric first (e.g., a copy of key Power BI reports wired to Fabric semantic models).³⁹³²
- Compare performance, cost, and data correctness between old and new environments, and adjust Fabric data models and indexes accordingly.³³³⁴
Cutover with a short freeze
- Schedule a small read-only window on the source warehouse: pause heavy writes or batch loads, or buffer them in queues.⁴⁰³¹
- Run a final incremental sync job to bring Fabric fully up to date, then repoint:
  - Connection strings for BI tools to Fabric Warehouse or Fabric semantic models.
  - Downstream applications that read from the warehouse to Fabric endpoints.³¹³²
Post-cutover safety net
- Keep the legacy warehouse online in read-only mode for a limited fallback period while monitoring data quality and performance in Fabric.⁴¹³²
- Decommission legacy compute/storage only after a defined acceptance window and back-out plan expires.⁴⁰³⁴

Tactics that reduce downtime risk

Metadata-driven pipelines: Use parameterized Fabric Data Factory pipelines to drive table-by-table loads and re-runs, simplifying incremental sync.³⁸³¹
Virtualization where possible: Use OneLake shortcuts or mirroring to virtually expose existing data without immediate full copies, then gradually materialize where needed.³²³⁵
Environment parity: Recreate security (roles, row-level security, masking), maintenance jobs, and data quality checks in Fabric before cutover.³¹³²

If you outline your current stack (e.g., “Synapse dedicated SQL + ADF + Power BI with CDC from SQL Server”), a concise next prompt could be: “Design a minimal-downtime migration plan from <current stack> to Microsoft Fabric, including detailed pipeline patterns and cutover sequence.” ⁴²⁴³⁴⁴⁴⁵

⁂

Does Microsoft Fabric provide a public REST API for management Assess which Synapse components require refactoring for Fabric compatibility

Best practices for near-zero-downtime data migration strategies How to design an incremental migration plan for large warehouses Tools to replicate live SQL data to Microsoft Fabric during cutover How to test and validate data consistency post-migration quickly Does Microsoft Fabric provide a public REST API for management

Microsoft Fabric does expose public REST APIs for management, and a near‑zero‑downtime migration from Synapse to Fabric is typically done via refactor‑where‑needed, dual‑running, and incremental sync patterns.⁴⁶⁴⁷

1. Public REST APIs for management

Microsoft publishes a Fabric REST API surface for automation and embedded analytics, covering workspaces, items (lakehouses, warehouses, semantic models), jobs, and admin operations.⁴⁶
For pipelines specifically, Fabric Data Factory exposes REST endpoints to create, read, update, delete, and run pipelines on demand, authenticated via Entra ID bearer tokens.⁴⁸
Example run‑pipeline call: POST https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{itemId}/jobs/instances?jobType=Pipeline with Authorization: Bearer <access-token>.⁴⁸

2. Synapse components that require refactoring

For a Synapse‑to‑Fabric move, the main refactor hotspots are:

Mapping Data Flows: Synapse Mapping Data Flows are not directly available in Fabric and must be re‑implemented using Dataflows Gen2, Spark notebooks, or Data Factory transformations.⁴⁹⁵⁰
T‑SQL incompatibilities: Certain dedicated‑pool T‑SQL features (some system views, DW‑specific hints, distribution directives, CTAS patterns) require adjustment for Fabric Data Warehouse semantics.⁵¹⁴⁹
Orchestration: ADF/Synapse pipelines with legacy activities (SSIS, certain custom activities, some integration runtime patterns) need translation to Fabric Data Factory equivalents or external orchestration.⁵²⁴⁹
Security & governance: Synapse RBAC, row‑level security, masking, and Purview integration need re‑expression using Fabric’s workspace roles, object‑level permissions, and Purview‑backed governance.⁵³⁴⁹

A practical approach is to generate an inventory and compatibility report (via Microsoft’s Migration Assistant and/or partner tools) that tags objects as “lift‑and‑shift”, “minor refactor”, or “redesign”.⁵⁴⁵²

3. Near‑zero‑downtime migration strategies

Best‑practice patterns for near‑zero‑downtime revolve around parallel running + continuous sync:

Parallel environments: Keep Synapse as system of record while Fabric Warehouse/Lakehouse is built and backfilled; only cut over when validated.⁴⁷⁵⁵
Initial bulk load + CDC: Perform a full historical load into Fabric, then use change‑only pipelines (CDC/change tracking timestamps) to keep Fabric nearly real‑time.⁵⁶⁵⁷
Phased cutover: Move non‑critical workloads and reports first, then mission‑critical ones in a controlled, short read‑only window on the source.⁴⁷⁴⁹

Microsoft and partners explicitly recommend parallel‑running strategies to achieve “minimal or no downtime” for Fabric migrations.⁴⁷

4. Designing an incremental migration plan for large warehouses

A workable incremental plan for large Synapse warehouses typically has these stages:

Stage 1 – Discovery and slicing
- Inventory tables by size, change rate, and criticality; classify into “static history”, “slowly changing”, and “high‑velocity” datasets.⁵⁵⁵⁶
- Decide which tables go to Fabric Warehouse (relational querying) vs Lakehouse (very large or semi‑structured data).⁵⁶
Stage 2 – Schema and object migration
- Use the Fabric Migration Assistant for Data Warehouse (DACPAC / online schema capture) for T‑SQL objects; fix reported incompatibilities early.⁵⁸⁵⁴
- Rebuild transformation logic from Mapping Data Flows into Dataflows Gen2 or Spark notebooks where applicable.⁵⁰⁴⁹
Stage 3 – Initial backfill
- Bulk load “cold” history first (e.g., partitions older than N months), then progressively newer partitions.⁵⁹⁵⁶
- Use partitioned COPY INTO or parallel Copy Jobs to hit throughput targets.⁴⁹⁵⁶
Stage 4 – Incremental sync
- Implement high‑frequency incremental loads based on CDC/change tracking fields into corresponding Fabric tables/Delta files.⁵⁷⁵⁶
- Mirror existing batch windows so downstream metrics remain consistent across both platforms during dual‑run.⁵²⁵⁵
Stage 5 – Workload migration and cutover
- Point a subset of Power BI reports and analytics jobs to the Fabric semantic models, compare KPIs/latency, then extend coverage.⁶⁰⁵⁸
- Schedule a brief read‑only period on Synapse for final sync and then switch all connections to Fabric.⁵⁸⁴⁷

5. Tools to replicate live SQL data into Fabric

For live or near‑real‑time replication during cutover, typical tools/patterns include:

Fabric Data Factory
- Pipelines pulling incremental changes from SQL Server, Azure SQL, or Synapse using change tracking/CDC and loading into Warehouse/Lakehouse.⁵⁷⁵⁶
Third‑party CDC tools
- Striim, Fivetran, Qlik, Debezium‑based platforms, and similar log‑based CDC tools can stream changes into ADLS/OneLake or directly into Fabric sinks for minimal‑lag replication.⁶¹⁶²
- Some Fabric‑focused migration accelerators (e.g., Kanerika FLIP) automate converting ADF/Synapse pipelines and validating row counts and checksums in parallel.⁵²⁴⁷
Database‑native replication
- SQL Server/Azure SQL replication, Always On, or transactional replication into a staging store that Fabric ingests from via OneLake shortcuts or pipelines.⁵⁹⁵⁶

6. Fast data consistency testing post‑migration

Efficient validation focuses on automated, aggregate‑level checks plus targeted deep dives:

Row counts and checksums
- Compare source vs Fabric tables via automated pipelines that compute row counts, sums, and hash/checksum columns and log differences.⁴⁹⁵²
- Partner accelerators and Microsoft guidance highlight automated validation runs as a key practice to reduce manual effort.⁴⁷⁵²
Business KPI parity
- Validate core business metrics (revenue, active users, balances) by running the same semantic queries/reports against both sources and comparing results over a defined period.⁵⁵⁵⁸
Data quality and lineage checks
- Use Purview/Fabric lineage views and DQ rules to confirm that pipelines, transformations, and policies are correctly applied end‑to‑end.⁵³⁵²

If you share a concise description of your current Synapse/SQL topology (sizes, CDC mechanisms, SLAs), a tight next prompt would be: “Design a Fabric migration blueprint for <current environment> with concrete APIs, pipeline patterns, and CDC tool choices to achieve <RPO/RTO>.” ⁶³⁶⁴⁶⁵⁶⁶⁶⁷⁶⁸⁶⁹

⁂

WHAT OTHER QUESTIONS SHOULD I BE ASKING?

You’re already asking the right migration questions; the next leverage points are around architecture, governance, operations, and vendor lock‑in in Fabric.⁷⁰

Architecture & capabilities

How do Fabric Warehouse vs Lakehouse vs Real‑Time Analytics map to my current patterns (batch, streaming, ML, BI), and where should each land?⁷¹⁷²
What limits/quotas (capacity SKUs, concurrency, file sizes, streaming throughput) affect my largest workloads, and how do they compare with current Synapse limits?⁷³⁷⁰

Security, governance, and compliance

How do Fabric’s workspace/scoped RBAC, RLS, CLS, and Purview integration map to my existing security and compliance controls (HIPAA/SOC2/etc.)?⁷⁴⁷⁵
What is my strategy for multi‑tenant isolation, data residency, and BYOK / encryption policies inside Fabric capacities?⁷⁶⁷⁴

Reliability, SRE, and operations

How will observability work: logs, metrics, traces, and cost telemetry for Fabric workloads, and how do these integrate with my existing monitoring stack?⁷⁰⁷⁴
What SLOs (latency, freshness, availability) will be defined for Fabric and how will incident response, rollback, and disaster recovery be handled?⁷⁷⁷⁰

CI/CD, APIs, and platform integration

How do I standardize CI/CD for Fabric artifacts (warehouse schema, lakehouse, semantic models, pipelines) using REST APIs, Git integration, and IaC‑like patterns?⁷⁸⁷⁹
Which management plane operations are covered by public Fabric REST APIs vs still requiring portal/manual steps, and what’s my automation strategy around that?⁸⁰⁷⁸

Cost, performance, and lock‑in

What is the cost model for Fabric capacities vs current Synapse/ADF/Power BI, and how do I set guardrails to prevent runaway spend?⁷³⁷⁰
How “open” is my data (Delta/Parquet + OneLake shortcuts) and what is my contingency plan to move workloads off Fabric if needed in the future?⁸¹⁸²

If you want a focused next step, you could ask: “Generate a Fabric adoption checklist for an enterprise Synapse environment, with sections for architecture, security, CI/CD, SRE, and cost management.”

⁂

Architecture & capabilities

How do Fabric Warehouse vs Lakehouse vs Real‑Time Analytics map to my current patterns (batch, streaming, ML, BI), and where should each land? What limits/quotas (capacity SKUs, concurrency, file sizes, streaming throughput) affect my largest workloads, and how do they compare with current Synapse limits? Security, governance, and compliance How do Fabric’s workspace/scoped RBAC, RLS, CLS, and Purview integration map to my existing security and compliance controls (HIPAA/SOC2/etc.)? What is my strategy for multi‑tenant isolation, data residency, and BYOK / encryption policies inside Fabric capacities? Reliability, SRE, and operations How will observability work: logs, metrics, traces, and cost telemetry for Fabric workloads, and how do these integrate with my existing monitoring stack? What SLOs (latency, freshness, availability) will be defined for Fabric and how will incident response, rollback, and disaster recovery be handled? CI/CD, APIs, and platform integration How do I standardize CI/CD for Fabric artifacts (warehouse schema, lakehouse, semantic models, pipelines) using REST APIs, Git integration, and IaC‑like patterns? Which management plane operations are covered by public Fabric REST APIs vs still requiring portal/manual steps, and what’s my automation strategy around that? Cost, performance, and lock‑in What is the cost model for Fabric capacities vs current Synapse/ADF/Power BI, and how do I set guardrails to prevent runaway spend? How “open” is my data (Delta/Parquet + OneLake shortcuts) and what is my contingency plan to move workloads off Fabric if needed in the future? If you want a focused next step, you could ask: “Generate a Fabric adoption checklist for an enterprise Synapse environment, with sections for architecture, security, CI/CD, SRE, and cost management.”

Here’s a Fabric adoption checklist structured around the questions you highlighted, tuned for a Synapse‑heavy shop.

Architecture & capabilities

Map workloads to Fabric primitives
- Define which workloads live in Warehouse (SQL‑first EDW, governed reporting), Lakehouse (multi‑engine, ML, large/cheap storage), and Real‑Time Analytics/KQL (streaming, log/IoT analytics).⁸³⁸⁴
- For each current pattern (batch ETL, streaming, ML, ad‑hoc BI), decide if it hits Warehouse, Lakehouse, or KQL DBs, and document reference architectures for each path.⁸⁵⁸⁶
Understand capacity limits and performance envelopes
- Document Fabric capacity SKUs you plan to use and their implications: Spark VCores per Capacity Unit, max concurrent Spark jobs, queue limits, and burst factors.⁸⁷
- Confirm there are no hard “user concurrency per SKU” benchmarks, and plan to use the Capacity Metrics App to tune concurrency and scale decisions instead of static numbers.⁸⁸

Security, governance, and compliance

Map identity, RBAC, RLS/CLS, and Purview
- Align Entra ID groups and Fabric workspace roles with existing Synapse/Power BI roles; design patterns for row‑level and object‑level security in Warehouse and Lakehouse.⁸⁹⁹⁰
- Define how Purview catalogs, lineage, DLP, and sensitivity labels in Fabric will mirror or replace current governance controls for HIPAA/SOC2/GDPR workloads.⁹¹⁹⁰
Tenancy, residency, and encryption strategy
- Decide whether to separate tenants, capacities, and workspaces for different business units or customers to achieve multi‑tenant isolation.⁹⁰⁸⁹
- Choose regions/multi‑geo capacities to satisfy data‑residency requirements and confirm encryption defaults/BYOK options fit your regulatory posture.⁹⁰

Reliability, SRE, and operations

Observability and cost telemetry
- Standardize where Fabric logs and metrics go (Log Analytics, SIEM, APM) and how you will monitor query performance, Spark job health, and capacity utilization.⁸⁸⁹⁰
- Decide on cost‑monitoring patterns (Capacity Metrics App, tagging, budgeting alerts) and how they integrate into your existing FinOps dashboards.⁹²⁸⁸
SLOs, incident response, and DR
- Define SLOs for latency, data freshness, and availability for key Fabric workloads, and map them to alert rules and runbooks.⁹³⁹⁴
- Design DR patterns: backup/restore, cross‑region strategies, and fall‑back to source systems during migration or Fabric outages.⁹⁴⁹³

CI/CD, APIs, and platform integration

Standardize CI/CD and artifact lifecycle
- Decide how Warehouse schemas, Lakehouse structures, notebooks, and semantic models are versioned and deployed (Fabric Git integration, REST APIs, external DevOps pipelines).⁹⁵⁹⁶
- Define promotion flows (dev → test → prod) and how to handle schema migrations and data backfills as code rather than ad‑hoc portal actions.⁹⁶⁹⁷
Clarify management surface vs manual ops
- Enumerate which management operations are available via public Fabric REST APIs (workspaces, items, jobs, pipelines) vs still requiring portal/PowerShell.⁹⁸⁹⁵
- Build an automation strategy around those APIs for provisioning, deployments, scheduled operations, and access management so Fabric behaves like a first‑class platform in your stack.⁹⁵⁹⁶

Cost, performance, and lock‑in

Capacity and cost guardrails
- Compare expected Fabric capacity costs (CUs/SKUs, pay‑as‑you‑go vs reserved) to current Synapse/ADF/Power BI spend, including overprovisioning and idle time.⁹²⁹⁴
- Define limits and governance: who can create new capacities/workspaces, what budget alerts exist, and how to review/query heavy users and jobs regularly.⁸⁸⁹²
Openness and exit strategy
- Confirm that critical data is stored in Delta/Parquet in OneLake and accessible via open formats/engines, and list the external systems that must be able to read it directly.⁹⁹⁸⁵
- Sketch a contingency plan: if moving off Fabric, how would you extract metadata and data (e.g., Delta files, shortcuts, parquet exports) into another lake/warehouse without major downtime.⁸³⁹⁹

If you share a brief description of your current Synapse/ADF/Power BI topology and regulatory constraints, the next prompt could be: “Given this environment, fill in this checklist with concrete design decisions and recommended Fabric SKUs, security models, and CI/CD patterns.” ¹⁰⁰¹⁰¹¹⁰²¹⁰³

⁂

Which workloads suit Fabric Warehouse versus Lakehouse for BI

How to design hybrid Warehouse plus Lakehouse in Fabric Best practices for real‑time analytics with Fabric KQL Capacity and quota limits for Fabric SKUs F64 to F1024 How to size Fabric capacity for concurrent Spark jobs

Fabric Warehouse is generally the BI-facing, modeled SQL layer, while Lakehouse is the engineering/ML/landing and refinement layer, with KQL for real‑time/log workloads; capacity sizing then depends on your Spark and query concurrency profile across those layers.¹⁰⁴¹⁰⁵

Warehouse vs Lakehouse for BI

Use Fabric Warehouse when
- You have conformed, relational models (star/snowflake) that need strong governance, fine‑grained security, and high‑performance SQL for large BI user populations.¹⁰⁵¹⁰⁶
- Workloads are primarily Power BI/SQL‑centric, with predictable schemas and SLAs for reporting; Warehouse is optimized as the “gold” layer for curated BI datasets.¹⁰⁷¹⁰⁸
Use Lakehouse when
- You need to ingest diverse/raw data (files, semi‑structured, large historical sets), run Spark‑based transformations, or support data science and ML on top of Delta tables.¹⁰⁶¹⁰⁹
- BI can still sit on Lakehouse via SQL endpoint/Direct Lake, but this is usually for exploratory/“silver” and advanced analytics rather than tightly governed EDW‑style reporting.¹¹⁰¹⁰⁴

Designing a hybrid Warehouse + Lakehouse

Canonical pattern
- Land and refine data in Lakehouse: raw → bronze → silver using Spark/pipelines over Delta.¹⁰⁹¹⁰⁶
- Publish curated, denormalized “gold” data into Warehouse (possibly via COPY INTO or shared Delta) for stable BI models and governed SQL access, all still on OneLake.¹⁰⁴¹⁰⁶
Design tips
- Keep heavy ETL, ML, and mixed‑format data in Lakehouse; keep dimensional models and SLA‑driven BI in Warehouse, but avoid redundant copies by sharing Delta where possible.¹⁰⁵¹⁰⁶
- Define clear ownership: data engineering owns Lakehouse layers; BI/analytics owns Warehouse models and semantic models, with contracts between them.¹⁰⁸¹¹⁰

Real‑time analytics with Fabric KQL

When to use KQL
- For event/log/telemetry/IoT streams where append‑only, time‑series queries, anomaly detection, and aggregations over recent windows dominate.¹¹¹¹¹²
- KQL databases integrate with Event Streams, Event Hubs, and other streaming sources, then feed Power BI for near‑real‑time dashboards and alerts.¹¹³¹¹¹
Best practices
- Model data as narrow, time‑partitioned tables and use summarization over time windows instead of row‑by‑row analysis; leverage KQL’s time‑series and window functions.¹¹²¹¹¹
- Treat KQL as the “hot” store and periodically archive or downsample into Lakehouse/Warehouse for long‑term storage and historical BI.¹¹⁴¹¹³

Capacity/quotas for F64–F1024 (Spark focus)

From the Spark concurrency documentation table:¹¹⁵

SKU	Spark VCores (base)	Max Spark VCores with burst (3×)	Queue limit (Spark jobs)
F64	128	384	64
F128	256	768	128
F256	512	1536	256
F512	1024	3072	512
F1024	2048	6144	1024

Burst lets you use up to 3× the base Spark VCores for concurrency across jobs but does not change the base allocation used for capacity billing.¹¹⁶¹¹⁵
Queue limit is the max number of Spark jobs that can be waiting when all cores are in use; beyond that, submissions are rejected.¹¹⁵

Sizing Fabric for concurrent Spark jobs

Translate workloads into VCore demand
- For each Spark workload type (ETL notebooks, streaming jobs, ML training), estimate required cores per job (e.g., small: 4–8, medium: 16–32, large: 64+).¹¹⁵
- Multiply by expected concurrent jobs to get total active VCore demand, then map to SKU base VCores and consider 3× burst as a concurrency buffer, not a guarantee.¹¹⁷¹¹⁵
Practical rules of thumb
- If most jobs are medium (16–32 VCores) and you need, say, 8–10 concurrent jobs, an F64 (128 base) is usually tight; F128 (256 base) is safer for headroom and burst.¹¹⁵
- Use multiple smaller pools (e.g., 8–16 VCore configs) to increase concurrency without starving other jobs, and monitor queue lengths and job runtimes to iteratively adjust SKU and pool sizing.¹¹⁷¹¹⁵

If you share representative job counts, typical VCore sizes, and latency expectations, a good next prompt is: “Design a Fabric capacity plan (SKU + pool configs) for this Spark workload mix, and map each workload to Warehouse, Lakehouse, or KQL.” ¹¹⁸¹¹⁹

⁂

Design a hybrid Fabric architecture for BI and streaming

When should I use Warehouse vs Lakehouse for BI metrics How to implement Direct Lake for Power BI in Fabric Best practices for syncing curated warehouse tables from lakehou

A solid hybrid Fabric design uses Lakehouse for ingestion/refinement + Warehouse for curated BI + Direct Lake for Power BI + KQL for streaming, all over OneLake.¹²⁰¹²¹

Hybrid Fabric architecture for BI + streaming

Data flow
- Land raw/operational and streaming data into Lakehouse bronze/silver layers (files, semi‑structured, mirrored sources), and into KQL databases for hot, real‑time analytics.¹²¹¹²²
- Publish curated, conformed “gold” dimensional data into Warehouse (and/or Lakehouse gold tables) feeding Power BI semantic models via Import/Direct Lake depending on size and freshness needs.¹²³¹²⁰
Workload placement
- Use Lakehouse for engineering/ML and flexible schema evolution, Warehouse for governed EDW‑style BI, and KQL DB for low‑latency telemetry/log dashboards and alerting.¹²⁴¹²⁰

Warehouse vs Lakehouse for BI metrics

Favour Warehouse when
- Metrics rely on stable star/snowflake models, strict governance, and many concurrent BI users; Warehouse exposes a SQL‑optimized, enterprise EDW surface with integrated semantic layer controls.¹²⁵¹²⁶
- You want fine‑grained permissions, predictable performance, and straightforward SQL development for BI teams familiar with classic warehouse patterns.¹²⁷¹²³
Favour Lakehouse when
- Metrics are built atop wide Delta tables or evolving schemas from data science/engineering teams, or when mixing structured with semi‑structured sources.¹²⁸¹²⁹
- You prioritize Direct Lake from Delta, rapid iteration, and ML/feature engineering on the same physical data as BI, accepting looser warehouse discipline.¹³⁰¹²⁰

Implementing Direct Lake for Power BI

When to use Direct Lake
- For large Fabric lakehouses/warehouses where fully importing data into Power BI is impractical; Direct Lake queries Delta tables in OneLake directly using VertiPaq, combining Import‑like speed with near‑real‑time data.¹³¹¹³²
Implementation steps
- Model curated Delta tables in Fabric Lakehouse or Warehouse (gold layer) with proper partitioning and surrogate keys.¹³²¹²⁰
- In Power BI (within Fabric workspace), create a semantic model using Direct Lake storage mode against the lakehouse/warehouse; tune relationships, aggregations, and fallback behaviour (DirectQuery fallback for certain features/SKU constraints).¹³³¹³²

Syncing curated Warehouse tables from Lakehouse

Pattern: Lakehouse gold → Warehouse
- Use Fabric Data Factory or Spark to materialize gold tables in Lakehouse, then use Copy activities or INSERT/MERGE into Warehouse tables pointing to the same or derived Delta data.¹²⁰¹²³
- For many tables, drive this with a config‑driven pipeline that loops over table metadata, applying truncate‑and‑load or upsert patterns depending on change volume.¹³⁴¹²¹
Practices that minimize disruption
- Avoid relying on the Lakehouse SQL Endpoint for tight SLAs; pull directly from Delta files when syncing to Warehouse to reduce latency and sync issues.¹³⁴
- Use transactional truncate‑then‑insert or swap‑table patterns, and coordinate Power BI semantic model refresh (or Direct Lake preload) so users always see consistent versions of the curated tables.¹³³¹³⁴

If you paste a short description of your current BI stack (model size, refresh SLAs, streaming sources), a concise next prompt is: “Map this environment onto a concrete Fabric design: which metrics in Warehouse vs Lakehouse, which use Direct Lake vs Import, and how to wire KQL for streaming.” ¹³⁵¹³⁶¹³⁷¹³⁸

⁂

What are common hybrid patterns combining Warehouse and Lakehouse

How to design data flows for real-time streaming into Fabric When to use materialized views in the Warehouse for BI metrics Steps to implement Direct Lake with large fact tables How to configure semantic models for Direct Lake fallback behavior Sajid

Common hybrid patterns in Fabric use Lakehouse for ingestion/refinement, Warehouse for curated BI, and Direct Lake semantic models to read shared Delta data, while Real‑Time Intelligence handles streaming into KQL and/or Lakehouse.¹³⁹¹⁴⁰

Hybrid Warehouse + Lakehouse patterns

Medallion + Warehouse “gold”
- Bronze/silver in Lakehouse (raw → refined Delta), then push curated gold tables into Warehouse for star schemas and governed BI, using SQL modeling, constraints, and indexing.¹⁴¹¹⁴⁰
- Both layers sit on OneLake/Delta, so you can avoid duplication by sharing Delta files where possible and only materializing dimensional structures in Warehouse.¹⁴²¹⁴³
Exploration in Lakehouse, production in Warehouse
- Data engineers and data scientists iterate in Lakehouse notebooks/Dataflows over Delta; once stable, those pipelines write to Warehouse tables that BI uses as the main semantic source.¹⁴⁰¹³⁹

Real‑time streaming data flows into Fabric

End‑to‑end streaming path
- Ingest from Event Hubs/Kafka/IoT Hub or databases with CDC into Eventstream in Real‑Time Intelligence.¹⁴⁴¹⁴⁵
- Route and transform events in Eventstream, then land into KQL databases (for hot analytics) and optionally into Lakehouse Delta tables for historical storage and downstream Warehouse consumption.¹⁴⁵¹⁴⁶
Design practices
- Use KQL DB for low‑latency queries and real‑time dashboards, and periodically batch or micro‑batch export into Lakehouse, then roll into Warehouse gold tables via scheduled pipelines.¹⁴⁷¹⁴⁵

When to use Warehouse materialized views for BI metrics

Good use cases
- Frequently queried aggregations over large fact tables (e.g., daily sales, rolling 7‑day metrics) where precomputing improves query latency and cost.¹⁴⁸¹⁴¹
- Reusable business metrics that multiple reports and semantic models depend on; materialized views centralize logic and can be indexed/optimized like core tables.¹⁴¹
Design tips
- Build materialized views on stable, partitioned fact tables; refresh them on a schedule aligned with data arrival (batch or micro‑batch) to keep BI SLAs predictable.¹⁴⁸

Implementing Direct Lake with large fact tables

Core steps
- Store large fact tables as well‑partitioned Delta tables in Lakehouse or Warehouse (gold), with surrogate keys and sensible clustering.¹³⁹¹⁴¹
- In Power BI (Fabric workspace), create a semantic model using Direct Lake storage mode pointed at those Delta tables; ensure you meet capacity guardrails (table/model size, row counts) to avoid forced fallback.¹⁴⁹¹⁵⁰
Performance considerations
- Use aggregations and proper partitioning to keep framing and reload overhead manageable, and minimize transformations in the model layer—do heavy shaping in Lakehouse/Warehouse first.¹⁵¹¹⁴⁹

Configuring semantic models for Direct Lake fallback

Understanding fallback behavior
- Direct Lake semantic models using SQL endpoints can fall back to DirectQuery when queries touch views, RLS/OLS‑protected tables, or exceed capacity guardrails.¹⁴⁹
- Direct Lake on OneLake models created from Power BI Desktop do not support fallback to DirectQuery; the Direct Lake behavior property is fixed and greyed out.¹⁵⁰
Controlling and monitoring fallback
- In model properties, use the Direct Lake behavior setting (where available) to allow or disallow fallback, choosing between “Direct Lake and DirectQuery” vs “Direct Lake only.”¹⁵⁰¹⁴⁹
- Monitor fallback via DMV info.DELTATABLEMETADATASTORAGES() and build a small Power BI report to track tables and reasons for fallback (e.g., RLS, views, missing tables), then refactor offending objects.¹⁵²

If you share a brief description of your main fact tables (row counts, update cadence, RLS needs), a tight next prompt is: “Propose a concrete Direct Lake + Warehouse design for these facts, including which ones allow fallback, where to use materialized views, and how to stream hot data into KQL/Lakehouse.” ¹⁵³¹⁵⁴

⁂

Sample notebook code for streaming ingestion with Spark in Fabric Lakehouse

A minimal Spark Structured Streaming “hello world” for Fabric Lakehouse is just a regular Delta stream with Fabric‑style paths and checkpointing.¹⁵⁵

Basic notebook pattern: rate source → Delta table

from pyspark.sql import functions as F

# Lakehouse-relative paths (default lakehouse attached to notebook)
table_name = "streaming_rate_demo"
checkpoint_path = f"Files/checkpoints/{table_name}"
delta_path = f"Tables/{table_name}"

# Simple streaming source (generates rows over time)
stream_df = (
    spark.readStream
         .format("rate")
         .option("rowsPerSecond", 5)
         .load()
)

# Optional: add business columns
stream_df = stream_df.withColumn("ingest_ts", F.current_timestamp())

# Write stream into a Delta table in the Lakehouse
query = (
    stream_df.writeStream
             .format("delta")
             .outputMode("append")
             .option("checkpointLocation", checkpoint_path)
             .toTable(table_name)   # registers as a Lakehouse table
)

query.awaitTermination()

File-based streaming into Lakehouse Delta

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

source_path = "Files/incoming_csv"        # folder monitored for new files
checkpoint_path = "Files/checkpoints/csv_stream"
target_table = "fact_events_stream"

schema = StructType([
    StructField("event_id", StringType(), False),
    StructField("user_id", StringType(), True),
    StructField("event_type", StringType(), True),
    StructField("value", IntegerType(), True),
    StructField("event_ts", StringType(), True),
])

stream_df = (
    spark.readStream
         .schema(schema)
         .option("header", True)
         .csv(source_path)
)

query = (
    stream_df.writeStream
             .format("delta")
             .outputMode("append")
             .option("checkpointLocation", checkpoint_path)
             .toTable(target_table)
)

query.awaitTermination()

Streaming with foreachBatch upsert into a gold table

from delta.tables import DeltaTable

source_table = "bronze_events_stream"
gold_table = "gold_events_agg"
checkpoint_path = "Files/checkpoints/bronze_to_gold"

# Stream from a Delta table (bronze) with change feed if enabled
stream_df = (
    spark.readStream
         .format("delta")
         .table(source_table)
)

def upsert_to_gold(batch_df, batch_id):
    gold = DeltaTable.forName(spark, gold_table)
    (
        gold.alias("t")
            .merge(
                batch_df.alias("s"),
                "t.event_id = s.event_id"
            )
            .whenMatchedUpdateAll()
            .whenNotMatchedInsertAll()
            .execute()
    )

query = (
    stream_df.writeStream
             .foreachBatch(upsert_to_gold)
             .option("checkpointLocation", checkpoint_path)
             .start()
)

query.awaitTermination()

These patterns are valid as notebooks or as Spark Job Definitions bound to a Lakehouse in Fabric.¹⁵⁶¹⁵⁷ ¹⁵⁸¹⁵⁹¹⁶⁰¹⁶¹¹⁶²¹⁶³¹⁶⁴¹⁶⁵¹⁶⁶¹⁶⁷¹⁶⁸¹⁶⁹

⁂

Show a minimal Spark Structured Streaming notebook for Fabric Lakehouse

A minimal Spark Structured Streaming notebook in Fabric Lakehouse can be just a single input stream writing to a Delta table in the attached Lakehouse.¹⁷⁰

1. Simple rate stream → Lakehouse Delta table

from pyspark.sql import functions as F

# --- Config: Lakehouse-relative locations ---
table_name = "demo_stream_rate"
checkpoint_path = f"Files/checkpoints/{table_name}"

# --- Streaming source (built-in generator) ---
stream_df = (
    spark.readStream
         .format("rate")               # emits rows with 'timestamp' and 'value'
         .option("rowsPerSecond", 5)
         .load()
)

# Optional: add metadata
stream_df = stream_df.withColumn("ingest_ts", F.current_timestamp())

# --- Write stream into a Lakehouse Delta table ---
query = (
    stream_df.writeStream
             .format("delta")
             .outputMode("append")
             .option("checkpointLocation", checkpoint_path)
             .toTable(table_name)       # creates/uses a Lakehouse table
)

query.awaitTermination()

2. Minimal “file drop” stream → Delta

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

source_path = "Files/incoming"          # monitored folder in the Lakehouse
checkpoint_path = "Files/checkpoints/demo_csv_stream"
table_name = "demo_csv_stream"

schema = StructType([
    StructField("id", StringType(), False),
    StructField("value", IntegerType(), True),
])

stream_df = (
    spark.readStream
         .schema(schema)
         .option("header", True)
         .csv(source_path)
)

query = (
    stream_df.writeStream
             .format("delta")
             .outputMode("append")
             .option("checkpointLocation", checkpoint_path)
             .toTable(table_name)
)

query.awaitTermination()

Paste either cell pair into a Fabric notebook attached to a Lakehouse, run it, then start dropping files (for the second example) or just watch the rate stream append rows. ¹⁷¹¹⁷²¹⁷³

⁂

Example code to write streaming data to a Delta table in Fabric

How to enable Change Data Capture on Fabric lakehouse tables How to deploy a Spark Job Definition for a streaming notebook in Fabric

For Fabric, treat this as three layers: stream → Delta table → CDC/Job Definition.¹⁷⁴

1) Minimal streaming write to a Delta table

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql import functions as F

# Lakehouse-relative locations
checkpoint_path = "Files/checkpoints/events_stream"
table_name = "events_stream"

schema = StructType([
    StructField("id", StringType(), False),
    StructField("event_type", StringType(), True),
    StructField("value", IntegerType(), True),
])

# Stream from a source (replace with eventhubs/kafka/etc.)
df = (
    spark.readStream
         .schema(schema)
         .option("header", True)
         .csv("Files/incoming_events")          # folder in Lakehouse
         .withColumn("ingest_ts", F.current_timestamp())
)

# Write streaming data into a Delta table
query = (
    df.writeStream
      .format("delta")
      .outputMode("append")
      .option("checkpointLocation", checkpoint_path)
      .toTable(table_name)                     # creates/uses Delta table
)

query.awaitTermination()

This is the same pattern as the Event Hubs → Delta example: writeStream.format("delta").option("checkpointLocation", "...").outputMode("append").toTable("deltaeventstable").¹⁷⁴

2) Enabling CDC (Delta Change Data Feed) on Lakehouse tables

In Fabric, you use Delta Change Data Feed (CDF) rather than classic SQL CDC.¹⁷⁵¹⁷⁶

Enable CDF by default for new tables:

spark.conf.set(
    "spark.microsoft.delta.properties.defaults.enableChangeDataFeed",
    "true"
)

All newly created Delta tables now have CDF enabled automatically.¹⁷⁵

Or enable CDF on an existing table:

spark.sql("""
ALTER TABLE lakehouse_table_name
SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
""")

Read changes for incremental processing:

changes_df = spark.read.format("delta") \
    .option("readChangeData", "true") \
    .option("startingVersion", 0) \
    .table("lakehouse_table_name")

This exposes _change_type, _commit_version, _commit_timestamp for downstream ETL/streaming.¹⁷⁶¹⁷⁵

3) Deploying a streaming notebook as a Spark Job Definition

Fabric guides you to move streaming code from notebooks into a Spark Job Definition (SJD) and wire that into orchestration.¹⁷⁷¹⁷⁸

Extract notebook code to a .py file
- Put your streaming logic (like the first snippet) into stream_to_delta.py with a main() function and if __name__ == "__main__": main().¹⁷⁷
Create the Spark Job Definition in Fabric portal
- In your workspace, select New item → Spark Job Definition.¹⁷⁸
- Give it a name, then upload stream_to_delta.py as the main definition file.¹⁷⁸
- Under Lakehouse reference, select the target Lakehouse so paths like Files/... and Tables/... resolve correctly.¹⁷⁷¹⁷⁸
Configure compute and schedule
- In the SJD, configure Spark compute (runtime, driver/worker size, max concurrent runs) appropriate for a long‑running streaming job.¹⁷⁸
- Trigger it manually or via Fabric Data Factory pipeline/schedule; the job then runs your streaming code as a managed Spark application.¹⁷⁹¹⁷⁷

That gives you a production‑ready pattern: structured streaming into Delta, CDF for downstream increments, and Spark Job Definitions for durable execution. ¹⁸⁰¹⁸¹¹⁸²¹⁸³¹⁸⁴¹⁸⁵¹⁸⁶¹⁸⁷¹⁸⁸

⁂

Core idea​

Key components​

Positioning and use cases​

How it differs from “classic” Azure data stack​

Security & Compliance Features​

Migrating Azure Synapse Dedicated SQL to Fabric​

Prerequisites​

Migration Steps​

Fabric REST API​

Capabilities​

Example: Create a Lakehouse via API (Python)​

How to migrate data warehouses with minimal downtime to Fabric

Principles for minimal downtime​

High-level migration pattern​

Tactics that reduce downtime risk​

If you share more context​

Does Microsoft Fabric provide a public REST API for management Assess which Synapse components require refactoring for Fabric compatibility

1. Public REST APIs for management​

2. Synapse components that require refactoring​

3. Near‑zero‑downtime migration strategies​

4. Designing an incremental migration plan for large warehouses​

5. Tools to replicate live SQL data into Fabric​

6. Fast data consistency testing post‑migration​

WHAT OTHER QUESTIONS SHOULD I BE ASKING?

Architecture & capabilities​

Security, governance, and compliance​

Reliability, SRE, and operations​

CI/CD, APIs, and platform integration​

Cost, performance, and lock‑in​

Architecture & capabilities

Architecture & capabilities​

Security, governance, and compliance​

Reliability, SRE, and operations​

CI/CD, APIs, and platform integration​

Cost, performance, and lock‑in​

Which workloads suit Fabric Warehouse versus Lakehouse for BI

Warehouse vs Lakehouse for BI​

Designing a hybrid Warehouse + Lakehouse​

Real‑time analytics with Fabric KQL​

Capacity/quotas for F64–F1024 (Spark focus)​

Sizing Fabric for concurrent Spark jobs​

Design a hybrid Fabric architecture for BI and streaming

Hybrid Fabric architecture for BI + streaming​

Warehouse vs Lakehouse for BI metrics​

Implementing Direct Lake for Power BI​

Syncing curated Warehouse tables from Lakehouse​

What are common hybrid patterns combining Warehouse and Lakehouse

Hybrid Warehouse + Lakehouse patterns​

Real‑time streaming data flows into Fabric​

When to use Warehouse materialized views for BI metrics​

Implementing Direct Lake with large fact tables​

Configuring semantic models for Direct Lake fallback​

Sample notebook code for streaming ingestion with Spark in Fabric Lakehouse

Basic notebook pattern: rate source → Delta table​

File-based streaming into Lakehouse Delta​

Streaming with foreachBatch upsert into a gold table​

Show a minimal Spark Structured Streaming notebook for Fabric Lakehouse

1. Simple rate stream → Lakehouse Delta table​

2. Minimal “file drop” stream → Delta​

Example code to write streaming data to a Delta table in Fabric

1) Minimal streaming write to a Delta table​

2) Enabling CDC (Delta Change Data Feed) on Lakehouse tables​

3) Deploying a streaming notebook as a Spark Job Definition​

Footnotes​

Core idea

Key components

Positioning and use cases

How it differs from “classic” Azure data stack

Security & Compliance Features

Migrating Azure Synapse Dedicated SQL to Fabric

Prerequisites

Migration Steps

Fabric REST API

Capabilities

Example: Create a Lakehouse via API (Python)

Principles for minimal downtime

High-level migration pattern

Tactics that reduce downtime risk

If you share more context

1. Public REST APIs for management

2. Synapse components that require refactoring

3. Near‑zero‑downtime migration strategies

4. Designing an incremental migration plan for large warehouses

5. Tools to replicate live SQL data into Fabric

6. Fast data consistency testing post‑migration

Architecture & capabilities

Security, governance, and compliance

Reliability, SRE, and operations

CI/CD, APIs, and platform integration

Cost, performance, and lock‑in

Architecture & capabilities

Security, governance, and compliance

Reliability, SRE, and operations

CI/CD, APIs, and platform integration

Cost, performance, and lock‑in

Warehouse vs Lakehouse for BI

Designing a hybrid Warehouse + Lakehouse

Real‑time analytics with Fabric KQL

Capacity/quotas for F64–F1024 (Spark focus)

Sizing Fabric for concurrent Spark jobs

Hybrid Fabric architecture for BI + streaming

Warehouse vs Lakehouse for BI metrics

Implementing Direct Lake for Power BI

Syncing curated Warehouse tables from Lakehouse

Hybrid Warehouse + Lakehouse patterns

Real‑time streaming data flows into Fabric

When to use Warehouse materialized views for BI metrics

Implementing Direct Lake with large fact tables

Configuring semantic models for Direct Lake fallback

Basic notebook pattern: rate source → Delta table

File-based streaming into Lakehouse Delta

Streaming with foreachBatch upsert into a gold table

1. Simple rate stream → Lakehouse Delta table

2. Minimal “file drop” stream → Delta

1) Minimal streaming write to a Delta table

2) Enabling CDC (Delta Change Data Feed) on Lakehouse tables

3) Deploying a streaming notebook as a Spark Job Definition

Footnotes