context graph vs knowledge graph
A knowledge graph is a persistent, global model of entities and relationships; a context graph is a small, dynamic subgraph (or overlay) that represents what is relevant “right now” for a specific task, agent, or decision.123
Core definitions
- Knowledge graph: Long‑lived graph of entities, relations, and sometimes rules or constraints, typically represented as triples $(subject, predicate, object)$, used as a semantic backbone across systems and time.245
- Context graph: Task‑ or query‑specific graph capturing current user intent, recent events, applicable policies, and the subset of facts relevant to the ongoing interaction.6371
Structural and lifecycle differences
- Scope & size
- Temporal behavior
Role in AI agent systems
- Knowledge graph → “semantic backbone” / long‑term memory
- Context graph → “working state of now” / control surface
Governance and decision semantics
Some authors distinguish knowledge graph (what exists, factual relations) from governed context graph (what is allowed, under which authority, in this situation).6
- Knowledge graph:
- Governed context graph:
How they work together in an agentic stack
In a typical multi‑agent architecture:
- The knowledge graph sits under everything as the global world model and institutionally governed truth layer.512
- A context engine (or context graph builder) pulls a query‑driven subgraph—scored by relevance, filtered by policy—and exposes it as a compact context graph for the current agent turn or workflow.8143
- This context graph is then serialized (often as structured text or JSON) into the model’s prompt, giving it focused, policy‑aware grounding while keeping tokens in check and reducing hallucinations.43
If you want, next step I can sketch:
- a minimal typed schema (entities/edges) for a domain KG, and
- a LangGraph‑style context‑graph builder node that projects task‑specific subgraphs for your agents.
ketch:
a minimal typed schema (entities/edges) for a domain KG, and a LangGraph‑style context‑graph builder node that projects task‑specific subgraphs for your agents.
Below is a concrete, copy‑pasteable sketch for both parts: a minimal typed domain KG schema and a LangGraph node that builds a task‑specific context graph view from it.
1. Minimal typed schema for a domain knowledge graph
Assume an enterprise “Customer‑Case‑Content” domain (tickets, docs, agents). This is property‑graph style, but easy to map to RDF triples.1112131415
Entity types (nodes)
// Core node types
type NodeType =
| "Customer"
| "User" // internal user/agent
| "Case" // ticket / incident
| "Document" // markdown, spec, runbook
| "Product"
| "Feature"
| "Tag"
| "Policy";
// Base node shape
interface KGNode {
id: string; // global, stable ID
type: NodeType;
name: string;
createdAt?: string; // ISO 8601
updatedAt?: string;
// Arbitrary properties per type in a `props` bag
props?: Record<string, any>;
}
// Examples of props by type
interface CustomerNode extends KGNode {
type: "Customer";
props?: {
externalId?: string;
segment?: "SMB" | "MidMarket" | "Enterprise";
region?: string;
isRegulated?: boolean; // e.g. healthcare/fintech
};
}
interface CaseNode extends KGNode {
type: "Case";
props?: {
status: "open" | "in_progress" | "resolved";
priority: "p0" | "p1" | "p2" | "p3";
openedAt: string;
closedAt?: string;
channel: "email" | "chat" | "api";
};
}
interface DocumentNode extends KGNode {
type: "Document";
props?: {
path: string; // repo path, URL, etc.
mimeType: string;
embeddingId?: string; // pointer into vector store
};
}
Relationship types (edges)
type EdgeType =
| "OWNS" // Customer -> Product
| "USES_FEATURE" // Customer -> Feature
| "RAISED" // Customer -> Case
| "ASSIGNED_TO" // Case -> User
| "RELATES_TO_CASE" // Document -> Case
| "RELATES_TO_FEATURE"// Document -> Feature
| "HAS_TAG" // Case/Document -> Tag
| "GOVERNED_BY" // Node -> Policy
| "SIMILAR_TO"; // symmetric similarity
interface KGEdge {
id: string;
type: EdgeType;
from: string; // KGNode.id
to: string; // KGNode.id
props?: {
score?: number; // similarity/relevance score
since?: string; // ISO 8601
source?: string; // “etl”, “llm_extraction”, etc.
};
}
You can then package a subgraph like this:1314
interface KGSubgraph {
nodes: KGNode[];
edges: KGEdge[];
}
This is sufficient for:
- Entity‑centric reasoning: “Which features does this regulated enterprise customer use?”
- Graph‑RAG: “Which documents are most related to this case and its product/feature neighborhood?”
2. LangGraph‑style context‑graph builder node
Goal: a node that, given current task state (e.g. active case, query text), queries the KG + vector store, then writes a compact contextGraph into the LangGraph state.
Assume:
- You have a
kgClientwith methods likegetNode(id),neighbors(id, depth, filters),similarDocuments(embeddingId, k). - You use LangGraph Python with
TypedDictstate.16171819
State schema with context graph field
from typing import List, Literal, TypedDict, Optional, Dict, Any
NodeType = Literal[
"Customer", "User", "Case", "Document", "Product", "Feature", "Tag", "Policy"
]
EdgeType = Literal[
"OWNS",
"USES_FEATURE",
"RAISED",
"ASSIGNED_TO",
"RELATES_TO_CASE",
"RELATES_TO_FEATURE",
"HAS_TAG",
"GOVERNED_BY",
"SIMILAR_TO",
]
class KGNode(TypedDict, total=False):
id: str
type: NodeType
name: str
createdAt: Optional[str]
updatedAt: Optional[str]
props: Dict[str, Any]
class KGEdge(TypedDict, total=False):
id: str
type: EdgeType
from_: str # `from` is reserved in Python
to: str
props: Dict[str, Any]
class KGSubgraph(TypedDict):
nodes: List[KGNode]
edges: List[KGEdge]
class AgentState(TypedDict):
messages: list # your normal conversation/messages state
active_case_id: Optional[str]
query: Optional[str]
context_graph: Optional[KGSubgraph]
Context‑graph builder node
from langgraph.graph import StateGraph, START, END
# Pseudocode client; you’d back this with ArangoDB/Neo4j/etc.
class KGClient:
def get_node(self, node_id: str) -> Optional[KGNode]: ...
def neighborhood(
self,
node_id: str,
depth: int = 2,
max_nodes: int = 50,
allowed_types: Optional[list[NodeType]] = None,
) -> KGSubgraph: ...
def semantic_docs_for_case(
self,
case_id: str,
k: int = 10,
) -> KGSubgraph: ...
kg_client = KGClient()
def build_context_graph(state: AgentState) -> AgentState:
case_id = state.get("active_case_id")
query = state.get("query")
nodes: dict[str, KGNode] = {}
edges: dict[str, KGEdge] = {}
if case_id:
# 1) Structural neighborhood around the active case
case_ego = kg_client.neighborhood(
node_id=case_id,
depth=2,
max_nodes=50,
allowed_types=[
"Case",
"Customer",
"Product",
"Feature",
"Document",
"Policy",
"Tag",
],
)
for n in case_ego["nodes"]:
nodes[n["id"]] = n
for e in case_ego["edges"]:
edges[e["id"]] = e
# 2) Semantic expansion: top-k related documents
doc_subgraph = kg_client.semantic_docs_for_case(case_id=case_id, k=10)
for n in doc_subgraph["nodes"]:
nodes[n["id"]] = n
for e in doc_subgraph["edges"]:
edges[e["id"]] = e
# Optional: query‑driven expansion if no active case but a free‑text query
# e.g., vector search + expansion via KG neighborhood; omitted for brevity.
state["context_graph"] = {
"nodes": list(nodes.values()),
"edges": list(edges.values()),
}
return state
Wiring it into a LangGraph
from langgraph.graph import StateGraph, START, END
builder = StateGraph(AgentState)
builder.add_node("build_context", build_context_graph)
# Example: always build context first, then call an LLM node that consumes it
def call_model(state: AgentState) -> AgentState:
# You’d serialize state["context_graph"] into the prompt here
# and append the model’s reply to `messages`.
...
return state
builder.add_node("model", call_model)
builder.add_edge(START, "build_context")
builder.add_edge("build_context", "model")
builder.add_edge("model", END)
graph = builder.compile()
This gives you:
- A global KG schema you can implement in ArangoDB/Neo4j/Relational‑KG, compatible with GraphRAG patterns.14202113
- A LangGraph node that projects a task‑specific context graph into state, ready to be serialized into prompts or used for policy checks before tool calls.17181916
If you tell me your preferred backing store (ArangoDB vs Neo4j vs relational) I can turn the KGClient pseudocode into concrete queries (AQL, Cypher, or SQL + edge tables) and add a prompt template that injects context_graph for multi‑step agents.
Examples of knowledge graphs in enterprise AI applications
When to use context graphs over knowledge graphs How do context graphs integrate with RAG Building a governed context graph step by step Knowledge graphs vs context graphs for AI agents. ext_graph for multi‑step agents.
How to implement the KG schema in Rel or Neo4j Example code for LangGraph node projecting subgraphs Task-specific subgraph projection techniques for agents Integrating KG with LangGraph state and context_schema Building multi-agent workflows using context graphs. ALL 3 preferred backing store (ArangoDB vs Neo4j vs relational) I can turn the KGClient pseudocode into concrete queries (AQL, Cypher, or SQL + edge tables) and add a prompt template that injects context_graph for multi‑step agents.
Below is a compact design you can lift into a spec or codebase. I’ll group by what you asked for, with concrete patterns and code.
1. When to use context graphs vs knowledge graphs
- Use a knowledge graph when you need a durable, governed world model: shared semantics across teams, cross‑system joins, lineage, and slow‑changing truth (products, org, policies, long‑term customer history).2627282930
- Use a context graph when you need a small, task‑scoped, policy‑aware slice for a specific run/agent: “what matters for this incident, this user, under these constraints, right now.”2731323334
For agent stacks, a solid pattern is: KG as backbone, context graph as working set injected into prompts and tools, then logged for traceability.31323527
2. How context graphs integrate with RAG
Typical pattern (KG + vector RAG + context graph):28323526
- Retrieve candidate docs from vector store with the user query or task description.
- Map chunks to KG nodes (Document, Section, Feature, Policy) via stored IDs or edges (e.g.,
Document -> Feature,Document -> Policy).3637 - Expand to a context graph by pulling:
- Local KG neighborhood (customer, product, feature, prior cases).
- Governing policies or SLAs.
- A small set of top‑K related docs (semantic + graph similarity).
- Serialize the context graph into the model prompt (or a tool schema) with explicit types and relationships, not just raw text.
- Use the context graph for control, e.g., filter tools by policy node, mask PII nodes, or veto actions that violate a
GOVERNED_BY -> Policyconstraint.3327
This is essentially “GraphRAG with a decision layer”: the KG provides structure, the retriever provides content, the context graph provides governed, task‑specific state.
3. Building a governed context graph (step by step)
You can implement this as a pipeline that runs per task/agent turn:
- Identify the anchor(s)
- Pull the structural neighborhood
- Add relevant content nodes
- Overlay governance
- Attach
Policynodes referenced byGOVERNED_BYedges from Case, Customer, User, or Feature. - Optionally, add
Risk/Authorityedges, e.g.User -[HAS_ROLE]-> Role -[ALLOWS]-> Action.
- Attach
- Normalize & cap
- Emit as
context_graph- Store into graph state (LangGraph) and/or sign/log it for audit.
- Optionally store a hash / ID and reference it in your traces.
4. Implementing the KG schema in three backends
4.1 Relational (PostgreSQL / “Rel‑style” schema)
A simple, explicit schema: one table per node type + a generic edges table.36
-- Nodes
CREATE TABLE customers (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
external_id TEXT,
segment TEXT,
region TEXT,
is_regulated BOOLEAN DEFAULT FALSE
);
CREATE TABLE cases (
id TEXT PRIMARY KEY,
customer_id TEXT REFERENCES customers(id),
status TEXT,
priority TEXT,
opened_at TIMESTAMPTZ,
closed_at TIMESTAMPTZ,
channel TEXT
);
CREATE TABLE documents (
id TEXT PRIMARY KEY,
name TEXT,
path TEXT,
mime_type TEXT,
embedding_id TEXT
);
CREATE TABLE policies (
id TEXT PRIMARY KEY,
name TEXT,
path TEXT
);
CREATE TABLE tags (
id TEXT PRIMARY KEY,
name TEXT UNIQUE
);
-- Edges (typed, generic)
CREATE TABLE edges (
id BIGSERIAL PRIMARY KEY,
type TEXT NOT NULL, -- 'OWNS', 'RELATES_TO_FEATURE', ...
from_id TEXT NOT NULL,
to_id TEXT NOT NULL,
since TIMESTAMPTZ,
score DOUBLE PRECISION,
source TEXT,
-- Optionally: foreign keys + CHECK constraints per type
-- or keep it soft-typed if you want flexibility
INDEX (from_id),
INDEX (to_id),
INDEX (type, from_id),
INDEX (type, to_id)
);
Example: task‑specific neighborhood query around a case:
-- 1) Get case + customer + product/features
WITH anchor_case AS (
SELECT * FROM cases WHERE id = $1
),
neighbors AS (
-- direct edges from the case
SELECT e.*
FROM edges e
WHERE e.from_id = $1 OR e.to_id = $1
)
SELECT *
FROM neighbors;
You can then join neighbors to customers, products, policies, documents by from_id / to_id and build a KGSubgraph in application code.
4.2 Neo4j (Cypher)
Node labels and relationship types map directly to your schema.38
// Example nodes
CREATE (c:Customer {id: $customerId, name: $name, segment: $segment});
CREATE (p:Product {id: $productId, name: $productName});
CREATE (k:Case {id: $caseId, status: $status, priority: $priority});
// Relationships
MATCH (c:Customer {id: $customerId}),
(p:Product {id: $productId})
MERGE (c)-[:OWNS]->(p);
MATCH (c:Customer {id: $customerId}),
(k:Case {id: $caseId})
MERGE (c)-[:RAISED]->(k);
Task‑specific subgraph around a case:
// ego network: Case + neighbors (depth <= 2), typed and capped
MATCH (case:Case {id: $caseId})
CALL apoc.path.subgraphNodes(case, {
maxLevel: 2,
relationshipFilter: "RAISED|OWNS|USES_FEATURE|ASSIGNED_TO|HAS_TAG|GOVERNED_BY|RELATES_TO_CASE|RELATES_TO_FEATURE",
labelFilter: "+Customer|+Product|+Feature|+User|+Document|+Policy|+Tag"
})
YIELD node
WITH collect(node) AS nodes
UNWIND nodes AS n
MATCH (n)-[r]->(m)
WHERE m IN nodes
RETURN collect(DISTINCT n) AS nodes, collect(DISTINCT r) AS rels
LIMIT 200;
You can add RAG overlays by adding SIMILAR_TO relationships or storing embedding_id on Document and querying external vector search.
4.3 ArangoDB (AQL)
Use one document collection per node type + edge collections.3736
// Example collections
// Customers, Cases, Documents, Policies, Tags (document collections)
// ownsEdges, raisedEdges, relatesToFeatureEdges, governedByEdges (edge collections)
Ego neighborhood around a Case:
LET caseDoc = DOCUMENT("Cases", @caseId)
FOR v, e, p IN 1..2 ANY caseDoc
GRAPH "EnterpriseGraph"
OPTIONS { bfs: true, uniqueVertices: "global" }
FILTER v.type IN ["Customer","Product","Feature","User","Document","Policy","Tag"]
LIMIT 200
RETURN { node: v, edge: e }
You can keep RAG integration by having a Documents collection with embedding_id and an external vector index; ArangoSearch can handle full‑text / metadata filters.3736
5. Example LangGraph node projecting subgraphs
This extends the earlier pseudocode and shows: projection, state integration, and context_schema usage.3940
State and context schemas
from typing import TypedDict, List, Literal, Optional, Dict, Any
from typing_extensions import Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langgraph.graph import MessagesState
from langgraph.types import Command
from langgraph.graph.state import Runtime
NodeType = Literal[
"Customer", "User", "Case", "Document", "Product", "Feature", "Tag", "Policy"
]
EdgeType = Literal[
"OWNS",
"USES_FEATURE",
"RAISED",
"ASSIGNED_TO",
"RELATES_TO_CASE",
"RELATES_TO_FEATURE",
"HAS_TAG",
"GOVERNED_BY",
"SIMILAR_TO",
]
class KGNode(TypedDict, total=False):
id: str
type: NodeType
name: str
props: Dict[str, Any]
class KGEdge(TypedDict, total=False):
id: str
type: EdgeType
from_: str
to: str
props: Dict[str, Any]
class KGSubgraph(TypedDict):
nodes: List[KGNode]
edges: List[KGEdge]
class AgentState(TypedDict):
messages: list
active_case_id: Optional[str]
query: Optional[str]
context_graph: Optional[KGSubgraph]
class Context(TypedDict):
max_nodes: int
max_depth: int
Subgraph‑projection node
class KGClient:
def neighborhood(self, node_id: str, depth: int, max_nodes: int) -> KGSubgraph:
"""Backend-specific implementation (Arango, Neo4j, SQL)."""
...
def semantic_docs_for_case(self, case_id: str, k: int) -> KGSubgraph:
"""Vector search + KG joins."""
...
kg_client = KGClient()
def build_context_graph(
state: AgentState,
runtime: Runtime[Context],
) -> AgentState:
ctx = runtime.context
max_nodes = ctx.get("max_nodes", 128)
max_depth = ctx.get("max_depth", 2)
case_id = state.get("active_case_id")
query = state.get("query")
nodes: Dict[str, KGNode] = {}
edges: Dict[str, KGEdge] = {}
if case_id:
ego = kg_client.neighborhood(
node_id=case_id,
depth=max_depth,
max_nodes=max_nodes,
)
for n in ego["nodes"]:
nodes[n["id"]] = n
for e in ego["edges"]:
edges[e["id"]] = e
docs = kg_client.semantic_docs_for_case(case_id=case_id, k=10)
for n in docs["nodes"]:
nodes[n["id"]] = n
for e in docs["edges"]:
edges[e["id"]] = e
# TODO: query-based projection when no case_id
state["context_graph"] = {
"nodes": list(nodes.values())[:max_nodes],
"edges": list(edges.values())[: max_nodes * 2],
}
return state
Wiring into a multi‑agent workflow
Supervisor routes to workers; all agents see the same context_graph on state.41424344
from typing import Literal
def support_agent(state: AgentState) -> Command[Literal["security_agent", END]]:
# serialize context graph into prompt
ctx_graph = state.get("context_graph") or {"nodes": [], "edges": []}
system_instructions = (
"You are a support agent. Use the following context graph of entities and "
"relations to answer. Do not invent entities not present in the graph.\n"
f"CONTEXT_GRAPH_JSON:\n{ctx_graph}"
)
# call your model here; omitted
...
return Command(
goto="security_agent",
update={"messages": state["messages"] + ["support reply"]},
)
def security_agent(state: AgentState) -> Command[Literal[END]]:
# use context_graph to enforce policy: e.g., disallow PHI for non-compliant customers
ctx_graph = state.get("context_graph") or {"nodes": [], "edges": []}
# run checks over ctx_graph before finalizing response
...
return Command(
goto=END,
update={"messages": state["messages"] + ["security checked"]},
)
builder = StateGraph(AgentState, context_schema=Context)
builder.add_node("build_context", build_context_graph)
builder.add_node("support_agent", support_agent)
builder.add_node("security_agent", security_agent)
builder.add_edge(START, "build_context")
builder.add_edge("build_context", "support_agent")
builder.add_edge("support_agent", "security_agent")
graph = builder.compile()
# Example invocation with runtime context (context_schema)
result = graph.invoke(
{
"messages": [],
"active_case_id": "CASE-123",
"query": "Customer cannot access feature X in region Y",
"context_graph": None,
},
context={"max_nodes": 128, "max_depth": 2},
)
This uses context_schema for immutable run‑time parameters (budgets, backends, toggles) and state for evolving context_graph and messages.454039
6. Task‑specific subgraph projection techniques
You can define several projection modes and select them via context or supervisor policy:
- Anchor‑centric: ego network around
Case,Customer,Product. Good for incident handling. - Query‑semantic: vector search on docs, then expand to nearest entities in KG (lazy graph building / LazyGraphRAG style).4637
- Policy‑first: start from
PolicyandRolenodes, then intersect with customer/product neighborhood; use for strong governance. - Historical trace: project only nodes/edges with
since >= last_30_daysto bias toward recent context.
Implementation‑wise you can expose a projection_mode in Context and branch inside build_context_graph.
7. Stronger follow‑up prompts you can reuse
You can paste any of these as next‑step questions:
- “Design a governed context graph specification for a healthcare SaaS (HIPAA), including node/edge types for PHI, policies, and audit trails, and show how to enforce it in a LangGraph multi‑agent workflow.”
- “Given a Neo4j‑backed enterprise KG, write complete Python code for a LangGraph node that runs a Cypher query to build a task‑specific context graph and injects it into an LLM prompt for RAG.”
- “For ArangoDB Community Edition, implement
KGClient.neighborhoodandKGClient.semantic_docs_for_caseusing AQL, including indices and collection design optimized for GraphRAG.” - “Define a JSON serialization format and prompt template that lets an LLM reason reliably over a
context_graph(nodes, edges, props) while minimizing tokens, and show an example multi‑turn dialogue.” - “Extend the context‑graph pattern to a supervisor/worker multi‑agent design: specify how the supervisor modifies
context_graphbetween agents to add tasks, constraints, and intermediate conclusions.”
Stronger follow‑up prompts you can reuse
You can paste any of these as next‑step questions: “Design a governed context graph specification for a healthcare SaaS (HIPAA), including node/edge types for PHI, policies, and audit trails, and show how to enforce it in a LangGraph multi‑agent workflow.” “Given a Neo4j‑backed enterprise KG, write complete Python code for a LangGraph node that runs a Cypher query to build a task‑specific context graph and injects it into an LLM prompt for RAG.” “For ArangoDB Community Edition, implement KGClient.neighborhood and KGClient.semantic_docs_for_case using AQL, including indices and collection design optimized for GraphRAG.” “Define a JSON serialization format and prompt template that lets an LLM reason reliably over a context_graph (nodes, edges, props) while minimizing tokens, and show an example multi‑turn dialogue.” “Extend the context‑graph pattern to a supervisor/worker multi‑agent design: specify how the supervisor modifies context_graph between agents to add tasks, constraints, and intermediate conclusions.”
LangGraph multi-agent collaboration examples Code tutorial for StateGraph with checkpointer Context graphs in production AI systems Differences between LangGraph and Neo4j knowledge graphs Scaling governed context graphs for enterprise
Here are tighter, higher‑leverage follow‑up prompts you can paste directly, tuned to your list and to enterprise / multi‑agent work.
Governance, KG vs context graph, production
- Governed context graph for HIPAA SaaS
“Design a governed context graph specification for a HIPAA‑regulated healthcare SaaS: define concrete node/edge types for Patients, Encounters, PHI fields, Policies, Roles, and AuditEvents; show how this graph constrains a LangGraph multi‑agent workflow (intake agent, clinical‑support agent, compliance agent) so that PHI never leaves allowed channels and every decision is auditable.”55565758 2. Context graphs in production AI systems
“Using real‑world patterns from Glean’s enterprise graph and modern ‘context layer’ architectures, outline how a governed context graph sits between raw data sources and AI agents in a large enterprise (identity, content, activity, policies). Specify required SLAs, observability, and governance controls to run this in production for search + RAG assistants across multiple business units.”5657585960 3. Scaling governed context graphs
“Propose an architecture to scale governed context graphs to 10k+ concurrent agent sessions in a large enterprise: detail how you separate the durable enterprise KG from per‑task context graphs, how you enforce row/edge‑level permissions, and how you index context graphs for post‑hoc decision forensics and policy‑violation detection.”58616256 4. Knowledge graphs vs context graphs for AI agents
“Create a technical comparison (with tables) of knowledge graphs vs governed context graphs for AI agents, focusing on: schema design, update cadence, governance model, role in RAG, and impact on agent reliability. Include examples from enterprise tools like Glean and context‑layer architectures described by Atlan or similar vendors.”5759605658
LangGraph multi‑agent, context_graph, and checkpointer
- Multi‑agent collaboration with context_graph
“Using LangGraph’s multi‑agent patterns (supervisor and peer collaboration), design a three‑agent system (research, reasoning, and policy‑check) that all read/write a shared context_graph field in state. Show the full Python StateGraph definition, the AgentState schema, and how the supervisor modifies context_graph between agents to add tasks, constraints, and intermediate conclusions.”63646555
6. StateGraph with checkpointer for decision forensics
“Write a complete code tutorial that builds a LangGraph StateGraph with a SQLite or Couchbase checkpointer, focused on replaying and inspecting context_graph evolution over time. Demonstrate how to time‑travel to a specific step, dump the context graph at that step, and relate it to model actions for debugging non‑deterministic multi‑agent behaviour.”66676869
7. LangGraph vs Neo4j knowledge graphs
“Explain the conceptual and practical differences between LangGraph (as an execution/state graph) and Neo4j (as a knowledge graph database). Then design an architecture where LangGraph nodes query Neo4j to build task‑specific context graphs, and describe how you’d trace from a LangGraph execution to the underlying Neo4j entities and relationships used.”70715557
Backend‑specific KGClient implementations (ArangoDB, Neo4j, relational)
- Neo4j‑backed KG + LangGraph node
“Given a Neo4j‑backed enterprise KG with labels Customer, Case, Document, Product, Feature, Policy, and Tag, write complete Python code for a KGClient plus a LangGraph node that: (1) runs Cypher to pull a 2‑hop ego network around a Case, (2) caps to N nodes, (3) returns a context_graph JSON object, and (4) injects that JSON into an LLM RAG prompt in a downstream node.”65725570
9. ArangoDB Community Edition GraphRAG client
“For ArangoDB Community Edition, design collections and indices for Customers, Cases, Documents, Policies, Tags, and typed edge collections (e.g., raisedEdges, governedByEdges). Then implement KGClient.neighborhood and KGClient.semantic_docs_for_case in Python using AQL, optimized for GraphRAG patterns (including AQL examples and hints for performance).”6273
10. Relational implementation with edge tables
“Starting from a relational schema with node tables (customers, cases, documents, policies, tags) and a generic edges table, implement KGClient.neighborhood in Python using SQL (for PostgreSQL). Show how to: (1) run a bounded BFS from an anchor case, (2) assemble a KGSubgraph in memory, and (3) serialize it into a compact context_graph attached to LangGraph state.”686662
JSON serialization, prompts, and dialogue
- JSON context_graph format + prompt template
“Define a compact JSON schema for context_graph (nodes, edges, props) tailored for LLM reasoning under tight token budgets. Then write a system prompt template that teaches the model how to use this graph for chain‑of‑thought reasoning without hallucinating entities. Finally, show a multi‑turn dialogue where the model updates its plan based on changes to context_graph between turns.”55635758
12. Task‑specific subgraph projection techniques
“Enumerate and implement three task‑specific subgraph projection strategies—anchor‑centric, query‑semantic, and policy‑first—and show how each is exposed as a projection_mode parameter in LangGraph’s context_schema. Provide example Python code that switches projection logic inside build_context_graph based on this mode.”74755855
Context graphs in enterprise AI
- Enterprise AI examples using KGs and context graphs
“Survey 3–5 real or representative enterprise AI applications that use knowledge graphs (e.g., Glean, enterprise search, customer 360) and show how they could evolve to governed context graphs for agentic workflows. For each, outline entities, relationships, governance needs, and how a context graph would be constructed per request.”5960565758
If you tell me which of these you want to run first, I can expand it straight into code + diagrams tailored to your current stack (GCP + ArangoDB/Neo4j/Postgres + LangGraph).
Implement this context graph spec using LangGraph with checkpointer for persistence
Add multi-agent workflow for patient data queries in the healthcare graph Compare governed context graph vs knowledge graph for HIPAA healthcare use Example JSON schema for the healthcare context graph nodes and edges Extend the spec with governance metrics and audit trail examples YES TO ALL which of these you want to run first, I can expand it straight into code + diagrams tailored to your current stack (GCP + ArangoDB/Neo4j/Postgres + LangGraph).
Below is a compact “YES TO ALL” implementation sketch you can extend. I’ll keep it concrete and healthcare‑specific.
1) Healthcare context graph vs knowledge graph (HIPAA focus)
- Enterprise healthcare knowledge graph: durable, governed ontology and facts about Providers, Patients, Encounters, Diagnoses, Orders, Systems, Policies, etc.; supports lineage, semantic consistency, and compliance reporting.79808182
- Governed context graph: small, per‑task graph to answer “what is relevant and allowed for this patient query, for this clinician, under this policy, right now?”, including PHI nodes and audit events tied to each agent step.838479
Key HIPAA‑specific differences:858687
- KG = long‑term semantic + data governance backbone (who the patient is, what happened, policy set and lineage).
- Context graph = short‑lived access + decision graph (who accessed what PHI, for what purpose, under which role/policy, and what the agent did with it).
2) JSON schema for the healthcare context graph
Minimal, PHI‑aware context_graph JSON:
{
"nodes": [
{
"id": "patient:123",
"type": "Patient",
"name": "REDACTED",
"props": {
"mrn": "MRN-12345",
"age": 54,
"isPHI": true
}
},
{
"id": "encounter:789",
"type": "Encounter",
"name": "ED visit 2026-01-11",
"props": {
"encounterType": "ED",
"start": "2026-01-11T12:34:00Z",
"end": "2026-01-11T16:22:00Z",
"isPHI": true
}
},
{
"id": "policy:hipaa-minimum-necessary",
"type": "Policy",
"name": "HIPAA Minimum Necessary",
"props": {
"reg": "45 CFR 164.514(d)",
"link": "https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html",
"isPHI": false
}
},
{
"id": "audit:step-5",
"type": "AuditEvent",
"name": "AgentAccess",
"props": {
"agentId": "clinical-assistant",
"userId": "clinician:456",
"timestamp": "2026-02-03T04:31:00Z",
"action": "READ",
"targetNodeIds": ["patient:123", "encounter:789"],
"purposeOfUse": "TREATMENT",
"isPHI": false
}
}
],
"edges": [
{
"id": "e1",
"type": "HAS_ENCOUNTER",
"from": "patient:123",
"to": "encounter:789",
"props": {}
},
{
"id": "e2",
"type": "GOVERNED_BY",
"from": "encounter:789",
"to": "policy:hipaa-minimum-necessary",
"props": {}
},
{
"id": "e3",
"type": "RECORDED_IN",
"from": "audit:step-5",
"to": "encounter:789",
"props": {}
}
]
}
Core node types: Patient, Encounter, Condition, Medication, Observation, Document, User, Role, Policy, AuditEvent.
Core edge types: HAS_ENCOUNTER, HAS_CONDITION, HAS_MEDICATION, HAS_OBSERVATION, HAS_DOCUMENT, HAS_ROLE, GOVERNED_BY, RECORDED_IN, PERMITTED_ACTION, DENIED_ACTION.
3) LangGraph implementation with checkpointer (Postgres)
Using a Postgres checkpointer for persistence and replay.888990
State + context schema
from typing import TypedDict, List, Literal, Optional, Dict, Any
from langgraph.graph import StateGraph, START, END
from langgraph.types import Command
from langgraph.graph.state import Runtime
from langgraph.checkpoint.postgres import PostgresSaver
from psycopg_pool import ConnectionPool
NodeType = Literal[
"Patient", "Encounter", "Condition", "Medication",
"Observation", "Document", "User", "Role", "Policy", "AuditEvent"
]
EdgeType = Literal[
"HAS_ENCOUNTER", "HAS_CONDITION", "HAS_MEDICATION",
"HAS_OBSERVATION", "HAS_DOCUMENT", "HAS_ROLE",
"GOVERNED_BY", "RECORDED_IN", "PERMITTED_ACTION", "DENIED_ACTION"
]
class KGNode(TypedDict, total=False):
id: str
type: NodeType
name: str
props: Dict[str, Any]
class KGEdge(TypedDict, total=False):
id: str
type: EdgeType
from_: str
to: str
props: Dict[str, Any]
class KGSubgraph(TypedDict):
nodes: List[KGNode]
edges: List[KGEdge]
class AgentState(TypedDict):
messages: list
patient_id: Optional[str]
clinician_id: Optional[str]
query: Optional[str]
context_graph: Optional[KGSubgraph]
governance_metrics: Dict[str, Any]
class Context(TypedDict):
max_nodes: int
max_depth: int
projection_mode: Literal["anchor", "semantic", "policy_first"]
Checkpointer setup (Postgres)
DB_URI = "postgresql://user:pass@host:5432/langgraph?sslmode=require"
pool = ConnectionPool(conninfo=DB_URI, max_size=10)
with pool.connection() as conn:
checkpointer = PostgresSaver(conn)
checkpointer.setup()
4) Multi‑agent workflow for patient data queries
Three agents: intake_agent, clinical_agent, compliance_agent. Supervisor is implicit via edges.919293
KG client (backend‑agnostic interface)
class HealthcareKGClient:
def neighborhood(
self,
patient_id: str,
depth: int,
max_nodes: int,
) -> KGSubgraph:
"""Backend-specific (ArangoDB / Neo4j / Postgres)."""
...
def semantic_docs_for_patient(
self, patient_id: str, k: int
) -> KGSubgraph:
"""RAG over guidelines, SOPs, patient-doc links."""
...
kg_client = HealthcareKGClient()
Context‑graph builder node (governed projection)
def build_context_graph(
state: AgentState,
runtime: Runtime[Context],
) -> AgentState:
ctx = runtime.context
patient_id = state.get("patient_id")
if not patient_id:
return state
max_nodes = ctx.get("max_nodes", 128)
max_depth = ctx.get("max_depth", 2)
nodes: Dict[str, KGNode] = {}
edges: Dict[str, KGEdge] = {}
# 1) Structural neighborhood for patient
ego = kg_client.neighborhood(patient_id, depth=max_depth, max_nodes=max_nodes)
for n in ego["nodes"]:
nodes[n["id"]] = n
for e in ego["edges"]:
edges[e["id"]] = e
# 2) Semantic expansion (guidelines, SOPs, etc.)
docs = kg_client.semantic_docs_for_patient(patient_id, k=10)
for n in docs["nodes"]:
nodes[n["id"]] = n
for e in docs["edges"]:
edges[e["id"]] = e
# Governance metrics (simple examples)
phi_nodes = [n for n in nodes.values() if n.get("props", {}).get("isPHI")]
policies = [n for n in nodes.values() if n["type"] == "Policy"]
state["context_graph"] = {
"nodes": list(nodes.values())[:max_nodes],
"edges": list(edges.values())[: max_nodes * 2],
}
state["governance_metrics"] = {
"phi_node_count": len(phi_nodes),
"policy_count": len(policies),
}
return state
Agents
def intake_agent(state: AgentState) -> AgentState:
# Clarify query, set patient_id / clinician_id, append to messages.
...
return state
def clinical_agent(state: AgentState) -> AgentState:
ctx_graph = state.get("context_graph") or {"nodes": [], "edges": []}
system_prompt = (
"You are a clinical decision-support assistant. "
"Use ONLY facts in CONTEXT_GRAPH to answer. Do NOT reveal direct identifiers.\n"
f"CONTEXT_GRAPH_JSON:\n{ctx_graph}"
)
# Call LLM with system_prompt + state["messages"]; update messages.
...
return state
def compliance_agent(state: AgentState) -> AgentState:
ctx_graph = state.get("context_graph") or {"nodes": [], "edges": []}
metrics = state.get("governance_metrics", {})
# Example: block if PHI nodes exist and no Policy nodes guarding them.
phi_cnt = metrics.get("phi_node_count", 0)
pol_cnt = metrics.get("policy_count", 0)
if phi_cnt > 0 and pol_cnt == 0:
state["messages"].append(
"Compliance check: response blocked due to missing policies for PHI access."
)
else:
state["messages"].append("Compliance check: response approved.")
# Optionally append an AuditEvent node to context_graph.
...
return state
Wiring with checkpointer
builder = StateGraph(AgentState, context_schema=Context)
builder.add_node("intake", intake_agent)
builder.add_node("build_context", build_context_graph)
builder.add_node("clinical", clinical_agent)
builder.add_node("compliance", compliance_agent)
builder.add_edge(START, "intake")
builder.add_edge("intake", "build_context")
builder.add_edge("build_context", "clinical")
builder.add_edge("clinical", "compliance")
builder.add_edge("compliance", END)
graph = builder.compile(checkpointer=checkpointer)
Invocation:
thread = {"configurable": {"thread_id": "pt-123-q-1"}}
result = graph.invoke(
{
"messages": [{"role": "user", "content": "Why was my ACE inhibitor stopped?"}],
"patient_id": "patient:123",
"clinician_id": "clinician:456",
"query": "medication change explanation",
"context_graph": None,
"governance_metrics": {},
},
config=thread,
context={"max_nodes": 128, "max_depth": 2, "projection_mode": "anchor"},
)
You can later replay / inspect the context_graph from checkpoints via the checkpointer API.94899088
5) Governance metrics and audit trail examples
Examples of metrics you can compute per run:86877985
phi_node_count: number of PHI‑tagged nodes in the context graph.un-governed_phi_nodes: PHI nodes without aGOVERNED_BY -> Policyedge.agent_actions_per_policy: how many actions were taken under each policy (e.g.,hipaa-minimum-necessary).access_log_coverage: ratio of PHI nodes in context graph that also appear inAuditEvent.targetNodeIds.
Audit trail pattern (conceptual):
- Every time an agent reads or writes PHI, add an
AuditEventnode with edgesRECORDED_IN(to Encounter or Document) andGOVERNED_BY(to Policy), containingagentId,userId,timestamp,action,purposeOfUse.878586 - Persist checkpoints so you can replay the evolution of
context_graphand correlate it with AuditEvent nodes and external logs (EHR, FHIR server).
If you tell me which backend you want first (ArangoDB vs Neo4j vs Postgres) I can next give you a fully concrete HealthcareKGClient implementation (AQL, Cypher, or SQL) that populates this healthcare context_graph for the LangGraph workflow above.
Enterprise context graph schema for clinic patient records
An enterprise context graph schema for clinic patient records is a constrained, FHIR‑aligned slice of a broader clinical KG, focused on what a given agent/user is allowed to see and act on for one patient and episode of care.99100101
Core design principles
- Standards‑anchored: align node/edge types with FHIR resources (Patient, Encounter, Observation, MedicationStatement, Condition, Practitioner) and major vocabularies (SNOMED CT, LOINC, RxNorm, ICD‑10).10110299
- Patient‑episode centric: graph is anchored on a Patient and one or more Encounters/Visits, representing a clinical trajectory rather than the whole enterprise.100101
- Governance‑forward: embed Policy, Consent, and AuditEvent nodes, with
GOVERNED_BY/RECORDED_INedges, to support HIPAA auditability and minimum‑necessary access.10310499100
Node types
You can treat this as a minimal typed schema for the context graph (not the full enterprise KG).
- Patient (FHIR Patient)
- Key props:
mrn,age,sex,deidentifiedId,isPHI=true.
- Key props:
- Encounter (FHIR Encounter / Visit_occurrence in OMOP)100101
- Props:
encounterType,start,end,locationId,serviceType.
- Props:
- Condition (FHIR Condition, ICD‑10/SNOMED‑coded)99101
- Props:
codeSystem,code,display,onsetDate,clinicalStatus.
- Props:
- Medication (FHIR MedicationStatement/Drug_exposure, RxNorm‑coded)10210199
- Props:
rxnormCode,name,dose,route,start,end,status.
- Props:
- Observation (FHIR Observation, LOINC‑coded)10110299
- Props:
loincCode,name,value,unit,referenceRange,effectiveTime.
- Props:
- Procedure (FHIR Procedure)102101
- Props:
codeSystem,code,display,performedTime.
- Props:
- Document (clinical note, guideline, SOP)
- Props:
docType,pathOrId,embeddingId,isPHI.
- Props:
- Practitioner (FHIR Practitioner)101
- Props:
npi,role,department.
- Props:
- User (application user or service principal)
- Props:
userType,externalId,associatedPractitionerId.
- Props:
- Role (RBAC roles: “Attending”, “Resident”, “Billing”, “Researcher”)
- Props:
purposeOfUse,riskTier.
- Props:
- Policy (HIPAA / local policies)10410399
- Props:
regCitation,name,link,scope(“TREATMENT”, “PAYMENT”, “OPERATIONS”).
- Props:
- Consent (FHIR Consent)
- Props:
patientId,status,scope,categories,provision.
- Props:
- AuditEvent (FHIR AuditEvent / HIPAA audit trail)105106107
- Props:
agentId,userId,timestamp,action,targets,purposeOfUse,outcome.
- Props:
Each node includes generic properties like id, type, name, createdAt, updatedAt, and isPHI where relevant.99101
Edge types
Edges encode clinical relationships, user‑to‑data access, and governance.
- Patient‑centric clinical edges10099101
HAS_ENCOUNTER: Patient → EncounterHAS_CONDITION: Encounter → ConditionHAS_MEDICATION: Encounter → MedicationHAS_OBSERVATION: Encounter → ObservationHAS_PROCEDURE: Encounter → ProcedureHAS_DOCUMENT: Encounter → Document or Patient → Document
- Provider and user relations99100101
PERFORMED_BY: Encounter/Procedure → PractitionerAUTHORED_BY: Document/Observation → PractitionerHAS_ROLE: User → Role
- Governance and consent10310499
GOVERNED_BY: Patient/Encounter/Document → PolicyPERMITTED_ACTION: Role → Policy (what actions role may take under this policy)DENIED_ACTION: Role → Policy (explicit prohibitions)HAS_CONSENT: Patient → ConsentCONSENT_GOVERNED_BY: Consent → Policy
- Audit and access trails106107105
RECORDED_IN: AuditEvent → Encounter/Patient/Document (what access relates to)ACTED_BY: AuditEvent → User or Practitioner
You can optionally include SIMILAR_TO between Conditions, Medications, or Documents for GraphRAG‑style expansion.108109110
Example JSON snippet (context graph for one query)
{
"nodes": [
{
"id": "patient:123",
"type": "Patient",
"name": "REDACTED",
"props": {
"mrn": "MRN-12345",
"age": 54,
"sex": "female",
"isPHI": true
}
},
{
"id": "encounter:789",
"type": "Encounter",
"name": "Inpatient 2026-01-10",
"props": {
"encounterType": "inpatient",
"start": "2026-01-10T09:32:00Z",
"end": "2026-01-14T15:21:00Z",
"locationId": "ward:cardiology",
"isPHI": true
}
},
{
"id": "condition:ckd-3a",
"type": "Condition",
"name": "Chronic kidney disease, stage 3a",
"props": {
"codeSystem": "ICD-10",
"code": "N18.31",
"display": "Chronic kidney disease, stage 3a",
"onsetDate": "2024-11-02",
"clinicalStatus": "active",
"isPHI": true
}
},
{
"id": "med:ace-inhibitor",
"type": "Medication",
"name": "Lisinopril 10 mg",
"props": {
"rxnormCode": "29046",
"dose": "10 mg",
"route": "oral",
"start": "2024-12-01",
"end": "2026-01-10",
"status": "stopped",
"isPHI": true
}
},
{
"id": "policy:hipaa-minimum-necessary",
"type": "Policy",
"name": "HIPAA Minimum Necessary",
"props": {
"regCitation": "45 CFR 164.514(d)",
"scope": ["TREATMENT"],
"link": "https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html",
"isPHI": false
}
},
{
"id": "consent:de-identified-research",
"type": "Consent",
"name": "De-identified data for research",
"props": {
"status": "active",
"scope": "RESEARCH",
"categories": ["de-identified"],
"isPHI": false
}
},
{
"id": "audit:step-12",
"type": "AuditEvent",
"name": "ClinicalAgentRead",
"props": {
"agentId": "clinical-agent",
"userId": "user:clinician-456",
"timestamp": "2026-02-03T04:33:00Z",
"action": "READ",
"targetNodeIds": ["patient:123", "encounter:789", "med:ace-inhibitor"],
"purposeOfUse": "TREATMENT",
"outcome": "success",
"isPHI": false
}
}
],
"edges": [
{
"id": "e1",
"type": "HAS_ENCOUNTER",
"from": "patient:123",
"to": "encounter:789",
"props": {}
},
{
"id": "e2",
"type": "HAS_CONDITION",
"from": "encounter:789",
"to": "condition:ckd-3a",
"props": {}
},
{
"id": "e3",
"type": "HAS_MEDICATION",
"from": "encounter:789",
"to": "med:ace-inhibitor",
"props": {}
},
{
"id": "e4",
"type": "GOVERNED_BY",
"from": "encounter:789",
"to": "policy:hipaa-minimum-necessary",
"props": {}
},
{
"id": "e5",
"type": "HAS_CONSENT",
"from": "patient:123",
"to": "consent:de-identified-research",
"props": {}
},
{
"id": "e6",
"type": "RECORDED_IN",
"from": "audit:step-12",
"to": "encounter:789",
"props": {}
}
]
}
This is the context graph your agents would receive for a single patient query: small, patient‑episode‑scoped, PHI‑annotated, and explicitly governed by policies and consents.10010199
Suggested next step prompt
If you want to push this into implementation, you can paste:
“Implement this enterprise clinic context graph schema on top of a FHIR/OMOP‑aligned clinical knowledge graph:
- show how FHIR Patient/Encounter/Observation/MedicationStatement map into the node/edge types above,
- design AQL or Cypher queries that build a per‑patient, per‑encounter context graph, and
- integrate it into a LangGraph multi‑agent workflow with a checkpointer, including how AuditEvent nodes are appended at each agent step.”
which backend you want first (ArangoDB vs Neo4j vs Postgres) I can next give you a fully concrete HealthcareKGClient implementation (AQL, Cypher, or SQL) that populates this healthcare context_graph for the LangGraph workflow above. Sample code for PostgresSaver checkpointer setup in LangGraph
Multi-agent roles for healthcare query workflow HIPAA compliance checklist for patient data graphs Deploy LangGraph healthcare agent with Redis checkpointing Enterprise context graph schema for clinic patient records Sample code for PostgresSaver checkpointer setup in LangGraph Multi-agent roles for healthcare query workflow HIPAA compliance checklist for patient data graphs Deploy LangGraph healthcare agent with Redis checkpointing. “Implement this enterprise clinic context graph schema on top of a FHIR/OMOP‑aligned clinical knowledge graph: show how FHIR Patient/Encounter/Observation/MedicationStatement map into the node/edge types above, design AQL or Cypher queries that build a per‑patient, per‑encounter context graph, and integrate it into a LangGraph multi‑agent workflow with a checkpointer, including how AuditEvent nodes are appended at each agent step.” Related How to implement this schema with FHIR and OMOP mappings Example entities and edges for clinic patient graph Privacy and governance best practices for healthcare graphs GraphRAG applications for patient journey analysis LangGraph implementation of clinic context graph Enterprise context graph schema for clinic patient records
Use Postgres first for the durable checkpointer and OMOP‑style tabular mapping, then layer ArangoDB/Neo4j if/when you want native graph ops. Below are targeted snippets for each item.
1) Sample PostgresSaver checkpointer setup (LangGraph)
from psycopg_pool import ConnectionPool
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.graph import StateGraph, START, END
# 1) Connection + checkpointer
DB_URI = "postgresql://user:pass@host:5432/langgraph?sslmode=require"
pool = ConnectionPool(conninfo=DB_URI, max_size=10)
with pool.connection() as conn:
checkpointer = PostgresSaver(conn)
checkpointer.setup() # creates tables if not exist
# 2) Define state
from typing import TypedDict, Optional, Dict, Any, List
class AgentState(TypedDict):
messages: List[Dict[str, Any]]
patient_id: Optional[str]
clinician_id: Optional[str]
query: Optional[str]
context_graph: Optional[Dict[str, Any]]
governance_metrics: Dict[str, Any]
# 3) Build graph and compile with checkpointer
def dummy_node(state: AgentState) -> AgentState:
return state
builder = StateGraph(AgentState)
builder.add_node("dummy", dummy_node)
builder.add_edge(START, "dummy")
builder.add_edge("dummy", END)
graph = builder.compile(checkpointer=checkpointer)
Checkpointing docs and patterns:117118119120
2) Multi‑agent roles for healthcare query workflow
Typical roles over the clinic context graph:121122123
- Intake agent
- Normalize user question, identify patient/episode, detect intent (med change, lab interpretation, discharge question).
- Sets
patient_id,clinician_id,queryin state.
- Context builder agent (tool node)
- Calls
HealthcareKGClientto buildcontext_graphfor that patient and encounter (per‑patient, per‑episode subgraph).
- Calls
- Clinical reasoning agent
- Uses
context_graphto explain decisions (e.g., “Why was ACE inhibitor stopped?”), referencing Conditions, Medications, Observations.
- Uses
- Compliance / privacy agent
- Supervisor (optional)
3) HIPAA compliance checklist for patient data graphs
Non‑exhaustive but practical list tied to graph design:122125126124
- Identify PHI and tag explicitly
- Mark nodes and properties (
isPHI,phiCategory) for all identifiers and quasi‑identifiers.
- Mark nodes and properties (
- Minimum necessary principle
- Ensure per‑task context graphs only include nodes needed for that task (episode‑scoped, small neighborhoods).
- Access control and role‑based edges
- Represent roles and policies explicitly (
User–HAS_ROLE–Role,Role–PERMITTED_ACTION–Policy).
- Represent roles and policies explicitly (
- Consent and purpose‑of‑use
- Include
Consentnodes andpurposeOfUsefields inAuditEventaligned with HIPAA TPO (Treatment/Payment/Operations).
- Include
- Audit trails
- For each agent action touching PHI, create an
AuditEventnode withACTED_BY,RECORDED_IN,GOVERNED_BYedges.
- For each agent action touching PHI, create an
- Data retention and deletion
- Track
createdAt,updatedAt, and retention policy indicators per node; support archival / purge workflows.
- Track
- De‑identification / pseudonymization
- Maintain mappings between MRN and deidentified IDs outside the graph used by agents, or segregated into a higher‑security KG/DB.
4) Deploying LangGraph healthcare agent with Redis checkpointing
For less heavy deployment (or dev/test) you can swap Postgres for Redis.120130
from langgraph.checkpoint.redis import RedisSaver
from redis import Redis
from langgraph.graph import StateGraph, START, END
redis_client = Redis(host="redis", port=6379, db=0)
checkpointer = RedisSaver(redis_client)
builder = StateGraph(AgentState)
builder.add_node("intake", intake_agent)
builder.add_edge(START, "intake")
builder.add_edge("intake", END)
graph = builder.compile(checkpointer=checkpointer)
# Invoke with a thread id for replay
config = {"configurable": {"thread_id": "pt-123"}}
result = graph.invoke(
{
"messages": [{"role": "user", "content": "Why was my ACE inhibitor stopped?"}],
"patient_id": "patient:123",
"clinician_id": "clinician:456",
"query": "medication change explanation",
"context_graph": None,
"governance_metrics": {},
},
config=config,
)
Redis integration guidance:130120
5) FHIR / OMOP mappings into clinic context graph
High‑level mapping from FHIR/OMOP into the node types we defined:123131132133
- Patient node
- FHIR:
Patient(id, identifiers, birthDate, gender) →Patientnode props:mrn,age,sex,deidentifiedId,isPHI. - OMOP:
person(person_id, gender_concept_id, year_of_birth) → same node.
- FHIR:
- Encounter node
- FHIR:
Encounter→Encounternode props:encounterType,start,end,locationId. - OMOP:
visit_occurrence→ same node; mapvisit_concept_idto type.
- FHIR:
- Condition node
- FHIR:
Condition→Conditionnode props:codeSystem,code,display,clinicalStatus,onsetDate. - OMOP:
condition_occurrence.
- FHIR:
- Medication node
- FHIR:
MedicationStatement/MedicationRequest→Medicationnode props:rxnormCode,name,dose,route,status. - OMOP:
drug_exposure.
- FHIR:
- Observation node
- FHIR:
Observation→Observationnode withloincCode,value,unit,effectiveTime. - OMOP:
measurement.
- FHIR:
Edges like HAS_ENCOUNTER, HAS_CONDITION, HAS_MEDICATION, HAS_OBSERVATION correspond to FHIR references (Encounter.subject, Condition.encounter, MedicationStatement.subject, Observation.encounter) or OMOP foreign keys (e.g., person_id, visit_occurrence_id).131132123
6) Example entities and edges for clinic patient graph
You can view this as the enterprise context graph schema you already started:121122123
- Entities:
Patient,Encounter,Condition,Medication,Observation,Procedure,Document,Practitioner,User,Role,Policy,Consent,AuditEvent. - Edges:
HAS_ENCOUNTER,HAS_CONDITION,HAS_MEDICATION,HAS_OBSERVATION,HAS_PROCEDURE,HAS_DOCUMENT,PERFORMED_BY,AUTHORED_BY,HAS_ROLE,GOVERNED_BY,HAS_CONSENT,CONSENT_GOVERNED_BY,RECORDED_IN,ACTED_BY.
This schema is consistent with FHIR‑oriented clinical KGs and cohort‑building graphs in the literature.134123131121
7) LangGraph implementation of clinic context graph (Postgres backend)
A minimal HealthcareKGClient over a relational (OMOP‑ish) model that builds a per‑patient, per‑encounter graph (pseudo‑SQL, not optimized):
import psycopg
from typing import Dict, Any, List
class HealthcareKGClient:
def __init__(self, dsn: str):
self.dsn = dsn
def _fetchall(self, query: str, params: tuple) -> List[Dict[str, Any]]:
with psycopg.connect(self.dsn) as conn:
with conn.cursor() as cur:
cur.execute(query, params)
cols = [c.name for c in cur.description]
return [dict(zip(cols, row)) for row in cur.fetchall()]
def patient_context(self, patient_id: str, max_nodes: int = 128) -> Dict[str, Any]:
# Example: join person, visit_occurrence, condition_occurrence, drug_exposure, measurement
# Build nodes/edges from rows; omitted for brevity.
nodes: Dict[str, Dict[str, Any]] = {}
edges: List[Dict[str, Any]] = []
# 1) patient
person_rows = self._fetchall(
"""
SELECT person_id, year_of_birth, gender_concept_id
FROM person
WHERE person_id = %s
""",
(patient_id,),
)
# build Patient node, then add visits, conditions, etc.
return {"nodes": list(nodes.values())[:max_nodes], "edges": edges}
You then call this from build_context_graph in the LangGraph node as we sketched earlier.
8) GraphRAG applications for patient journey analysis
You can use the same context graph to support GraphRAG tasks such as:135136134
- “Summarize this patient’s journey around diagnosis X over the last N encounters.”
- “Find similar patients with comparable conditions, meds, and lab trajectories.”
- “Explain treatment changes in light of guidelines (documents) linked to the graph.”
This is where SIMILAR_TO edges (patients/encounters/documents) and cohort‑level KGs from EHR data become valuable.123134121
If you want next, I can either:
- Write a concrete Postgres SQL schema + full
HealthcareKGClient.patient_contextimplementation, or - Show a Cypher or AQL variant for the same context graph projection, using FHIR/OMOP mappings.
Write a concrete Postgres SQL schema + full HealthcareKGClient.patient_context implementation, or
Show a Cypher or AQL variant for the same context graph projection, using FHIR/OMOP mappings.
Postgres + OMOP is the most generally useful base, so I’ll give you a concrete SQL schema and a full HealthcareKGClient.patient_context for a clinic context graph. You can adapt this to your LangGraph node directly.
1) Concrete Postgres schema (OMOP‑aligned, graph‑friendly)
This is a minimal subset of OMOP CDM tables plus a generic edges table to make graph projection easy.137138139140141142
-- Core OMOP-like tables (highly simplified)
CREATE TABLE person (
person_id BIGINT PRIMARY KEY,
birth_datetime TIMESTAMPTZ,
gender_concept_id INTEGER,
race_concept_id INTEGER
);
CREATE TABLE visit_occurrence (
visit_occurrence_id BIGINT PRIMARY KEY,
person_id BIGINT NOT NULL REFERENCES person(person_id),
visit_concept_id INTEGER NOT NULL,
visit_start_datetime TIMESTAMPTZ NOT NULL,
visit_end_datetime TIMESTAMPTZ,
care_site_id BIGINT
);
CREATE TABLE condition_occurrence (
condition_occurrence_id BIGINT PRIMARY KEY,
person_id BIGINT NOT NULL REFERENCES person(person_id),
condition_concept_id INTEGER NOT NULL,
condition_start_datetime TIMESTAMPTZ,
visit_occurrence_id BIGINT REFERENCES visit_occurrence(visit_occurrence_id)
);
CREATE TABLE drug_exposure (
drug_exposure_id BIGINT PRIMARY KEY,
person_id BIGINT NOT NULL REFERENCES person(person_id),
drug_concept_id INTEGER NOT NULL,
drug_exposure_start_datetime TIMESTAMPTZ,
drug_exposure_end_datetime TIMESTAMPTZ,
visit_occurrence_id BIGINT REFERENCES visit_occurrence(visit_occurrence_id),
route_concept_id INTEGER,
dose_unit_concept_id INTEGER,
stop_reason VARCHAR(50)
);
CREATE TABLE measurement (
measurement_id BIGINT PRIMARY KEY,
person_id BIGINT NOT NULL REFERENCES person(person_id),
measurement_concept_id INTEGER NOT NULL,
measurement_datetime TIMESTAMPTZ,
value_as_number DOUBLE PRECISION,
unit_concept_id INTEGER,
visit_occurrence_id BIGINT REFERENCES visit_occurrence(visit_occurrence_id)
);
-- Clinical notes / documents
CREATE TABLE note (
note_id BIGINT PRIMARY KEY,
person_id BIGINT NOT NULL REFERENCES person(person_id),
visit_occurrence_id BIGINT REFERENCES visit_occurrence(visit_occurrence_id),
note_datetime TIMESTAMPTZ,
note_title TEXT,
note_text TEXT,
embedding_id TEXT
);
-- Governance tables
CREATE TABLE policy (
policy_id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
reg_citation TEXT,
scope TEXT, -- e.g., 'TREATMENT', 'RESEARCH'
link TEXT
);
CREATE TABLE consent (
consent_id BIGINT PRIMARY KEY,
person_id BIGINT NOT NULL REFERENCES person(person_id),
status TEXT, -- 'active', 'revoked'
scope TEXT,
categories TEXT, -- JSON/text list
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
CREATE TABLE app_user (
user_id BIGINT PRIMARY KEY,
external_id TEXT,
user_type TEXT, -- 'clinician', 'admin', 'service'
practitioner_id BIGINT -- nullable, if mapped to clinical provider
);
CREATE TABLE user_role (
role_id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
purpose_of_use TEXT -- 'TREATMENT', 'PAYMENT', etc.
);
CREATE TABLE user_role_link (
user_id BIGINT NOT NULL REFERENCES app_user(user_id),
role_id BIGINT NOT NULL REFERENCES user_role(role_id),
PRIMARY KEY (user_id, role_id)
);
CREATE TABLE audit_event (
audit_event_id BIGSERIAL PRIMARY KEY,
person_id BIGINT REFERENCES person(person_id),
visit_occurrence_id BIGINT REFERENCES visit_occurrence(visit_occurrence_id),
user_id BIGINT REFERENCES app_user(user_id),
agent_id TEXT, -- agent name in LangGraph
action TEXT, -- 'READ', 'WRITE', 'DISCLOSE'
purpose_of_use TEXT,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
outcome TEXT -- 'success', 'denied'
);
-- Generic edges to model governance relations at KG level
CREATE TABLE policy_edge (
edge_id BIGSERIAL PRIMARY KEY,
from_type TEXT NOT NULL, -- 'Patient','Encounter','Document','Role','Consent'
from_id BIGINT NOT NULL,
to_type TEXT NOT NULL, -- 'Policy'
to_id BIGINT NOT NULL,
edge_type TEXT NOT NULL, -- 'GOVERNED_BY','CONSENT_GOVERNED_BY','PERMITTED_ACTION'
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_visit_person ON visit_occurrence(person_id);
CREATE INDEX idx_condition_person ON condition_occurrence(person_id);
CREATE INDEX idx_drug_person ON drug_exposure(person_id);
CREATE INDEX idx_measurement_person ON measurement(person_id);
CREATE INDEX idx_note_person ON note(person_id);
CREATE INDEX idx_policy_edge_from ON policy_edge(from_type, from_id, edge_type);
CREATE INDEX idx_policy_edge_to ON policy_edge(to_type, to_id, edge_type);
This is compatible with OMOP CDM semantics while being simple enough to project into a context graph.138139140137
2) Full HealthcareKGClient.patient_context implementation (Python)
This client queries Postgres and returns a context_graph JSON compatible with the earlier schema.
import psycopg
from typing import Dict, Any, List, Tuple
Node = Dict[str, Any]
Edge = Dict[str, Any]
Subgraph = Dict[str, List[Dict[str, Any]]]
class HealthcareKGClient:
def __init__(self, dsn: str):
self.dsn = dsn
# --- helpers ---------------------------------------------------------
def _fetchall(self, conn, query: str, params: Tuple[Any, ...]) -> List[Dict[str, Any]]:
with conn.cursor() as cur:
cur.execute(query, params)
cols = [c.name for c in cur.description]
return [dict(zip(cols, row)) for row in cur.fetchall()]
def _add_node(self, nodes: Dict[str, Node], node: Node) -> None:
nodes[node["id"]] = node
def _edge(self, id_: str, type_: str, from_id: str, to_id: str, props=None) -> Edge:
return {
"id": id_,
"type": type_,
"from": from_id,
"to": to_id,
"props": props or {},
}
# --- public API ------------------------------------------------------
def patient_context(
self,
person_id: int,
max_visits: int = 10,
max_conditions: int = 50,
max_drugs: int = 50,
max_measurements: int = 50,
include_notes: bool = True,
) -> Subgraph:
"""
Build a per-patient, per-encounter clinic context graph from OMOP-like tables.
"""
nodes: Dict[str, Node] = {}
edges: List[Edge] = []
with psycopg.connect(self.dsn) as conn:
# 1) Patient
person_rows = self._fetchall(
conn,
"""
SELECT person_id, birth_datetime, gender_concept_id
FROM person
WHERE person_id = %s
""",
(person_id,),
)
if not person_rows:
return {"nodes": [], "edges": []}
p = person_rows[^8_0]
patient_node_id = f"patient:{p['person_id']}"
patient_node = {
"id": patient_node_id,
"type": "Patient",
"name": f"Patient {p['person_id']}",
"props": {
"personId": p["person_id"],
"birthDatetime": p["birth_datetime"].isoformat() if p["birth_datetime"] else None,
"genderConceptId": p["gender_concept_id"],
"isPHI": True,
},
}
self._add_node(nodes, patient_node)
# 2) Encounters (visits)
visit_rows = self._fetchall(
conn,
"""
SELECT visit_occurrence_id, visit_concept_id,
visit_start_datetime, visit_end_datetime, care_site_id
FROM visit_occurrence
WHERE person_id = %s
ORDER BY visit_start_datetime DESC
LIMIT %s
""",
(person_id, max_visits),
)
for v in visit_rows:
enc_id = f"encounter:{v['visit_occurrence_id']}"
enc_node = {
"id": enc_id,
"type": "Encounter",
"name": f"Encounter {v['visit_occurrence_id']}",
"props": {
"visitConceptId": v["visit_concept_id"],
"start": v["visit_start_datetime"].isoformat() if v["visit_start_datetime"] else None,
"end": v["visit_end_datetime"].isoformat() if v["visit_end_datetime"] else None,
"careSiteId": v["care_site_id"],
"isPHI": True,
},
}
self._add_node(nodes, enc_node)
edges.append(self._edge(f"e-patient-enc-{v['visit_occurrence_id']}",
"HAS_ENCOUNTER", patient_node_id, enc_id))
if not visit_rows:
# Even if no visits, you might still have conditions or drugs, but
# for a context graph we usually stop here.
return {"nodes": list(nodes.values()), "edges": edges}
visit_ids = [v["visit_occurrence_id"] for v in visit_rows]
# 3) Conditions
condition_rows = self._fetchall(
conn,
"""
SELECT condition_occurrence_id, condition_concept_id,
condition_start_datetime, visit_occurrence_id
FROM condition_occurrence
WHERE person_id = %s
AND visit_occurrence_id = ANY(%s)
LIMIT %s
""",
(person_id, visit_ids, max_conditions),
)
for c in condition_rows:
cond_id = f"condition:{c['condition_occurrence_id']}"
node = {
"id": cond_id,
"type": "Condition",
"name": f"Condition {c['condition_occurrence_id']}",
"props": {
"conditionConceptId": c["condition_concept_id"],
"onset": c["condition_start_datetime"].isoformat()
if c["condition_start_datetime"] else None,
"isPHI": True,
},
}
self._add_node(nodes, node)
enc_id = f"encounter:{c['visit_occurrence_id']}"
edges.append(self._edge(f"e-enc-cond-{c['condition_occurrence_id']}",
"HAS_CONDITION", enc_id, cond_id))
# 4) Medications
drug_rows = self._fetchall(
conn,
"""
SELECT drug_exposure_id, drug_concept_id,
drug_exposure_start_datetime, drug_exposure_end_datetime,
visit_occurrence_id, stop_reason
FROM drug_exposure
WHERE person_id = %s
AND visit_occurrence_id = ANY(%s)
LIMIT %s
""",
(person_id, visit_ids, max_drugs),
)
for d in drug_rows:
med_id = f"med:{d['drug_exposure_id']}"
node = {
"id": med_id,
"type": "Medication",
"name": f"Medication {d['drug_exposure_id']}",
"props": {
"drugConceptId": d["drug_concept_id"],
"start": d["drug_exposure_start_datetime"].isoformat()
if d["drug_exposure_start_datetime"] else None,
"end": d["drug_exposure_end_datetime"].isoformat()
if d["drug_exposure_end_datetime"] else None,
"stopReason": d["stop_reason"],
"isPHI": True,
},
}
self._add_node(nodes, node)
enc_id = f"encounter:{d['visit_occurrence_id']}"
edges.append(self._edge(f"e-enc-med-{d['drug_exposure_id']}",
"HAS_MEDICATION", enc_id, med_id))
# 5) Observations / measurements
meas_rows = self._fetchall(
conn,
"""
SELECT measurement_id, measurement_concept_id,
measurement_datetime, value_as_number,
unit_concept_id, visit_occurrence_id
FROM measurement
WHERE person_id = %s
AND visit_occurrence_id = ANY(%s)
LIMIT %s
""",
(person_id, visit_ids, max_measurements),
)
for m in meas_rows:
obs_id = f"obs:{m['measurement_id']}"
node = {
"id": obs_id,
"type": "Observation",
"name": f"Measurement {m['measurement_id']}",
"props": {
"measurementConceptId": m["measurement_concept_id"],
"time": m["measurement_datetime"].isoformat()
if m["measurement_datetime"] else None,
"value": m["value_as_number"],
"unitConceptId": m["unit_concept_id"],
"isPHI": True,
},
}
self._add_node(nodes, node)
enc_id = f"encounter:{m['visit_occurrence_id']}"
edges.append(self._edge(f"e-enc-obs-{m['measurement_id']}",
"HAS_OBSERVATION", enc_id, obs_id))
# 6) Notes / documents
if include_notes:
note_rows = self._fetchall(
conn,
"""
SELECT note_id, visit_occurrence_id, note_datetime,
note_title, embedding_id
FROM note
WHERE person_id = %s
AND visit_occurrence_id = ANY(%s)
""",
(person_id, visit_ids),
)
for n in note_rows:
doc_id = f"doc:{n['note_id']}"
node = {
"id": doc_id,
"type": "Document",
"name": n["note_title"] or f"Note {n['note_id']}",
"props": {
"noteId": n["note_id"],
"time": n["note_datetime"].isoformat()
if n["note_datetime"] else None,
"embeddingId": n["embedding_id"],
"isPHI": True,
},
}
self._add_node(nodes, node)
enc_id = f"encounter:{n['visit_occurrence_id']}"
edges.append(self._edge(f"e-enc-doc-{n['note_id']}",
"HAS_DOCUMENT", enc_id, doc_id))
# 7) Governance: policies and consents attached to the patient
policy_rows = self._fetchall(
conn,
"""
SELECT pe.edge_id, pe.from_type, pe.from_id, pe.to_type, pe.to_id,
pe.edge_type,
p.name, p.reg_citation, p.scope, p.link
FROM policy_edge pe
JOIN policy p ON p.policy_id = pe.to_id
WHERE pe.from_type = 'Patient'
AND pe.from_id = %s
""",
(person_id,),
)
for r in policy_rows:
pol_node_id = f"policy:{r['to_id']}"
if pol_node_id not in nodes:
policy_node = {
"id": pol_node_id,
"type": "Policy",
"name": r["name"],
"props": {
"regCitation": r["reg_citation"],
"scope": r["scope"],
"link": r["link"],
"isPHI": False,
},
}
self._add_node(nodes, policy_node)
edges.append(
self._edge(
f"e-pol-{r['edge_id']}",
r["edge_type"] or "GOVERNED_BY",
patient_node_id,
pol_node_id,
)
)
consent_rows = self._fetchall(
conn,
"""
SELECT consent_id, status, scope, categories,
created_at, updated_at
FROM consent
WHERE person_id = %s
""",
(person_id,),
)
for c in consent_rows:
cons_id = f"consent:{c['consent_id']}"
node = {
"id": cons_id,
"type": "Consent",
"name": f"Consent {c['consent_id']}",
"props": {
"status": c["status"],
"scope": c["scope"],
"categories": c["categories"],
"createdAt": c["created_at"].isoformat() if c["created_at"] else None,
"updatedAt": c["updated_at"].isoformat() if c["updated_at"] else None,
"isPHI": False,
},
}
self._add_node(nodes, node)
edges.append(
self._edge(
f"e-patient-cons-{c['consent_id']}",
"HAS_CONSENT",
patient_node_id,
cons_id,
)
)
return {"nodes": list(nodes.values()), "edges": edges}
This function gives you a per‑patient context graph aligned with OMOP tables and the clinic schema you defined.139140141142137138
You then wire it into your LangGraph build_context_graph node:
kg_client = HealthcareKGClient(dsn="postgresql://user:pass@host/db")
def build_context_graph(state: AgentState, runtime) -> AgentState:
person_id = int(state["patient_id"].split(":")[^8_1]) # e.g., "patient:123"
subgraph = kg_client.patient_context(person_id=person_id)
state["context_graph"] = subgraph
return state
If you want, next I can add a Neo4j Cypher version of patient_context that builds the same context graph directly from a graph model, or provide AQL queries if you prefer an ArangoDB‑backed implementation.
RDF explain structured format
RDF (Resource Description Framework) is a data model for representing information as a directed labeled graph, using a very simple structured format: triples.
Core structured model
- Triple: the basic statement is
$\langle subject,\ predicate,\ object \rangle$
where:
- subject = the thing being described (usually an IRI/URI or blank node)
- predicate = the property/relationship (IRI)
- object = the value or related thing (IRI, blank node, or literal)
Example triple (conceptually):
- Subject:
<https://example.org/patient/123> - Predicate:
<https://example.org/vocab/hasCondition> - Object:
<https://example.org/condition/CKD-3a>
Together, many triples form a graph: subjects/objects are nodes, predicates are edge labels.
Common structured serializations
These are different concrete syntaxes to write the same RDF graph:
- Turtle (.ttl)
- Human‑readable, compact.
- Uses prefixes and punctuation to group triples.
- Example:
@prefix ex: <https://example.org/> .
ex:patient123
ex:hasCondition ex:CKD_3a ;
ex:age 54 .
- N‑Triples (.nt)
- Line‑oriented, very simple, one triple per line.
- Easier for machines, verbose for humans.
- Example:
<https://example.org/patient/123> <https://example.org/vocab/hasCondition> <https://example.org/condition/CKD-3a> .
<https://example.org/patient/123> <https://example.org/vocab/age> "54"^^<http://www.w3.org/2001/XMLSchema#integer> .
- N‑Quads (.nq)
- Extends N‑Triples with a 4th element (graph name) for datasets (multiple named graphs).
- RDF/XML (.rdf)
- XML syntax; older and more verbose, but still widely supported.
- JSON‑LD (.jsonld)
- JSON representation of RDF, designed to embed linked data into JSON APIs.
- Uses an
@contextto map JSON keys to IRIs. - Example:
{
"@context": {
"ex": "https://example.org/",
"hasCondition": "ex:hasCondition",
"age": "ex:age"
},
"@id": "ex:patient123",
"hasCondition": { "@id": "ex:CKD_3a" },
"age": 54
}
All these formats encode the same underlying structure: a set of triples in the RDF graph model. The choice is about readability (Turtle), simplicity (N‑Triples), compatibility (RDF/XML), or JSON integration (JSON‑LD).
RDF Turtle syntax examples
JSON-LD vs Turtle comparison N-Triples format details How to parse RDF serializations in Python RDF vs knowledge graph schemas like clinic patient
Below is a compact, implementation‑oriented rundown for all five items.
1) RDF Turtle syntax examples
Turtle is a compact textual syntax for RDF triples.162163164165
Basic example:
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123
a ex:Patient ;
ex:hasCondition ex:CKD_3a ;
ex:age 54 ;
ex:birthDate "1971-03-15"^^xsd:date .
List and blank node example:163162
@prefix : <http://example.org/foo/> .
:subject :predicate ( :a :b :c ) .
:subject :predicate2 () .
Predicate list shorthand (multiple properties for same subject):164
<http://example.org/thing1>
a <http://example.org/Example> ;
<http://example.org/name> "First Thing" ;
<http://example.org/next> <http://example.org/thing2> .
2) JSON‑LD vs Turtle (in practice)
Both are RDF serializations (different notations for the same triple model).166167168169
- Turtle
- Textual, graph‑centric; closer to SPARQL; very readable for ontologists.
- Good for ontologies, schema files, data dumps.
- JSON‑LD
Key practical differences:168166167
- JSON‑LD works seamlessly with JSON types and web stacks; Turtle is more natural inside RDF/SPARQL ecosystems.
- Conversion between them is algorithmically defined and supported by libraries (e.g., json‑ld processors, RDFLib).170171166
- SPARQL works identically once data is in a triple store regardless of whether it came from Turtle or JSON‑LD.168
3) N‑Triples format details
N‑Triples is a very simple, line‑oriented RDF serialization.169172162
- One triple per line:
<subject> <predicate> <object> . - Subject and predicate are IRIs or blank nodes; object can be IRI, blank node, or literal.
- No prefixes; everything is absolute, which makes it verbose but trivial to parse.
- Example:
<http://example.org/patient/123> <http://example.org/vocab/hasCondition> <http://example.org/condition/CKD-3a> .
<http://example.org/patient/123> <http://example.org/vocab/age> "54"^^<http://www.w3.org/2001/XMLSchema#integer> .
N‑Quads extends N‑Triples with a 4th term (graph name) for datasets.
4) Parsing RDF serializations in Python (RDFLib)
RDFLib supports Turtle, N‑Triples, RDF/XML, JSON‑LD, etc.173171174
Basic parsing:
from rdflib import Graph
g = Graph()
g.parse("example.ttl", format="turtle") # or "nt", "xml", "json-ld", ...
print(len(g)) # number of triples
for s, p, o in g:
print(s, p, o)
Parsing N‑Triples:174
from rdflib import Graph
g = Graph()
g.parse("demo.nt", format="nt")
for stmt in g:
print(stmt)
Parsing JSON‑LD:
from rdflib import Graph
g = Graph()
g.parse("example.jsonld", format="json-ld")
RDFLib also supports datasets and TriG (named graphs):171173
from rdflib import Dataset, RDF
ds = Dataset()
ds.parse("demo.trig", format="trig")
for s, p, o, g in ds.quads((None, RDF.type, None, None)):
print(s, g)
5) RDF vs clinic patient “knowledge graph” schema
RDF is an abstract graph data model + semantics; your clinic patient graph is a specific domain schema that can be expressed in RDF (or in other graph/relational stores).175176177178169
- RDF level
- Just triples: subject–predicate–object with IRIs and literals.
- Serializations: Turtle, JSON‑LD, N‑Triples, RDF/XML, etc.
- Ontology layers (RDFS/OWL) express class hierarchies, property domains/ranges, constraints.
- Clinic patient KG level
- Domain modeling: classes like
Patient,Encounter,Condition,Medication,Observation,Policy,AuditEvent, and properties likehasEncounter,hasCondition,governedBy. - You can define these as RDF classes/properties (e.g., in Turtle) and store patient data as RDF graphs, or you can implement the same schema in Neo4j, ArangoDB, Postgres.176177178175
- Domain modeling: classes like
Mapping your schema to RDF is roughly:
- Node types → RDF classes (
ex:Patient a rdfs:Class, etc.). - Properties/edges → RDF properties (
ex:hasEncounter a rdf:Property, etc.). - Instance nodes → RDF resources (
ex:patient123 a ex:Patient…).
You are free to choose RDF/Turtle/JSON‑LD if you want standards‑based interoperability and SPARQL, or stay in property‑graph / relational if you prioritize operational familiarity and performance. The structural ideas (entities + relationships) are the same; RDF just standardizes the representation and reasoning layer.177178169175176
If you’d like, next I can:
- Show your clinic context graph encoded as Turtle and JSON‑LD side‑by‑side, or
- Sketch an OWL/RDFS ontology (in Turtle) for your healthcare context graph schema.
RDFLib tutorial for querying RDF graphs
Convert Turtle to JSON-LD in Python N-Quads vs N-Triples differences SPARQL queries on Turtle data examples Turtle syntax for blank nodes and collections. Show your clinic context graph encoded as Turtle and JSON‑LD side‑by‑side, or Sketch an OWL/RDFS ontology (in Turtle) for your healthcare context graph schema.
Here’s a concise, code‑oriented answer touching all the items, plus the clinic context graph encodings.
RDFLib: querying RDF graphs with SPARQL
Basic pattern to load Turtle and run SPARQL.183184185186187
from rdflib import Graph, Namespace
g = Graph()
g.parse("clinic.ttl", format="turtle")
EX = Namespace("http://example.org/clinic#")
q = """
PREFIX ex: <http://example.org/clinic#>
SELECT ?enc ?cond
WHERE {
?patient a ex:Patient ;
ex:hasEncounter ?enc .
?enc ex:hasCondition ?cond .
}
"""
for row in g.query(q):
enc_iri, cond_iri = row
print(enc_iri, cond_iri)
from rdflib.plugins.sparql import prepareQuery
q = prepareQuery(
"""
PREFIX ex: <http://example.org/clinic#>
SELECT ?obs ?value
WHERE {
?patient a ex:Patient ;
ex:hasEncounter ?enc .
?enc ex:hasObservation ?obs .
?obs ex:value ?value .
}
""",
)
for row in g.query(q):
print(row.obs, row.value)
Convert Turtle → JSON‑LD in Python (RDFLib)
from rdflib import Graph
g = Graph()
g.parse("clinic.ttl", format="turtle")
jsonld_str = g.serialize(format="json-ld", indent=2)
with open("clinic.jsonld", "w") as f:
f.write(jsonld_str)
Reverse (JSON‑LD → Turtle) is symmetric by swapping parse/format.188189187190
N‑Quads vs N‑Triples
- N‑Triples: 3 terms + dot per line (subject, predicate, object).191192
- N‑Quads: 4 terms + dot per line (subject, predicate, object, graph label), so it can represent named graphs / datasets.193191
Example N‑Triples:
<http://example.org/clinic/patient123> <http://example.org/clinic/hasEncounter> <http://example.org/clinic/encounter789> .
Same triple in a named graph with N‑Quads:
<http://example.org/clinic/patient123> <http://example.org/clinic/hasEncounter> <http://example.org/clinic/encounter789> <http://example.org/graphs/episode1> .
N‑Triples: only a single default graph; N‑Quads: multi‑graph datasets.193191
SPARQL queries on Turtle data (clinic example)
Assume clinic.ttl defines your context graph.
All encounters for a patient:
PREFIX ex: <http://example.org/clinic#>
SELECT ?enc
WHERE {
ex:patient123 a ex:Patient ;
ex:hasEncounter ?enc .
}
Conditions and medications per encounter:
PREFIX ex: <http://example.org/clinic#>
SELECT ?enc ?cond ?med
WHERE {
ex:patient123 ex:hasEncounter ?enc .
OPTIONAL { ?enc ex:hasCondition ?cond . }
OPTIONAL { ?enc ex:hasMedication ?med . }
}
These queries work identically whether data comes from Turtle, JSON‑LD, or N‑Triples once loaded into a triplestore or RDFLib graph.185194183
Turtle: blank nodes and collections
Blank nodes (anonymous resources):195196197198
@prefix ex: <http://example.org/clinic#> .
ex:patient123 ex:hasAddress [
a ex:Address ;
ex:street "123 Main St" ;
ex:city "Springfield"
] .
Collections (RDF lists):
@prefix ex: <http://example.org/clinic#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
ex:patient123 ex:hasAllergies ( ex:Penicillin ex:Peanuts ) .
This expands to rdf:first, rdf:rest chains under the hood.
Clinic context graph: Turtle and JSON‑LD side‑by‑side
Minimal slice: Patient, one Encounter, one Condition, one Medication, one Policy.
Turtle
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123 a ex:Patient ;
ex:mrn "MRN-12345" ;
ex:age 54 ;
ex:sex "female" ;
ex:hasEncounter ex:encounter789 ;
ex:hasConsent ex:consent1 .
ex:encounter789 a ex:Encounter ;
ex:encounterType "inpatient" ;
ex:start "2026-01-10T09:32:00Z"^^xsd:dateTime ;
ex:end "2026-01-14T15:21:00Z"^^xsd:dateTime ;
ex:hasCondition ex:condition_ckd3a ;
ex:hasMedication ex:med_acei ;
ex:governedBy ex:policy_minimum_necessary .
ex:condition_ckd3a a ex:Condition ;
ex:codeSystem "ICD-10" ;
ex:code "N18.31" ;
ex:display "Chronic kidney disease, stage 3a" .
ex:med_acei a ex:Medication ;
ex:rxnormCode "29046" ;
ex:name "Lisinopril 10 mg" ;
ex:status "stopped" ;
ex:stopReason "hyperkalemia" .
ex:policy_minimum_necessary a ex:Policy ;
ex:name "HIPAA Minimum Necessary" ;
ex:regCitation "45 CFR 164.514(d)" .
ex:consent1 a ex:Consent ;
ex:status "active" ;
ex:scope "RESEARCH" .
ex:audit_step12 a ex:AuditEvent ;
ex:agentId "clinical-agent" ;
ex:userId "user:clinician-456" ;
ex:timestamp "2026-02-03T04:33:00Z"^^xsd:dateTime ;
ex:action "READ" ;
ex:purposeOfUse "TREATMENT" ;
ex:outcome "success" ;
ex:recordedIn ex:encounter789 .
JSON‑LD
{
"@context": {
"ex": "http://example.org/clinic#",
"mrn": "ex:mrn",
"age": "ex:age",
"sex": "ex:sex",
"hasEncounter": "ex:hasEncounter",
"hasCondition": "ex:hasCondition",
"hasMedication": "ex:hasMedication",
"hasConsent": "ex:hasConsent",
"governedBy": "ex:governedBy",
"codeSystem": "ex:codeSystem",
"code": "ex:code",
"display": "ex:display",
"rxnormCode": "ex:rxnormCode",
"name": "ex:name",
"status": "ex:status",
"stopReason": "ex:stopReason",
"regCitation": "ex:regCitation",
"scope": "ex:scope",
"agentId": "ex:agentId",
"userId": "ex:userId",
"timestamp": {
"@id": "ex:timestamp",
"@type": "http://www.w3.org/2001/XMLSchema#dateTime"
},
"action": "ex:action",
"purposeOfUse": "ex:purposeOfUse",
"outcome": "ex:outcome",
"recordedIn": "ex:recordedIn",
"Patient": "ex:Patient",
"Encounter": "ex:Encounter",
"Condition": "ex:Condition",
"Medication": "ex:Medication",
"Policy": "ex:Policy",
"Consent": "ex:Consent",
"AuditEvent": "ex:AuditEvent"
},
"@graph": [
{
"@id": "ex:patient123",
"@type": "Patient",
"mrn": "MRN-12345",
"age": 54,
"sex": "female",
"hasEncounter": { "@id": "ex:encounter789" },
"hasConsent": { "@id": "ex:consent1" }
},
{
"@id": "ex:encounter789",
"@type": "Encounter",
"encounterType": "inpatient",
"start": "2026-01-10T09:32:00Z",
"end": "2026-01-14T15:21:00Z",
"hasCondition": { "@id": "ex:condition_ckd3a" },
"hasMedication": { "@id": "ex:med_acei" },
"governedBy": { "@id": "ex:policy_minimum_necessary" }
},
{
"@id": "ex:condition_ckd3a",
"@type": "Condition",
"codeSystem": "ICD-10",
"code": "N18.31",
"display": "Chronic kidney disease, stage 3a"
},
{
"@id": "ex:med_acei",
"@type": "Medication",
"rxnormCode": "29046",
"name": "Lisinopril 10 mg",
"status": "stopped",
"stopReason": "hyperkalemia"
},
{
"@id": "ex:policy_minimum_necessary",
"@type": "Policy",
"name": "HIPAA Minimum Necessary",
"regCitation": "45 CFR 164.514(d)"
},
{
"@id": "ex:consent1",
"@type": "Consent",
"status": "active",
"scope": "RESEARCH"
},
{
"@id": "ex:audit_step12",
"@type": "AuditEvent",
"agentId": "clinical-agent",
"userId": "user:clinician-456",
"timestamp": "2026-02-03T04:33:00Z",
"action": "READ",
"purposeOfUse": "TREATMENT",
"outcome": "success",
"recordedIn": { "@id": "ex:encounter789" }
}
]
}
These two encodings are isomorphic RDF graphs; you can round‑trip between them with RDFLib.187190188
If you want the OWL/RDFS layer next, I can sketch an ontology in Turtle (classes + properties + domains/ranges) for this clinic context schema, suitable as a base for SHACL or reasoning.
REIFCATION statements about statements
In RDF, reification is the pattern of making a statement itself into a resource so you can make further statements about that statement (provenance, source, time, certainty, etc.).206207208209
Core idea
Normal RDF triple:
- $\langle S, P, O \rangle$ = subject, predicate, object (e.g.,
:alice :likes :RDF).
Reification turns that triple into a describable node R:
R rdf:type rdf:StatementR rdf:subject SR rdf:predicate PR rdf:object O
Now you can say things like “who asserted this triple”, “when”, “with what confidence”.207210211206
Simple example
Base triple:
@prefix ex: <http://example.org/> .
ex:alice ex:likes ex:RDF .
Reified version (statements about that statement):212213206
@prefix ex: <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
# The original statement
ex:alice ex:likes ex:RDF .
# A resource representing that statement
ex:stmt1
a rdf:Statement ;
rdf:subject ex:alice ;
rdf:predicate ex:likes ;
rdf:object ex:RDF .
# Metadata about the statement
ex:stmt1
ex:assertedBy ex:wikipedia ;
ex:assertedOn "2024-01-01" ;
ex:certainty "0.8"^^xsd:decimal .
Here ex:stmt1 is not the subject of the original statement; it is a description of the triple (ex:alice, ex:likes, ex:RDF).206207
Why use reification?
- Provenance: source of a statement (
ex:assertedBy,prov:wasDerivedFrom). - Temporal context: when the statement was true or observed.
- Trust/quality: confidence scores, author, version.
- Access control: who is allowed to see/use a given triple (relevant to your governed context graph / HIPAA work).
In a clinic setting, you’d reify statements like “patient has condition CKD‑3a” to attach who recorded it, when, and with what evidence, separate from the core clinical fact.
Alternatives in practice
Standard reification (with rdf:Statement) is verbose, and many systems prefer alternatives:208209212207
- Named graphs / N‑Quads: put triples in a graph identified by a URI, then attach metadata to the graph (e.g., “graph G was asserted by source X at time T”).
- Singleton properties: create a unique predicate for each statement so you can annotate that predicate.209212208
- Custom patterns: e.g., application‑level “Fact” resource that has
fact:subject,fact:predicate,fact:object, plus metadata.
For a governed context graph, named graphs or custom “Fact/AuditEvent” resources often scale better than classic rdf:Statement reification.
Singleton property vs RDF reification for statements about statements
Turtle syntax example for reifying a statement RDFLib Python code to create and query reified statements Use cases of reification in RDF knowledge graphs Named graphs alternative to RDF reification Alternatives in practice Standard reification (with rdf:Statement) is verbose, and many systems prefer alternatives: Named graphs / N‑Quads: put triples in a graph identified by a URI, then attach metadata to the graph (e.g., “graph G was asserted by source X at time T”). Singleton properties: create a unique predicate for each statement so you can annotate that predicate. Custom patterns: e.g., application‑level “Fact” resource that has fact:subject, fact:predicate, fact:object, plus metadata. For a governed context graph, named graphs or custom “Fact/AuditEvent” resources often scale better than classic rdf:Statement reification.
RDF gives you several ways to make “statements about statements.” Below is a concise comparison and examples, with emphasis on patterns that fit a governed context graph.
1) Standard RDF reification vs singleton properties
Standard reification (rdf:Statement)216217218
- Represent each triple as a resource with
rdf:subject,rdf:predicate,rdf:object. - Pros: standardized, works with any RDF 1.1 system.
- Cons: verbose (4 triples + metadata per original triple), awkward to query at scale.
Example (Tolkien wrote LOTR):219220
@prefix ex: <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
# Base triple
ex:Tolkien ex:wrote ex:LOTR .
# Reified statement
ex:s1 a rdf:Statement ;
rdf:subject ex:Tolkien ;
rdf:predicate ex:wrote ;
rdf:object ex:LOTR ;
ex:since "1954" ;
ex:source ex:someBibliography .
Singleton properties221220222223216
- Create a unique property for each triple, relate it to its generic property via
rdf:singletonPropertyOf. - Metadata is attached to the singleton property instead of a statement node.
Example (same semantics):220216221
@prefix ex: <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
# A unique property for this specific “wrote” statement
ex:wrote_s1 rdf:singletonPropertyOf ex:wrote ;
ex:since "1954" ;
ex:source ex:someBibliography .
# Statement using the singleton property
ex:Tolkien ex:wrote_s1 ex:LOTR .
Pros/cons relative to classic reification:222224216221
- Pros: fewer auxiliary triples per statement, more natural SPARQL over properties.
- Cons: not part of the core RDF standard; reasoners and tools may need special handling.
For a governed context graph, singleton properties can be powerful if you care deeply about per‑edge metadata and are OK with non‑standard patterns.
2) RDFLib: creating and querying reified statements
Creating reified statements (classic style) in Python:225218
from rdflib import Graph, URIRef, BNode, Namespace, RDF, Literal
EX = Namespace("http://example.org/")
g = Graph()
subject = EX.Tolkien
predicate = EX.wrote
obj = EX.LOTR
# Base triple
g.add((subject, predicate, obj))
# Reification node
s1 = BNode()
g.add((s1, RDF.type, RDF.Statement))
g.add((s1, RDF.subject, subject))
g.add((s1, RDF.predicate, predicate))
g.add((s1, RDF.object, obj))
g.add((s1, EX.since, Literal("1954")))
g.add((s1, EX.source, EX.someBibliography))
print(g.serialize(format="turtle").decode())
Querying reified statements with SPARQL:218226227
from rdflib import Graph, Namespace, RDF
EX = Namespace("http://example.org/")
g = Graph()
g.parse("reified.ttl", format="turtle")
q = """
PREFIX ex: <http://example.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?sub ?obj ?since ?src
WHERE {
?stmt a rdf:Statement ;
rdf:subject ?sub ;
rdf:predicate ex:wrote ;
rdf:object ?obj ;
ex:since ?since ;
ex:source ?src .
}
"""
for row in g.query(q):
print(row.sub, row.obj, row.since, row.src)
Singleton properties in RDFLib are analogous: you create a unique predicate EX.wrote_s1, assert EX.wrote_s1 rdf:singletonPropertyOf EX.wrote, and attach metadata to EX.wrote_s1.216220
3) Use cases of reification / singleton properties in KGs
Common uses in large KGs:224228221216
- Provenance: who asserted a fact, which dataset or publication, with which version.
- Temporal validity: when a relation was observed or valid (e.g., drug exposure period, condition status intervals).
- Uncertainty and trust: probabilities, confidence scores, source reliability.
- Policy / access control: which roles or policies apply at the statement level (e.g., “clinician can see this link between patient and diagnosis but researcher cannot”).
- Biomedical and clinical graphs: multiple lab measurements, conflicting diagnoses, or different coding systems for the same concept often benefit from statement‑level metadata.221224216
For your clinic context graph, this aligns with representing “Patient has CKD‑3a according to clinician X at time T, for encounter E, under policy P, with certainty C.”
4) Named graphs as an alternative
Named graphs attach metadata to sets of triples (graphs) rather than individual triples.229230231232
- Data model: quads
(subject, predicate, object, graphName). - Implementation: TriG/N‑Quads syntax and quad stores.
- You then annotate the graph IRI with provenance, time, etc.
Example in N‑Quads:230231232229
<http://example.org/Tolkien> <http://example.org/wrote> <http://example.org/LOTR> <http://example.org/graph/assertion1> .
Metadata about the graph (in Turtle over the dataset vocabulary):
@prefix ex: <http://example.org/> .
ex:graph/assertion1
ex:source ex:someBibliography ;
ex:since "1954" ;
ex:assertedBy ex:alice .
- Easier to manage large sets of provenance (per dataset / per context) than per triple.
- Works well with governed context graphs: you can use one named graph per agent step or per “context graph snapshot,” then attach audit metadata to the graph name.
5) What to use in a governed context graph
Given your clinic/HIPAA context:
- Use named graphs for:
- Use custom Fact/AuditEvent resources for:
- Application‑level audit trails (“agent X read PHI node Y at time T for purpose U”).
- This aligns with the AuditEvent node pattern you already have, without forcing RDF‑level reification.
- Use singleton properties or classic reification only where you truly need fine‑grained, statement‑level metadata and you’re willing to pay the complexity cost (e.g., for specific high‑value clinical facts or research‑grade provenance).224216221
If you want, next I can sketch a RDF‑star / Turtle‑star variant, which is the more modern RDF‑level solution for statements‑about‑statements and tends to be cleaner than classic reification or singleton properties.
RDF 1.2
RDF 1.2 is the latest revision of the RDF family of specs. Conceptually it keeps the same graph data model as RDF 1.1, but adds a few important capabilities and some versioning/clarifications.236237238
Key additions in RDF 1.2
- Triple terms (RDF‑star built in) RDF 1.2 allows an RDF triple to be used as a triple term in the object position of another triple.237236
- Directional language‑tagged strings RDF 1.2 introduces language‑tagged strings that also carry a base direction (e.g., left‑to‑right, right‑to‑left), so user agents can render multilingual content correctly.237236
- Version signalling
RDF 1.2 defines a mechanism for indicating the RDF version used by a dataset or serialization (e.g., via media‑type parameters or in‑syntax directives).236
- This lets parsers detect RDF 1.2‑specific features (like triple terms) early and warn or error if unsupported.
Backwards compatibility with RDF 1.1
- Data valid under RDF 1.1 is also valid under RDF 1.2 (“1.1” and “1.2‑basic” profiles). The core graph model (IRIs, blank nodes, literals, triples, datasets) remains the same.238237236
- Existing entailment regimes and tools can continue to operate; RDF 1.2 mainly extends the model (e.g., triple terms) and clarifies semantics rather than breaking existing 1.1 data.239240237236
Why it matters for your use cases
For governed context graphs and “statements about statements” in healthcare:
- RDF 1.2’s triple terms offer a more natural alternative to classic
rdf:Statementreification and singleton properties for attaching provenance, policy, or audit metadata directly to clinical facts.241237236 - You can still keep your clinic context graph schema (Patient, Encounter, Condition, Policy, AuditEvent) and, where needed, annotate particular statements using triple terms instead of heavy reification constructs.
If you want, next I can sketch how one of your clinic context graph facts (e.g., “patient has condition CKD‑3a”) would look in RDF 1.2 triple‑term style vs classic reification and singleton properties.
sketch a RDF‑star / Turtle‑star variant, which is the more modern RDF‑level solution for statements‑about‑statements and tends to be cleaner than classic reification or singleton properties.
Related RDFStar triple terms vs RDF reification and singleton properties Turtle syntax example using named graphs for metadata RDFLib code example for singleton properties Limitations and performance issues of RDF reification Alternatives to reification like RDF quads or ntriples. sketch how one of your clinic context graph facts (e.g., “patient has condition CKD‑3a”) would look in RDF 1.2 triple‑term style vs classic reification and singleton properties.
Here’s a focused sketch of RDF‑star / Turtle‑star for your clinic graph, plus comparisons to classic reification and singleton properties, and a quick named‑graph example.
1) RDF‑star / Turtle‑star (RDF 1.2 triple terms)
RDF‑star lets you quote a triple and use it as subject or object of another triple, using << … >> in Turtle‑star.246247248249250
Clinic fact: “patient123 has condition CKD‑3a”.
Base triple (normal Turtle):
@prefix ex: <http://example.org/clinic#> .
ex:patient123 ex:hasCondition ex:condition_ckd3a .
RDF‑star: add metadata directly on that triple:
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123 ex:hasCondition ex:condition_ckd3a .
<< ex:patient123 ex:hasCondition ex:condition_ckd3a >>
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
Here the quoted triple << ex:patient123 ex:hasCondition ex:condition_ckd3a >> is a triple term that can be used just like a node.247248249250
This is the cleanest RDF‑level fit for your governed context graph (per‑fact provenance, audit, and policy) without four extra reification triples.
2) Same clinic fact in three styles
a) Classic RDF reification (rdf:Statement)
@prefix ex: <http://example.org/clinic#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123 ex:hasCondition ex:condition_ckd3a .
ex:s1 a rdf:Statement ;
rdf:subject ex:patient123 ;
rdf:predicate ex:hasCondition ;
rdf:object ex:condition_ckd3a ;
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
- Pros: standard RDF 1.1.250251252
- Cons: verbose; queries must go through
rdf:Statementstructure; performance overhead at scale.253254255250
b) Singleton property
@prefix ex: <http://example.org/clinic#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# Singleton property for this specific assertion
ex:hasCondition_s1 rdf:singletonPropertyOf ex:hasCondition ;
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
# Fact using the singleton property
ex:patient123 ex:hasCondition_s1 ex:condition_ckd3a .
- Pros: fewer extra triples; easier SPARQL over properties; good benchmarked characteristics.256257258259
- Cons: non‑standard; explosion of properties; some tools won’t “understand” singleton semantics out‑of‑the‑box.
c) RDF‑star triple term (RDF 1.2 / Turtle‑star)
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123 ex:hasCondition ex:condition_ckd3a .
<< ex:patient123 ex:hasCondition ex:condition_ckd3a >>
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
- Pros: compact, readable; good store performance; integrates with SPARQL‑star (query triple terms directly).248260261262247250
- Cons: requires RDF 1.2 / RDF‑star support (not universal yet, though major stacks like Jena, GraphDB, Blazegraph variants support it).263260249
3) Turtle‑star with named graphs (metadata per context graph)
You can combine RDF‑star with named graphs for your governed context graph snapshots.264265266267247
TriG‑star style:
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# Named graph = one context graph snapshot for a LangGraph step
ex:ctxGraph_step12 {
ex:patient123 ex:hasCondition ex:condition_ckd3a .
ex:patient123 ex:hasEncounter ex:encounter789 .
<< ex:patient123 ex:hasCondition ex:condition_ckd3a >>
ex:assertedBy ex:clinicalAgent ;
ex:assertedAt "2026-02-03T04:33:00Z"^^xsd:dateTime ;
ex:purposeOfUse "TREATMENT" .
}
# Metadata about the context graph as a whole
ex:ctxGraph_step12
ex:belongsToThread "pt-123-q-1" ;
ex:agentStep "clinical_agent" ;
ex:user "clinician:456" .
Now you can track:
- Metadata on the fact (triple term annotations).
- Metadata on the whole context graph snapshot (named graph annotations).
4) RDFLib and RDF‑star / singleton properties
- Singleton properties work in RDFLib as plain RDF: just treat
ex:hasCondition_s1as a normal predicate and addrdf:singletonPropertyOfand your metadata.268256
Example:
from rdflib import Graph, Namespace, RDF, Literal
from rdflib.namespace import XSD
EX = Namespace("http://example.org/clinic#")
g = Graph()
g.bind("ex", EX)
patient = EX.patient123
condition = EX.condition_ckd3a
base_pred = EX.hasCondition
single_pred = EX.hasCondition_s1
# Base relation using singleton property
g.add((patient, single_pred, condition))
# Declare singleton relationship to base property and metadata
g.add((single_pred, EX.assertedBy, EX.clinician456))
g.add((single_pred, EX.sourceEncounter, EX.encounter789))
g.add((single_pred, EX.confidence, Literal("0.92", datatype=XSD.decimal)))
g.add((single_pred, RDF.type, RDF.Property))
g.add((single_pred, EX.singletonPropertyOf, base_pred))
print(g.serialize(format="turtle").decode())
- RDF‑star in RDFLib: core releases don’t fully support Turtle‑star yet, but there are forks/PRs and external modules that translate RDF‑star to standard RDF (e.g., to reification) and back.269270
5) Limitations and performance issues of classic reification
Empirically and per vendor docs:254255250253
- Storage blow‑up: 4+ extra triples per annotated triple (statement node + subject/predicate/object).
- Query complexity: queries become multi‑join patterns over
rdf:Statement, which hurts readability and performance. - Ambiguity: reification does not, by itself, assert the original triple; you must assert it separately if you want it in the base graph.251263250
This is why RDF‑star is considered the preferred modern solution for edge metadata, especially in large KGs.261262260250
6) Alternatives recap for your governed context graph
For a clinic / HIPAA context graph:
- RDF‑star (triple terms): best fit if your graph stack supports it; annotate clinically important edges (Patient–Condition, Encounter–Medication) with provenance, policy, and audit metadata.
- Named graphs (TriG / N‑Quads): use one named graph per context‑graph snapshot (per LangGraph step / agent turn) and attach metadata (thread, agent, user, purposeOfUse) to the graph IRI.265266267264
- Singleton properties: good fallback if you want per‑edge metadata in RDF 1.1 tools and can tolerate property proliferation.257258256
- Classic reification: standards‑pure but heavy; mainly useful if you must stay fully RDF 1.1 with no extensions and no custom patterns.250251254
If you tell me which stack you’re most likely to use for the clinic KG (Jena/GraphDB vs pure RDFLib), I can give you a SPARQL‑star example query over the RDF‑star clinic graph and a minimal migration pattern from your current JSON context_graph into Turtle‑star.
pure RDFLib), I can give you a SPARQL‑star example query over the RDF‑star clinic graph and a minimal migration pattern from your current JSON context_graph into Turtle‑star.
RDF-star vs classic RDF reification examples SPARQL-star query syntax for nested triples RDFLib code example parsing Turtle-star Performance benchmarks RDF-star vs singleton properties Convert Turtle-star to TriG-star format
RDF‑star lets you add metadata directly to triples; in your clinic context graph it’s a cleaner fit than classic reification. Below are minimal examples and patterns you can adapt.
RDF‑star vs classic reification (clinic fact)
Fact: “patient123 has condition CKD‑3a”.
RDF‑star (Turtle‑star)
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# Base fact
ex:patient123 ex:hasCondition ex:condition_ckd3a .
# Metadata on that fact (triple term)
<< ex:patient123 ex:hasCondition ex:condition_ckd3a >>
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
Classic RDF reification
@prefix ex: <http://example.org/clinic#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123 ex:hasCondition ex:condition_ckd3a .
ex:s1 a rdf:Statement ;
rdf:subject ex:patient123 ;
rdf:predicate ex:hasCondition ;
rdf:object ex:condition_ckd3a ;
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
RDF‑star is more compact and keeps metadata close to the fact; reification requires a separate rdf:Statement node and more joins.274275276277278
SPARQL‑star query syntax for nested triples
SPARQL‑star extends SPARQL to bind triple terms using << … >>.277279280
Example: find all patient‑condition facts with confidence and encounter:
PREFIX ex: <http://example.org/clinic#>
SELECT ?patient ?cond ?enc ?conf
WHERE {
# Base fact
?patient ex:hasCondition ?cond .
# Metadata about that fact (quoted triple)
<< ?patient ex:hasCondition ?cond >>
ex:sourceEncounter ?enc ;
ex:confidence ?conf .
}
Example: filter by clinician:
PREFIX ex: <http://example.org/clinic#>
SELECT ?cond ?when
WHERE {
<< ex:patient123 ex:hasCondition ?cond >>
ex:assertedBy ex:clinician456 ;
ex:assertedAt ?when .
}
Most triple stores that support RDF‑star (Jena, GraphDB, etc.) also support SPARQL‑star.278281282283
RDFLib and Turtle‑star
RDFLib does not fully support RDF‑star natively in stable releases; current patterns are:284285
- Use a store or service that supports RDF‑star (e.g., Jena Fuseki, GraphDB), talk to it via HTTP/SPARQL‑star.
- Or, preprocess Turtle‑star → classic RDF (reification or custom encoding) before loading into RDFLib.
If you have a server with RDF‑star, you can still use RDFLib’s SPARQL wrapper to query it:
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint = SPARQLWrapper("http://localhost:3030/clinic/sparql")
endpoint.setReturnFormat(JSON)
query = """
PREFIX ex: <http://example.org/clinic#>
SELECT ?cond ?when
WHERE {
<< ex:patient123 ex:hasCondition ?cond >>
ex:assertedBy ex:clinician456 ;
ex:assertedAt ?when .
}
"""
endpoint.setQuery(query)
results = endpoint.query().convert()
for row in results["results"]["bindings"]:
print(row["cond"]["value"], row["when"]["value"])
For pure local RDFLib today, you’d generally stick to standard RDF (no << >>) or encode RDF‑star triples into a custom vocabulary until RDF‑star support lands.285284
Performance: RDF‑star vs singleton properties (high‑level)
Benchmarks comparing reification, singleton properties, and RDF‑star show:282283286287288289278
- Classic reification: worst in storage and query performance due to 4+ metadata triples and complex patterns per statement.
- Singleton properties: better storage and query characteristics than reification, but explode the property space and require special semantics.
- RDF‑star: generally best trade‑off—metadata expressed with minimal overhead, simpler graph patterns, better performance with engines optimized for triple terms.
The exact numbers depend on the engine, but for large governed KGs, vendors now generally recommend RDF‑star where available.286289278282
Named graphs / TriG‑star for metadata per context graph
You can combine RDF‑star with named graphs (TriG‑star) to annotate whole context graphs.276290291292293
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# Context graph for one LangGraph step
ex:ctxGraph_step12 {
ex:patient123 ex:hasCondition ex:condition_ckd3a .
<< ex:patient123 ex:hasCondition ex:condition_ckd3a >>
ex:assertedBy ex:clinicalAgent ;
ex:assertedAt "2026-02-03T04:33:00Z"^^xsd:dateTime ;
ex:purposeOfUse "TREATMENT" .
}
# Graph-level metadata
ex:ctxGraph_step12
ex:threadId "pt-123-q-1" ;
ex:agentStep "clinical_agent" ;
ex:user "clinician:456" .
Converting Turtle‑star (single default graph) → TriG‑star is mostly a matter of wrapping triples into a named graph block and adding graph‑level metadata.
Minimal migration: JSON context_graph → Turtle‑star
Assume your current JSON context graph:
{
"nodes": [
{ "id": "patient:123", "type": "Patient" },
{ "id": "condition:ckd-3a", "type": "Condition" }
],
"edges": [
{
"id": "e1",
"type": "HAS_CONDITION",
"from": "patient:123",
"to": "condition:ckd-3a",
"props": {
"assertedBy": "clinician:456",
"assertedAt": "2026-01-10T10:00:00Z",
"sourceEncounter": "encounter:789",
"confidence": 0.92
}
}
]
}
You can emit Turtle‑star like:
@prefix ex: <http://example.org/clinic#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:patient123 a ex:Patient .
ex:condition_ckd3a a ex:Condition .
ex:patient123 ex:hasCondition ex:condition_ckd3a .
<< ex:patient123 ex:hasCondition ex:condition_ckd3a >>
ex:assertedBy ex:clinician456 ;
ex:assertedAt "2026-01-10T10:00:00Z"^^xsd:dateTime ;
ex:sourceEncounter ex:encounter789 ;
ex:confidence "0.92"^^xsd:decimal .
Mapping rules:
- JSON node IDs → IRIs:
"patient:123"→ex:patient123;"condition:ckd-3a"→ex:condition_ckd3a. - Edge types → properties:
"HAS_CONDITION"→ex:hasCondition. - Edge
props→ metadata on the quoted triple.
You can implement this as a simple export function in Python that:
- Normalizes IDs and labels.
- Writes base triples for each edge.
- Writes a quoted triple block per edge with non‑structural properties as metadata.
If you decide on RDF‑star‑capable infrastructure (e.g., Jena/GraphDB) I can next sketch a full export function (context_graph JSON → Turtle‑star string) and a corresponding SPARQL‑star query you’d run over that data.
LAYERS of a a CONTEXT GRAPH: 1. Synthetic Grounding - grounding data thats captured as system generated 2. System of Records - metadata and records ontology 3. Initial domain grounding - more traditional knowledge graph structures
You can think of a context graph as stratified into three governed layers, each feeding the next: synthetic grounding, systems of record, and domain KG. This matches how emerging “context layers” are being described for enterprise AI.294295296297298299
1) Synthetic grounding layer (L1)
Purpose: capture system‑generated signals and operational traces that give you “what is happening now” around agents and users.
- Content:
- Agent runs, tool calls, prompts/responses, LangGraph steps, error events.
- Usage telemetry (features clicked, workflows invoked), session context, UI state.
- Synthetic assertions from heuristics or models (e.g., intent labels, risk scores, triage priority).
- Shape in the graph:
- Nodes:
AgentRun,ToolCall,Session,UIEvent,RiskScore,Intent. - Edges:
TRIGGERED_BY(ToolCall → AgentRun),OBSERVED_IN(RiskScore → Encounter),DERIVED_FROM(SyntheticFact → SourceFact).
- Nodes:
- Role:
For your governed context graph, this layer is where you’d attach decision traces and agent‑level audit (e.g., LangGraph step → AuditEvent nodes).
2) System‑of‑record layer (L2)
Purpose: mirror and index core SoR systems and metadata, but in a graph‑friendly, ontology‑aware way.295296297304305294
- Content:
- Shape:
- Nodes:
PatientSoR,EncounterSoR,FHIRResourceRef,Table,Column,Dataset,LineageEvent. - Edges:
RESOLVES_TO(PatientSoR → PatientDomain),HAS_LINEAGE,STORED_IN,HAS_SOR_RECORD.
- Nodes:
- Role:
In your clinic setting, this is where FHIR/OMOP mappings live; the context graph pulls patient/encounter/condition/measurement slices from here.
3) Initial domain grounding layer (L3 – domain KG)
Purpose: encode domain semantics—entities, relationships, and policies—for reasoning and RAG.302308309310311306
- Content:
- Shape:
- Nodes: domain‑level
Patient,Encounter,Condition, etc. (your clinic context graph schema). - Edges:
HAS_ENCOUNTER,HAS_CONDITION,HAS_MEDICATION,GOVERNED_BY,HAS_CONSENT,RECORDED_IN.
- Nodes: domain‑level
- Role:
How these layers combine into a context graph
For a given task (e.g., “Explain why ACE inhibitor was stopped for patient 123”):
- L2 → L3 projection
- L3 domain grounding
- L1 synthetic grounding overlay
The context graph you inject into the agent’s state is then:
- Mostly L3 (domain KG view of the patient episode),
- Filtered and sourced from L2 (SoR),
- Annotated by L1 (synthetic and decision trace).
You can explicitly tag each node/edge with a layer property ("synthetic" | "sor" | "domain") and use that in governance and debugging.
If you want, next I can turn this into a typed schema (Layer enum + node/edge props) and a LangGraph build_context_graph node that composes all three layers for a given patient and clinician.
turn this into a typed schema (Layer enum + node/edge props) and a LangGraph build_context_graph node that composes all three layers for a given patient and clinician. Examples of synthetic grounding data in enterprise context graphs
How to implement system of records ontology in Neo4j, arangodb and Postgresql Relationship between initial domain grounding and RAG What is a context layer in enterprise AI like Atlan Fourth layer or extensions beyond these three in context graphs
Below is a compact, typed schema plus a build_context_graph node that composes the three layers for a given patient and clinician, plus brief notes on your related questions.
1) Typed schema with Layer enum
from typing import TypedDict, Literal, Dict, Any, List, Optional
Layer = Literal["synthetic", "sor", "domain"]
class Node(TypedDict, total=False):
id: str # e.g., "patient:123"
type: str # "Patient","Encounter","Condition","AgentRun",...
layer: Layer # "synthetic" | "sor" | "domain"
name: str
props: Dict[str, Any]
class Edge(TypedDict, total=False):
id: str
type: str # "HAS_ENCOUNTER","HAS_CONDITION","DERIVED_FROM",...
from_: str
to: str
layer: Layer
props: Dict[str, Any]
class ContextGraph(TypedDict):
nodes: List[Node]
edges: List[Edge]
class AgentState(TypedDict):
messages: List[Dict[str, Any]]
patient_id: Optional[str] # "patient:123"
clinician_id: Optional[str] # "clinician:456"
query: Optional[str]
context_graph: Optional[ContextGraph]
governance_metrics: Dict[str, Any]
class ContextConfig(TypedDict):
max_nodes: int
max_depth: int
Layer semantics:
domain: Patient, Encounter, Condition, Medication, Observation, Policy, Consent, AuditEvent (KG view).318319320321sor: raw SoR entities or references (OMOP/FHIR rows, table/column/dataset nodes, lineage entries).320321322323synthetic: AgentRun, ToolCall, RiskScore, Intent, plus synthetic facts derived by agents or models.324325326327328
2) LangGraph build_context_graph node (3‑layer composition)
Assume you have three backends:
sor_client: queries SoR (Postgres/OMOP or FHIR) and returns SoR‑layer nodes/edges.domain_client: maps SoR entities into domain KG and expands the clinical neighborhood.synthetic_client: returns recent agent runs, tool calls, risk scores for this patient/clinician/thread.
from langgraph.graph import StateGraph, START, END
from langgraph.graph.state import Runtime
class SoRClient:
def patient_slice(self, patient_id: str, max_nodes: int) -> ContextGraph:
...
class DomainKGClient:
def patient_episode_graph(
self,
patient_id: str,
max_depth: int,
max_nodes: int,
) -> ContextGraph:
...
class SyntheticClient:
def recent_context(
self,
patient_id: str,
clinician_id: str,
thread_id: str,
max_events: int = 50,
) -> ContextGraph:
...
sor_client = SoRClient()
domain_client = DomainKGClient()
synthetic_client = SyntheticClient()
def merge_graphs(graphs: List[ContextGraph], max_nodes: int) -> ContextGraph:
nodes: Dict[str, Node] = {}
edges: Dict[str, Edge] = {}
for g in graphs:
for n in g["nodes"]:
nodes[n["id"]] = n
for e in g["edges"]:
edges[e["id"]] = e
# Simple cap
return {
"nodes": list(nodes.values())[:max_nodes],
"edges": list(edges.values())[: 2 * max_nodes],
}
def build_context_graph(
state: AgentState,
runtime: Runtime[ContextConfig],
) -> AgentState:
cfg = runtime.context
patient_id = state.get("patient_id")
clinician_id = state.get("clinician_id")
if not patient_id:
return state
max_nodes = cfg.get("max_nodes", 256)
max_depth = cfg.get("max_depth", 2)
thread_id = runtime.config.get("configurable", {}).get("thread_id", "default-thread")
# 1) SoR slice (layer = "sor")
sor_graph = sor_client.patient_slice(patient_id, max_nodes=max_nodes)
for n in sor_graph["nodes"]:
n.setdefault("layer", "sor")
for e in sor_graph["edges"]:
e.setdefault("layer", "sor")
# 2) Domain KG neighborhood (layer = "domain")
domain_graph = domain_client.patient_episode_graph(
patient_id=patient_id,
max_depth=max_depth,
max_nodes=max_nodes,
)
for n in domain_graph["nodes"]:
n.setdefault("layer", "domain")
for e in domain_graph["edges"]:
e.setdefault("layer", "domain")
# 3) Synthetic grounding (agent runs, tool calls) (layer = "synthetic")
synthetic_graph = synthetic_client.recent_context(
patient_id=patient_id,
clinician_id=clinician_id or "",
thread_id=thread_id,
max_events=50,
)
for n in synthetic_graph["nodes"]:
n.setdefault("layer", "synthetic")
for e in synthetic_graph["edges"]:
e.setdefault("layer", "synthetic")
ctx_graph = merge_graphs([sor_graph, domain_graph, synthetic_graph], max_nodes)
# Simple governance metrics
phi_nodes = [n for n in ctx_graph["nodes"] if n.get("props", {}).get("isPHI")]
policies = [n for n in ctx_graph["nodes"] if n["type"] == "Policy"]
synthetic_fraction = (
len([n for n in ctx_graph["nodes"] if n["layer"] == "synthetic"]) / max(len(ctx_graph["nodes"]), 1)
)
state["context_graph"] = ctx_graph
state["governance_metrics"] = {
"phi_node_count": len(phi_nodes),
"policy_count": len(policies),
"synthetic_node_fraction": synthetic_fraction,
}
return state
You wire this node as earlier, with StateGraph(AgentState, context_schema=ContextConfig) and a checkpointer.
3) Examples of synthetic grounding data
Concrete synthetic nodes/edges you’d add in SyntheticClient:325329326328330324
- Nodes:
AgentRun: LangGraph execution step, includingagentName,timestamp,status.ToolCall: invoked tool, inputs/outputs summary, latency, error flags.RiskScore: output of a model estimating readmission risk, sepsis risk, etc.Intent: classifier output for the current query (e.g., “medication_explanation”).
- Edges:
ABOUT_PATIENT: AgentRun → Patient.USED_CONTEXT_FROM: AgentRun → Encounter/Document/Policy nodes.TRIGGERED_TOOL: AgentRun → ToolCall.DERIVED_FROM: RiskScore → (Patient, Encounter, Observations).
This is consistent with how “context layers” capture decision traces and operational signals for agents.331329330324325
4) System‑of‑record ontology in Neo4j / ArangoDB / Postgres
High‑level patterns:
- Neo4j
- Labels:
:PatientSoR,:VisitSoR,:Table,:Column,:Dataset. - Relationships:
(:VisitSoR)-[:VISIT_OF]->(:PatientSoR),(:Column)-[:IN_TABLE]->(:Table),(:Column)-[:LINEAGE_FROM]->(:Column). - Map FHIR/OMOP IDs into node properties and connect them to domain
Patient/EncounterviaRESOLVES_TO.
- Labels:
- ArangoDB
- Collections:
PatientsSoR,VisitsSoR,Tables,Columns, plus edge collectionspatientVisitEdges,columnLineageEdges. - Use AQL graph traversals to link SoR records to domain nodes via
RESOLVES_TO/HAS_SOR_RECORDedges.
- Collections:
- Postgres
- Keep OMOP/FHIR tables as‑is; add “graph overlay” tables like
entity_node(entity_id, entity_type, sor_table, sor_pk)andentity_edge(from_id, to_id, type). - Your
SoRClient.patient_sliceis essentially a query + adapter that buildsNode(layer='sor')/Edge(layer='sor')objects.
- Keep OMOP/FHIR tables as‑is; add “graph overlay” tables like
This makes the SoR layer explicit and queryable, while the domain layer is your cleaned, ontology‑driven graph.321322323332320
5) Relationship between initial domain grounding and RAG
Domain grounding (layer 3) is the semantic backbone for GraphRAG:327333334318320321
- Embeddings retrieve documents/chunks.
- Domain KG connects patients, encounters, conditions, guidelines, and policies.
- Context graph builder:
- Takes retrieved docs + KG nodes,
- Expands along domain edges (e.g., Condition → Medication → Lab Observations),
- Produces a compact, role‑aware subgraph.
RAG then consumes that subgraph (plus text) as structured grounding instead of a flat docs list, improving precision and interpretability.
6) What is a “context layer” in enterprise AI (Atlan‑style)
Vendors like Atlan define a context layer as a unified, governed system that captures and serves organizational knowledge (semantics, lineage, usage, governance, and decision traces) to AI systems in real time.335336337332325331
Typical components:
- Connectors to SoR and SaaS tools (unification).
- Metadata / lineage / business glossary (semantic and governance).
- Usage, collaboration, and quality signals (operational context).
- Activation interfaces (MCP, APIs) for agents and copilots.336337338339335
Your three layers map well to this: SoR + domain KG + synthetic grounding + activation.
7) A fourth layer beyond the three
A natural “L4” in your stack:
- Policy and simulation layer (or “governance / planning layer”)
- Encodes higher‑order rules, simulations, and hypothetical states.
- Nodes:
PolicyModel,Scenario,WhatIfPlan,EvaluationResult. - Edges:
APPLIES_TO(PolicyModel → Policy/Role),EVALUATED_ON(Scenario → ContextGraphSnapshot),RECOMMENDS_ACTION.
This layer lets you run what‑if analyses, policy simulations, and replay over stored context graphs and agent traces, then feed updated policies back down into the domain and synthetic layers as they evolve.340328324325
If you want, next I can expand the three‑layer schema into a four‑layer one (with a layer: "policy" and Scenario/ContextGraphSnapshot nodes) and show how to log and replay context graphs through LangGraph’s checkpointer to feed that simulation layer.
Expand the three‑layer schema into a four‑layer one (with a layer: "policy" and Scenario/ContextGraphSnapshot nodes) and show how to log and replay context graphs through LangGraph’s checkpointer to feed that simulation layer.
Example input data for build_context_graph with sample patient SPARQL query to traverse the three-layer context graph Add compliance governance to Activation layer nodes Healthcare examples of Unification layer connectors Extend schema for multi-clinician shared context
Here’s a compact expansion to a four‑layer schema plus how to log and replay context graphs with LangGraph, then short answers to the related points.
1) Four‑layer schema (add policy layer, Scenario, Snapshot)
Extend Layer and node types:
from typing import TypedDict, Literal, Dict, Any, List, Optional
Layer = Literal["synthetic", "sor", "domain", "policy"]
class Node(TypedDict, total=False):
id: str
type: str
layer: Layer
name: str
props: Dict[str, Any]
class Edge(TypedDict, total=False):
id: str
type: str
from_: str
to: str
layer: Layer
props: Dict[str, Any]
class ContextGraph(TypedDict):
nodes: List[Node]
edges: List[Edge]
class AgentState(TypedDict):
messages: List[Dict[str, Any]]
patient_id: Optional[str]
clinician_id: Optional[str]
query: Optional[str]
context_graph: Optional[ContextGraph]
governance_metrics: Dict[str, Any]
current_scenario_id: Optional[str] # policy layer
class ContextConfig(TypedDict):
max_nodes: int
max_depth: int
New policy‑layer node types:
Scenario(simulated or what‑if context; e.g., “remove ACEI, add ARB”).ContextGraphSnapshot(snapshot of a priorcontext_graphfor replay/simulation).PolicyModel(e.g., sepsis policy engine, HIPAA policy set).EvaluationResult(result of applying a scenario/policy to a snapshot).
Each gets layer="policy" in Node.layer.
Example policy‑layer nodes:
scenario_node: Node = {
"id": "scenario:stop-acei",
"type": "Scenario",
"layer": "policy",
"name": "Stop ACE inhibitor",
"props": {"description": "Simulate stopping ACE inhibitor due to hyperkalemia"}
}
snapshot_node: Node = {
"id": "snapshot:thread-pt123-step12",
"type": "ContextGraphSnapshot",
"layer": "policy",
"name": "Context at step12",
"props": {
"threadId": "pt-123-q-1",
"checkpointId": "ckpt-uuid",
"createdAt": "2026-02-03T04:33:00Z"
}
}
Edges in policy layer (examples):
SCENARIO_APPLIES_TO: Scenario → ContextGraphSnapshot.EVALUATES_POLICY: PolicyModel → Scenario.HAS_EVALUATION: Scenario → EvaluationResult.
All with layer="policy".
2) Logging and replaying context graphs via LangGraph checkpointer
LangGraph stores state per checkpoint (thread, step); you can read past states, including context_graph, and fork or replay from them.347348349350351352
Assume graph = builder.compile(checkpointer=postgres_or_redis_saver).
Log snapshots into the graph
A node to emit a ContextGraphSnapshot policy node:
import uuid
from datetime import datetime, timezone
def log_snapshot(state: AgentState) -> AgentState:
ctx_graph = state.get("context_graph")
if not ctx_graph:
return state
# Assume checkpoint_id is injected into config by LangGraph
checkpoint_id = state.get("props", {}).get("checkpoint_id") # or from runtime/config
thread_id = state.get("props", {}).get("thread_id", "unknown")
snapshot_id = f"snapshot:{thread_id}:{checkpoint_id}"
snapshot_node: Node = {
"id": snapshot_id,
"type": "ContextGraphSnapshot",
"layer": "policy",
"name": f"Snapshot {checkpoint_id}",
"props": {
"threadId": thread_id,
"checkpointId": checkpoint_id,
"createdAt": datetime.now(timezone.utc).isoformat()
}
}
# Append snapshot node into context_graph
nodes = {n["id"]: n for n in ctx_graph["nodes"]}
nodes[snapshot_id] = snapshot_node
ctx_graph["nodes"] = list(nodes.values())
state["context_graph"] = ctx_graph
state["current_scenario_id"] = snapshot_id
return state
You can call this node after build_context_graph and before activation/clinical agents.
Replay from a snapshot (fork a scenario)
Using LangGraph’s time‑travel/forking API:348350352
# Get history for a thread
thread = {"configurable": {"thread_id": "pt-123-q-1"}}
history = list(graph.get_state_history(thread))
# Pick a checkpoint to simulate from
snapshot_state = history[-2] # e.g., before compliance agent
# Fork: new scenario thread id
scenario_thread = {
"configurable": {
"thread_id": "pt-123-q-1-sim-stop-acei",
"checkpoint_id": snapshot_state.config["configurable"]["checkpoint_id"],
}
}
# Resume from that checkpoint with different scenario in state
for event in graph.stream(
{"messages": [], "current_scenario_id": "scenario:stop-acei"},
config=scenario_thread,
stream_mode="values",
):
pass # process simulation events
LangGraph won’t re‑execute steps before the checkpoint; it will continue from that point and create new checkpoints on the forked branch.353352348
3) Example input data for build_context_graph
Example AgentState input for a clinical question:
state: AgentState = {
"messages": [
{"role": "user", "content": "Why was my ACE inhibitor stopped?"},
],
"patient_id": "patient:123",
"clinician_id": "clinician:456",
"query": "Explain ACE inhibitor discontinuation",
"context_graph": None,
"governance_metrics": {},
"current_scenario_id": None,
}
config = {"configurable": {"thread_id": "pt-123-q-1"}}
context = {"max_nodes": 256, "max_depth": 2}
build_context_graph then builds a multi‑layer context_graph with layer tags as above.
4) SPARQL query to traverse three‑layer context
Assuming you export your context graph to RDF with ex:layer as a property, and IRIs like ex:patient123, ex:encounter789, etc.
Example: find all domain conditions for a patient and their SoR origins and synthetic agent runs that used them:
PREFIX ex: <http://example.org/clinic#>
SELECT ?cond ?sorRec ?agentRun
WHERE {
# Domain layer condition
?patient a ex:Patient ;
ex:hasEncounter ?enc .
?enc ex:hasCondition ?cond .
?patient ex:patientId "123" . # or use IRI ex:patient123
?cond ex:layer "domain" .
# SoR records linked to this condition
OPTIONAL {
?sorRec ex:resolvesTo ?cond ;
ex:layer "sor" .
}
# Synthetic agent runs that used this condition in context
OPTIONAL {
?agentRun a ex:AgentRun ;
ex:usedContext ?cond ;
ex:layer "synthetic" .
}
}
This illustrates traversing domain → SoR → synthetic by filtering on layer.354355356357358
5) Add compliance governance to activation layer nodes
Activation layer = the nodes that represent agent actions or external side effects (e.g., writing to EHR, sending a message, executing a tool).359360361362363
Extend node types:
Activation: a concrete action (e.g., “send explanation message”, “propose medication change”).
Add props and edges:
activation_node: Node = {
"id": "activation:msg-1",
"type": "Activation",
"layer": "synthetic",
"name": "Send explanation to patient portal",
"props": {
"targetChannel": "patient_portal",
"containsPHI": True,
"governanceStatus": "pending_approval" # or 'allowed','blocked'
}
}
# Edges
# Activation governed by policies
# Activation reviewed/approved by compliance agent or human
Edges:
GOVERNED_BY: Activation → Policy (HIPAA minimum necessary, local policy).HAS_REVIEW: Activation → AuditEvent orComplianceDecision.
Compliance agent in LangGraph can then:
- Inspect
context_graphforActivationnodes withcontainsPHI=true. - Check for
GOVERNED_BYedges to appropriate Policy nodes and Consent. - Set
governanceStatustoallowedorblockedand emit anAuditEventnode (layerdomainorpolicydepending on modeling).360364365366359
6) Healthcare examples of Unification layer connectors
Unification (SoR) connectors bring clinical data into the system:367368369370
- FHIR servers (R4/R5) via REST APIs: Patient, Encounter, Observation, Condition, MedicationStatement.
- HL7 v2 feeds for ADT, ORU, ORM messages (ADTs to visits, ORUs to observations).
- Direct DB connectors into EHR databases or data warehouses (Epic Clarity, Cerner HealtheIntent, OMOP CDM).
- RIS/LIS/PACS systems for imaging and lab results.
- HIEs and registries for cross‑facility data sharing.
Each connector populates SoR nodes and edges (PatientSoR, VisitSoR, ObservationSoR) and links them to domain Patient / Encounter nodes via RESOLVES_TO / HAS_SOR_RECORD edges.
7) Extend schema for multi‑clinician shared context
To support multi‑clinician collaboration on the same patient/thread:
- Add node types:
Clinician(domain) andUser(app‑level).SharedContextSession(collaborative session over one or more context graphs).
- Edges:
PARTICIPATES_IN: Clinician/User → SharedContextSession.USES_SNAPSHOT: SharedContextSession → ContextGraphSnapshot.HAS_NOTE: SharedContextSession → Document (collaborative notes).
Within context_graph:
- Allow multiple
Cliniciannodes (layer="domain") attached to the same patient/encounter. - Synthetic layer can include
AgentRunnodes tagged withclinicianIdandsessionId, so multiple clinicians see the same synthetic grounding and can add their own “synthetic facts” (e.g., proposed plans) that are properly attributed.
If you want, next I can show a complete ContextGraph JSON example containing all four layers (domain, SoR, synthetic, policy) for a sample multi‑clinician scenario and a LangGraph compliance agent that uses governanceStatus on Activation nodes to gate real side‑effects.
Example code for policy layer nodes in four-layer schema
How to implement checkpointer with Postgres for context graphs Using LangGraph time-travel to fork policy scenarios Integrate FHIR data into ContextGraphSnapshot nodes Best practices for managing checkpoints in production LangGraph
Here’s a compact, code‑first answer for all five items.
1) Example code for policy‑layer nodes
Define a helper to append Scenario and ContextGraphSnapshot nodes to the current context_graph:
from datetime import datetime, timezone
def add_policy_nodes(
state: AgentState,
scenario_name: str,
scenario_description: str,
) -> AgentState:
ctx_graph = state.get("context_graph") or {"nodes": [], "edges": []}
nodes = {n["id"]: n for n in ctx_graph["nodes"]}
# Scenario node (policy layer)
scenario_id = f"scenario:{scenario_name}"
scenario_node: Node = {
"id": scenario_id,
"type": "Scenario",
"layer": "policy",
"name": scenario_name,
"props": {"description": scenario_description},
}
nodes[scenario_id] = scenario_node
# Snapshot node pointing to the current checkpoint/thread
props = state.get("props", {})
checkpoint_id = props.get("checkpoint_id", "unknown")
thread_id = props.get("thread_id", "unknown")
snapshot_id = f"snapshot:{thread_id}:{checkpoint_id}"
snapshot_node: Node = {
"id": snapshot_id,
"type": "ContextGraphSnapshot",
"layer": "policy",
"name": f"Snapshot {checkpoint_id}",
"props": {
"threadId": thread_id,
"checkpointId": checkpoint_id,
"createdAt": datetime.now(timezone.utc).isoformat(),
},
}
nodes[snapshot_id] = snapshot_node
# Link scenario to snapshot
edge_id = f"edge:scenario-applies-{scenario_id}-{snapshot_id}"
ctx_graph["edges"].append({
"id": edge_id,
"type": "SCENARIO_APPLIES_TO",
"from_: scenario_id,
"to": snapshot_id,
"layer": "policy",
"props": {},
})
ctx_graph["nodes"] = list(nodes.values())
state["context_graph"] = ctx_graph
state["current_scenario_id"] = scenario_id
return state
You call add_policy_nodes after building the context graph but before running a simulation branch.
2) Implement Postgres checkpointer for context graphs
Basic PostgresSaver setup for durable checkpoints (short‑term) with context graphs in state:373374375376
from psycopg_pool import ConnectionPool
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.graph import StateGraph, START, END
DB_URI = "postgresql://user:pass@host:5432/langgraph?sslmode=require"
pool = ConnectionPool(conninfo=DB_URI, max_size=10)
with pool.connection() as conn:
checkpointer = PostgresSaver(conn)
checkpointer.setup() # creates checkpoint tables
builder = StateGraph(AgentState, context_schema=ContextConfig)
builder.add_node("build_context", build_context_graph)
builder.add_edge(START, "build_context")
builder.add_edge("build_context", END)
graph = builder.compile(checkpointer=checkpointer)
Best practice: treat checkpoints as operational state, not long‑term memory; promote anything important (e.g., snapshots) into your own tables or a LangGraph Store before pruning.377378379375373
3) Using LangGraph time‑travel to fork policy scenarios
Pattern: get history for a thread, pick a checkpoint, then resume from it with a new current_scenario_id.380381382383
# Original thread config
base_config = {"configurable": {"thread_id": "pt-123-q-1"}}
# Run once
graph.invoke(initial_state, config=base_config)
# 1) Identify a checkpoint
states = list(graph.get_state_history(base_config)) # newest first
chosen = states[-2] # e.g., before compliance node
checkpoint_id = chosen.config["configurable"]["checkpoint_id"]
# 2) Fork a new scenario thread from that checkpoint
scenario_config = {
"configurable": {
"thread_id": "pt-123-q-1-scenario-stop-acei",
"checkpoint_id": checkpoint_id,
}
}
# 3) Resume with modified state (e.g., new scenario)
graph.update_state(
scenario_config,
{
"current_scenario_id": "scenario:stop-acei",
# optionally tweak context_graph/Activation nodes here
},
)
# 4) Resume execution (no new input needed)
for event in graph.stream(None, config=scenario_config, stream_mode="values"):
pass # collect simulation outputs
This gives you a forked policy scenario without rerunning the earlier steps.384382385
4) Integrate FHIR data into ContextGraphSnapshot nodes
Assuming you have FHIR resources in JSON and/or a FHIR→OMOP mapping:386387388
- When creating a
ContextGraphSnapshot, you can embed lightweight FHIR references or keys inprops, not full resources (to avoid bloat).
Example snapshot with FHIR references:
snapshot_node: Node = {
"id": "snapshot:pt-123-q-1:ckpt-abc",
"type": "ContextGraphSnapshot",
"layer": "policy",
"name": "Snapshot before medication change",
"props": {
"threadId": "pt-123-q-1",
"checkpointId": "ckpt-abc",
"createdAt": "2026-02-03T04:33:00Z",
"fhirPatientRef": "Patient/123",
"fhirEncounterRefs": ["Encounter/789"],
"fhirBundleIds": ["Bundle/xyz"] # e.g., cached in FHIR store
},
}
Your SoR/domain clients can then resolve these to OMOP or FHIR resources as needed (e.g., using FHIR Ontop OMOP or an OMOP‑on‑FHIR DB).387388386
5) Best practices for managing checkpoints in production
Patterns that show up repeatedly:378379375373377
- Use Postgres (or DynamoDB, etc.) for production checkpointers; InMemorySaver is for testing only.
- Treat
thread_idas a first‑class key:- One user session / patient case / workflow = one
thread_id. - Include
thread_idin logs and Snapshot nodes for correlation.
- One user session / patient case / workflow = one
- Garbage‑collect old checkpoints:
- Separate “memory” from checkpoints:
- Observe and cap:
- Monitor checkpoint table growth and query latency.
- Consider partitioning by date or
thread_idfor very high volumes.
- Time‑travel discipline:
If you want, next I can show a concrete Postgres schema and GC job for pruning checkpoints (keeping last N per thread_id) and a small helper that creates ContextGraphSnapshot records in your own context_snapshots table alongside LangGraph’s internal checkpoint tables.
Best practices for pruning old Postgres checkpoints in LangGraph
LangGraph Store abstraction for long-term memory beyond checkpoints Example of update_state for branching policy scenarios in LangGraph FHIR Ontop OMOP setup tutorial with Postgres How to add pgvector for vector search in LangGraph PostgresSavercontext drift??
Here’s a concise set of patterns you can reuse.
1) Pruning old Postgres checkpoints (LangGraph)
Checkpoints = operational log, not long‑term memory; prune by time/length per thread.391392393
Typical pattern:
- Keep last N checkpoints or T hours per
thread_id. - Nightly GC job that deletes old rows from
checkpoints,checkpoint_writes,checkpoint_blobs.
Pseudo‑SQL sketch (you’ll adapt to actual table names):
-- Example: keep last 20 checkpoints per thread
WITH ranked AS (
SELECT id, thread_id,
ROW_NUMBER() OVER (PARTITION BY thread_id ORDER BY created_at DESC) AS rn
FROM langgraph_checkpoints
)
DELETE FROM langgraph_checkpoint_blobs
WHERE checkpoint_id IN (
SELECT id FROM ranked WHERE rn > 20
);
DELETE FROM langgraph_checkpoint_writes
WHERE checkpoint_id IN (
SELECT id FROM ranked WHERE rn > 20
);
DELETE FROM langgraph_checkpoints
WHERE id IN (
SELECT id FROM ranked WHERE rn > 20
);
Wire that into a scheduled job (Cron, Cloud Scheduler) and monitor table size/latency.392394393391
2) LangGraph Store abstraction for long‑term memory
LangGraph distinguishes:395396397
- Short‑term:
state+ checkpoints (thread‑scoped). - Long‑term: Stores (document/record stores, often with vector search).
Pattern:
- Use
PostgresSaveronly for checkpoints. - Use a Store (or your own tables/vector DB) for durable facts/preferences.
- Nodes call Store tools to
get/put/searchmemories, then merge intostate.
Docs + Store interfaces cover both scalar + vector memory; on LangGraph Cloud, the Store can be Postgres with vector similarity built‑in.396398397395
3) update_state example for branching policy scenarios
Use update_state to branch from a checkpoint into a new scenario thread.399400401402
# Base config for original conversation
base_cfg = {"configurable": {"thread_id": "pt-123-q-1"}}
# Get history and pick a checkpoint to fork from
history = list(graph.get_state_history(base_cfg))
chosen = history[-2] # e.g., pre-compliance step
checkpoint_id = chosen.config["configurable"]["checkpoint_id"]
# New scenario thread config
scenario_cfg = {
"configurable": {
"thread_id": "pt-123-q-1-scenario-stop-acei",
"checkpoint_id": checkpoint_id,
}
}
# Branch state: set scenario id, tweak context_graph if desired
graph.update_state(
scenario_cfg,
{
"current_scenario_id": "scenario:stop-acei",
# optionally: "context_graph": modified_graph
},
)
# Resume execution on the fork
for event in graph.stream(None, config=scenario_cfg, stream_mode="values"):
pass
This pattern gives you explicit policy scenario branches without rerunning earlier nodes.401403399
4) FHIR Ontop OMOP setup with Postgres (pointer)
For exposing OMOP Postgres as a FHIR‑compatible virtual KG, FHIROntopOMOP shows a full setup:404405406
High‑level steps:
- OMOP on Postgres: deploy OMOP schema + sample data (e.g.,
omop_v5image).405 - Configure Ontop to connect to Postgres:
- Set
listen_addresses='*'+ properpg_hba.confso Docker/VM can reach Postgres.404 - Provide JDBC URL/credentials.
- Set
- Provide mappings from OMOP tables to FHIR model using Ontop (OBDA mappings or FHIR‑Omop IG).406404
- Run Ontop server; query with SPARQL or FHIR‑like APIs over the virtual KG.
For your context graph, you can either:
- Pull FHIR resources via this virtual KG and project into your domain graph, or
- Pull OMOP rows directly and store FHIR references in
ContextGraphSnapshotprops.
5) Adding pgvector for vector search alongside PostgresSaver
Best practice: keep pgvector separate from the checkpoint tables; use it for long‑term vector memory, not for the checkpointer itself.407398408395
Typical steps:
- Enable pgvector:
CREATE EXTENSION IF NOT EXISTS vector;
- Create a vector table:
CREATE TABLE memory_messages (
id BIGSERIAL PRIMARY KEY,
thread_id TEXT NOT NULL,
role TEXT NOT NULL, -- 'user','assistant',...
content TEXT NOT NULL,
embedding vector(1536), -- match your model
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON memory_messages USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
- In your LangGraph node, when you want to store long‑term memory:
def persist_memory(thread_id: str, role: str, content: str, embedding: list[float]):
with psycopg.connect(DSN) as conn:
with conn.cursor() as cur:
cur.execute(
"""
INSERT INTO memory_messages (thread_id, role, content, embedding)
VALUES (%s, %s, %s, %s)
""",
(thread_id, role, content, embedding),
)
- For retrieval (vector search):
SELECT id, thread_id, role, content
FROM memory_messages
WHERE thread_id = $1
ORDER BY embedding <-> $2::vector
LIMIT 10;
This complements PostgresSaver: checkpoints hold short‑term execution state, while pgvector table holds long‑term semantic memory, avoiding context drift by pulling only the most relevant past items instead of entire history.397395407
If you want, next I can sketch a small LangGraph node pair: one that writes conversation turns to the pgvector memory table and one retriever node that injects the top‑k similar past turns into context_graph or messages before the main reasoning agent runs.
TrustGraph github
The main TrustGraph repo you’re looking for is:
- GitHub: https://github.com/trustgraph-ai/trustgraph
- Website/docs: https://trustgraph.ai
That repo is the open‑source “Context Graph Engine / Factory for AI” with GraphRAG, ontology‑RAG, and agent integration.
DSL
A DSL (domain‑specific language) is a small language tailored to a specific problem domain, rather than a general programming language meant for everything.421422423
- Examples: SQL for relational queries, HTML/CSS for web layout, regex syntax for pattern matching, Terraform’s HCL for infrastructure, Drools rules DSLs for business rules.422424425421
- DSLs can be external (their own parser/interpreter, like SQL or a custom config language) or embedded/internal (APIs or fluent builders inside a host language, like a testing DSL in Ruby or Kotlin).426421422
They’re useful when a recurring class of problems in your domain can be expressed more clearly, concisely, and safely with a specialized syntax/abstractions than with a general‑purpose language.427425421422
How to install and run TrustGraph locally
TrustGraph quickstart tutorial Context Graphs vs traditional knowledge graphs TrustGraph CLI examples for graph extraction TrustGraph supported LLMs and integrations. DSL examples for graph databases Internal DSL vs external DSL comparison Tools for building DSLs like ANTLR DSL for knowledge graph schemas How to implement a DSL
Here’s a tight rundown you can drop into your notes.
TrustGraph: install, quickstart, CLI
- Repo: https://github.com/trustgraph-ai/trustgraph
- Docs: https://docs.trustgraph.ai
- Tutorial video (launch locally): “How to Launch TrustGraph Tutorial”431
Install and run locally (typical pattern)
From the quickstart / tutorial:431432433
- Install CLI:
pip install "trustgraph-cli==0.5.2"
# or latest version per README
- Use the Configuration Builder (web UI or provided configs) to download a deployment package with:
docker-compose.yamlconfig/YAML (LLM, graph DB, vector DB, chunking, etc.)- optional sample data and prompts432433
- Launch via Docker:
# from the directory with docker-compose.yaml
docker compose up -d
# later:
docker compose down -v
The tutorial shows using tg CLI commands (e.g., tg load-sample-documents) and exploring the UI/Data Workbench.434435431
CLI examples for graph extraction
The CLI supports:
- Loading documents into the Librarian, building graph edges + vector embeddings as “Knowledge Packages”.
- Running GraphRAG queries and inspecting subgraphs.435436437434432
Concrete command names evolve, but patterns from their materials include:
tg load-sample-documents
tg upload-data my_docs/
tg run-flow my_flow.yaml
tg query --graph-rag "What obligations does this contract create?"
(Use tg --help and docs for the exact verbs in the version you install.)433431432
Context graphs vs traditional knowledge graphs (TrustGraph view)
TrustGraph frames context graphs as AI‑optimized, governed subgraphs for a specific task, built from a more general knowledge/metadata layer:438437439432
- Traditional KG: global, persistent, schema/ontology‑driven; good for enterprise semantics and lineage.
- Context graph: small, dynamic, relevance‑ranked subgraph (plus embeddings) tuned for LLM grounding and agent workflows (GraphRAG, tools, policy checks).
Their “Knowledge Core” = graph edges + mapped vector embeddings; context graphs are subgraphs over these cores used per query.436437432
Supported LLMs & integrations (high‑level)
From the configuration builder / architecture:436432433
- LLMs: OpenAI, Anthropic, Google AI Studio, local models (via OpenAI‑compatible APIs) selectable in config.
- Graph stores: Neo4j, Memgraph, FalkorDB, plus others depending on version.437440432436
- Vector DBs: Qdrant, Pinecone, others as configured.432436
- Tools: MCP tools, custom agent tools, chunking/embedding pipelines configured via YAML.433432
DSLs for graph / KG work
DSL examples for graph databases
- Cypher (Neo4j): pattern‑matching DSL for graph queries.
MATCH (p:Patient)-[:HAS_CONDITION]->(c:Condition {code: "N18.31"})
RETURN p, c;
- Gremlin: traversal DSL for TinkerPop graphs.
- Graph DSL via ANTLR: e.g., external DSL
Graph { A -> B (10) }compiled to Graph/Vertex/Edge model.441442
From the ANTLR graph DSL example:442441
grammar Graph;
graph : 'Graph' '{' edge+ '}' ;
vertex : ID ;
edge : vertex '->' vertex '(' NUM ')' ;
ID : [a-zA-Z]+ ;
NUM : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
Input:
Graph {
A -> B (10)
B -> C (20)
}
This is an external DSL that the parser turns into a semantic model (Graph/Edge/Vertex objects).441442
Internal vs external DSL
- Internal DSL: embedded in a host language’s syntax (e.g., fluent builders in Kotlin/Scala/Ruby).
- Pros: no separate parser; leverage IDE, types, tooling.
- Example (Python-ish):
g = (
Graph()
.node("patient123").label("Patient")
.node("cond_ckd3a").label("Condition")
.edge("patient123", "HAS_CONDITION", "cond_ckd3a")
)
- External DSL: its own syntax and parser (ANTLR, PEG, etc.).
Tools for building DSLs (esp. for KGs)
- ANTLR: widely used parser generator for external DSLs; generates parsers in Java, Python, C#, etc.442441
- PEG/Parsers: Pegen, Lark, Parsec, etc.
- Xtext (Eclipse), JetBrains MPS: for full language workbenches.
- Martin Fowler’s DSL guide gives design patterns for internal/external DSLs and semantic models.444445446
DSL for knowledge graph schemas (how to implement)
You can design a small DSL to define KG schemas (entities, edges, constraints) and compile it to Neo4j/SQL/Arango/FHIR/RDF.
Example external DSL sketch:
schema Clinic {
entity Patient {
key id
prop mrn: string
prop isPHI: bool
}
entity Condition {
key id
prop code: string
prop system: string
}
relation HAS_CONDITION {
from Patient
to Condition
cardinality many
}
}
ANTLR‑style grammar fragment:
grammar ClinicSchema;
schema : 'schema' ID '{' (entity | relation)* '}' ;
entity : 'entity' ID '{' (prop | keyDecl)+ '}' ;
prop : 'prop' ID ':' ID ;
keyDecl : 'key' ID ;
relation : 'relation' ID '{'
'from' ID
'to' ID
'cardinality' ID
'}' ;
ID : [a-zA-Z_][a-zA-Z0-9_]* ;
WS : [ \t\r\n]+ -> skip ;
Compiler pipeline:
- Parse DSL → AST (ANTLR).
- Build semantic model:
Entity,Property,Relationclasses. - Generate:
- Cypher constraints and relationship types (Neo4j), or
- SQL table/edge definitions (Postgres/Arango), or
- RDF/OWL/Turtle schema.
For an internal DSL in Python for schema:
schema = (
Schema("Clinic")
.entity("Patient")
.key("id")
.prop("mrn", "string")
.prop("isPHI", "bool")
.entity("Condition")
.key("id")
.prop("code", "string")
.prop("system", "string")
.relation("HAS_CONDITION", "Patient", "Condition", cardinality="many")
)
Same semantic model, but you rely on Python’s syntax instead of a dedicated grammar.
If you’d like, next I can sketch a tiny ClinicSchema DSL→Cypher code generator: given the DSL above, emit CREATE CONSTRAINT and CREATE INDEX plus relationship patterns for Neo4j or Postgres edge tables.
TrustGraph enterprise licensing and pricing
TrustGraph’s software is open‑source under Apache 2.0; the enterprise cost model is infrastructure‑based rather than per‑query SaaS.
- The main repo
trustgraph-ai/trustgraphis licensed under Apache License 2.0, so you can run and modify it in your own environment without license fees, subject to standard Apache terms (notice preservation, patent grant, etc.).451452453 - For enterprise deployments, their own comparison vs Graphlit describes the cost model as “Infrastructure‑Based”: you pay for your own graph DB, vector DB, compute, and storage (e.g., Neo4j cluster, Qdrant, K8s), plus optional enterprise support, with no per‑request charges from TrustGraph itself.454
- They contrast this with usage‑based SaaS (per document, per query, per GB) and position TrustGraph as better for high‑volume, cost‑sensitive workloads where you want predictable infra costs and full data sovereignty.455456454
They don’t publish a simple public price sheet for support / enterprise services; the model is effectively:
- Software: Apache‑2.0 OSS (no license fee).
- Costs: your infra + optional paid support/consulting (contact them via site for quotes).457456451454
For an accurate enterprise quote (support tiers, SLAs, onboarding), you’d need to reach out through their site or sales/contact channels.
TrustGraph Configuration UI tutorial
Common TrustGraph deployment errors and fixes TrustGraph data workbench GraphRAG examples Integrate TrustGraph with Qdrant vector DB TrustGraph enterprise licensing and pricing
TrustGraph’s docs and videos cover all of these at a high level; here’s a concise map with where to look and what to expect.
Configuration UI tutorial
TrustGraph ships a configuration builder / UI that helps you generate a deployable package (Docker compose + configs):466467468469
- Walkthrough:
- TrustGraph YouTube: “How to Launch TrustGraph Tutorial” and “TrustGraph 101” show using the UI to:
- Docs hub:
- https://docs.trustgraph.ai – sections on “Get Started”, “Configuration”, “Deployment” cover the UI fields and how they map to config files.468472
Common deployment errors and fixes
From README/discussions + partner writeups:473474466
Typical issues:
- Container dependencies not reachable (graph DB, vector DB, LLM endpoint):
- Missing or incompatible graph DB / vector DB versions:
- LLM auth failures:
- Fix: ensure API keys and base URLs are set in the config UI or
.envfile, and that the selected LLM provider matches the key (OpenAI vs Anthropic vs GCP).
- Fix: ensure API keys and base URLs are set in the config UI or
- Resource limits (OOM during extraction / GraphRAG flows):
- Fix: lower parallelism / batch sizes in flows; adjust container memory/CPU in compose; prune sample data.
Troubleshooting aids:
- Logs from
tg-api, graph DB, and vector DB containers (viadocker compose logs), plus tests in the “How to Explore Knowledge with the Test Suite” video.471473474
Data Workbench / GraphRAG examples
TrustGraph’s Data Workbench and published case studies show GraphRAG patterns:473474466
- Workbench capabilities:
- Example scenarios:
- Memgraph/TrustGraph blog: complex industrial domains where GraphRAG is used to retrieve multi‑hop context (assets, components, incidents) for a query.473
- Qdrant case study: high‑volume agentic workflows where TrustGraph uses a combination of knowledge packages, vector search, and graph traversal for contextual grounding.474
Look for “GraphRAG”, “Knowledge Packages”, and “Data Workbench” sections in the docs/guides and videos.468472466474473
Integrating TrustGraph with Qdrant
TrustGraph explicitly supports Qdrant as a vector DB backend:466474
- Configuration UI:
- Choose Qdrant as the vector DB provider, specify host/port/API key.
- Docker‑based deployment:
- Flow:
For a concrete setup, follow the Qdrant case study which outlines the architecture and config blocks.474
Enterprise licensing and pricing
- License: core TrustGraph repo is Apache 2.0 OSS (no software license fee to run/modify).475476466
- Enterprise pricing model:
If you plan to run it as part of your own platform, the main cost drivers are: graph DB licenses (if using enterprise Neo4j etc.), managed vector DB (or self‑hosted Qdrant), and compute for extraction and serving.
If you want, next I can sketch a minimal docker‑compose + config layout for a local TrustGraph + Memgraph + Qdrant + OpenAI stack that mirrors your clinic context graph use case, including where you’d plug in your own extraction DSL or LangGraph‑based agents.
sketch a minimal docker‑compose + config layout for a local TrustGraph + Memgraph + Qdrant + OpenAI stack that mirrors your clinic context graph use case, including where you’d plug in your own extraction DSL or LangGraph‑based agents.
Below is a minimal, opinionated sketch you can adapt. It’s not full YAML from TrustGraph, but shows the structure, services, and where your clinic context graph + LangGraph agents plug in.
1) High-level architecture
Services:
trustgraph-api: TrustGraph engine (GraphRAG & orchestration).memgraph: graph store for knowledge/core graph.qdrant: vector DB for embeddings.langgraph-backend: your LangGraph multi‑agent pipeline for clinic context graphs.fhir-pg: Postgres/OMOP/FHIR DB (SoR) for patient data.nginx(optional): single entrypoint for UI/API.
Config:
config/trustgraph.yaml: TrustGraph connections, LLM, chunking.config/langgraph.yaml: endpoints and keys for TrustGraph, FHIR DB, Qdrant (optional).flows/clinic_graphrag.yaml: TrustGraph flow using your extraction DSL.
2) docker-compose.yml (minimal sketch)
version: "3.9"
services:
memgraph:
image: memgraph/memgraph-mage:latest
container_name: memgraph
ports:
- "7687:7687" # Bolt
- "7444:7444" # Web interface
environment:
MEMGRAPH="--log-level=INFO"
volumes:
- memgraph_data:/var/lib/memgraph
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333" # HTTP
- "6334:6334" # gRPC
volumes:
- qdrant_data:/qdrant/storage
fhir-pg:
image: postgres:16
container_name: fhir-pg
environment:
POSTGRES_DB: fhir_omop
POSTGRES_USER: fhiruser
POSTGRES_PASSWORD: fhirpass
ports:
- "5432:5432"
volumes:
- fhir_pg_data:/var/lib/postgresql/data
# seed with OMOP/FHIR schema + data via init scripts/SQL
trustgraph-api:
image: trustgraph/trustgraph-api:latest
container_name: trustgraph-api
depends_on:
- memgraph
- qdrant
environment:
TG_CONFIG_PATH: /app/config/trustgraph.yaml
OPENAI_API_KEY: ${OPENAI_API_KEY}
volumes:
- ./config:/app/config
- ./flows:/app/flows
- ./data:/app/data
ports:
- "8080:8080" # TrustGraph API / UI
langgraph-backend:
build: ./langgraph-backend
container_name: langgraph-backend
depends_on:
- fhir-pg
- trustgraph-api
environment:
FHIR_PG_DSN: "postgresql://fhiruser:fhirpass@fhir-pg:5432/fhir_omop"
TRUSTGRAPH_API_URL: "http://trustgraph-api:8080"
OPENAI_API_KEY: ${OPENAI_API_KEY}
ports:
- "8000:8000" # LangGraph HTTP API / SSE
volumes:
memgraph_data:
qdrant_data:
fhir_pg_data:
3) TrustGraph config (config/trustgraph.yaml)
Key ideas: point TrustGraph at Memgraph + Qdrant + OpenAI; define a clinic extraction flow that your DSL feeds.
# config/trustgraph.yaml
llm:
provider: openai
model: gpt-4.1-mini
api_key_env: OPENAI_API_KEY
graph:
type: memgraph
uri: bolt://memgraph:7687
username: ""
password: ""
vector_store:
type: qdrant
url: http://qdrant:6333
grpc_url: http://qdrant:6334
collection: clinic_docs
knowledge_packages:
- name: clinic_guidelines
source_path: /app/data/clinic_guidelines
chunk_size: 1024
chunk_overlap: 128
embedding_model: text-embedding-3-large
flows:
- id: clinic_graphrag
path: /app/flows/clinic_graphrag.yaml
Example flow (flows/clinic_graphrag.yaml), showing where your extraction DSL fits (as a custom step):
id: clinic_graphrag
steps:
- id: chunk_and_embed
type: embedding
input: clinic_guidelines
output_collection: clinic_docs
- id: extract_graph
type: custom
description: "Call clinic-DSL extractor to generate entities/edges"
handler: clinic_dsl_extractor # TrustGraph will call this (Python code in container)
input_collection: clinic_docs
- id: build_knowledge_package
type: graph_package
graph_db: memgraph
vector_db: qdrant
from: extract_graph
- id: graph_rag_query
type: graph_rag
graph_db: memgraph
vector_db: qdrant
llm: default
params:
max_hops: 2
top_k: 10
Your DSL extractor (e.g., Python using ANTLR or pattern rules) transforms text chunks into Patient, Condition, Medication, Policy nodes and their edges and writes them into Memgraph and Qdrant.
4) LangGraph backend: clinic context graph + TrustGraph integration
langgraph-backend/Dockerfile (sketch):
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
requirements.txt:
langgraph
langchain-openai
psycopg[binary]
httpx
pydantic
Example node that calls TrustGraph’s GraphRAG API to enrich the domain layer of your context_graph:
import httpx
from typing import Dict, Any
from langgraph.graph import StateGraph, START, END
TRUSTGRAPH_API_URL = "http://trustgraph-api:8080"
def enrich_with_trustgraph(state: AgentState) -> AgentState:
patient_id = state.get("patient_id")
if not patient_id:
return state
query = state.get("query") or "Explain recent medication changes"
payload = {
"flow_id": "clinic_graphrag",
"query": query,
"context": {"patient_id": patient_id},
}
with httpx.Client(timeout=30.0) as client:
resp = client.post(f"{TRUSTGRAPH_API_URL}/v1/graphrag/query", json=payload)
resp.raise_for_status()
data: Dict[str, Any] = resp.json()
# Assume TrustGraph returns nodes/edges (domain) in a simple schema
tg_graph = data.get("context_graph", {"nodes": [], "edges": []})
for n in tg_graph["nodes"]:
n.setdefault("layer", "domain")
for e in tg_graph["edges"]:
e.setdefault("layer", "domain")
ctx = state.get("context_graph") or {"nodes": [], "edges": []}
existing_nodes = {n["id"]: n for n in ctx["nodes"]}
existing_edges = {e["id"]: e for e in ctx["edges"]}
for n in tg_graph["nodes"]:
existing_nodes[n["id"]] = n
for e in tg_graph["edges"]:
existing_edges[e["id"]] = e
state["context_graph"] = {
"nodes": list(existing_nodes.values()),
"edges": list(existing_edges.values()),
}
return state
Your full LangGraph pipeline then:
- Builds SoR + synthetic layers (
SoRClient,SyntheticClient). - Calls
enrich_with_trustgraphto add domain‑layer graph context from TrustGraph. - Runs clinical + compliance agents consuming
context_graph.
5) Where the extraction DSL plugs in
You have two main options:
- Inside TrustGraph (preferred for KG build):
- Provide a
clinic_dsl_extractorstep in the TrustGraph flow, implemented as a Python handler in thetrustgraph-apiimage that:- Parses clinic documents with your DSL (ANTLR or internal DSL).
- Writes entities/edges into Memgraph and embeddings into Qdrant.
- Provide a
- Pre‑processor feeding TrustGraph’s data directory:
- Run a separate container / script that:
- Reads raw clinical guidelines / SOPs.
- Produces normalized JSON/CSV representing KG edges.
- TrustGraph then ingests that as a data source for the
graph_packagestep.
- Run a separate container / script that:
LangGraph stays focused on per‑patient context graphs and agent workflows; TrustGraph handles offline graph extraction + GraphRAG.
If you want, next I can:
- Sketch the
clinic_dsl_extractorPython handler signature that TrustGraph would call, and how it writes to Memgraph/Qdrant, or - Show a small end‑to‑end example query: FHIR/OMOP → LangGraph
build_context_graph→ TrustGraph GraphRAG → combinedcontext_graphinjected to the LLM.