Memory System Implementation Guide
Building Multi-Layer Memory Architecture for Agentic AI
Document ID: A2-MEMORY-GUIDE
Version: 1.0
Category: P1 - Implementation Guides
Memory Architecture Overview
The Four Memory Layers
| Layer | Persistence | Capacity | Access Speed | Update Frequency |
|---|---|---|---|---|
| Parametric | Permanent | Billions params | Instant | Training only |
| Short-term | Session | 8K-200K tokens | Instant | Every turn |
| Long-term | Persistent | Unlimited | 50-500ms | As needed |
| Audit | Immutable | Unlimited | 100-1000ms | Every action |
Short-Term Memory
Context Window Management
class ContextWindow:
max_tokens: int = 128000
reserve_tokens: int = 4000
def add_message(self, message):
if message.token_count > self.available_tokens:
self._truncate_for_space(message.token_count)
self.messages.append(message)
def _truncate_for_space(self, needed):
# 1. Remove old tool results
# 2. Summarize middle section
# 3. Apply sliding window
Session State
- Current task and progress
- Extracted entities
- User preferences (session-level)
- Topic stack and pending questions
Working Memory (Scratchpad)
- Current reasoning chain
- Intermediate computations
- Hypotheses being evaluated
- Retrieved evidence buffer
Long-Term Memory
Vector Store Implementation
class VectorStore:
async def upsert(self, id, vector, metadata) -> bool
async def query(self, vector, top_k, filter) -> List[Tuple]
async def delete(self, id) -> bool
Recommended Backends:
- < 1M vectors: pgvector
-
1M vectors: Pinecone, Weaviate
Knowledge Graph
- Entity nodes with embeddings
- Relationship edges with weights
- Semantic + structural queries
- Neo4j or Amazon Neptune
Episodic Memory
class Episode:
task: str
actions: List[Dict]
outcome: str
success: bool
reflections: List[str]
lessons: List[str]
embedding: List[float]
Audit Memory
Immutable Action Log
class AuditRecord:
id: str
timestamp: float
action_type: ActionType # tool_call, decision, state_change
agent_id: str
action_name: str
action_input: Dict
action_output: Dict
previous_hash: str # Chain integrity
record_hash: str
Decision Trace
- Decision point with context
- Options considered
- Selected option with rationale
- Confidence score
- Relevant memories/evidence used
Memory Patterns by Paradigm
| Paradigm | Primary Memory | Key Components |
|---|---|---|
| LSR | Context window | Minimal retrieval |
| GS | Long-term + Evidence ledger | Vector store, citations |
| EP | Episodic + Reflexion | Working memory, learning |
| VE | Audit + State register | Immutable logs, protocols |
Unified Memory Manager
class MemoryManager:
# Short-term
def get_session(self, session_id) -> SessionState
def get_working_memory(self, session_id) -> WorkingMemory
# Long-term
async def store_knowledge(self, content, metadata, type)
async def retrieve_knowledge(self, query, top_k, type)
# Episodic
async def store_episode(self, episode)
async def recall_similar_episodes(self, query, top_k)
# Audit
async def log_action(self, **kwargs) -> AuditRecord
async def record_decision(self, **kwargs) -> DecisionPoint
# Composite
async def build_context(self, session_id, query,
include_episodes, include_knowledge)
Performance Optimization
Caching Strategies
| Cache Type | TTL | Use Case |
|---|---|---|
| Embedding cache | 24h | Repeated queries |
| Query result cache | 5m | Hot queries |
| Session cache | Session | Active sessions |
Scaling Recommendations
| Component | Horizontal | Vertical |
|---|---|---|
| Vector store | ✓ (sharding) | ✓ |
| Knowledge graph | Limited | ✓ |
| Audit log | ✓ (partitioning) | ✓ |
| Session state | ✓ (Redis cluster) | Limited |
Document maintained by CODITECT Platform Team