Skip to main content

Memory System Implementation Guide

Building Multi-Layer Memory Architecture for Agentic AI

Document ID: A2-MEMORY-GUIDE
Version: 1.0
Category: P1 - Implementation Guides


Memory Architecture Overview

The Four Memory Layers

LayerPersistenceCapacityAccess SpeedUpdate Frequency
ParametricPermanentBillions paramsInstantTraining only
Short-termSession8K-200K tokensInstantEvery turn
Long-termPersistentUnlimited50-500msAs needed
AuditImmutableUnlimited100-1000msEvery action

Short-Term Memory

Context Window Management

class ContextWindow:
max_tokens: int = 128000
reserve_tokens: int = 4000

def add_message(self, message):
if message.token_count > self.available_tokens:
self._truncate_for_space(message.token_count)
self.messages.append(message)

def _truncate_for_space(self, needed):
# 1. Remove old tool results
# 2. Summarize middle section
# 3. Apply sliding window

Session State

  • Current task and progress
  • Extracted entities
  • User preferences (session-level)
  • Topic stack and pending questions

Working Memory (Scratchpad)

  • Current reasoning chain
  • Intermediate computations
  • Hypotheses being evaluated
  • Retrieved evidence buffer

Long-Term Memory

Vector Store Implementation

class VectorStore:
async def upsert(self, id, vector, metadata) -> bool
async def query(self, vector, top_k, filter) -> List[Tuple]
async def delete(self, id) -> bool

Recommended Backends:

  • < 1M vectors: pgvector
  • 1M vectors: Pinecone, Weaviate

Knowledge Graph

  • Entity nodes with embeddings
  • Relationship edges with weights
  • Semantic + structural queries
  • Neo4j or Amazon Neptune

Episodic Memory

class Episode:
task: str
actions: List[Dict]
outcome: str
success: bool
reflections: List[str]
lessons: List[str]
embedding: List[float]

Audit Memory

Immutable Action Log

class AuditRecord:
id: str
timestamp: float
action_type: ActionType # tool_call, decision, state_change
agent_id: str
action_name: str
action_input: Dict
action_output: Dict
previous_hash: str # Chain integrity
record_hash: str

Decision Trace

  • Decision point with context
  • Options considered
  • Selected option with rationale
  • Confidence score
  • Relevant memories/evidence used

Memory Patterns by Paradigm

ParadigmPrimary MemoryKey Components
LSRContext windowMinimal retrieval
GSLong-term + Evidence ledgerVector store, citations
EPEpisodic + ReflexionWorking memory, learning
VEAudit + State registerImmutable logs, protocols

Unified Memory Manager

class MemoryManager:
# Short-term
def get_session(self, session_id) -> SessionState
def get_working_memory(self, session_id) -> WorkingMemory

# Long-term
async def store_knowledge(self, content, metadata, type)
async def retrieve_knowledge(self, query, top_k, type)

# Episodic
async def store_episode(self, episode)
async def recall_similar_episodes(self, query, top_k)

# Audit
async def log_action(self, **kwargs) -> AuditRecord
async def record_decision(self, **kwargs) -> DecisionPoint

# Composite
async def build_context(self, session_id, query,
include_episodes, include_knowledge)

Performance Optimization

Caching Strategies

Cache TypeTTLUse Case
Embedding cache24hRepeated queries
Query result cache5mHot queries
Session cacheSessionActive sessions

Scaling Recommendations

ComponentHorizontalVertical
Vector store✓ (sharding)
Knowledge graphLimited
Audit log✓ (partitioning)
Session state✓ (Redis cluster)Limited

Document maintained by CODITECT Platform Team