Skip to main content

Better Methods: Advanced Alternatives to File-Based External Memory

Assessment of the Video's Approach

What It Gets Right

  • Core insight is valid: Externalized state enables unbounded processing
  • Checkpoint-resume pattern: Fundamental technique that scales
  • Accessibility: Zero-code barrier for non-technical users
  • Immediate utility: Works today with Claude Code

Critical Limitations

LimitationImpactSeverity
Linear processingNo parallelization, O(n) time complexityHigh
No semantic awarenessProcesses files in order, not by relevanceHigh
Full re-read on resumeReads entire file set each checkpointMedium
No pre-filteringLLM processes raw data, high token wasteHigh
Single-level memoryNo hierarchy, flat structureMedium
No quality validationHallucinations propagate through summariesHigh
Manual prompt engineeringEach use case requires custom promptMedium

Alternative Method 1: Map-Reduce Pattern

Concept

Split documents into chunks, process in parallel ("map"), then aggregate results ("reduce"). Enables horizontal scaling and dramatically reduces wall-clock time.

Architecture

┌────────────────────────────────────────────────────────────┐
│ MAP PHASE │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Doc 1 │ │ Doc 2 │ │ Doc 3 │ │ Doc N │ │
│ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │Summary │ │Summary │ │Summary │ │Summary │ PARALLEL │
│ │ 1 │ │ 2 │ │ 3 │ │ N │ │
│ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │
│ │ │ │ │ │
└──────┼───────────┼───────────┼───────────┼─────────────────┘
│ │ │ │
└─────────┬─┴─────────┬─┴───────────┘
│ │
▼ ▼
┌────────────────────────────────────────────────────────────┐
│ REDUCE PHASE │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Aggregate Summaries │ │
│ │ (Combine, Deduplicate, Synthesize) │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Final Output │ │
│ └───────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘

Advantages Over Video Method

  • 50 files in parallel vs sequential → 10-50x faster
  • Independent failure isolation → one bad file doesn't block others
  • Optimal token utilization → each agent processes within context window
  • Stateless map workers → no checkpoint complexity per worker

Implementation (LangChain Example)

from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_anthropic import ChatAnthropic

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=10000,
chunk_overlap=1000
)
docs = text_splitter.split_documents(raw_documents)

# Map-reduce chain
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
chain = load_summarize_chain(
llm,
chain_type="map_reduce",
map_prompt=MAP_PROMPT,
combine_prompt=REDUCE_PROMPT
)

result = chain.invoke(docs)

When to Use

  • Batch processing of independent documents
  • Time-sensitive analysis (need results fast)
  • Documents don't have cross-references
  • Infrastructure supports parallel API calls

Alternative Method 2: Hierarchical Summarization

Concept

Build a pyramid of summaries: summarize chunks → summarize summaries → summarize the summary of summaries. Each level compresses information while preserving key insights.

Architecture

Level 3 (Final):     ┌─────────────────────┐
│ Master Summary │
│ (500 tokens) │
└──────────┬──────────┘

Level 2 (Sections): ┌──────────┴──────────┐
│ │
┌──────┴─────┐ ┌──────┴─────┐
│ Section │ │ Section │
│ Summary A │ │ Summary B │
│ (2K tokens)│ │ (2K tokens)│
└──────┬─────┘ └──────┬─────┘
│ │
Level 1 (Chunks): │ │
┌─────────┼─────────┐ ┌────────┼────────┐
│ │ │ │ │ │
┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐
│ C1 │ │ C2 │ │ C3 │ │ C4 │ │ C5 │
│ Sum │ │ Sum │ │ Sum │ │ Sum │ │ Sum │
└─────┘ └─────┘ └─────┘ └─────┘ └─────┘
▲ ▲ ▲ ▲ ▲
│ │ │ │ │
Level 0: Doc1 Doc2 Doc3 Doc4 Doc5
(Raw) (Raw) (Raw) (Raw) (Raw)

Key Insight: Compression Ratios

LevelInput SizeOutput SizeCompression
L0 → L110K tokens1K tokens10:1
L1 → L25K tokens (5×1K)2K tokens2.5:1
L2 → L34K tokens (2×2K)500 tokens8:1

Advantages Over Video Method

  • Multi-level retrieval: Query at appropriate granularity
  • Progressive disclosure: Start with summary, drill down as needed
  • Model flexibility: Use cheaper models for lower levels
  • Parallelizable at each level: All L1 summaries can run in parallel

Critical Warning: Hallucination Propagation

Research from Pieces.app shows that over-preprocessing increases hallucinations:

"The more pre-processing we did, the more hallucinations were created, and the worse the final summaries."

Mitigation:

  • Preserve exact quotes at lower levels
  • Use extractive summaries (exact text selection) before abstractive
  • Ground upper levels in lower-level citations

Implementation Pattern

async def hierarchical_summarize(
documents: List[str],
chunk_size: int = 10000,
compression_ratio: float = 0.1
) -> HierarchySummary:
"""Build hierarchical summary pyramid"""

# Level 0 → Level 1: Chunk summaries (parallel)
chunks = split_into_chunks(documents, chunk_size)
level_1 = await asyncio.gather(*[
summarize_chunk(chunk, target_size=int(chunk_size * compression_ratio))
for chunk in chunks
])

# Level 1 → Level 2: Section summaries
sections = group_related_summaries(level_1, group_size=5)
level_2 = await asyncio.gather(*[
merge_summaries(section)
for section in sections
])

# Level 2 → Level 3: Master summary
level_3 = await create_master_summary(level_2)

return HierarchySummary(
master=level_3,
sections=level_2,
chunks=level_1,
raw_count=len(documents)
)

When to Use

  • Long-form document analysis (books, reports, transcripts)
  • Need both high-level overview AND ability to drill down
  • Information density varies across document
  • Query complexity varies (simple → detailed)

Alternative Method 3: RAG with Vector Store

Concept

Instead of processing all documents sequentially, embed documents into a vector database and retrieve only the most relevant chunks for each query. Process 10% of data, get 90% of value.

Architecture

INGESTION PHASE (One-time):
┌──────────────────────────────────────────────────────────────┐
│ ┌────────┐ ┌────────────┐ ┌─────────────────────┐ │
│ │Raw Docs│───►│ Chunker + │───►│ Vector Database │ │
│ │ │ │ Embedder │ │ (Pinecone/Chroma) │ │
│ └────────┘ └────────────┘ └─────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

QUERY PHASE (Per-request):
┌──────────────────────────────────────────────────────────────┐
│ ┌────────┐ ┌────────────┐ ┌─────────────────────┐ │
│ │ Query │───►│ Embed + │───►│ Top-K Retrieval │ │
│ │ │ │ Search │ │ (k=5-20 chunks) │ │
│ └────────┘ └────────────┘ └──────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ LLM Generation │ │
│ │ (with context) │ │
│ └─────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Advantages Over Video Method

  • Semantic relevance: Retrieve by meaning, not file order
  • Massive scale: Handle millions of documents
  • Query-specific context: Only load what's needed
  • Incremental updates: Add new docs without reprocessing all
  • Sub-second retrieval: Index enables O(log n) search

RAG Variants for Different Use Cases

VariantDescriptionBest For
Traditional RAGSimple retrieve + generateFAQ, simple queries
Long RAGRetrieve entire sections, not snippetsContext-heavy analysis
Self-RAGModel decides when to retrieveComplex reasoning
Corrective RAGValidates and corrects retrieved contextHigh-accuracy needs
GraphRAGKnowledge graph + vector retrievalEntity relationships
Adaptive RAGDynamic strategy based on queryMixed query types

Implementation (Simple RAG)

from langchain_community.vectorstores import Chroma
from langchain_anthropic import ChatAnthropic
from langchain.chains import RetrievalQA

# Create vector store (one-time)
vectorstore = Chroma.from_documents(
documents=split_docs,
embedding=embedding_model,
persist_directory="./chroma_db"
)

# Create retrieval chain
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 10}
)

qa_chain = RetrievalQA.from_chain_type(
llm=ChatAnthropic(model="claude-sonnet-4-20250514"),
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)

# Query
result = qa_chain.invoke({"query": "What are customer pain points?"})

When to Use

  • Large document corpus (>100 documents)
  • Repeated queries against same corpus
  • Need source attribution
  • Query intent varies widely
  • Real-time response requirements

Alternative Method 4: Multi-Level Memory Hierarchy

Concept

Inspired by human memory: maintain separate memory systems operating at different timescales and abstraction levels.

Architecture

┌─────────────────────────────────────────────────────────────┐
│ MEMORY HIERARCHY │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ WORKING MEMORY (Immediate) │ │
│ │ - Current conversation context │ │
│ │ - Active task state │ │
│ │ - Scope: Last 5-10 interactions │ │
│ │ - Storage: In-context (prompt) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ EPISODIC MEMORY (Session) │ │
│ │ - Important past interactions │ │
│ │ - Key decisions and outcomes │ │
│ │ - Scope: Current session + recent sessions │ │
│ │ - Storage: Compressed summaries in DB │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ SEMANTIC MEMORY (Long-term) │ │
│ │ - General knowledge extracted over time │ │
│ │ - User preferences and patterns │ │
│ │ - Scope: All historical interactions │ │
│ │ - Storage: Knowledge graph + vector store │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

Advantages Over Video Method

  • Appropriate granularity: Recent = detailed, old = summarized
  • Efficient retrieval: Query correct memory level
  • Continuous learning: Semantic memory evolves over time
  • Graceful degradation: Lose detail, preserve essence

Implementation Pattern

@dataclass
class MemoryHierarchy:
working: WorkingMemory # In-context, last N messages
episodic: EpisodicMemory # Compressed session summaries
semantic: SemanticMemory # Extracted knowledge graph

async def recall(self, query: str) -> MemoryContext:
"""Retrieve relevant context from all memory levels"""

# Always include working memory
working_ctx = self.working.get_recent(n=10)

# Search episodic for relevant past sessions
episodic_ctx = await self.episodic.search(
query=query,
max_results=5
)

# Query semantic for general knowledge
semantic_ctx = await self.semantic.query(
query=query,
include_relations=True
)

return MemoryContext(
working=working_ctx,
episodic=episodic_ctx,
semantic=semantic_ctx
)

async def commit(self, interaction: Interaction):
"""Update memory hierarchy after interaction"""

# Update working memory
self.working.append(interaction)

# Periodically compress to episodic
if self.working.should_compress():
summary = await self.compress_working()
await self.episodic.store(summary)
self.working.clear_old()

# Extract semantic knowledge
facts = await self.extract_facts(interaction)
await self.semantic.update(facts)

When to Use

  • Long-running agent sessions
  • Need to maintain user context over time
  • Information has different decay rates
  • Query complexity varies

Alternative Method 5: Pre-Processing Pipeline

Concept

Instead of feeding raw documents to LLM, apply traditional NLP/ML techniques first to extract structure, filter noise, and reduce token consumption.

Pipeline Architecture

┌────────────────────────────────────────────────────────────────┐
│ PRE-PROCESSING PIPELINE │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Raw │──►│ OCR/ │──►│ Format │──►│ Entity │ │
│ │ Docs │ │ Parse │ │ Detect │ │ Extract │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ LLM │◄──│ Semantic │◄──│ Keyword │◄──│ NER │ │
│ │ Analysis │ │ Chunking │ │ Filter │ │ Output │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘

Pre-Processing Techniques

StageTechniqueToken Reduction
ParsingExtract text from PDF/DOCX, remove boilerplate20-40%
DeduplicationRemove repeated content, headers/footers10-30%
NERExtract names, dates, amounts before LLM5-15%
Keyword filteringOnly process sections containing target terms50-80%
Extractive summarySelect key sentences (TF-IDF, TextRank)60-90%
CompressionRemove redundancy while preserving meaning30-50%

Advantages Over Video Method

  • Dramatic token savings: Process 10-20% of original tokens
  • Structured output: Entities extracted in consistent format
  • Deterministic extraction: NER is reproducible, LLM isn't
  • Cost reduction: Less LLM API spend

Implementation Example

import spacy
from sklearn.feature_extraction.text import TfidfVectorizer

class PreProcessor:
def __init__(self):
self.nlp = spacy.load("en_core_web_lg")
self.tfidf = TfidfVectorizer(max_features=1000)

def process(self, documents: List[str], keywords: List[str]) -> ProcessedCorpus:
results = []

for doc in documents:
# Step 1: Basic cleaning
cleaned = self.clean_text(doc)

# Step 2: Keyword filtering (keep only relevant sections)
relevant_sections = self.filter_by_keywords(cleaned, keywords)
if not relevant_sections:
continue

# Step 3: NER extraction
entities = self.extract_entities(relevant_sections)

# Step 4: Extractive summary (reduce to key sentences)
key_sentences = self.extractive_summary(
relevant_sections,
num_sentences=10
)

results.append(ProcessedDoc(
original_tokens=len(doc.split()),
processed_tokens=len(key_sentences.split()),
entities=entities,
summary=key_sentences
))

return ProcessedCorpus(documents=results)

def filter_by_keywords(self, text: str, keywords: List[str]) -> str:
"""Keep only paragraphs containing target keywords"""
paragraphs = text.split('\n\n')
relevant = [
p for p in paragraphs
if any(kw.lower() in p.lower() for kw in keywords)
]
return '\n\n'.join(relevant)

When to Use

  • High document volume (>1000 files)
  • Known extraction targets (specific entities, topics)
  • Cost-sensitive applications
  • Need deterministic extraction alongside LLM analysis

Hybrid Approach: Best of All Methods

┌─────────────────────────────────────────────────────────────────┐
│ HYBRID PROCESSING PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ STAGE 1: PRE-PROCESSING (Reduce input) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ OCR → Clean → NER → Keyword Filter → Extractive Sum │ │
│ │ Result: 80% token reduction, structured entities │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ STAGE 2: INDEXING (Enable retrieval) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Chunk → Embed → Store in Vector DB → Build KG │ │
│ │ Result: Semantic search enabled, entity graph built │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ STAGE 3: MAP-REDUCE ANALYSIS (Parallel processing) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Parallel chunk analysis → Aggregate → Synthesize │ │
│ │ Result: Comprehensive insights in 1/10th time │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ STAGE 4: HIERARCHICAL MEMORY (Persistent state) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Store summaries → Build hierarchy → Enable drill-down │ │
│ │ Result: Multi-level retrieval, persistent knowledge │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ STAGE 5: RAG QUERY (Interactive retrieval) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Query → Retrieve relevant context → Generate response │ │
│ │ Result: Fast, accurate, cited responses │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Comparison Matrix

MethodSpeedAccuracyCostComplexityBest For
Video (Sequential)SlowMediumHighLowQuick demos
Map-ReduceFastMediumMediumMediumBatch processing
HierarchicalMediumHighMediumHighLong documents
RAGFastHighLow*HighInteractive queries
Pre-ProcessingFastMediumLowMediumHigh volume
HybridFastHighLowVery HighEnterprise

*After initial indexing investment


Coditect Implementation Recommendations

Priority 1: Map-Reduce Agent Orchestration

class CoditectMapReduce:
"""Parallel document processing with agent specialization"""

async def process_corpus(
self,
documents: List[Document],
extraction_schema: ExtractionSchema
) -> CorpusAnalysis:
# Map: Spawn specialized agents per document
map_tasks = [
self.spawn_extraction_agent(doc, extraction_schema)
for doc in documents
]

# Execute in parallel with token budgets
map_results = await asyncio.gather(*map_tasks)

# Reduce: Synthesis agent aggregates
return await self.synthesis_agent.aggregate(map_results)

Priority 2: Hierarchical Knowledge Store

class CoditectKnowledgeStore:
"""FoundationDB-backed hierarchical memory"""

def __init__(self, fdb_cluster: str):
self.fdb = fdb.open(fdb_cluster)
self.hierarchy = {
'raw': self.fdb.directory.create_or_open(('knowledge', 'raw')),
'chunks': self.fdb.directory.create_or_open(('knowledge', 'chunks')),
'summaries': self.fdb.directory.create_or_open(('knowledge', 'summaries')),
'master': self.fdb.directory.create_or_open(('knowledge', 'master'))
}

Priority 3: Compliance-Aware RAG

class CoditectRAG:
"""RAG with audit trails for regulated industries"""

async def query(
self,
query: str,
user_context: UserContext
) -> AuditedResponse:
# Retrieve with access control
chunks = await self.retriever.search(
query=query,
filters=self.access_control.get_filters(user_context)
)

# Generate with source tracking
response = await self.generator.generate(
query=query,
context=chunks,
require_citations=True
)

# Create audit record (21 CFR Part 11)
await self.audit_log.record(
query=query,
sources=chunks,
response=response,
user=user_context.user_id,
timestamp=datetime.utcnow()
)

return AuditedResponse(
answer=response.text,
citations=response.citations,
audit_id=audit_record.id
)

Actionable Recommendations

For Immediate Use (Today)

  1. Switch to Map-Reduce for batch processing → 5-10x speed improvement
  2. Add keyword pre-filtering → 50-80% token reduction
  3. Use RAG for repeated queries against same corpus

For Coditect Roadmap

  1. Sprint 1-2: Implement parallel agent orchestration with map-reduce
  2. Sprint 3-4: Build hierarchical knowledge store on FoundationDB
  3. Sprint 5-6: Add RAG layer with compliance audit trails
  4. Sprint 7-8: Integrate pre-processing pipeline for token efficiency

ROI Projections

ImprovementToken SavingsTime SavingsCost Savings
Map-Reduce0%80-90%0%
Pre-Processing60-80%20%60-80%
RAG (vs full reprocess)90%+95%+90%+
Combined70-85%90-95%75-90%

Conclusion

The video's approach is a valid starting point but represents the simplest possible implementation. Enterprise-grade systems require:

  1. Parallelization (Map-Reduce) for speed
  2. Semantic retrieval (RAG) for relevance
  3. Hierarchical memory for multi-scale access
  4. Pre-processing for token efficiency
  5. Audit trails for compliance

Coditect's opportunity is to package these advanced patterns into a turnkey solution that regulated industries can adopt without building from scratch.