Agent Skills Framework Extension

Memory Optimization Patterns Skill

When to Use This Skill

Use this skill when implementing memory optimization patterns patterns in your codebase.

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Token reduction, context compression, intelligent summarization, and caching for efficient AI operations.

Core Capabilities

Token Management - Counting, budgeting, tracking
Context Compression - Smart reduction techniques
Summarization - Multi-level content condensing
Caching - Intelligent result reuse
Incremental Loading - Load only what's needed
Priority Pruning - Remove least important content

Token Budget Manager

# scripts/token-budget.py
from dataclasses import dataclass
from typing import List, Dict, Optional
from enum import Enum

class Priority(Enum):
    CRITICAL = 3
    HIGH = 2
    MEDIUM = 1
    LOW = 0

@dataclass
class TokenizedContent:
    content: str
    token_count: int
    priority: Priority
    source: str
    compressed: bool = False

class TokenBudgetManager:
    """Manage token budgets across context components"""

    def __init__(self, max_tokens: int):
        self.max_tokens = max_tokens
        self.allocated: Dict[str, int] = {}
        self.contents: List[TokenizedContent] = []

    def allocate(self, category: str, tokens: int) -> bool:
        """Allocate tokens to category"""
        total_allocated = sum(self.allocated.values())
        if total_allocated + tokens > self.max_tokens:
            return False

        self.allocated[category] = tokens
        return True

    def add_content(
        self,
        content: str,
        priority: Priority,
        source: str,
        compress_if_needed: bool = True
    ) -> bool:
        """Add content, compress if exceeds budget"""
        tokens = self._count_tokens(content)

        # Check if fits
        current_total = sum(c.token_count for c in self.contents)
        if current_total + tokens > self.max_tokens:
            if compress_if_needed:
                content, tokens = self._compress(content, tokens)
            else:
                self._prune_by_priority(tokens)

        tokenized = TokenizedContent(
            content=content,
            token_count=tokens,
            priority=priority,
            source=source,
            compressed=compress_if_needed and tokens < self._count_tokens(content)
        )

        self.contents.append(tokenized)
        return True

    def _count_tokens(self, text: str) -> int:
        """Count tokens (simplified)"""
        # In production: use tiktoken
        return len(text.split()) * 1.3

    def _compress(self, content: str, original_tokens: int) -> tuple[str, int]:
        """Compress content to fit budget"""
        # Simple compression: take first 70%
        words = content.split()
        target_words = int(len(words) * 0.7)
        compressed = ' '.join(words[:target_words]) + "..."

        return compressed, self._count_tokens(compressed)

    def _prune_by_priority(self, needed_tokens: int):
        """Remove low-priority content to make space"""
        self.contents.sort(key=lambda c: c.priority.value)

        freed = 0
        while freed < needed_tokens and self.contents:
            if self.contents[0].priority != Priority.CRITICAL:
                removed = self.contents.pop(0)
                freed += removed.token_count
            else:
                break

    def get_usage_report(self) -> str:
        """Generate token usage report"""
        total = sum(c.token_count for c in self.contents)
        by_source = {}

        for content in self.contents:
            if content.source not in by_source:
                by_source[content.source] = 0
            by_source[content.source] += content.token_count

        report = f"Token Usage: {total}/{self.max_tokens} ({total/self.max_tokens*100:.1f}%)\n\n"
        report += "By Source:\n"
        for source, tokens in sorted(by_source.items(), key=lambda x: x[1], reverse=True):
            report += f"  {source}: {tokens} tokens\n"

        return report

# Usage
budget = TokenBudgetManager(max_tokens=100000)

budget.add_content(
    "Long context from previous session...",
    priority=Priority.HIGH,
    source="session_history"
)

budget.add_content(
    "Critical project documentation",
    priority=Priority.CRITICAL,
    source="documentation"
)

print(budget.get_usage_report())

Intelligent Summarization

# scripts/intelligent-summarization.py
from dataclasses import dataclass
from typing import List
import re

@dataclass
class SummaryLevel:
    name: str
    target_compression: float  # 0.0 - 1.0
    preserve_code: bool
    preserve_examples: bool

SUMMARY_LEVELS = {
    'extreme': SummaryLevel('extreme', 0.1, False, False),
    'aggressive': SummaryLevel('aggressive', 0.3, False, True),
    'moderate': SummaryLevel('moderate', 0.5, True, True),
    'light': SummaryLevel('light', 0.7, True, True),
}

class IntelligentSummarizer:
    """Multi-level intelligent summarization"""

    def summarize(self, text: str, level: str = 'moderate') -> str:
        """Summarize text at specified level"""
        config = SUMMARY_LEVELS.get(level, SUMMARY_LEVELS['moderate'])

        # Extract components
        code_blocks = self._extract_code_blocks(text)
        examples = self._extract_examples(text)
        paragraphs = self._extract_paragraphs(text)

        # Summarize paragraphs
        summarized_paras = self._summarize_paragraphs(
            paragraphs,
            target_ratio=config.target_compression
        )

        # Reassemble
        result = []
        result.extend(summarized_paras)

        if config.preserve_examples and examples:
            result.append("\n## Examples\n")
            result.extend(examples[:2])  # Top 2 examples

        if config.preserve_code and code_blocks:
            result.append("\n## Code\n")
            result.extend(code_blocks[:3])  # Top 3 code blocks

        return '\n'.join(result)

    def _extract_code_blocks(self, text: str) -> List[str]:
        """Extract code blocks"""
        pattern = r'```[\w]*\n(.*?)\n```'
        matches = re.findall(pattern, text, re.DOTALL)
        return [f"```\n{m}\n```" for m in matches]

    def _extract_examples(self, text: str) -> List[str]:
        """Extract example sections"""
        lines = text.split('\n')
        examples = []
        in_example = False
        current = []

        for line in lines:
            if 'example' in line.lower() and line.startswith('#'):
                in_example = True
                current = [line]
            elif in_example and line.startswith('#'):
                examples.append('\n'.join(current))
                in_example = False
                current = []
            elif in_example:
                current.append(line)

        if current:
            examples.append('\n'.join(current))

        return examples

    def _extract_paragraphs(self, text: str) -> List[str]:
        """Extract text paragraphs (non-code, non-example)"""
        # Remove code blocks
        text_without_code = re.sub(r'```.*?```', '', text, flags=re.DOTALL)

        paragraphs = text_without_code.split('\n\n')
        return [p.strip() for p in paragraphs if p.strip() and len(p.strip()) > 50]

    def _summarize_paragraphs(
        self,
        paragraphs: List[str],
        target_ratio: float
    ) -> List[str]:
        """Summarize paragraphs to target ratio"""
        target_count = max(1, int(len(paragraphs) * target_ratio))

        # Score paragraphs by importance
        scored = []
        for para in paragraphs:
            score = self._importance_score(para)
            scored.append((score, para))

        # Take top N
        scored.sort(reverse=True, key=lambda x: x[0])
        return [para for _, para in scored[:target_count]]

    def _importance_score(self, paragraph: str) -> float:
        """Score paragraph importance"""
        score = 0.0

        # Keywords that indicate importance
        important_words = ['must', 'critical', 'important', 'required', 'always', 'never']
        for word in important_words:
            if word in paragraph.lower():
                score += 0.2

        # Length (moderate length preferred)
        word_count = len(paragraph.split())
        if 50 < word_count < 200:
            score += 0.3

        # Has numbers/data
        if re.search(r'\d+', paragraph):
            score += 0.1

        return score

# Usage
summarizer = IntelligentSummarizer()

long_doc = """
# Documentation

This is a comprehensive guide...

## Example 1

Here's an example...

```python
def example():
    pass

More text... """

Different levels

extreme = summarizer.summarize(long_doc, level='extreme') moderate = summarizer.summarize(long_doc, level='moderate') light = summarizer.summarize(long_doc, level='light')

## Intelligent Caching System

```typescript
// scripts/intelligent-cache.ts
interface CacheEntry<T> {
  key: string;
  value: T;
  timestamp: Date;
  hitCount: number;
  tokenSize: number;
  priority: number;
}

class IntelligentCache<T> {
  private cache = new Map<string, CacheEntry<T>>();
  private maxTokens: number;
  private currentTokens: number = 0;

  constructor(maxTokens: number) {
    this.maxTokens = maxTokens;
  }

  set(key: string, value: T, tokenSize: number, priority: number = 1): void {
    // Evict if needed
    while (this.currentTokens + tokenSize > this.maxTokens) {
      this.evictLRU();
    }

    const entry: CacheEntry<T> = {
      key,
      value,
      timestamp: new Date(),
      hitCount: 0,
      tokenSize,
      priority,
    };

    this.cache.set(key, entry);
    this.currentTokens += tokenSize;
  }

  get(key: string): T | undefined {
    const entry = this.cache.get(key);
    if (!entry) return undefined;

    // Update stats
    entry.hitCount++;
    entry.timestamp = new Date();

    return entry.value;
  }

  private evictLRU(): void {
    if (this.cache.size === 0) return;

    // Calculate eviction score (lower = evict first)
    let lowestScore = Infinity;
    let evictKey: string | null = null;

    for (const [key, entry] of this.cache) {
      // Score based on: recency, hit count, priority
      const ageMinutes = (Date.now() - entry.timestamp.getTime()) / 60000;
      const recencyScore = 1 / (1 + ageMinutes);
      const hitScore = Math.log(1 + entry.hitCount);

      const score = (recencyScore + hitScore) * entry.priority;

      if (score < lowestScore) {
        lowestScore = score;
        evictKey = key;
      }
    }

    if (evictKey) {
      const entry = this.cache.get(evictKey)!;
      this.cache.delete(evictKey);
      this.currentTokens -= entry.tokenSize;
    }
  }

  getStats(): string {
    const entries = Array.from(this.cache.values());
    const totalHits = entries.reduce((sum, e) => sum + e.hitCount, 0);

    return `Cache Stats:
  Entries: ${this.cache.size}
  Tokens: ${this.currentTokens}/${this.maxTokens}
  Total Hits: ${totalHits}
  Hit Rate: ${entries.length > 0 ? (totalHits / entries.length).toFixed(2) : 0}`;
  }
}

// Usage
const cache = new IntelligentCache<string>(50000);

cache.set('research:rust-async', 'Rust async research results...', 5000, 2);
cache.set('code:api-client', 'API client implementation...', 3000, 3);

const result = cache.get('research:rust-async');
console.log(cache.getStats());

Usage Examples

Token Budget Management

Apply memory-optimization-patterns skill to manage 100K token budget with priority-based allocation

Intelligent Summarization

Apply memory-optimization-patterns skill to summarize documentation with code preservation at moderate level

Smart Caching

Apply memory-optimization-patterns skill to cache research results with LRU eviction

Integration Points

memory-context-patterns - Context management
session-analysis-patterns - Session compression
prompt-analysis-patterns - Prompt optimization

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: memory-optimization-patterns

Completed:
- [x] Token budget allocated across {category_count} categories
- [x] Content compressed from {original_tokens} to {compressed_tokens} tokens ({reduction_pct}% reduction)
- [x] {pruned_count} low-priority items pruned to fit budget
- [x] Cache configured with {max_tokens} token limit, {entry_count} entries
- [x] Summarization applied at {level} level (compression ratio: {compression_ratio})

Outputs:
- Token usage report: {total_tokens}/{max_tokens} ({usage_pct}%)
- Compressed content: {output_path} ({final_size} tokens)
- Cache statistics: {hit_count} hits, {eviction_count} evictions
- Memory savings: {savings_tokens} tokens ({savings_pct}% reduction)

Completion Checklist

Before marking this skill as complete, verify:

Token counting function validated against tiktoken library
Budget allocations sum to ≤ max_tokens (no over-allocation)
Compression preserves critical content (CRITICAL priority items)
Pruning removed only LOW/MEDIUM priority items, not CRITICAL
Cache eviction policy tested (LRU with priority weighting)
Summarization level appropriate for use case (not over-compressed)
Code blocks and examples preserved when preserve_code=True
Final token usage under budget threshold (< 100% allocation)
Usage report generated and contains accurate statistics

Failure Indicators

This skill has FAILED if:

❌ Token count calculation returns negative or implausible values
❌ Budget allocation exceeds max_tokens (over-provisioned)
❌ Compression removes CRITICAL content
❌ Pruning deletes all content (nothing survives budget cuts)
❌ Cache eviction fails (infinite loop or exception)
❌ Summarization produces empty output
❌ Code blocks lost when preserve_code=True
❌ Final content exceeds max_tokens budget
❌ Cache hit rate = 0% (cache not functioning)

When NOT to Use

Do NOT use this skill when:

Token budget not constrained (unlimited context window)
Content already optimized and minimal
Real-time streaming required (compression adds latency)
All content is CRITICAL priority (nothing to prune)
Simple content (< 1000 tokens, no optimization needed)
User needs full detailed output (no summarization acceptable)
Token cost negligible (budget not a concern)

Use alternative skills:

For session context → memory-context-patterns
For prompt engineering → prompt-analysis-patterns
For content generation → content-optimization-patterns
For caching strategy → caching-strategies

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Aggressive compression on all content	Loss of critical information	Use priority-based compression
No baseline measurement	Can't measure savings	Count tokens before and after
Hardcoded token limits	Inflexible across models	Use configurable max_tokens
Ignoring token counting accuracy	Inaccurate budgets	Use tiktoken for precise counts
Over-caching everything	Memory bloat	Set cache size limits with LRU eviction
Extreme summarization (0.1 ratio)	Content becomes useless	Use moderate levels (0.5-0.7)
Not preserving structure	Lost readability	Keep headings, examples, code blocks
Cache without eviction policy	Unbounded growth	Implement LRU or priority-based eviction

Principles

This skill embodies:

#1 Recycle → Extend → Re-Use → Create - Reuse cached results, extend with compression
#2 First Principles - Understand token economics (cost, limits, latency)
#3 Keep It Simple - Start with light compression (0.7), escalate only if needed
#4 Separation of Concerns - Separate counting, budgeting, compression, caching
#5 Eliminate Ambiguity - Explicit token counts and budgets, not vague "too large"
#8 No Assumptions - Measure actual token usage with tiktoken
#9 Progressive Disclosure - Multi-level summarization (light → moderate → aggressive)

Full Standard: CODITECT-STANDARD-AUTOMATION.md

When to Use This Skill​

How to Use This Skill​

Core Capabilities​

Token Budget Manager​

Intelligent Summarization​

Different levels

Usage Examples​

Token Budget Management​

Intelligent Summarization​

Smart Caching​

Integration Points​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

When to Use This Skill

How to Use This Skill

Core Capabilities

Token Budget Manager

Intelligent Summarization

Usage Examples

Token Budget Management

Intelligent Summarization

Smart Caching

Integration Points

Success Output

Completion Checklist

Failure Indicators

When NOT to Use

Anti-Patterns (Avoid)

Principles