Skip to main content

ADR-019-v4: Prompt Engine Server Architecture - Part 2 (Technical)

Document Specification Block​

Document: ADR-019-v4-prompt-engine-server-architecture-part2-technical
Version: 2.0.0
Purpose: Complete technical implementation blueprint combining optimization engine and meta-prompt generation for AI agents
Audience: AI agents, implementation teams, DevOps engineers
Date Created: 2025-09-01
Date Modified: 2025-09-01
Status: DRAFT

Implementation Constraints​

MUST Requirements​

  • <100ms latency for optimization operations
  • 60% token reduction average across all prompts
  • FoundationDB storage for all state and learned patterns
  • WebSocket + REST APIs for real-time optimization
  • Rust/Actix-web implementation with multi-tenant isolation
  • Meta-prompt generation from simple descriptions
  • Self-learning patterns with success rate tracking

MUST NOT Requirements​

  • External databases (PostgreSQL, MongoDB) - FoundationDB only
  • Blocking I/O operations in request handlers
  • Hardcoded AI provider configurations
  • Unencrypted storage of prompts or patterns
  • Missing error handling or user feedback

System Architecture​

// src/prompt_engine/mod.rs
use actix_web::{web, HttpResponse, Result};
use serde::{Deserialize, Serialize};
use std::sync::Arc;

#[derive(Clone)]
pub struct PromptEngineService {
optimization_engine: Arc<OptimizationEngine>,
meta_prompt_system: Arc<MetaPromptSystem>,
pattern_memory: Arc<PatternMemory>,
provider_router: Arc<ProviderRouter>,
db: Arc<FoundationDBClient>,
}

impl PromptEngineService {
pub async fn optimize_prompt(&self, request: OptimizeRequest, tenant_id: &str) -> Result<OptimizationResult> {
let start = Instant::now();

// 1. Check cache first - <5ms
if let Some(cached) = self.cache_lookup(&request, tenant_id).await? {
return Ok(cached);
}

// 2. Apply optimization techniques - <70ms
let result = self.optimization_engine.optimize(&request).await?;

// 3. Cache result (async)
tokio::spawn(async move {
self.cache_result(&request, &result, tenant_id).await;
});

Ok(result)
}

pub async fn generate_prompt(&self, description: &str, use_case: PromptUseCase, tenant_id: &str) -> Result<GeneratedPrompt> {
self.meta_prompt_system.generate(description, use_case, tenant_id).await
}
}

API Specifications​

REST Endpoints​

POST /api/v1/prompts/optimize:
body: { prompt: string, options?: OptimizationPrefs }
response: { optimized_prompt: string, token_reduction: number, time_ms: number }

POST /api/v1/prompts/generate:
body: { description: string, use_case: PromptUseCase }
response: { generated_prompt: string, confidence: number, pattern_id: string }

GET /api/v1/patterns/library:
response: { patterns: Pattern[], categories: string[] }

WebSocket Messages​

interface PromptMessage {
id: string;
type: "optimize" | "generate" | "feedback";
payload: {
prompt?: string;
description?: string;
success?: boolean;
quality_score?: number;
};
}

Core Components​

1. Optimization Engine​

// src/prompt_engine/optimization.rs
pub struct OptimizationEngine {
techniques: Vec<Box<dyn OptimizationTechnique>>,
cache: Arc<RwLock<LruCache<String, OptimizationResult>>>,
}

impl OptimizationEngine {
pub async fn optimize(&self, request: &OptimizeRequest) -> Result<OptimizationResult> {
let mut prompt = request.prompt.clone();
let original_tokens = count_tokens(&prompt);

// Apply compression, restructuring, context pruning
for technique in &self.techniques {
prompt = technique.apply(&prompt).await?;
}

let optimized_tokens = count_tokens(&prompt);
Ok(OptimizationResult {
optimized_prompt: prompt,
reduction_percent: ((original_tokens - optimized_tokens) as f64 / original_tokens as f64) * 100.0,
techniques_applied: vec!["compression", "restructuring", "context_pruning"],
})
}
}

2. Meta-Prompt System​

// src/prompt_engine/meta_prompt.rs
pub struct MetaPromptSystem {
pattern_library: Arc<PatternLibrary>,
intent_classifier: Arc<IntentClassifier>,
db: Arc<FoundationDBClient>,
}

impl MetaPromptSystem {
pub async fn generate(&self, description: &str, use_case: PromptUseCase, tenant_id: &str) -> Result<GeneratedPrompt> {
// 1. Classify intent from description
let intent = self.intent_classifier.classify(description).await?;

// 2. Find matching pattern
let pattern = self.pattern_library.get_best_pattern(&intent, &use_case).await?;

// 3. Generate prompt using template
let prompt = pattern.template
.replace("{{description}}", description)
.replace("{{context}}", &self.build_context(&use_case).await?);

Ok(GeneratedPrompt {
prompt,
confidence: pattern.success_rate,
pattern_id: pattern.id,
})
}
}

3. Pattern Memory​

// src/prompt_engine/memory.rs  
pub struct PatternMemory {
db: Arc<FoundationDBClient>,
}

impl PatternMemory {
pub async fn store_pattern(&self, pattern: &PromptPattern, tenant_id: &str) -> Result<()> {
let key = format!("{}/patterns/{}", tenant_id, pattern.id);
self.db.set(&key, &serialize(pattern)?).await
}

pub async fn learn_from_success(&self, prompt: &str, success_rate: f64, tenant_id: &str) -> Result<()> {
// Extract patterns from successful prompts and update library
let extracted_pattern = self.extract_pattern(prompt, success_rate).await?;
self.store_pattern(&extracted_pattern, tenant_id).await
}
}

Database Schema​

/{tenant_id}/prompt_patterns/{pattern_id}:
id: string
name: string
template: string
use_case: enum
success_rate: f64
usage_count: u64

/{tenant_id}/optimization_cache/{prompt_hash}:
original_prompt: string
optimized_prompt: string
reduction_percent: f64
cached_at: timestamp

Error Handling​

#[derive(Debug, thiserror::Error)]
pub enum PromptEngineError {
#[error("Optimization failed: {reason}")]
OptimizationFailed { reason: String },

#[error("Pattern not found for use case: {use_case}")]
PatternNotFound { use_case: String },

#[error("Database error: {0}")]
DatabaseError(#[from] foundationdb::FdbError),
}

impl PromptEngineError {
pub fn user_message(&self) -> String {
match self {
Self::OptimizationFailed { .. } => "Prompt optimization failed. Using original prompt.".into(),
Self::PatternNotFound { .. } => "No pattern found. Try describing your request differently.".into(),
_ => "Service temporarily unavailable. Please try again.".into(),
}
}
}

Testing & Deployment​

#[cfg(test)]
mod tests {
#[tokio::test]
async fn test_optimization_performance() {
let engine = OptimizationEngine::new();
let start = Instant::now();

let result = engine.optimize(&OptimizeRequest {
prompt: "Please help me create a detailed REST API".into(),
}).await.unwrap();

assert!(start.elapsed().as_millis() < 100);
assert!(result.reduction_percent > 40.0);
}
}
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prompt-engine
spec:
replicas: 3
template:
spec:
containers:
- name: prompt-engine
image: coditect/prompt-engine:latest
env:
- name: FDB_CLUSTER_FILE
value: /etc/fdb/fdb.cluster
resources:
requests: { memory: "256Mi", cpu: "250m" }
limits: { memory: "512Mi", cpu: "500m" }

File Structure​

src/prompt_engine/
├── mod.rs # Main service
├── optimization.rs # OptimizationEngine
├── meta_prompt.rs # MetaPromptSystem
├── memory.rs # PatternMemory
├── api/
│ ├── rest.rs # REST endpoints
│ └── websocket.rs # WebSocket handlers
└── providers/
└── router.rs # ProviderRouter