Skip to main content

ADR-026: Error Handling Architecture - Part 2: Technical Implementation

Document: ADR-026 Error Handling Architecture - Part 2 Technical
Version: 1.1.0
Purpose: Complete technical specification for error handling
Audience: Engineers implementing error handling across all components
Date Created: 2025-09-02
Date Modified: 2025-09-02
Status: DRAFT

Table of Contents​

  1. Core Error System Implementation
  2. Error Code Structure
  3. Core Error Type
  4. Error Context Extraction
  5. Error Handler Trait
  6. Specific Error Handlers
  7. Error Logging and Monitoring
  8. Client-Side Error Handling
  9. WASM Error Handling
  10. Integration Tests
  11. Performance Considerations
  12. Monitoring Integration
  13. Implementation Checklist
  14. Approval Signatures

Core Error System Implementation​

Error Code Structure​

// src/errors/error_code.rs
use serde::{Serialize, Deserialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ErrorCode {
/// Service identifier (API, AUTH, DB, CNTR, WS)
pub service: &'static str,
/// Category (USR, SYS, EXT)
pub category: &'static str,
/// Specific error number
pub code: u16,
}

impl ErrorCode {
pub const fn new(service: &'static str, category: &'static str, code: u16) -> Self {
ErrorCode { service, category, code }
}

pub fn to_string(&self) -> String {
format!("{}-{}-{:04}", self.service, self.category, self.code)
}
}

// Predefined error codes
pub mod codes {
use super::ErrorCode;

// Authentication errors
pub const AUTH_INVALID_CREDENTIALS: ErrorCode = ErrorCode::new("AUTH", "USR", 1001);
pub const AUTH_TOKEN_EXPIRED: ErrorCode = ErrorCode::new("AUTH", "USR", 1002);
pub const AUTH_INSUFFICIENT_PERMISSIONS: ErrorCode = ErrorCode::new("AUTH", "USR", 1003);
pub const AUTH_RATE_LIMITED: ErrorCode = ErrorCode::new("AUTH", "SYS", 1004);

// Database errors
pub const DB_CONNECTION_FAILED: ErrorCode = ErrorCode::new("DB", "SYS", 2001);
pub const DB_TENANT_NOT_FOUND: ErrorCode = ErrorCode::new("DB", "USR", 2002);
pub const DB_CONSTRAINT_VIOLATION: ErrorCode = ErrorCode::new("DB", "USR", 2003);
pub const DB_TRANSACTION_CONFLICT: ErrorCode = ErrorCode::new("DB", "SYS", 2004);

// Container errors
pub const CNTR_CREATION_FAILED: ErrorCode = ErrorCode::new("CNTR", "SYS", 3001);
pub const CNTR_RESOURCE_LIMIT: ErrorCode = ErrorCode::new("CNTR", "USR", 3002);
pub const CNTR_STARTUP_TIMEOUT: ErrorCode = ErrorCode::new("CNTR", "SYS", 3003);
pub const CNTR_STATE_CORRUPTION: ErrorCode = ErrorCode::new("CNTR", "SYS", 3004);

// API errors
pub const API_INVALID_REQUEST: ErrorCode = ErrorCode::new("API", "USR", 4001);
pub const API_RESOURCE_NOT_FOUND: ErrorCode = ErrorCode::new("API", "USR", 4002);
pub const API_RATE_LIMITED: ErrorCode = ErrorCode::new("API", "USR", 4003);
pub const API_SERVICE_UNAVAILABLE: ErrorCode = ErrorCode::new("API", "SYS", 4004);
}

↑ Back to Top

Core Error Type​

// src/errors/mod.rs
use std::fmt;
use anyhow::Error as AnyhowError;
use actix_web::{ResponseError, HttpResponse};
use serde::{Serialize, Deserialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CoditechError {
/// Unique error instance ID
pub id: String,
/// Error code
pub code: ErrorCode,
/// User-friendly message
pub message: String,
/// Technical details (not shown to users)
pub details: Option<String>,
/// Additional context
pub context: ErrorContext,
/// Recovery suggestions
pub recovery: Option<RecoveryInfo>,
/// Related error IDs
pub caused_by: Option<Vec<String>>,
/// Timestamp
pub timestamp: chrono::DateTime<chrono::Utc>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ErrorContext {
pub user_id: Option<String>,
pub tenant_id: Option<String>,
pub workspace_id: Option<String>,
pub request_id: Option<String>,
pub action: Option<String>,
pub metadata: serde_json::Value,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RecoveryInfo {
pub actions: Vec<RecoveryAction>,
pub auto_retry: bool,
pub retry_after: Option<u64>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RecoveryAction {
pub label: String,
pub action: String,
pub description: Option<String>,
}

impl CoditechError {
pub fn new(code: ErrorCode, message: impl Into<String>) -> Self {
CoditechError {
id: uuid::Uuid::new_v4().to_string(),
code,
message: message.into(),
details: None,
context: ErrorContext::default(),
recovery: None,
caused_by: None,
timestamp: chrono::Utc::now(),
}
}

pub fn with_details(mut self, details: impl Into<String>) -> Self {
self.details = Some(details.into());
self
}

pub fn with_context(mut self, context: ErrorContext) -> Self {
self.context = context;
self
}

pub fn with_recovery(mut self, recovery: RecoveryInfo) -> Self {
self.recovery = Some(recovery);
self
}

pub fn with_cause(mut self, cause_id: String) -> Self {
self.caused_by.get_or_insert(Vec::new()).push(cause_id);
self
}
}

impl ResponseError for CoditechError {
fn error_response(&self) -> HttpResponse {
let status = match self.code.category {
"USR" => actix_web::http::StatusCode::BAD_REQUEST,
"SYS" => actix_web::http::StatusCode::INTERNAL_SERVER_ERROR,
"EXT" => actix_web::http::StatusCode::BAD_GATEWAY,
_ => actix_web::http::StatusCode::INTERNAL_SERVER_ERROR,
};

HttpResponse::build(status).json(&self.to_user_response())
}
}

impl CoditechError {
/// Convert to user-safe response
pub fn to_user_response(&self) -> UserErrorResponse {
UserErrorResponse {
error_id: self.id.clone(),
code: self.code.to_string(),
message: self.message.clone(),
recovery: self.recovery.clone(),
timestamp: self.timestamp,
}
}
}

#[derive(Serialize)]
pub struct UserErrorResponse {
pub error_id: String,
pub code: String,
pub message: String,
pub recovery: Option<RecoveryInfo>,
pub timestamp: chrono::DateTime<chrono::Utc>,
}

↑ Back to Top

Error Context Extraction​

// src/middleware/error_context.rs
use actix_web::{dev::ServiceRequest, Error, HttpMessage};
use actix_web::middleware::Logger;

pub struct ErrorContextMiddleware;

impl<S, B> Transform<S, ServiceRequest> for ErrorContextMiddleware
where
S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
B: 'static,
{
type Response = ServiceResponse<B>;
type Error = Error;
type Transform = ErrorContextMiddlewareService<S>;
type InitError = ();
type Future = Ready<Result<Self::Transform, Self::InitError>>;

fn new_transform(&self, service: S) -> Self::Future {
ready(Ok(ErrorContextMiddlewareService { service }))
}
}

pub struct ErrorContextMiddlewareService<S> {
service: S,
}

impl<S, B> Service<ServiceRequest> for ErrorContextMiddlewareService<S>
where
S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
B: 'static,
{
type Response = ServiceResponse<B>;
type Error = Error;
type Future = LocalBoxFuture<'static, Result<Self::Response, Self::Error>>;

fn poll_ready(&self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.service.poll_ready(cx)
}

fn call(&self, req: ServiceRequest) -> Self::Future {
// Extract context
let request_id = uuid::Uuid::new_v4().to_string();
let user_id = req.headers()
.get("X-User-ID")
.and_then(|h| h.to_str().ok())
.map(|s| s.to_string());
let tenant_id = req.headers()
.get("X-Tenant-ID")
.and_then(|h| h.to_str().ok())
.map(|s| s.to_string());

// Store in request extensions
req.extensions_mut().insert(ErrorContext {
user_id,
tenant_id,
workspace_id: None,
request_id: Some(request_id.clone()),
action: Some(format!("{} {}", req.method(), req.path())),
metadata: serde_json::json!({
"ip": req.connection_info().realip_remote_addr(),
"user_agent": req.headers().get("User-Agent").and_then(|h| h.to_str().ok()),
}),
});

let fut = self.service.call(req);

Box::pin(async move {
let res = fut.await?;
Ok(res)
})
}
}

↑ Back to Top

Error Handler Trait​

// src/errors/handler.rs
use async_trait::async_trait;

#[async_trait]
pub trait ErrorHandler: Send + Sync {
async fn handle(&self, error: CoditechError) -> Result<(), CoditechError>;
fn can_handle(&self, error: &CoditechError) -> bool;
}

pub struct ErrorHandlerChain {
handlers: Vec<Box<dyn ErrorHandler>>,
}

impl ErrorHandlerChain {
pub fn new() -> Self {
ErrorHandlerChain {
handlers: Vec::new(),
}
}

pub fn add_handler(mut self, handler: Box<dyn ErrorHandler>) -> Self {
self.handlers.push(handler);
self
}

pub async fn handle(&self, mut error: CoditechError) -> CoditechError {
for handler in &self.handlers {
if handler.can_handle(&error) {
match handler.handle(error.clone()).await {
Ok(_) => return error, // Handled successfully
Err(e) => error = e, // Continue with modified error
}
}
}
error
}
}

↑ Back to Top

Specific Error Handlers​

// src/errors/handlers/retry_handler.rs
pub struct RetryHandler {
max_retries: u32,
}

#[async_trait]
impl ErrorHandler for RetryHandler {
async fn handle(&self, error: CoditechError) -> Result<(), CoditechError> {
if let Some(recovery) = &error.recovery {
if recovery.auto_retry {
// Implement retry logic
for attempt in 1..=self.max_retries {
tokio::time::sleep(Duration::from_secs(attempt as u64)).await;

// Attempt recovery
match self.attempt_recovery(&error).await {
Ok(_) => return Ok(()),
Err(_) if attempt < self.max_retries => continue,
Err(e) => return Err(e),
}
}
}
}
Err(error)
}

fn can_handle(&self, error: &CoditechError) -> bool {
matches!(error.code.category, "SYS") && error.recovery.is_some()
}
}

// src/errors/handlers/fallback_handler.rs
pub struct FallbackHandler {
fallback_services: HashMap<String, String>,
}

#[async_trait]
impl ErrorHandler for FallbackHandler {
async fn handle(&self, error: CoditechError) -> Result<(), CoditechError> {
if error.code.service == "AI" {
// Switch to fallback AI provider
if let Some(fallback) = self.fallback_services.get("AI") {
// Implement fallback logic
return Ok(());
}
}
Err(error)
}

fn can_handle(&self, error: &CoditechError) -> bool {
self.fallback_services.contains_key(error.code.service)
}
}

↑ Back to Top

Error Logging and Monitoring​

// src/errors/logging.rs
use crate::logging::{Logger, LogLevel, LogEntry};

pub struct ErrorLogger {
logger: Logger,
}

impl ErrorLogger {
pub async fn log_error(&self, error: &CoditechError, level: LogLevel) {
let entry = LogEntry {
timestamp: error.timestamp,
level,
component: error.code.service,
action: error.context.action.clone().unwrap_or_default(),
user_id: error.context.user_id.clone(),
details: serde_json::json!({
"error_id": error.id,
"code": error.code.to_string(),
"message": error.message,
"details": error.details,
"context": error.context,
"recovery": error.recovery,
"caused_by": error.caused_by,
}),
};

self.logger.log(entry).await;

// Send to monitoring
self.send_to_monitoring(error).await;
}

async fn send_to_monitoring(&self, error: &CoditechError) {
// Prometheus metrics
ERROR_COUNTER
.with_label_values(&[
error.code.service,
error.code.category,
&error.code.code.to_string(),
])
.inc();

// Send to external monitoring if critical
if error.code.category == "SYS" {
self.send_alert(error).await;
}
}
}

↑ Back to Top

Client-Side Error Handling​

// src/errors/client-handler.ts
export interface ErrorResponse {
error_id: string;
code: string;
message: string;
recovery?: RecoveryInfo;
timestamp: string;
}

export interface RecoveryInfo {
actions: RecoveryAction[];
auto_retry: boolean;
retry_after?: number;
}

export interface RecoveryAction {
label: string;
action: string;
description?: string;
}

export class ErrorHandler {
private errorStore = new Map<string, ErrorResponse>();

async handleError(error: ErrorResponse) {
// Store error for debugging
this.errorStore.set(error.error_id, error);

// Log to console in development
if (process.env.NODE_ENV === 'development') {
console.error('Error:', error);
}

// Show user-friendly notification
this.showErrorNotification(error);

// Execute recovery if available
if (error.recovery?.auto_retry) {
await this.attemptAutoRecovery(error);
}
}

private showErrorNotification(error: ErrorResponse) {
const notification = {
id: error.error_id,
type: 'error',
title: this.getErrorTitle(error.code),
message: error.message,
actions: error.recovery?.actions.map(action => ({
label: action.label,
onClick: () => this.executeAction(action.action),
})),
};

NotificationManager.show(notification);
}

private async attemptAutoRecovery(error: ErrorResponse) {
if (error.recovery?.retry_after) {
await sleep(error.recovery.retry_after * 1000);
}

// Retry the failed operation
const operation = this.getFailedOperation(error.error_id);
if (operation) {
try {
await operation.retry();
} catch (retryError) {
// Log retry failure
console.error('Retry failed:', retryError);
}
}
}

private executeAction(action: string) {
switch (action) {
case 'upgrade_plan':
router.push('/settings/billing');
break;
case 'free_up_space':
router.push('/settings/storage');
break;
case 'retry_operation':
window.location.reload();
break;
default:
console.warn('Unknown action:', action);
}
}
}

↑ Back to Top

WASM Error Handling​

// src/wasm/errors.rs
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub struct WasmError {
code: String,
message: String,
recovery_actions: Vec<JsValue>,
}

#[wasm_bindgen]
impl WasmError {
#[wasm_bindgen(constructor)]
pub fn new(code: String, message: String) -> Self {
WasmError {
code,
message,
recovery_actions: Vec::new(),
}
}

pub fn add_recovery_action(&mut self, label: String, action: String) {
let action_obj = js_sys::Object::new();
js_sys::Reflect::set(&action_obj, &"label".into(), &label.into()).unwrap();
js_sys::Reflect::set(&action_obj, &"action".into(), &action.into()).unwrap();
self.recovery_actions.push(action_obj.into());
}

pub fn to_js_error(&self) -> JsValue {
let error = js_sys::Object::new();
js_sys::Reflect::set(&error, &"code".into(), &self.code.clone().into()).unwrap();
js_sys::Reflect::set(&error, &"message".into(), &self.message.clone().into()).unwrap();

let actions = js_sys::Array::new();
for action in &self.recovery_actions {
actions.push(action);
}
js_sys::Reflect::set(&error, &"recovery".into(), &actions.into()).unwrap();

error.into()
}
}

// Panic hook for WASM
pub fn set_panic_hook() {
std::panic::set_hook(Box::new(|panic_info| {
let msg = panic_info.to_string();
web_sys::console::error_1(&format!("WASM Panic: {}", msg).into());

// Send error to monitoring
let error = WasmError::new(
"WASM-SYS-9999".to_string(),
"An unexpected error occurred. Please refresh the page.".to_string(),
);

// Dispatch error event
let window = web_sys::window().unwrap();
let event = web_sys::CustomEvent::new_with_event_init_dict(
"wasm-error",
web_sys::CustomEventInit::new().detail(&error.to_js_error()),
).unwrap();
window.dispatch_event(&event).unwrap();
}));
}

↑ Back to Top

Integration Tests​

#[cfg(test)]
mod tests {
use super::*;

#[tokio::test]
async fn test_error_creation_and_serialization() {
let error = CoditechError::new(
codes::AUTH_INVALID_CREDENTIALS,
"Invalid email or password"
)
.with_context(ErrorContext {
user_id: Some("user123".to_string()),
tenant_id: Some("tenant456".to_string()),
..Default::default()
})
.with_recovery(RecoveryInfo {
actions: vec![
RecoveryAction {
label: "Reset Password".to_string(),
action: "reset_password".to_string(),
description: Some("Reset your password via email".to_string()),
},
],
auto_retry: false,
retry_after: None,
});

let json = serde_json::to_string(&error).unwrap();
let deserialized: CoditechError = serde_json::from_str(&json).unwrap();

assert_eq!(error.id, deserialized.id);
assert_eq!(error.code.to_string(), "AUTH-USR-1001");
}

#[tokio::test]
async fn test_error_handler_chain() {
let chain = ErrorHandlerChain::new()
.add_handler(Box::new(RetryHandler { max_retries: 3 }))
.add_handler(Box::new(FallbackHandler::default()));

let error = CoditechError::new(
codes::API_SERVICE_UNAVAILABLE,
"Service temporarily unavailable"
)
.with_recovery(RecoveryInfo {
actions: vec![],
auto_retry: true,
retry_after: Some(1),
});

let result = chain.handle(error).await;
// Verify retry was attempted
}

#[test]
fn test_error_code_formatting() {
let code = codes::DB_CONNECTION_FAILED;
assert_eq!(code.to_string(), "DB-SYS-2001");
}
}

↑ Back to Top

Performance Considerations​

// Error handling performance optimizations
lazy_static! {
// Cache common error messages
static ref ERROR_MESSAGES: HashMap<String, String> = {
let mut m = HashMap::new();
m.insert("AUTH-USR-1001".to_string(), "Invalid email or password".to_string());
m.insert("DB-SYS-2001".to_string(), "Database connection failed".to_string());
// ... more cached messages
m
};

// Pre-compile error response templates
static ref ERROR_TEMPLATES: HashMap<String, serde_json::Value> = {
let mut m = HashMap::new();
// ... cached templates
m
};
}

// Implement error pooling to reduce allocations
pub struct ErrorPool {
pool: Vec<CoditechError>,
}

impl ErrorPool {
pub fn acquire(&mut self) -> CoditechError {
self.pool.pop().unwrap_or_else(|| CoditechError::default())
}

pub fn release(&mut self, mut error: CoditechError) {
error.clear(); // Reset error state
self.pool.push(error);
}
}

Monitoring Integration​

# Prometheus metrics for error tracking
coditect_errors_total{service="API",category="USR",code="4001"} 142
coditect_errors_total{service="DB",category="SYS",code="2001"} 3
coditect_error_recovery_success_total{service="API"} 89
coditect_error_recovery_failure_total{service="API"} 12

# Grafana dashboard queries
- Error rate by service
- Recovery success rate
- Most common errors
- Error trends over time

↑ Back to Top

Implementation Checklist​

Core Components​

  • Error code system with service/category/code structure
  • CoditechError type with context and recovery
  • Error context extraction middleware
  • Error handler chain implementation
  • Retry handler with exponential backoff
  • Fallback handler for service failures
  • Error logging with structured output
  • Client-side error handler (TypeScript)
  • WASM error handling with panic hook

Integration Points​

  • API error responses with proper HTTP codes
  • Database error mapping
  • Container error handling
  • Authentication error flows
  • WebSocket error handling
  • Multi-tenant error isolation

Testing​

  • Unit tests for error creation/serialization
  • Integration tests for error handler chain
  • Performance benchmarks for error handling
  • Error recovery scenario tests
  • Multi-tenant isolation tests

Monitoring​

  • Prometheus metrics configured
  • Grafana dashboards created
  • Alert rules defined
  • Error tracking dashboard
  • Recovery success metrics

↑ Back to Top

Approval Signatures​

Technical Sign-off​

ComponentOwnerApprovedDate
ArchitectureSESSION8-ORCHESTRATOR✓2025-09-02
ImplementationPending--
Security ReviewPending--
Performance TestPending--

Review History​

VersionDateReviewerStatusComments
1.0.02025-09-02SESSION8-ORCHESTRATORDRAFTInitial creation
1.1.02025-09-02SESSION8-QA-REVIEWERREVISIONAdded v4.2 compliance elements

↑ Back to Top


This comprehensive error handling system ensures robust error management across all CODITECT components with proper context, recovery options, and monitoring.