ADR-026: Error Handling Architecture - Part 2: Technical Implementation
Document: ADR-026 Error Handling Architecture - Part 2 Technical
Version: 1.1.0
Purpose: Complete technical specification for error handling
Audience: Engineers implementing error handling across all components
Date Created: 2025-09-02
Date Modified: 2025-09-02
Status: DRAFT
Table of Contents​
- Core Error System Implementation
- Error Code Structure
- Core Error Type
- Error Context Extraction
- Error Handler Trait
- Specific Error Handlers
- Error Logging and Monitoring
- Client-Side Error Handling
- WASM Error Handling
- Integration Tests
- Performance Considerations
- Monitoring Integration
- Implementation Checklist
- Approval Signatures
Core Error System Implementation​
Error Code Structure​
// src/errors/error_code.rs
use serde::{Serialize, Deserialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ErrorCode {
/// Service identifier (API, AUTH, DB, CNTR, WS)
pub service: &'static str,
/// Category (USR, SYS, EXT)
pub category: &'static str,
/// Specific error number
pub code: u16,
}
impl ErrorCode {
pub const fn new(service: &'static str, category: &'static str, code: u16) -> Self {
ErrorCode { service, category, code }
}
pub fn to_string(&self) -> String {
format!("{}-{}-{:04}", self.service, self.category, self.code)
}
}
// Predefined error codes
pub mod codes {
use super::ErrorCode;
// Authentication errors
pub const AUTH_INVALID_CREDENTIALS: ErrorCode = ErrorCode::new("AUTH", "USR", 1001);
pub const AUTH_TOKEN_EXPIRED: ErrorCode = ErrorCode::new("AUTH", "USR", 1002);
pub const AUTH_INSUFFICIENT_PERMISSIONS: ErrorCode = ErrorCode::new("AUTH", "USR", 1003);
pub const AUTH_RATE_LIMITED: ErrorCode = ErrorCode::new("AUTH", "SYS", 1004);
// Database errors
pub const DB_CONNECTION_FAILED: ErrorCode = ErrorCode::new("DB", "SYS", 2001);
pub const DB_TENANT_NOT_FOUND: ErrorCode = ErrorCode::new("DB", "USR", 2002);
pub const DB_CONSTRAINT_VIOLATION: ErrorCode = ErrorCode::new("DB", "USR", 2003);
pub const DB_TRANSACTION_CONFLICT: ErrorCode = ErrorCode::new("DB", "SYS", 2004);
// Container errors
pub const CNTR_CREATION_FAILED: ErrorCode = ErrorCode::new("CNTR", "SYS", 3001);
pub const CNTR_RESOURCE_LIMIT: ErrorCode = ErrorCode::new("CNTR", "USR", 3002);
pub const CNTR_STARTUP_TIMEOUT: ErrorCode = ErrorCode::new("CNTR", "SYS", 3003);
pub const CNTR_STATE_CORRUPTION: ErrorCode = ErrorCode::new("CNTR", "SYS", 3004);
// API errors
pub const API_INVALID_REQUEST: ErrorCode = ErrorCode::new("API", "USR", 4001);
pub const API_RESOURCE_NOT_FOUND: ErrorCode = ErrorCode::new("API", "USR", 4002);
pub const API_RATE_LIMITED: ErrorCode = ErrorCode::new("API", "USR", 4003);
pub const API_SERVICE_UNAVAILABLE: ErrorCode = ErrorCode::new("API", "SYS", 4004);
}
Core Error Type​
// src/errors/mod.rs
use std::fmt;
use anyhow::Error as AnyhowError;
use actix_web::{ResponseError, HttpResponse};
use serde::{Serialize, Deserialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CoditechError {
/// Unique error instance ID
pub id: String,
/// Error code
pub code: ErrorCode,
/// User-friendly message
pub message: String,
/// Technical details (not shown to users)
pub details: Option<String>,
/// Additional context
pub context: ErrorContext,
/// Recovery suggestions
pub recovery: Option<RecoveryInfo>,
/// Related error IDs
pub caused_by: Option<Vec<String>>,
/// Timestamp
pub timestamp: chrono::DateTime<chrono::Utc>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ErrorContext {
pub user_id: Option<String>,
pub tenant_id: Option<String>,
pub workspace_id: Option<String>,
pub request_id: Option<String>,
pub action: Option<String>,
pub metadata: serde_json::Value,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RecoveryInfo {
pub actions: Vec<RecoveryAction>,
pub auto_retry: bool,
pub retry_after: Option<u64>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RecoveryAction {
pub label: String,
pub action: String,
pub description: Option<String>,
}
impl CoditechError {
pub fn new(code: ErrorCode, message: impl Into<String>) -> Self {
CoditechError {
id: uuid::Uuid::new_v4().to_string(),
code,
message: message.into(),
details: None,
context: ErrorContext::default(),
recovery: None,
caused_by: None,
timestamp: chrono::Utc::now(),
}
}
pub fn with_details(mut self, details: impl Into<String>) -> Self {
self.details = Some(details.into());
self
}
pub fn with_context(mut self, context: ErrorContext) -> Self {
self.context = context;
self
}
pub fn with_recovery(mut self, recovery: RecoveryInfo) -> Self {
self.recovery = Some(recovery);
self
}
pub fn with_cause(mut self, cause_id: String) -> Self {
self.caused_by.get_or_insert(Vec::new()).push(cause_id);
self
}
}
impl ResponseError for CoditechError {
fn error_response(&self) -> HttpResponse {
let status = match self.code.category {
"USR" => actix_web::http::StatusCode::BAD_REQUEST,
"SYS" => actix_web::http::StatusCode::INTERNAL_SERVER_ERROR,
"EXT" => actix_web::http::StatusCode::BAD_GATEWAY,
_ => actix_web::http::StatusCode::INTERNAL_SERVER_ERROR,
};
HttpResponse::build(status).json(&self.to_user_response())
}
}
impl CoditechError {
/// Convert to user-safe response
pub fn to_user_response(&self) -> UserErrorResponse {
UserErrorResponse {
error_id: self.id.clone(),
code: self.code.to_string(),
message: self.message.clone(),
recovery: self.recovery.clone(),
timestamp: self.timestamp,
}
}
}
#[derive(Serialize)]
pub struct UserErrorResponse {
pub error_id: String,
pub code: String,
pub message: String,
pub recovery: Option<RecoveryInfo>,
pub timestamp: chrono::DateTime<chrono::Utc>,
}
Error Context Extraction​
// src/middleware/error_context.rs
use actix_web::{dev::ServiceRequest, Error, HttpMessage};
use actix_web::middleware::Logger;
pub struct ErrorContextMiddleware;
impl<S, B> Transform<S, ServiceRequest> for ErrorContextMiddleware
where
S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
B: 'static,
{
type Response = ServiceResponse<B>;
type Error = Error;
type Transform = ErrorContextMiddlewareService<S>;
type InitError = ();
type Future = Ready<Result<Self::Transform, Self::InitError>>;
fn new_transform(&self, service: S) -> Self::Future {
ready(Ok(ErrorContextMiddlewareService { service }))
}
}
pub struct ErrorContextMiddlewareService<S> {
service: S,
}
impl<S, B> Service<ServiceRequest> for ErrorContextMiddlewareService<S>
where
S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
B: 'static,
{
type Response = ServiceResponse<B>;
type Error = Error;
type Future = LocalBoxFuture<'static, Result<Self::Response, Self::Error>>;
fn poll_ready(&self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.service.poll_ready(cx)
}
fn call(&self, req: ServiceRequest) -> Self::Future {
// Extract context
let request_id = uuid::Uuid::new_v4().to_string();
let user_id = req.headers()
.get("X-User-ID")
.and_then(|h| h.to_str().ok())
.map(|s| s.to_string());
let tenant_id = req.headers()
.get("X-Tenant-ID")
.and_then(|h| h.to_str().ok())
.map(|s| s.to_string());
// Store in request extensions
req.extensions_mut().insert(ErrorContext {
user_id,
tenant_id,
workspace_id: None,
request_id: Some(request_id.clone()),
action: Some(format!("{} {}", req.method(), req.path())),
metadata: serde_json::json!({
"ip": req.connection_info().realip_remote_addr(),
"user_agent": req.headers().get("User-Agent").and_then(|h| h.to_str().ok()),
}),
});
let fut = self.service.call(req);
Box::pin(async move {
let res = fut.await?;
Ok(res)
})
}
}
Error Handler Trait​
// src/errors/handler.rs
use async_trait::async_trait;
#[async_trait]
pub trait ErrorHandler: Send + Sync {
async fn handle(&self, error: CoditechError) -> Result<(), CoditechError>;
fn can_handle(&self, error: &CoditechError) -> bool;
}
pub struct ErrorHandlerChain {
handlers: Vec<Box<dyn ErrorHandler>>,
}
impl ErrorHandlerChain {
pub fn new() -> Self {
ErrorHandlerChain {
handlers: Vec::new(),
}
}
pub fn add_handler(mut self, handler: Box<dyn ErrorHandler>) -> Self {
self.handlers.push(handler);
self
}
pub async fn handle(&self, mut error: CoditechError) -> CoditechError {
for handler in &self.handlers {
if handler.can_handle(&error) {
match handler.handle(error.clone()).await {
Ok(_) => return error, // Handled successfully
Err(e) => error = e, // Continue with modified error
}
}
}
error
}
}
Specific Error Handlers​
// src/errors/handlers/retry_handler.rs
pub struct RetryHandler {
max_retries: u32,
}
#[async_trait]
impl ErrorHandler for RetryHandler {
async fn handle(&self, error: CoditechError) -> Result<(), CoditechError> {
if let Some(recovery) = &error.recovery {
if recovery.auto_retry {
// Implement retry logic
for attempt in 1..=self.max_retries {
tokio::time::sleep(Duration::from_secs(attempt as u64)).await;
// Attempt recovery
match self.attempt_recovery(&error).await {
Ok(_) => return Ok(()),
Err(_) if attempt < self.max_retries => continue,
Err(e) => return Err(e),
}
}
}
}
Err(error)
}
fn can_handle(&self, error: &CoditechError) -> bool {
matches!(error.code.category, "SYS") && error.recovery.is_some()
}
}
// src/errors/handlers/fallback_handler.rs
pub struct FallbackHandler {
fallback_services: HashMap<String, String>,
}
#[async_trait]
impl ErrorHandler for FallbackHandler {
async fn handle(&self, error: CoditechError) -> Result<(), CoditechError> {
if error.code.service == "AI" {
// Switch to fallback AI provider
if let Some(fallback) = self.fallback_services.get("AI") {
// Implement fallback logic
return Ok(());
}
}
Err(error)
}
fn can_handle(&self, error: &CoditechError) -> bool {
self.fallback_services.contains_key(error.code.service)
}
}
Error Logging and Monitoring​
// src/errors/logging.rs
use crate::logging::{Logger, LogLevel, LogEntry};
pub struct ErrorLogger {
logger: Logger,
}
impl ErrorLogger {
pub async fn log_error(&self, error: &CoditechError, level: LogLevel) {
let entry = LogEntry {
timestamp: error.timestamp,
level,
component: error.code.service,
action: error.context.action.clone().unwrap_or_default(),
user_id: error.context.user_id.clone(),
details: serde_json::json!({
"error_id": error.id,
"code": error.code.to_string(),
"message": error.message,
"details": error.details,
"context": error.context,
"recovery": error.recovery,
"caused_by": error.caused_by,
}),
};
self.logger.log(entry).await;
// Send to monitoring
self.send_to_monitoring(error).await;
}
async fn send_to_monitoring(&self, error: &CoditechError) {
// Prometheus metrics
ERROR_COUNTER
.with_label_values(&[
error.code.service,
error.code.category,
&error.code.code.to_string(),
])
.inc();
// Send to external monitoring if critical
if error.code.category == "SYS" {
self.send_alert(error).await;
}
}
}
Client-Side Error Handling​
// src/errors/client-handler.ts
export interface ErrorResponse {
error_id: string;
code: string;
message: string;
recovery?: RecoveryInfo;
timestamp: string;
}
export interface RecoveryInfo {
actions: RecoveryAction[];
auto_retry: boolean;
retry_after?: number;
}
export interface RecoveryAction {
label: string;
action: string;
description?: string;
}
export class ErrorHandler {
private errorStore = new Map<string, ErrorResponse>();
async handleError(error: ErrorResponse) {
// Store error for debugging
this.errorStore.set(error.error_id, error);
// Log to console in development
if (process.env.NODE_ENV === 'development') {
console.error('Error:', error);
}
// Show user-friendly notification
this.showErrorNotification(error);
// Execute recovery if available
if (error.recovery?.auto_retry) {
await this.attemptAutoRecovery(error);
}
}
private showErrorNotification(error: ErrorResponse) {
const notification = {
id: error.error_id,
type: 'error',
title: this.getErrorTitle(error.code),
message: error.message,
actions: error.recovery?.actions.map(action => ({
label: action.label,
onClick: () => this.executeAction(action.action),
})),
};
NotificationManager.show(notification);
}
private async attemptAutoRecovery(error: ErrorResponse) {
if (error.recovery?.retry_after) {
await sleep(error.recovery.retry_after * 1000);
}
// Retry the failed operation
const operation = this.getFailedOperation(error.error_id);
if (operation) {
try {
await operation.retry();
} catch (retryError) {
// Log retry failure
console.error('Retry failed:', retryError);
}
}
}
private executeAction(action: string) {
switch (action) {
case 'upgrade_plan':
router.push('/settings/billing');
break;
case 'free_up_space':
router.push('/settings/storage');
break;
case 'retry_operation':
window.location.reload();
break;
default:
console.warn('Unknown action:', action);
}
}
}
WASM Error Handling​
// src/wasm/errors.rs
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub struct WasmError {
code: String,
message: String,
recovery_actions: Vec<JsValue>,
}
#[wasm_bindgen]
impl WasmError {
#[wasm_bindgen(constructor)]
pub fn new(code: String, message: String) -> Self {
WasmError {
code,
message,
recovery_actions: Vec::new(),
}
}
pub fn add_recovery_action(&mut self, label: String, action: String) {
let action_obj = js_sys::Object::new();
js_sys::Reflect::set(&action_obj, &"label".into(), &label.into()).unwrap();
js_sys::Reflect::set(&action_obj, &"action".into(), &action.into()).unwrap();
self.recovery_actions.push(action_obj.into());
}
pub fn to_js_error(&self) -> JsValue {
let error = js_sys::Object::new();
js_sys::Reflect::set(&error, &"code".into(), &self.code.clone().into()).unwrap();
js_sys::Reflect::set(&error, &"message".into(), &self.message.clone().into()).unwrap();
let actions = js_sys::Array::new();
for action in &self.recovery_actions {
actions.push(action);
}
js_sys::Reflect::set(&error, &"recovery".into(), &actions.into()).unwrap();
error.into()
}
}
// Panic hook for WASM
pub fn set_panic_hook() {
std::panic::set_hook(Box::new(|panic_info| {
let msg = panic_info.to_string();
web_sys::console::error_1(&format!("WASM Panic: {}", msg).into());
// Send error to monitoring
let error = WasmError::new(
"WASM-SYS-9999".to_string(),
"An unexpected error occurred. Please refresh the page.".to_string(),
);
// Dispatch error event
let window = web_sys::window().unwrap();
let event = web_sys::CustomEvent::new_with_event_init_dict(
"wasm-error",
web_sys::CustomEventInit::new().detail(&error.to_js_error()),
).unwrap();
window.dispatch_event(&event).unwrap();
}));
}
Integration Tests​
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_error_creation_and_serialization() {
let error = CoditechError::new(
codes::AUTH_INVALID_CREDENTIALS,
"Invalid email or password"
)
.with_context(ErrorContext {
user_id: Some("user123".to_string()),
tenant_id: Some("tenant456".to_string()),
..Default::default()
})
.with_recovery(RecoveryInfo {
actions: vec![
RecoveryAction {
label: "Reset Password".to_string(),
action: "reset_password".to_string(),
description: Some("Reset your password via email".to_string()),
},
],
auto_retry: false,
retry_after: None,
});
let json = serde_json::to_string(&error).unwrap();
let deserialized: CoditechError = serde_json::from_str(&json).unwrap();
assert_eq!(error.id, deserialized.id);
assert_eq!(error.code.to_string(), "AUTH-USR-1001");
}
#[tokio::test]
async fn test_error_handler_chain() {
let chain = ErrorHandlerChain::new()
.add_handler(Box::new(RetryHandler { max_retries: 3 }))
.add_handler(Box::new(FallbackHandler::default()));
let error = CoditechError::new(
codes::API_SERVICE_UNAVAILABLE,
"Service temporarily unavailable"
)
.with_recovery(RecoveryInfo {
actions: vec![],
auto_retry: true,
retry_after: Some(1),
});
let result = chain.handle(error).await;
// Verify retry was attempted
}
#[test]
fn test_error_code_formatting() {
let code = codes::DB_CONNECTION_FAILED;
assert_eq!(code.to_string(), "DB-SYS-2001");
}
}
Performance Considerations​
// Error handling performance optimizations
lazy_static! {
// Cache common error messages
static ref ERROR_MESSAGES: HashMap<String, String> = {
let mut m = HashMap::new();
m.insert("AUTH-USR-1001".to_string(), "Invalid email or password".to_string());
m.insert("DB-SYS-2001".to_string(), "Database connection failed".to_string());
// ... more cached messages
m
};
// Pre-compile error response templates
static ref ERROR_TEMPLATES: HashMap<String, serde_json::Value> = {
let mut m = HashMap::new();
// ... cached templates
m
};
}
// Implement error pooling to reduce allocations
pub struct ErrorPool {
pool: Vec<CoditechError>,
}
impl ErrorPool {
pub fn acquire(&mut self) -> CoditechError {
self.pool.pop().unwrap_or_else(|| CoditechError::default())
}
pub fn release(&mut self, mut error: CoditechError) {
error.clear(); // Reset error state
self.pool.push(error);
}
}
Monitoring Integration​
# Prometheus metrics for error tracking
coditect_errors_total{service="API",category="USR",code="4001"} 142
coditect_errors_total{service="DB",category="SYS",code="2001"} 3
coditect_error_recovery_success_total{service="API"} 89
coditect_error_recovery_failure_total{service="API"} 12
# Grafana dashboard queries
- Error rate by service
- Recovery success rate
- Most common errors
- Error trends over time
Implementation Checklist​
Core Components​
- Error code system with service/category/code structure
- CoditechError type with context and recovery
- Error context extraction middleware
- Error handler chain implementation
- Retry handler with exponential backoff
- Fallback handler for service failures
- Error logging with structured output
- Client-side error handler (TypeScript)
- WASM error handling with panic hook
Integration Points​
- API error responses with proper HTTP codes
- Database error mapping
- Container error handling
- Authentication error flows
- WebSocket error handling
- Multi-tenant error isolation
Testing​
- Unit tests for error creation/serialization
- Integration tests for error handler chain
- Performance benchmarks for error handling
- Error recovery scenario tests
- Multi-tenant isolation tests
Monitoring​
- Prometheus metrics configured
- Grafana dashboards created
- Alert rules defined
- Error tracking dashboard
- Recovery success metrics
Approval Signatures​
Technical Sign-off​
| Component | Owner | Approved | Date |
|---|---|---|---|
| Architecture | SESSION8-ORCHESTRATOR | ✓ | 2025-09-02 |
| Implementation | Pending | - | - |
| Security Review | Pending | - | - |
| Performance Test | Pending | - | - |
Review History​
| Version | Date | Reviewer | Status | Comments |
|---|---|---|---|---|
| 1.0.0 | 2025-09-02 | SESSION8-ORCHESTRATOR | DRAFT | Initial creation |
| 1.1.0 | 2025-09-02 | SESSION8-QA-REVIEWER | REVISION | Added v4.2 compliance elements |
This comprehensive error handling system ensures robust error management across all CODITECT components with proper context, recovery options, and monitoring.