ADR-022-v4: llm-Based Log Reranking - Part 2 (Technical)

Document Specification Block

Document: ADR-022-v4-llm-log-reranking-part2-technical
Version: 1.0.0
Purpose: Constrain AI implementation with exact technical specifications for llm log reranking
Audience: AI agents, ML engineers, developers implementing the system
Date Created: 2025-09-01
Date Modified: 2025-09-01
QA Review Date: Pending
Status: DRAFT

Constraints
Dependencies
Component Architecture
Data Models
Implementation Patterns
API Specifications
Testing Requirements
Performance Benchmarks
Security Controls
Logging and Error Handling
References
Approval Signatures

1. Constraints

CONSTRAINT: Response Time

llm reranking MUST complete within 5 seconds for any query with up to 10,000 log entries.

CONSTRAINT: Relevance Accuracy

Top 10 reranked results MUST achieve 95% relevance accuracy based on engineer feedback.

CONSTRAINT: Scalability

System MUST handle 1TB+ of daily logs across 100+ services without degradation.

CONSTRAINT: Multi-Model Support

MUST support Claude, GPT-4, Gemini, and local models with fallback capabilities.

CONSTRAINT: Cost Efficiency

llm API costs MUST remain under $0.10 per incident query through optimization.

↑ Back to Top

2. Dependencies

cargo.toml Dependencies

[dependencies]
# Core async and web
tokio = { version = "1.35", features = ["full"] }
actix-web = "4.4"

# Machine Learning
candle-core = "0.3"
candle-nn = "0.3"
candle-transformers = "0.3"
tokenizers = "0.15"

# Vector embeddings
qdrant-client = "1.7"
faiss = "0.12"

# Text processing
regex = "1.10"
unicode-segmentation = "1.10"
stopwords = "1.0"

# Database
foundationdb = { version = "0.8", features = ["embedded-fdb-include"] }
elasticsearch = "8.5"

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"

# HTTP clients
reqwest = { version = "0.11", features = ["json", "rustls-tls"] }

# Utilities
uuid = { version = "1.6", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }
dashmap = "5.5"
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"

↑ Back to Top

3. Component Architecture

// File: src/log_reranking/service.rs
use std::sync::Arc;
use anyhow::Result;

pub struct LogRerankingService {
    query_processor: Arc<QueryProcessor>,
    vector_store: Arc<VectorStore>,
    llm_client: Arc<llmClient>,
    rank_engine: Arc<RankingEngine>,
    feedback_tracker: Arc<FeedbackTracker>,
    cache: Arc<RerankingCache>,
}

// File: src/log_reranking/service.rs
impl LogRerankingService {
    pub async fn rerank_logs(
        &self,
        query: &str,
        initial_logs: Vec<LogEntry>,
        context: &IncidentContext,
    ) -> Result<RerankingResult> {
        // 1. Process query for intent
        let processed_query = self.query_processor
            .analyze_intent(query, context)
            .await?;
        
        // 2. Get semantic embeddings
        let query_embedding = self.vector_store
            .embed_query(&processed_query)
            .await?;
        
        // 3. Compute initial similarity scores
        let mut candidates = vec![];
        for log in initial_logs {
            let log_embedding = self.vector_store
                .get_or_create_embedding(&log)
                .await?;
            
            let similarity = cosine_similarity(&query_embedding, &log_embedding);
            candidates.push(RankingCandidate {
                log,
                initial_score: similarity,
                features: vec![],
            });
        }
        
        // 4. Extract contextual features
        self.extract_features(&mut candidates, context).await?;
        
        // 5. llm-based reranking
        let reranked = self.llm_client
            .rerank_with_context(&processed_query, candidates)
            .await?;
        
        // 6. Apply final scoring
        let final_results = self.rank_engine
            .compute_final_scores(reranked)
            .await?;
        
        // 7. Track for feedback learning
        let session_id = Uuid::new_v4();
        self.feedback_tracker
            .start_session(session_id, query, &final_results)
            .await?;
        
        Ok(RerankingResult {
            session_id,
            query: query.to_string(),
            results: final_results,
            processing_time_ms: start.elapsed().as_millis() as u64,
        })
    }
}

↑ Back to Top

4. Data Models

Core Models

// File: src/models/log_reranking.rs
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LogEntry {
    pub id: String,
    pub timestamp: DateTime<Utc>,
    pub level: LogLevel,
    pub service: String,
    pub message: String,
    pub fields: HashMap<String, serde_json::Value>,
    pub trace_id: Option<String>,
    pub span_id: Option<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IncidentContext {
    pub user_id: Option<String>,
    pub session_id: Option<String>,
    pub service_name: Option<String>,
    pub time_window: TimeWindow,
    pub related_incidents: Vec<String>,
    pub known_symptoms: Vec<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankingCandidate {
    pub log: LogEntry,
    pub initial_score: f32,
    pub features: Vec<RankingFeature>,
    pub llm_score: Option<f32>,
    pub final_score: f32,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankingFeature {
    pub name: String,
    pub value: f32,
    pub weight: f32,
    pub description: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum RankingFeature {
    SemanticSimilarity(f32),
    TemporalProximity(f32),
    ServiceRelevance(f32),
    ErrorSeverity(f32),
    CausalLikelihood(f32),
    HistoricalRelevance(f32),
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RerankingResult {
    pub session_id: Uuid,
    pub query: String,
    pub results: Vec<RankedLogEntry>,
    pub processing_time_ms: u64,
    pub model_used: String,
    pub confidence_score: f32,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankedLogEntry {
    pub log: LogEntry,
    pub rank: usize,
    pub score: f32,
    pub explanation: String,
    pub causal_chain: Option<Vec<String>>,
}

↑ Back to Top

5. Implementation Patterns

llm Client Implementation

// File: src/log_reranking/llm_client.rs
use reqwest::Client;
use serde_json::json;

pub struct llmClient {
    client: Client,
    model_config: ModelConfig,
    fallback_models: Vec<ModelConfig>,
}

impl llmClient {
    pub async fn rerank_with_context(
        &self,
        query: &ProcessedQuery,
        candidates: Vec<RankingCandidate>,
    ) -> Result<Vec<RankingCandidate>> {
        // Prepare prompt with query context
        let prompt = self.build_reranking_prompt(query, &candidates)?;
        
        // Try primary model first
        match self.query_llm(&self.model_config, &prompt).await {
            Ok(response) => self.parse_reranking_response(response, candidates),
            Err(_) => {
                // Fallback to alternative models
                for fallback in &self.fallback_models {
                    if let Ok(response) = self.query_llm(fallback, &prompt).await {
                        return self.parse_reranking_response(response, candidates);
                    }
                }
                Err(anyhow!("All llm models failed"))
            }
        }
    }
    
    fn build_reranking_prompt(
        &self,
        query: &ProcessedQuery,
        candidates: &[RankingCandidate],
    ) -> Result<String> {
        let mut prompt = format!(
            r#"You are an expert system administrator analyzing logs for incident: "{}"

Context:
- Time window: {} to {}
- Affected services: {}
- Known symptoms: {}

Please rank these {} log entries by relevance to the incident.
Consider: temporal proximity, causal relationships, error severity, and semantic relevance.

Respond with JSON array of scores (0.0-1.0):

"#,
            query.intent,
            query.time_window.start,
            query.time_window.end,
            query.services.join(", "),
            query.symptoms.join(", "),
            candidates.len()
        );
        
        // Add log entries with truncated messages
        for (i, candidate) in candidates.iter().enumerate() {
            let truncated_msg = if candidate.log.message.len() > 200 {
                format!("{}...", &candidate.log.message[..200])
            } else {
                candidate.log.message.clone()
            };
            
            prompt.push_str(&format!(
                "{}. [{}] {}: {}\n",
                i + 1,
                candidate.log.timestamp.format("%H:%M:%S"),
                candidate.log.service,
                truncated_msg
            ));
        }
        
        prompt.push_str("\nReturn array of scores: [0.95, 0.87, 0.12, ...]");
        Ok(prompt)
    }
}

Vector Store Implementation

// File: src/log_reranking/vector_store.rs
use qdrant_client::{QdrantClient, qdrant};

pub struct VectorStore {
    client: QdrantClient,
    embedding_model: EmbeddingModel,
    collection_name: String,
}

impl VectorStore {
    pub async fn embed_query(&self, query: &ProcessedQuery) -> Result<Vec<f32>> {
        let combined_text = format!(
            "{} {} {}",
            query.intent,
            query.symptoms.join(" "),
            query.services.join(" ")
        );
        
        self.embedding_model.encode(&combined_text).await
    }
    
    pub async fn get_or_create_embedding(&self, log: &LogEntry) -> Result<Vec<f32>> {
        // Check if embedding exists
        let search_result = self.client
            .search_points(&qdrant::SearchPoints {
                collection_name: self.collection_name.clone(),
                vector: vec![], // Empty for exact match search
                filter: Some(qdrant::Filter {
                    must: vec![qdrant::Condition {
                        condition_one_of: Some(
                            qdrant::condition::ConditionOneOf::Field(
                                qdrant::FieldCondition {
                                    key: "log_id".to_string(),
                                    match_: Some(qdrant::Match {
                                        match_value: Some(
                                            qdrant::match_::MatchValue::Keyword(log.id.clone())
                                        )
                                    }),
                                    ..Default::default()
                                }
                            )
                        )
                    }],
                    ..Default::default()
                }),
                limit: 1,
                ..Default::default()
            })
            .await?;
        
        if let Some(point) = search_result.result.first() {
            Ok(point.vectors.as_ref().unwrap().vectors_count.as_ref().unwrap().data.clone())
        } else {
            // Generate new embedding
            let text = format!("{} {}", log.service, log.message);
            let embedding = self.embedding_model.encode(&text).await?;
            
            // Store for future use
            self.store_embedding(log, &embedding).await?;
            
            Ok(embedding)
        }
    }
}

Ranking Engine

// File: src/log_reranking/ranking_engine.rs
pub struct RankingEngine {
    feature_weights: HashMap<String, f32>,
    temporal_decay: f32,
    service_graph: ServiceGraph,
}

impl RankingEngine {
    pub async fn compute_final_scores(
        &self,
        mut candidates: Vec<RankingCandidate>,
    ) -> Result<Vec<RankedLogEntry>> {
        let mut ranked_entries = vec![];
        
        for (rank, candidate) in candidates.iter_mut().enumerate() {
            // Combine all scoring factors
            let mut final_score = 0.0;
            
            // llm score (primary)
            if let Some(llm_score) = candidate.llm_score {
                final_score += llm_score * 0.6; // 60% weight
            }
            
            // Feature-based scores
            for feature in &candidate.features {
                let weight = self.feature_weights
                    .get(&feature.name)
                    .unwrap_or(&0.1);
                final_score += feature.value * weight;
            }
            
            // Temporal decay
            let time_factor = self.calculate_temporal_factor(
                &candidate.log.timestamp,
                &query.time_window
            );
            final_score *= time_factor;
            
            candidate.final_score = final_score;
            
            ranked_entries.push(RankedLogEntry {
                log: candidate.log.clone(),
                rank: rank + 1,
                score: final_score,
                explanation: self.generate_explanation(candidate)?,
                causal_chain: self.detect_causal_chain(candidate).await?,
            });
        }
        
        // Sort by final score
        ranked_entries.sort_by(|a, b| 
            b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal)
        );
        
        Ok(ranked_entries)
    }
    
    fn calculate_temporal_factor(
        &self,
        log_time: &DateTime<Utc>,
        incident_window: &TimeWindow,
    ) -> f32 {
        let incident_start = incident_window.start;
        let time_diff = (*log_time - incident_start).num_seconds().abs() as f32;
        
        // Exponential decay: closer to incident time = higher relevance
        (-time_diff * self.temporal_decay).exp()
    }
}

↑ Back to Top

6. API Specifications

Log Reranking Endpoints

// File: src/api/handlers/log_reranking.rs
use actix_web::{web, HttpResponse, Result};

#[post("/api/v1/logs/rerank")]
pub async fn rerank_logs(
    request: web::Json<RerankingRequest>,
    service: web::Data<Arc<LogRerankingService>>,
    claims: Claims,
) -> Result<HttpResponse> {
    // Validate permissions
    if !claims.has_permission("logs:analyze") {
        return Ok(HttpResponse::Forbidden().json(json!({
            "error": "insufficient_permissions"
        })));
    }
    
    // Rate limiting
    if !service.check_rate_limit(&claims.user_id).await? {
        return Ok(HttpResponse::TooManyRequests().json(json!({
            "error": "rate_limit_exceeded",
            "retry_after": 60
        })));
    }
    
    // Execute reranking
    match service.rerank_logs(
        &request.query,
        request.logs.clone(),
        &request.context,
    ).await {
        Ok(result) => Ok(HttpResponse::Ok().json(result)),
        Err(e) => Ok(HttpResponse::InternalServerError().json(json!({
            "error": "reranking_failed",
            "details": e.to_string()
        })))
    }
}

#[post("/api/v1/logs/feedback")]
pub async fn submit_feedback(
    feedback: web::Json<FeedbackRequest>,
    tracker: web::Data<Arc<FeedbackTracker>>,
    claims: Claims,
) -> Result<HttpResponse> {
    tracker.record_feedback(
        feedback.session_id,
        &feedback.relevant_logs,
        &feedback.irrelevant_logs,
        claims.user_id,
    ).await?;
    
    Ok(HttpResponse::Ok().json(json!({
        "status": "feedback_recorded",
        "session_id": feedback.session_id
    })))
}

#[derive(Debug, Deserialize)]
pub struct RerankingRequest {
    pub query: String,
    pub logs: Vec<LogEntry>,
    pub context: IncidentContext,
    pub max_results: Option<usize>,
    pub model_preference: Option<String>,
}

#[derive(Debug, Deserialize)]
pub struct FeedbackRequest {
    pub session_id: Uuid,
    pub relevant_logs: Vec<String>,  // Log IDs
    pub irrelevant_logs: Vec<String>,
    pub resolution_notes: Option<String>,
}

↑ Back to Top

7. Testing Requirements

Test Coverage Requirements

Unit Test Coverage: ≥95% of ranking algorithms
Integration Test Coverage: ≥90% of llm integrations
E2E Test Coverage: Complete reranking workflows
Performance Test Coverage: Load testing with 10K+ logs
A/B Test Coverage: Model comparison validation

Integration Tests

#[cfg(test)]
mod tests {
    use super::*;
    
    #[tokio::test]
    async fn test_log_reranking_accuracy() {
        let service = setup_test_service().await;
        
        // Create test scenario
        let incident_logs = create_test_incident_logs();
        let query = "API timeout causing customer checkout failures";
        let context = IncidentContext {
            service_name: Some("checkout-api".to_string()),
            time_window: TimeWindow {
                start: Utc::now() - Duration::hours(1),
                end: Utc::now(),
            },
            ..Default::default()
        };
        
        // Execute reranking
        let result = service.rerank_logs(&query, incident_logs, &context)
            .await.unwrap();
        
        // Validate results
        assert!(result.results.len() <= 10);
        assert!(result.results[0].score > 0.8); // Top result highly relevant
        assert!(result.processing_time_ms < 5000);
        
        // Check for expected root cause
        let top_log = &result.results[0];
        assert!(top_log.explanation.contains("timeout"));
        assert_eq!(top_log.log.service, "checkout-api");
    }
    
    #[tokio::test]
    async fn test_multi_model_fallback() {
        let service = setup_test_service_with_fallback().await;
        
        // Simulate primary model failure
        service.llm_client.disable_primary_model();
        
        let result = service.rerank_logs(
            "database connection errors",
            create_test_logs(),
            &Default::default(),
        ).await.unwrap();
        
        // Should complete using fallback model
        assert!(!result.results.is_empty());
        assert_ne!(result.model_used, "claude-3.5"); // Primary model
    }
    
    #[tokio::test]
    async fn test_performance_with_large_dataset() {
        let service = setup_test_service().await;
        let large_dataset = create_large_log_dataset(10000); // 10K logs
        
        let start = Instant::now();
        let result = service.rerank_logs(
            "performance degradation",
            large_dataset,
            &Default::default(),
        ).await.unwrap();
        let duration = start.elapsed();
        
        assert!(duration.as_secs() < 5); // Under 5 second constraint
        assert!(result.results.len() <= 10);
    }
}

↑ Back to Top

8. Performance Benchmarks

Required Metrics

const MAX_RESPONSE_TIME_MS: u64 = 5000;
const MIN_RELEVANCE_ACCURACY: f32 = 0.95;
const MAX_COST_PER_QUERY: f32 = 0.10;
const MAX_MEMORY_MB: usize = 512;

pub struct PerformanceValidator {
    pub async fn validate_reranking_performance(&self) -> Result<ValidationReport> {
        let mut report = ValidationReport::default();
        
        // Test response time with various dataset sizes
        for size in [100, 1000, 5000, 10000] {
            let logs = create_test_logs(size);
            let start = Instant::now();
            
            let _result = self.service.rerank_logs(
                "test query",
                logs,
                &Default::default(),
            ).await?;
            
            let duration = start.elapsed();
            report.response_times.push((size, duration.as_millis() as u64));
        }
        
        // Validate all response times under constraint
        report.response_time_ok = report.response_times
            .iter()
            .all(|(_, time)| *time <= MAX_RESPONSE_TIME_MS);
        
        // Test relevance accuracy
        let accuracy = self.measure_relevance_accuracy().await?;
        report.accuracy_ok = accuracy >= MIN_RELEVANCE_ACCURACY;
        
        // Test cost efficiency
        let cost = self.estimate_query_cost().await?;
        report.cost_ok = cost <= MAX_COST_PER_QUERY;
        
        Ok(report)
    }
}

↑ Back to Top

9. Security Controls

Data Privacy and Security

// File: src/log_reranking/security.rs
pub struct RerankingSecurity {
    pub async fn sanitize_logs_for_llm(
        &self,
        logs: &[LogEntry],
    ) -> Result<Vec<SanitizedLogEntry>> {
        let mut sanitized = vec![];
        
        for log in logs {
            let mut sanitized_log = SanitizedLogEntry {
                id: log.id.clone(),
                timestamp: log.timestamp,
                level: log.level.clone(),
                service: log.service.clone(),
                message: self.redact_sensitive_data(&log.message)?,
                fields: HashMap::new(),
            };
            
            // Sanitize fields
            for (key, value) in &log.fields {
                if !self.is_sensitive_field(key) {
                    sanitized_log.fields.insert(
                        key.clone(),
                        self.sanitize_field_value(value)?
                    );
                }
            }
            
            sanitized.push(sanitized_log);
        }
        
        Ok(sanitized)
    }
    
    fn redact_sensitive_data(&self, message: &str) -> Result<String> {
        let mut redacted = message.to_string();
        
        // Redact common sensitive patterns
        let patterns = [
            (r"password=\S+", "password=***"),
            (r"token=\S+", "token=***"),
            (r"key=\S+", "key=***"),
            (r"\b\d{4}-\d{4}-\d{4}-\d{4}\b", "****-****-****-****"), // Credit cards
            (r"\b\d{3}-\d{2}-\d{4}\b", "***-**-****"), // SSNs
        ];
        
        for (pattern, replacement) in patterns {
            let re = regex::Regex::new(pattern)?;
            redacted = re.replace_all(&redacted, replacement).to_string();
        }
        
        Ok(redacted)
    }
}

↑ Back to Top

10. Logging and Error Handling

Reranking Event Logging

// File: src/log_reranking/logging.rs
use tracing::{info, warn, error, instrument};

#[instrument(skip(self))]
pub async fn log_reranking_request(&self, query: &str, log_count: usize) {
    info!(
        query = %query,
        log_count = log_count,
        "Log reranking request started"
    );
}

#[instrument(skip(self, result))]
pub async fn log_reranking_result(&self, result: &RerankingResult) {
    info!(
        session_id = %result.session_id,
        processing_time_ms = result.processing_time_ms,
        model_used = %result.model_used,
        confidence_score = result.confidence_score,
        result_count = result.results.len(),
        "Log reranking completed"
    );
}

pub async fn log_model_performance(&self, model: &str, latency: Duration, cost: f32) {
    info!(
        model = model,
        latency_ms = latency.as_millis(),
        cost_dollars = cost,
        "Model performance metrics"
    );
}

Error Handling

#[derive(thiserror::Error, Debug)]
pub enum LogRerankingError {
    #[error("llm request failed: {0}")]
    llmRequestFailed(String),
    
    #[error("Vector embedding failed for log: {0}")]
    EmbeddingFailed(String),
    
    #[error("Query too complex: {0}")]
    QueryTooComplex(String),
    
    #[error("Rate limit exceeded for user: {0}")]
    RateLimitExceeded(String),
    
    #[error("Insufficient log context for analysis")]
    InsufficientContext,
}

↑ Back to Top

11. References

Model Compatibility

Claude: 3.5 Sonnet for production, Haiku for cost optimization
GPT-4: Turbo for speed, standard for accuracy
Gemini: Pro for balanced performance
Local Models: Ollama for on-premise deployment

↑ Back to Top

12. Approval Signatures

Technical Sign-off

Component	Owner	Approved	Date
Architecture	Session5	✓	2025-09-01
Implementation	Pending	-	-
ML Engineering	Pending	-	-
Security Review	Pending	-	-

Implementation Checklist

↑ Back to Top

Document Specification Block​

Table of Contents​

1. Constraints​

CONSTRAINT: Response Time​

CONSTRAINT: Relevance Accuracy​

CONSTRAINT: Scalability​

CONSTRAINT: Multi-Model Support​

CONSTRAINT: Cost Efficiency​

2. Dependencies​

cargo.toml Dependencies​

3. Component Architecture​

4. Data Models​

Core Models​

5. Implementation Patterns​

llm Client Implementation​

Vector Store Implementation​

Ranking Engine​

6. API Specifications​

Log Reranking Endpoints​

7. Testing Requirements​

Test Coverage Requirements​

Integration Tests​

8. Performance Benchmarks​

Required Metrics​

9. Security Controls​

Data Privacy and Security​

10. Logging and Error Handling​

Reranking Event Logging​

Error Handling​

11. References​

Model Compatibility​

12. Approval Signatures​

Technical Sign-off​

Implementation Checklist​

Document Specification Block

Table of Contents

1. Constraints

CONSTRAINT: Response Time

CONSTRAINT: Relevance Accuracy

CONSTRAINT: Scalability

CONSTRAINT: Multi-Model Support

CONSTRAINT: Cost Efficiency

2. Dependencies

cargo.toml Dependencies

3. Component Architecture

4. Data Models

Core Models

5. Implementation Patterns

llm Client Implementation

Vector Store Implementation

Ranking Engine

6. API Specifications

Log Reranking Endpoints

7. Testing Requirements

Test Coverage Requirements

Integration Tests

8. Performance Benchmarks

Required Metrics

9. Security Controls

Data Privacy and Security

10. Logging and Error Handling

Reranking Event Logging

Error Handling

11. References

Model Compatibility

12. Approval Signatures

Technical Sign-off

Implementation Checklist