ADR-022-v4: llm-Based Log Reranking - Part 2 (Technical)
Document Specification Block​
Document: ADR-022-v4-llm-log-reranking-part2-technical
Version: 1.0.0
Purpose: Constrain AI implementation with exact technical specifications for llm log reranking
Audience: AI agents, ML engineers, developers implementing the system
Date Created: 2025-09-01
Date Modified: 2025-09-01
QA Review Date: Pending
Status: DRAFT
Table of Contents​
- Constraints
- Dependencies
- Component Architecture
- Data Models
- Implementation Patterns
- API Specifications
- Testing Requirements
- Performance Benchmarks
- Security Controls
- Logging and Error Handling
- References
- Approval Signatures
1. Constraints​
CONSTRAINT: Response Time​
llm reranking MUST complete within 5 seconds for any query with up to 10,000 log entries.
CONSTRAINT: Relevance Accuracy​
Top 10 reranked results MUST achieve 95% relevance accuracy based on engineer feedback.
CONSTRAINT: Scalability​
System MUST handle 1TB+ of daily logs across 100+ services without degradation.
CONSTRAINT: Multi-Model Support​
MUST support Claude, GPT-4, Gemini, and local models with fallback capabilities.
CONSTRAINT: Cost Efficiency​
llm API costs MUST remain under $0.10 per incident query through optimization.
2. Dependencies​
cargo.toml Dependencies​
[dependencies]
# Core async and web
tokio = { version = "1.35", features = ["full"] }
actix-web = "4.4"
# Machine Learning
candle-core = "0.3"
candle-nn = "0.3"
candle-transformers = "0.3"
tokenizers = "0.15"
# Vector embeddings
qdrant-client = "1.7"
faiss = "0.12"
# Text processing
regex = "1.10"
unicode-segmentation = "1.10"
stopwords = "1.0"
# Database
foundationdb = { version = "0.8", features = ["embedded-fdb-include"] }
elasticsearch = "8.5"
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
# HTTP clients
reqwest = { version = "0.11", features = ["json", "rustls-tls"] }
# Utilities
uuid = { version = "1.6", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }
dashmap = "5.5"
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
3. Component Architecture​
// File: src/log_reranking/service.rs
use std::sync::Arc;
use anyhow::Result;
pub struct LogRerankingService {
query_processor: Arc<QueryProcessor>,
vector_store: Arc<VectorStore>,
llm_client: Arc<llmClient>,
rank_engine: Arc<RankingEngine>,
feedback_tracker: Arc<FeedbackTracker>,
cache: Arc<RerankingCache>,
}
// File: src/log_reranking/service.rs
impl LogRerankingService {
pub async fn rerank_logs(
&self,
query: &str,
initial_logs: Vec<LogEntry>,
context: &IncidentContext,
) -> Result<RerankingResult> {
// 1. Process query for intent
let processed_query = self.query_processor
.analyze_intent(query, context)
.await?;
// 2. Get semantic embeddings
let query_embedding = self.vector_store
.embed_query(&processed_query)
.await?;
// 3. Compute initial similarity scores
let mut candidates = vec![];
for log in initial_logs {
let log_embedding = self.vector_store
.get_or_create_embedding(&log)
.await?;
let similarity = cosine_similarity(&query_embedding, &log_embedding);
candidates.push(RankingCandidate {
log,
initial_score: similarity,
features: vec![],
});
}
// 4. Extract contextual features
self.extract_features(&mut candidates, context).await?;
// 5. llm-based reranking
let reranked = self.llm_client
.rerank_with_context(&processed_query, candidates)
.await?;
// 6. Apply final scoring
let final_results = self.rank_engine
.compute_final_scores(reranked)
.await?;
// 7. Track for feedback learning
let session_id = Uuid::new_v4();
self.feedback_tracker
.start_session(session_id, query, &final_results)
.await?;
Ok(RerankingResult {
session_id,
query: query.to_string(),
results: final_results,
processing_time_ms: start.elapsed().as_millis() as u64,
})
}
}
4. Data Models​
Core Models​
// File: src/models/log_reranking.rs
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LogEntry {
pub id: String,
pub timestamp: DateTime<Utc>,
pub level: LogLevel,
pub service: String,
pub message: String,
pub fields: HashMap<String, serde_json::Value>,
pub trace_id: Option<String>,
pub span_id: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IncidentContext {
pub user_id: Option<String>,
pub session_id: Option<String>,
pub service_name: Option<String>,
pub time_window: TimeWindow,
pub related_incidents: Vec<String>,
pub known_symptoms: Vec<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankingCandidate {
pub log: LogEntry,
pub initial_score: f32,
pub features: Vec<RankingFeature>,
pub llm_score: Option<f32>,
pub final_score: f32,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankingFeature {
pub name: String,
pub value: f32,
pub weight: f32,
pub description: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum RankingFeature {
SemanticSimilarity(f32),
TemporalProximity(f32),
ServiceRelevance(f32),
ErrorSeverity(f32),
CausalLikelihood(f32),
HistoricalRelevance(f32),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RerankingResult {
pub session_id: Uuid,
pub query: String,
pub results: Vec<RankedLogEntry>,
pub processing_time_ms: u64,
pub model_used: String,
pub confidence_score: f32,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankedLogEntry {
pub log: LogEntry,
pub rank: usize,
pub score: f32,
pub explanation: String,
pub causal_chain: Option<Vec<String>>,
}
5. Implementation Patterns​
llm Client Implementation​
// File: src/log_reranking/llm_client.rs
use reqwest::Client;
use serde_json::json;
pub struct llmClient {
client: Client,
model_config: ModelConfig,
fallback_models: Vec<ModelConfig>,
}
impl llmClient {
pub async fn rerank_with_context(
&self,
query: &ProcessedQuery,
candidates: Vec<RankingCandidate>,
) -> Result<Vec<RankingCandidate>> {
// Prepare prompt with query context
let prompt = self.build_reranking_prompt(query, &candidates)?;
// Try primary model first
match self.query_llm(&self.model_config, &prompt).await {
Ok(response) => self.parse_reranking_response(response, candidates),
Err(_) => {
// Fallback to alternative models
for fallback in &self.fallback_models {
if let Ok(response) = self.query_llm(fallback, &prompt).await {
return self.parse_reranking_response(response, candidates);
}
}
Err(anyhow!("All llm models failed"))
}
}
}
fn build_reranking_prompt(
&self,
query: &ProcessedQuery,
candidates: &[RankingCandidate],
) -> Result<String> {
let mut prompt = format!(
r#"You are an expert system administrator analyzing logs for incident: "{}"
Context:
- Time window: {} to {}
- Affected services: {}
- Known symptoms: {}
Please rank these {} log entries by relevance to the incident.
Consider: temporal proximity, causal relationships, error severity, and semantic relevance.
Respond with JSON array of scores (0.0-1.0):
"#,
query.intent,
query.time_window.start,
query.time_window.end,
query.services.join(", "),
query.symptoms.join(", "),
candidates.len()
);
// Add log entries with truncated messages
for (i, candidate) in candidates.iter().enumerate() {
let truncated_msg = if candidate.log.message.len() > 200 {
format!("{}...", &candidate.log.message[..200])
} else {
candidate.log.message.clone()
};
prompt.push_str(&format!(
"{}. [{}] {}: {}\n",
i + 1,
candidate.log.timestamp.format("%H:%M:%S"),
candidate.log.service,
truncated_msg
));
}
prompt.push_str("\nReturn array of scores: [0.95, 0.87, 0.12, ...]");
Ok(prompt)
}
}
Vector Store Implementation​
// File: src/log_reranking/vector_store.rs
use qdrant_client::{QdrantClient, qdrant};
pub struct VectorStore {
client: QdrantClient,
embedding_model: EmbeddingModel,
collection_name: String,
}
impl VectorStore {
pub async fn embed_query(&self, query: &ProcessedQuery) -> Result<Vec<f32>> {
let combined_text = format!(
"{} {} {}",
query.intent,
query.symptoms.join(" "),
query.services.join(" ")
);
self.embedding_model.encode(&combined_text).await
}
pub async fn get_or_create_embedding(&self, log: &LogEntry) -> Result<Vec<f32>> {
// Check if embedding exists
let search_result = self.client
.search_points(&qdrant::SearchPoints {
collection_name: self.collection_name.clone(),
vector: vec![], // Empty for exact match search
filter: Some(qdrant::Filter {
must: vec![qdrant::Condition {
condition_one_of: Some(
qdrant::condition::ConditionOneOf::Field(
qdrant::FieldCondition {
key: "log_id".to_string(),
match_: Some(qdrant::Match {
match_value: Some(
qdrant::match_::MatchValue::Keyword(log.id.clone())
)
}),
..Default::default()
}
)
)
}],
..Default::default()
}),
limit: 1,
..Default::default()
})
.await?;
if let Some(point) = search_result.result.first() {
Ok(point.vectors.as_ref().unwrap().vectors_count.as_ref().unwrap().data.clone())
} else {
// Generate new embedding
let text = format!("{} {}", log.service, log.message);
let embedding = self.embedding_model.encode(&text).await?;
// Store for future use
self.store_embedding(log, &embedding).await?;
Ok(embedding)
}
}
}
Ranking Engine​
// File: src/log_reranking/ranking_engine.rs
pub struct RankingEngine {
feature_weights: HashMap<String, f32>,
temporal_decay: f32,
service_graph: ServiceGraph,
}
impl RankingEngine {
pub async fn compute_final_scores(
&self,
mut candidates: Vec<RankingCandidate>,
) -> Result<Vec<RankedLogEntry>> {
let mut ranked_entries = vec![];
for (rank, candidate) in candidates.iter_mut().enumerate() {
// Combine all scoring factors
let mut final_score = 0.0;
// llm score (primary)
if let Some(llm_score) = candidate.llm_score {
final_score += llm_score * 0.6; // 60% weight
}
// Feature-based scores
for feature in &candidate.features {
let weight = self.feature_weights
.get(&feature.name)
.unwrap_or(&0.1);
final_score += feature.value * weight;
}
// Temporal decay
let time_factor = self.calculate_temporal_factor(
&candidate.log.timestamp,
&query.time_window
);
final_score *= time_factor;
candidate.final_score = final_score;
ranked_entries.push(RankedLogEntry {
log: candidate.log.clone(),
rank: rank + 1,
score: final_score,
explanation: self.generate_explanation(candidate)?,
causal_chain: self.detect_causal_chain(candidate).await?,
});
}
// Sort by final score
ranked_entries.sort_by(|a, b|
b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal)
);
Ok(ranked_entries)
}
fn calculate_temporal_factor(
&self,
log_time: &DateTime<Utc>,
incident_window: &TimeWindow,
) -> f32 {
let incident_start = incident_window.start;
let time_diff = (*log_time - incident_start).num_seconds().abs() as f32;
// Exponential decay: closer to incident time = higher relevance
(-time_diff * self.temporal_decay).exp()
}
}
6. API Specifications​
Log Reranking Endpoints​
// File: src/api/handlers/log_reranking.rs
use actix_web::{web, HttpResponse, Result};
#[post("/api/v1/logs/rerank")]
pub async fn rerank_logs(
request: web::Json<RerankingRequest>,
service: web::Data<Arc<LogRerankingService>>,
claims: Claims,
) -> Result<HttpResponse> {
// Validate permissions
if !claims.has_permission("logs:analyze") {
return Ok(HttpResponse::Forbidden().json(json!({
"error": "insufficient_permissions"
})));
}
// Rate limiting
if !service.check_rate_limit(&claims.user_id).await? {
return Ok(HttpResponse::TooManyRequests().json(json!({
"error": "rate_limit_exceeded",
"retry_after": 60
})));
}
// Execute reranking
match service.rerank_logs(
&request.query,
request.logs.clone(),
&request.context,
).await {
Ok(result) => Ok(HttpResponse::Ok().json(result)),
Err(e) => Ok(HttpResponse::InternalServerError().json(json!({
"error": "reranking_failed",
"details": e.to_string()
})))
}
}
#[post("/api/v1/logs/feedback")]
pub async fn submit_feedback(
feedback: web::Json<FeedbackRequest>,
tracker: web::Data<Arc<FeedbackTracker>>,
claims: Claims,
) -> Result<HttpResponse> {
tracker.record_feedback(
feedback.session_id,
&feedback.relevant_logs,
&feedback.irrelevant_logs,
claims.user_id,
).await?;
Ok(HttpResponse::Ok().json(json!({
"status": "feedback_recorded",
"session_id": feedback.session_id
})))
}
#[derive(Debug, Deserialize)]
pub struct RerankingRequest {
pub query: String,
pub logs: Vec<LogEntry>,
pub context: IncidentContext,
pub max_results: Option<usize>,
pub model_preference: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct FeedbackRequest {
pub session_id: Uuid,
pub relevant_logs: Vec<String>, // Log IDs
pub irrelevant_logs: Vec<String>,
pub resolution_notes: Option<String>,
}
7. Testing Requirements​
Test Coverage Requirements​
- Unit Test Coverage: ≥95% of ranking algorithms
- Integration Test Coverage: ≥90% of llm integrations
- E2E Test Coverage: Complete reranking workflows
- Performance Test Coverage: Load testing with 10K+ logs
- A/B Test Coverage: Model comparison validation
Integration Tests​
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_log_reranking_accuracy() {
let service = setup_test_service().await;
// Create test scenario
let incident_logs = create_test_incident_logs();
let query = "API timeout causing customer checkout failures";
let context = IncidentContext {
service_name: Some("checkout-api".to_string()),
time_window: TimeWindow {
start: Utc::now() - Duration::hours(1),
end: Utc::now(),
},
..Default::default()
};
// Execute reranking
let result = service.rerank_logs(&query, incident_logs, &context)
.await.unwrap();
// Validate results
assert!(result.results.len() <= 10);
assert!(result.results[0].score > 0.8); // Top result highly relevant
assert!(result.processing_time_ms < 5000);
// Check for expected root cause
let top_log = &result.results[0];
assert!(top_log.explanation.contains("timeout"));
assert_eq!(top_log.log.service, "checkout-api");
}
#[tokio::test]
async fn test_multi_model_fallback() {
let service = setup_test_service_with_fallback().await;
// Simulate primary model failure
service.llm_client.disable_primary_model();
let result = service.rerank_logs(
"database connection errors",
create_test_logs(),
&Default::default(),
).await.unwrap();
// Should complete using fallback model
assert!(!result.results.is_empty());
assert_ne!(result.model_used, "claude-3.5"); // Primary model
}
#[tokio::test]
async fn test_performance_with_large_dataset() {
let service = setup_test_service().await;
let large_dataset = create_large_log_dataset(10000); // 10K logs
let start = Instant::now();
let result = service.rerank_logs(
"performance degradation",
large_dataset,
&Default::default(),
).await.unwrap();
let duration = start.elapsed();
assert!(duration.as_secs() < 5); // Under 5 second constraint
assert!(result.results.len() <= 10);
}
}
8. Performance Benchmarks​
Required Metrics​
const MAX_RESPONSE_TIME_MS: u64 = 5000;
const MIN_RELEVANCE_ACCURACY: f32 = 0.95;
const MAX_COST_PER_QUERY: f32 = 0.10;
const MAX_MEMORY_MB: usize = 512;
pub struct PerformanceValidator {
pub async fn validate_reranking_performance(&self) -> Result<ValidationReport> {
let mut report = ValidationReport::default();
// Test response time with various dataset sizes
for size in [100, 1000, 5000, 10000] {
let logs = create_test_logs(size);
let start = Instant::now();
let _result = self.service.rerank_logs(
"test query",
logs,
&Default::default(),
).await?;
let duration = start.elapsed();
report.response_times.push((size, duration.as_millis() as u64));
}
// Validate all response times under constraint
report.response_time_ok = report.response_times
.iter()
.all(|(_, time)| *time <= MAX_RESPONSE_TIME_MS);
// Test relevance accuracy
let accuracy = self.measure_relevance_accuracy().await?;
report.accuracy_ok = accuracy >= MIN_RELEVANCE_ACCURACY;
// Test cost efficiency
let cost = self.estimate_query_cost().await?;
report.cost_ok = cost <= MAX_COST_PER_QUERY;
Ok(report)
}
}
9. Security Controls​
Data Privacy and Security​
// File: src/log_reranking/security.rs
pub struct RerankingSecurity {
pub async fn sanitize_logs_for_llm(
&self,
logs: &[LogEntry],
) -> Result<Vec<SanitizedLogEntry>> {
let mut sanitized = vec![];
for log in logs {
let mut sanitized_log = SanitizedLogEntry {
id: log.id.clone(),
timestamp: log.timestamp,
level: log.level.clone(),
service: log.service.clone(),
message: self.redact_sensitive_data(&log.message)?,
fields: HashMap::new(),
};
// Sanitize fields
for (key, value) in &log.fields {
if !self.is_sensitive_field(key) {
sanitized_log.fields.insert(
key.clone(),
self.sanitize_field_value(value)?
);
}
}
sanitized.push(sanitized_log);
}
Ok(sanitized)
}
fn redact_sensitive_data(&self, message: &str) -> Result<String> {
let mut redacted = message.to_string();
// Redact common sensitive patterns
let patterns = [
(r"password=\S+", "password=***"),
(r"token=\S+", "token=***"),
(r"key=\S+", "key=***"),
(r"\b\d{4}-\d{4}-\d{4}-\d{4}\b", "****-****-****-****"), // Credit cards
(r"\b\d{3}-\d{2}-\d{4}\b", "***-**-****"), // SSNs
];
for (pattern, replacement) in patterns {
let re = regex::Regex::new(pattern)?;
redacted = re.replace_all(&redacted, replacement).to_string();
}
Ok(redacted)
}
}
10. Logging and Error Handling​
Reranking Event Logging​
// File: src/log_reranking/logging.rs
use tracing::{info, warn, error, instrument};
#[instrument(skip(self))]
pub async fn log_reranking_request(&self, query: &str, log_count: usize) {
info!(
query = %query,
log_count = log_count,
"Log reranking request started"
);
}
#[instrument(skip(self, result))]
pub async fn log_reranking_result(&self, result: &RerankingResult) {
info!(
session_id = %result.session_id,
processing_time_ms = result.processing_time_ms,
model_used = %result.model_used,
confidence_score = result.confidence_score,
result_count = result.results.len(),
"Log reranking completed"
);
}
pub async fn log_model_performance(&self, model: &str, latency: Duration, cost: f32) {
info!(
model = model,
latency_ms = latency.as_millis(),
cost_dollars = cost,
"Model performance metrics"
);
}
Error Handling​
#[derive(thiserror::Error, Debug)]
pub enum LogRerankingError {
#[error("llm request failed: {0}")]
llmRequestFailed(String),
#[error("Vector embedding failed for log: {0}")]
EmbeddingFailed(String),
#[error("Query too complex: {0}")]
QueryTooComplex(String),
#[error("Rate limit exceeded for user: {0}")]
RateLimitExceeded(String),
#[error("Insufficient log context for analysis")]
InsufficientContext,
}
11. References​
- ADR-008-v4: Monitoring & Observability
- ADR-011-v4: Audit & Compliance
- ADR-004-v4: API Architecture
- LOGGING-STANDARD-v4
Model Compatibility​
- Claude: 3.5 Sonnet for production, Haiku for cost optimization
- GPT-4: Turbo for speed, standard for accuracy
- Gemini: Pro for balanced performance
- Local Models: Ollama for on-premise deployment
12. Approval Signatures​
Technical Sign-off​
| Component | Owner | Approved | Date |
|---|---|---|---|
| Architecture | Session5 | ✓ | 2025-09-01 |
| Implementation | Pending | - | - |
| ML Engineering | Pending | - | - |
| Security Review | Pending | - | - |
Implementation Checklist​
- llm client with multi-model support
- Vector store with embeddings
- Ranking engine with features
- Feedback learning system
- API endpoints complete
- Performance benchmarks met
- Security controls validated
- Data privacy compliance