Skip to main content

ADR-022-v4: llm-Based Log Reranking - Part 2 (Technical)

Document Specification Block​

Document: ADR-022-v4-llm-log-reranking-part2-technical
Version: 1.0.0
Purpose: Constrain AI implementation with exact technical specifications for llm log reranking
Audience: AI agents, ML engineers, developers implementing the system
Date Created: 2025-09-01
Date Modified: 2025-09-01
QA Review Date: Pending
Status: DRAFT

Table of Contents​

  1. Constraints
  2. Dependencies
  3. Component Architecture
  4. Data Models
  5. Implementation Patterns
  6. API Specifications
  7. Testing Requirements
  8. Performance Benchmarks
  9. Security Controls
  10. Logging and Error Handling
  11. References
  12. Approval Signatures

1. Constraints​

CONSTRAINT: Response Time​

llm reranking MUST complete within 5 seconds for any query with up to 10,000 log entries.

CONSTRAINT: Relevance Accuracy​

Top 10 reranked results MUST achieve 95% relevance accuracy based on engineer feedback.

CONSTRAINT: Scalability​

System MUST handle 1TB+ of daily logs across 100+ services without degradation.

CONSTRAINT: Multi-Model Support​

MUST support Claude, GPT-4, Gemini, and local models with fallback capabilities.

CONSTRAINT: Cost Efficiency​

llm API costs MUST remain under $0.10 per incident query through optimization.

↑ Back to Top

2. Dependencies​

cargo.toml Dependencies​

[dependencies]
# Core async and web
tokio = { version = "1.35", features = ["full"] }
actix-web = "4.4"

# Machine Learning
candle-core = "0.3"
candle-nn = "0.3"
candle-transformers = "0.3"
tokenizers = "0.15"

# Vector embeddings
qdrant-client = "1.7"
faiss = "0.12"

# Text processing
regex = "1.10"
unicode-segmentation = "1.10"
stopwords = "1.0"

# Database
foundationdb = { version = "0.8", features = ["embedded-fdb-include"] }
elasticsearch = "8.5"

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"

# HTTP clients
reqwest = { version = "0.11", features = ["json", "rustls-tls"] }

# Utilities
uuid = { version = "1.6", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }
dashmap = "5.5"
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"

↑ Back to Top

3. Component Architecture​

// File: src/log_reranking/service.rs
use std::sync::Arc;
use anyhow::Result;

pub struct LogRerankingService {
query_processor: Arc<QueryProcessor>,
vector_store: Arc<VectorStore>,
llm_client: Arc<llmClient>,
rank_engine: Arc<RankingEngine>,
feedback_tracker: Arc<FeedbackTracker>,
cache: Arc<RerankingCache>,
}

// File: src/log_reranking/service.rs
impl LogRerankingService {
pub async fn rerank_logs(
&self,
query: &str,
initial_logs: Vec<LogEntry>,
context: &IncidentContext,
) -> Result<RerankingResult> {
// 1. Process query for intent
let processed_query = self.query_processor
.analyze_intent(query, context)
.await?;

// 2. Get semantic embeddings
let query_embedding = self.vector_store
.embed_query(&processed_query)
.await?;

// 3. Compute initial similarity scores
let mut candidates = vec![];
for log in initial_logs {
let log_embedding = self.vector_store
.get_or_create_embedding(&log)
.await?;

let similarity = cosine_similarity(&query_embedding, &log_embedding);
candidates.push(RankingCandidate {
log,
initial_score: similarity,
features: vec![],
});
}

// 4. Extract contextual features
self.extract_features(&mut candidates, context).await?;

// 5. llm-based reranking
let reranked = self.llm_client
.rerank_with_context(&processed_query, candidates)
.await?;

// 6. Apply final scoring
let final_results = self.rank_engine
.compute_final_scores(reranked)
.await?;

// 7. Track for feedback learning
let session_id = Uuid::new_v4();
self.feedback_tracker
.start_session(session_id, query, &final_results)
.await?;

Ok(RerankingResult {
session_id,
query: query.to_string(),
results: final_results,
processing_time_ms: start.elapsed().as_millis() as u64,
})
}
}

↑ Back to Top

4. Data Models​

Core Models​

// File: src/models/log_reranking.rs
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LogEntry {
pub id: String,
pub timestamp: DateTime<Utc>,
pub level: LogLevel,
pub service: String,
pub message: String,
pub fields: HashMap<String, serde_json::Value>,
pub trace_id: Option<String>,
pub span_id: Option<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IncidentContext {
pub user_id: Option<String>,
pub session_id: Option<String>,
pub service_name: Option<String>,
pub time_window: TimeWindow,
pub related_incidents: Vec<String>,
pub known_symptoms: Vec<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankingCandidate {
pub log: LogEntry,
pub initial_score: f32,
pub features: Vec<RankingFeature>,
pub llm_score: Option<f32>,
pub final_score: f32,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankingFeature {
pub name: String,
pub value: f32,
pub weight: f32,
pub description: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum RankingFeature {
SemanticSimilarity(f32),
TemporalProximity(f32),
ServiceRelevance(f32),
ErrorSeverity(f32),
CausalLikelihood(f32),
HistoricalRelevance(f32),
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RerankingResult {
pub session_id: Uuid,
pub query: String,
pub results: Vec<RankedLogEntry>,
pub processing_time_ms: u64,
pub model_used: String,
pub confidence_score: f32,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RankedLogEntry {
pub log: LogEntry,
pub rank: usize,
pub score: f32,
pub explanation: String,
pub causal_chain: Option<Vec<String>>,
}

↑ Back to Top

5. Implementation Patterns​

llm Client Implementation​

// File: src/log_reranking/llm_client.rs
use reqwest::Client;
use serde_json::json;

pub struct llmClient {
client: Client,
model_config: ModelConfig,
fallback_models: Vec<ModelConfig>,
}

impl llmClient {
pub async fn rerank_with_context(
&self,
query: &ProcessedQuery,
candidates: Vec<RankingCandidate>,
) -> Result<Vec<RankingCandidate>> {
// Prepare prompt with query context
let prompt = self.build_reranking_prompt(query, &candidates)?;

// Try primary model first
match self.query_llm(&self.model_config, &prompt).await {
Ok(response) => self.parse_reranking_response(response, candidates),
Err(_) => {
// Fallback to alternative models
for fallback in &self.fallback_models {
if let Ok(response) = self.query_llm(fallback, &prompt).await {
return self.parse_reranking_response(response, candidates);
}
}
Err(anyhow!("All llm models failed"))
}
}
}

fn build_reranking_prompt(
&self,
query: &ProcessedQuery,
candidates: &[RankingCandidate],
) -> Result<String> {
let mut prompt = format!(
r#"You are an expert system administrator analyzing logs for incident: "{}"

Context:
- Time window: {} to {}
- Affected services: {}
- Known symptoms: {}

Please rank these {} log entries by relevance to the incident.
Consider: temporal proximity, causal relationships, error severity, and semantic relevance.

Respond with JSON array of scores (0.0-1.0):

"#,
query.intent,
query.time_window.start,
query.time_window.end,
query.services.join(", "),
query.symptoms.join(", "),
candidates.len()
);

// Add log entries with truncated messages
for (i, candidate) in candidates.iter().enumerate() {
let truncated_msg = if candidate.log.message.len() > 200 {
format!("{}...", &candidate.log.message[..200])
} else {
candidate.log.message.clone()
};

prompt.push_str(&format!(
"{}. [{}] {}: {}\n",
i + 1,
candidate.log.timestamp.format("%H:%M:%S"),
candidate.log.service,
truncated_msg
));
}

prompt.push_str("\nReturn array of scores: [0.95, 0.87, 0.12, ...]");
Ok(prompt)
}
}

Vector Store Implementation​

// File: src/log_reranking/vector_store.rs
use qdrant_client::{QdrantClient, qdrant};

pub struct VectorStore {
client: QdrantClient,
embedding_model: EmbeddingModel,
collection_name: String,
}

impl VectorStore {
pub async fn embed_query(&self, query: &ProcessedQuery) -> Result<Vec<f32>> {
let combined_text = format!(
"{} {} {}",
query.intent,
query.symptoms.join(" "),
query.services.join(" ")
);

self.embedding_model.encode(&combined_text).await
}

pub async fn get_or_create_embedding(&self, log: &LogEntry) -> Result<Vec<f32>> {
// Check if embedding exists
let search_result = self.client
.search_points(&qdrant::SearchPoints {
collection_name: self.collection_name.clone(),
vector: vec![], // Empty for exact match search
filter: Some(qdrant::Filter {
must: vec![qdrant::Condition {
condition_one_of: Some(
qdrant::condition::ConditionOneOf::Field(
qdrant::FieldCondition {
key: "log_id".to_string(),
match_: Some(qdrant::Match {
match_value: Some(
qdrant::match_::MatchValue::Keyword(log.id.clone())
)
}),
..Default::default()
}
)
)
}],
..Default::default()
}),
limit: 1,
..Default::default()
})
.await?;

if let Some(point) = search_result.result.first() {
Ok(point.vectors.as_ref().unwrap().vectors_count.as_ref().unwrap().data.clone())
} else {
// Generate new embedding
let text = format!("{} {}", log.service, log.message);
let embedding = self.embedding_model.encode(&text).await?;

// Store for future use
self.store_embedding(log, &embedding).await?;

Ok(embedding)
}
}
}

Ranking Engine​

// File: src/log_reranking/ranking_engine.rs
pub struct RankingEngine {
feature_weights: HashMap<String, f32>,
temporal_decay: f32,
service_graph: ServiceGraph,
}

impl RankingEngine {
pub async fn compute_final_scores(
&self,
mut candidates: Vec<RankingCandidate>,
) -> Result<Vec<RankedLogEntry>> {
let mut ranked_entries = vec![];

for (rank, candidate) in candidates.iter_mut().enumerate() {
// Combine all scoring factors
let mut final_score = 0.0;

// llm score (primary)
if let Some(llm_score) = candidate.llm_score {
final_score += llm_score * 0.6; // 60% weight
}

// Feature-based scores
for feature in &candidate.features {
let weight = self.feature_weights
.get(&feature.name)
.unwrap_or(&0.1);
final_score += feature.value * weight;
}

// Temporal decay
let time_factor = self.calculate_temporal_factor(
&candidate.log.timestamp,
&query.time_window
);
final_score *= time_factor;

candidate.final_score = final_score;

ranked_entries.push(RankedLogEntry {
log: candidate.log.clone(),
rank: rank + 1,
score: final_score,
explanation: self.generate_explanation(candidate)?,
causal_chain: self.detect_causal_chain(candidate).await?,
});
}

// Sort by final score
ranked_entries.sort_by(|a, b|
b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal)
);

Ok(ranked_entries)
}

fn calculate_temporal_factor(
&self,
log_time: &DateTime<Utc>,
incident_window: &TimeWindow,
) -> f32 {
let incident_start = incident_window.start;
let time_diff = (*log_time - incident_start).num_seconds().abs() as f32;

// Exponential decay: closer to incident time = higher relevance
(-time_diff * self.temporal_decay).exp()
}
}

↑ Back to Top

6. API Specifications​

Log Reranking Endpoints​

// File: src/api/handlers/log_reranking.rs
use actix_web::{web, HttpResponse, Result};

#[post("/api/v1/logs/rerank")]
pub async fn rerank_logs(
request: web::Json<RerankingRequest>,
service: web::Data<Arc<LogRerankingService>>,
claims: Claims,
) -> Result<HttpResponse> {
// Validate permissions
if !claims.has_permission("logs:analyze") {
return Ok(HttpResponse::Forbidden().json(json!({
"error": "insufficient_permissions"
})));
}

// Rate limiting
if !service.check_rate_limit(&claims.user_id).await? {
return Ok(HttpResponse::TooManyRequests().json(json!({
"error": "rate_limit_exceeded",
"retry_after": 60
})));
}

// Execute reranking
match service.rerank_logs(
&request.query,
request.logs.clone(),
&request.context,
).await {
Ok(result) => Ok(HttpResponse::Ok().json(result)),
Err(e) => Ok(HttpResponse::InternalServerError().json(json!({
"error": "reranking_failed",
"details": e.to_string()
})))
}
}

#[post("/api/v1/logs/feedback")]
pub async fn submit_feedback(
feedback: web::Json<FeedbackRequest>,
tracker: web::Data<Arc<FeedbackTracker>>,
claims: Claims,
) -> Result<HttpResponse> {
tracker.record_feedback(
feedback.session_id,
&feedback.relevant_logs,
&feedback.irrelevant_logs,
claims.user_id,
).await?;

Ok(HttpResponse::Ok().json(json!({
"status": "feedback_recorded",
"session_id": feedback.session_id
})))
}

#[derive(Debug, Deserialize)]
pub struct RerankingRequest {
pub query: String,
pub logs: Vec<LogEntry>,
pub context: IncidentContext,
pub max_results: Option<usize>,
pub model_preference: Option<String>,
}

#[derive(Debug, Deserialize)]
pub struct FeedbackRequest {
pub session_id: Uuid,
pub relevant_logs: Vec<String>, // Log IDs
pub irrelevant_logs: Vec<String>,
pub resolution_notes: Option<String>,
}

↑ Back to Top

7. Testing Requirements​

Test Coverage Requirements​

  • Unit Test Coverage: ≥95% of ranking algorithms
  • Integration Test Coverage: ≥90% of llm integrations
  • E2E Test Coverage: Complete reranking workflows
  • Performance Test Coverage: Load testing with 10K+ logs
  • A/B Test Coverage: Model comparison validation

Integration Tests​

#[cfg(test)]
mod tests {
use super::*;

#[tokio::test]
async fn test_log_reranking_accuracy() {
let service = setup_test_service().await;

// Create test scenario
let incident_logs = create_test_incident_logs();
let query = "API timeout causing customer checkout failures";
let context = IncidentContext {
service_name: Some("checkout-api".to_string()),
time_window: TimeWindow {
start: Utc::now() - Duration::hours(1),
end: Utc::now(),
},
..Default::default()
};

// Execute reranking
let result = service.rerank_logs(&query, incident_logs, &context)
.await.unwrap();

// Validate results
assert!(result.results.len() <= 10);
assert!(result.results[0].score > 0.8); // Top result highly relevant
assert!(result.processing_time_ms < 5000);

// Check for expected root cause
let top_log = &result.results[0];
assert!(top_log.explanation.contains("timeout"));
assert_eq!(top_log.log.service, "checkout-api");
}

#[tokio::test]
async fn test_multi_model_fallback() {
let service = setup_test_service_with_fallback().await;

// Simulate primary model failure
service.llm_client.disable_primary_model();

let result = service.rerank_logs(
"database connection errors",
create_test_logs(),
&Default::default(),
).await.unwrap();

// Should complete using fallback model
assert!(!result.results.is_empty());
assert_ne!(result.model_used, "claude-3.5"); // Primary model
}

#[tokio::test]
async fn test_performance_with_large_dataset() {
let service = setup_test_service().await;
let large_dataset = create_large_log_dataset(10000); // 10K logs

let start = Instant::now();
let result = service.rerank_logs(
"performance degradation",
large_dataset,
&Default::default(),
).await.unwrap();
let duration = start.elapsed();

assert!(duration.as_secs() < 5); // Under 5 second constraint
assert!(result.results.len() <= 10);
}
}

↑ Back to Top

8. Performance Benchmarks​

Required Metrics​

const MAX_RESPONSE_TIME_MS: u64 = 5000;
const MIN_RELEVANCE_ACCURACY: f32 = 0.95;
const MAX_COST_PER_QUERY: f32 = 0.10;
const MAX_MEMORY_MB: usize = 512;

pub struct PerformanceValidator {
pub async fn validate_reranking_performance(&self) -> Result<ValidationReport> {
let mut report = ValidationReport::default();

// Test response time with various dataset sizes
for size in [100, 1000, 5000, 10000] {
let logs = create_test_logs(size);
let start = Instant::now();

let _result = self.service.rerank_logs(
"test query",
logs,
&Default::default(),
).await?;

let duration = start.elapsed();
report.response_times.push((size, duration.as_millis() as u64));
}

// Validate all response times under constraint
report.response_time_ok = report.response_times
.iter()
.all(|(_, time)| *time <= MAX_RESPONSE_TIME_MS);

// Test relevance accuracy
let accuracy = self.measure_relevance_accuracy().await?;
report.accuracy_ok = accuracy >= MIN_RELEVANCE_ACCURACY;

// Test cost efficiency
let cost = self.estimate_query_cost().await?;
report.cost_ok = cost <= MAX_COST_PER_QUERY;

Ok(report)
}
}

↑ Back to Top

9. Security Controls​

Data Privacy and Security​

// File: src/log_reranking/security.rs
pub struct RerankingSecurity {
pub async fn sanitize_logs_for_llm(
&self,
logs: &[LogEntry],
) -> Result<Vec<SanitizedLogEntry>> {
let mut sanitized = vec![];

for log in logs {
let mut sanitized_log = SanitizedLogEntry {
id: log.id.clone(),
timestamp: log.timestamp,
level: log.level.clone(),
service: log.service.clone(),
message: self.redact_sensitive_data(&log.message)?,
fields: HashMap::new(),
};

// Sanitize fields
for (key, value) in &log.fields {
if !self.is_sensitive_field(key) {
sanitized_log.fields.insert(
key.clone(),
self.sanitize_field_value(value)?
);
}
}

sanitized.push(sanitized_log);
}

Ok(sanitized)
}

fn redact_sensitive_data(&self, message: &str) -> Result<String> {
let mut redacted = message.to_string();

// Redact common sensitive patterns
let patterns = [
(r"password=\S+", "password=***"),
(r"token=\S+", "token=***"),
(r"key=\S+", "key=***"),
(r"\b\d{4}-\d{4}-\d{4}-\d{4}\b", "****-****-****-****"), // Credit cards
(r"\b\d{3}-\d{2}-\d{4}\b", "***-**-****"), // SSNs
];

for (pattern, replacement) in patterns {
let re = regex::Regex::new(pattern)?;
redacted = re.replace_all(&redacted, replacement).to_string();
}

Ok(redacted)
}
}

↑ Back to Top

10. Logging and Error Handling​

Reranking Event Logging​

// File: src/log_reranking/logging.rs
use tracing::{info, warn, error, instrument};

#[instrument(skip(self))]
pub async fn log_reranking_request(&self, query: &str, log_count: usize) {
info!(
query = %query,
log_count = log_count,
"Log reranking request started"
);
}

#[instrument(skip(self, result))]
pub async fn log_reranking_result(&self, result: &RerankingResult) {
info!(
session_id = %result.session_id,
processing_time_ms = result.processing_time_ms,
model_used = %result.model_used,
confidence_score = result.confidence_score,
result_count = result.results.len(),
"Log reranking completed"
);
}

pub async fn log_model_performance(&self, model: &str, latency: Duration, cost: f32) {
info!(
model = model,
latency_ms = latency.as_millis(),
cost_dollars = cost,
"Model performance metrics"
);
}

Error Handling​

#[derive(thiserror::Error, Debug)]
pub enum LogRerankingError {
#[error("llm request failed: {0}")]
llmRequestFailed(String),

#[error("Vector embedding failed for log: {0}")]
EmbeddingFailed(String),

#[error("Query too complex: {0}")]
QueryTooComplex(String),

#[error("Rate limit exceeded for user: {0}")]
RateLimitExceeded(String),

#[error("Insufficient log context for analysis")]
InsufficientContext,
}

↑ Back to Top

11. References​

Model Compatibility​

  • Claude: 3.5 Sonnet for production, Haiku for cost optimization
  • GPT-4: Turbo for speed, standard for accuracy
  • Gemini: Pro for balanced performance
  • Local Models: Ollama for on-premise deployment

↑ Back to Top

12. Approval Signatures​

Technical Sign-off​

ComponentOwnerApprovedDate
ArchitectureSession5✓2025-09-01
ImplementationPending--
ML EngineeringPending--
Security ReviewPending--

Implementation Checklist​

  • llm client with multi-model support
  • Vector store with embeddings
  • Ranking engine with features
  • Feedback learning system
  • API endpoints complete
  • Performance benchmarks met
  • Security controls validated
  • Data privacy compliance

↑ Back to Top