Skip to main content

Production Deployment Guide

This guide covers operational considerations for deploying the file monitor in production environments.

Pre-Deployment Checklist​

System Requirements​

  • Rust 1.70+ installed
  • Sufficient file descriptor limits (see Platform Configuration)
  • Monitoring infrastructure ready (Prometheus/Grafana recommended)
  • Log aggregation configured
  • Alert rules defined

Resource Planning​

ComponentCPUMemoryDisk I/ONotes
Monitoring (idle)<1%5 MBNegligibleBaseline
Monitoring (100 evt/s)5-10%20 MBLowTypical
Monitoring (1000 evt/s)20-30%50 MBModerateHigh load
With checksums+10-20%+VariableHighFile-dependent

Platform Configuration​

Linux​

Increase inotify Limits​

# Check current limits
cat /proc/sys/fs/inotify/max_user_watches
cat /proc/sys/fs/inotify/max_user_instances

# Temporary increase
echo 524288 | sudo tee /proc/sys/fs/inotify/max_user_watches
echo 512 | sudo tee /proc/sys/fs/inotify/max_user_instances

# Permanent increase
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_user_instances=512" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

File Descriptor Limits​

# Check current limits
ulimit -n

# Increase for user (add to /etc/security/limits.conf)
your_user soft nofile 65536
your_user hard nofile 65536

# Verify after login
ulimit -n

macOS​

FSEvents automatically handles limits, but verify:

# Check system limits
sysctl kern.maxfiles
sysctl kern.maxfilesperproc

# Increase if needed (requires reboot)
sudo sysctl -w kern.maxfiles=65536
sudo sysctl -w kern.maxfilesperproc=65536

Windows​

Ensure sufficient handle limits:

# Registry settings for handle limits
# HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet Control Session Manager SubSystems
# Windows key: SharedSection=1024,20480,1024

Configuration Tuning​

Development Environment​

let config = MonitorConfig::new("/path")
.recursive(true)
.debounce(100) // Low latency
.concurrency(50, 500); // Conservative

Production Environment​

let config = MonitorConfig::new("/data")
.recursive(true)
.debounce(500) // Balance responsiveness vs dedup
.concurrency(100, 1000) // Higher throughput
.ignore_patterns(vec![
"*.tmp".to_string(),
"*.swp".to_string(),
".git".to_string(),
"node_modules".to_string(),
"__pycache__".to_string(),
".DS_Store".to_string(),
])
.with_checksums(Some(50 * 1024 * 1024)); // 50MB limit

High-Volume Environment​

let config = MonitorConfig::new("/high-volume")
.recursive(true)
.debounce(1000) // Aggressive deduplication
.concurrency(200, 2000) // Maximum throughput
.ignore_patterns(/* extensive list */)
.with_checksums(None); // Disable for performance

Monitoring and Alerting​

Key Metrics​

Must Monitor​

MetricTypeAlert ThresholdAction
fs_monitor.rate_limiter.utilizationGauge>0.8Increase concurrency
fs_monitor.events.droppedCounterRate >100/minInvestigate load
fs_monitor.channel.usedGauge>90% capacityIncrease buffer
fs_monitor.errorsCounterRate >10/minCheck logs

Good to Monitor​

MetricTypePurpose
fs_monitor.events.receivedCounterTrack total volume
fs_monitor.events.debouncedCounterVerify debouncing effectiveness
fs_monitor.processing.latency_usHistogramPerformance tracking
fs_monitor.checksum.duration_msHistogramChecksum performance

Prometheus Queries​

# Event processing rate
rate(fs_monitor_events_published_total[5m])

# Rate limiter pressure
fs_monitor_rate_limiter_utilization > 0.8

# Error rate
rate(fs_monitor_errors_total[5m]) > 0.1

# P99 processing latency
histogram_quantile(0.99, rate(fs_monitor_processing_latency_us_bucket[5m]))

# Channel saturation
fs_monitor_channel_used / fs_monitor_channel_capacity > 0.9

Alert Rules​

groups:
- name: file_monitor
rules:
- alert: FileMonitorRateLimiterSaturated
expr: fs_monitor_rate_limiter_utilization > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "File monitor rate limiter saturated"
description: "Rate limiter utilization is {{ $value }}"

- alert: FileMonitorHighErrorRate
expr: rate(fs_monitor_errors_total[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate in file monitor"
description: "Error rate is {{ $value }} errors/sec"

- alert: FileMonitorChannelSaturated
expr: fs_monitor_channel_used / fs_monitor_channel_capacity > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "File monitor channel near capacity"

Logging Configuration​

Environment Variables​

# Development
export RUST_LOG=file_monitor=debug,info

# Production
export RUST_LOG=file_monitor=info,warn

# Troubleshooting
export RUST_LOG=file_monitor=trace

Structured Logging​

use tracing_subscriber::{fmt, EnvFilter};

fn init_logging() {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.with_target(true)
.with_thread_ids(true)
.with_file(true)
.with_line_number(true)
.json() // For production log aggregation
.init();
}

Deployment Patterns​

Systemd Service (Linux)​

[Unit]
Description=File Monitor Service
After=network.target

[Service]
Type=simple
User=monitor
Group=monitor
WorkingDirectory=/opt/file-monitor
ExecStart=/opt/file-monitor/bin/file-monitor /data
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

# Resource limits
LimitNOFILE=65536
MemoryMax=512M
CPUQuota=50%

# Environment
Environment="RUST_LOG=info"
Environment="RUST_BACKTRACE=1"

[Install]
WantedBy=multi-user.target
# Install and start
sudo cp file-monitor.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable file-monitor
sudo systemctl start file-monitor

# Check status
sudo systemctl status file-monitor
sudo journalctl -u file-monitor -f

Docker Deployment​

FROM rust:1.70-slim as builder
WORKDIR /build
COPY . .
RUN cargo build --release

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
COPY --from=builder /build/target/release/file-monitor /usr/local/bin/
USER nobody
ENTRYPOINT ["file-monitor"]
# Build
docker build -t file-monitor:latest .

# Run with volume mount
docker run -d \
--name file-monitor \
-v /data:/data:ro \
-e RUST_LOG=info \
--restart unless-stopped \
file-monitor:latest /data

Kubernetes Deployment​

apiVersion: apps/v1
kind: Deployment
metadata:
name: file-monitor
spec:
replicas: 1
selector:
matchLabels:
app: file-monitor
template:
metadata:
labels:
app: file-monitor
spec:
containers:
- name: file-monitor
image: file-monitor:latest
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: RUST_LOG
value: "info"
volumeMounts:
- name: data
mountPath: /data
readOnly: true
volumes:
- name: data
hostPath:
path: /data
type: Directory

Operational Procedures​

Health Checks​

# Endpoint (if implemented)
curl http://localhost:9090/health

# Check metrics
curl http://localhost:9090/metrics | grep fs_monitor

# Check logs
journalctl -u file-monitor --since "5 minutes ago"

Performance Tuning​

  1. Start conservative: Low concurrency, aggressive debouncing
  2. Monitor metrics: Track rate limiter utilization and drop rate
  3. Increase gradually: Adjust concurrency and buffer sizes
  4. Load test: Simulate production load before deployment

Troubleshooting​

High CPU Usage​

  1. Check event volume: fs_monitor.events.received
  2. Disable checksums if enabled
  3. Reduce concurrency temporarily
  4. Add more ignore patterns
  5. Increase debounce window

High Memory Usage​

  1. Check channel buffer usage: fs_monitor.channel.used
  2. Reduce channel buffer size
  3. Increase debounce cleanup frequency
  4. Check for file descriptor leaks

Events Dropped​

  1. Check rate limiter: fs_monitor.rate_limiter.utilization
  2. Increase max_concurrent_tasks
  3. Increase channel_buffer_size
  4. Review ignore patterns
  5. Consider multiple instances

Slow Shutdown​

  1. Reduce shutdown_timeout_secs
  2. Check active tasks during shutdown
  3. Review event volume during shutdown window

Backup and Recovery​

State Management​

The monitor is stateless but consider:

  • Configuration backup (YAML/TOML files)
  • Metrics retention (Prometheus)
  • Log archival (if compliance required)

Disaster Recovery​

  1. Service failure: Systemd auto-restart
  2. Configuration corruption: Version control config files
  3. Resource exhaustion: Automatic rate limiting prevents cascading failure

Security Considerations​

File Access​

# Run as dedicated user
sudo useradd -r -s /bin/false monitor

# Grant read-only access
sudo setfacl -R -m u:monitor:rx /data

Container Security​

# Run as non-root
USER nobody

# Read-only root filesystem
FROM scratch
COPY --from=builder /app/file-monitor /file-monitor
USER 65534:65534

Performance Benchmarks​

Test Environment​

  • CPU: Intel Xeon E5-2680 v4 @ 2.40GHz
  • RAM: 64GB DDR4
  • Disk: NVMe SSD
  • OS: Ubuntu 22.04 LTS

Results​

ScenarioEvents/secCPUMemoryLatency (p99)
Idle monitoring0<1%5 MBN/A
Light load (10 evt/s)102%10 MB2ms
Moderate (100 evt/s)1008%25 MB5ms
Heavy (1000 evt/s)100025%60 MB15ms
With checksums (100 evt/s)10015%30 MB25ms

Support and Escalation​

Log Collection​

# Collect recent logs
journalctl -u file-monitor --since "1 hour ago" > file-monitor.log

# Collect metrics
curl http://localhost:9090/metrics > metrics.txt

# System information
uname -a > system-info.txt
cat /proc/sys/fs/inotify/* > inotify-limits.txt

Bug Reports​

Include:

  1. Version: file-monitor --version
  2. Configuration used
  3. Platform details
  4. Reproduction steps
  5. Logs and metrics
  6. Resource usage (top/htop output)

Changelog​

  • 2025-01-06: Initial production deployment guide
  • Next review: After 3 months production usage