Production Deployment Guide
This guide covers operational considerations for deploying the file monitor in production environments.
Pre-Deployment Checklist​
System Requirements​
- Rust 1.70+ installed
- Sufficient file descriptor limits (see Platform Configuration)
- Monitoring infrastructure ready (Prometheus/Grafana recommended)
- Log aggregation configured
- Alert rules defined
Resource Planning​
| Component | CPU | Memory | Disk I/O | Notes |
|---|---|---|---|---|
| Monitoring (idle) | <1% | 5 MB | Negligible | Baseline |
| Monitoring (100 evt/s) | 5-10% | 20 MB | Low | Typical |
| Monitoring (1000 evt/s) | 20-30% | 50 MB | Moderate | High load |
| With checksums | +10-20% | +Variable | High | File-dependent |
Platform Configuration​
Linux​
Increase inotify Limits​
# Check current limits
cat /proc/sys/fs/inotify/max_user_watches
cat /proc/sys/fs/inotify/max_user_instances
# Temporary increase
echo 524288 | sudo tee /proc/sys/fs/inotify/max_user_watches
echo 512 | sudo tee /proc/sys/fs/inotify/max_user_instances
# Permanent increase
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_user_instances=512" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
File Descriptor Limits​
# Check current limits
ulimit -n
# Increase for user (add to /etc/security/limits.conf)
your_user soft nofile 65536
your_user hard nofile 65536
# Verify after login
ulimit -n
macOS​
FSEvents automatically handles limits, but verify:
# Check system limits
sysctl kern.maxfiles
sysctl kern.maxfilesperproc
# Increase if needed (requires reboot)
sudo sysctl -w kern.maxfiles=65536
sudo sysctl -w kern.maxfilesperproc=65536
Windows​
Ensure sufficient handle limits:
# Registry settings for handle limits
# HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet Control Session Manager SubSystems
# Windows key: SharedSection=1024,20480,1024
Configuration Tuning​
Development Environment​
let config = MonitorConfig::new("/path")
.recursive(true)
.debounce(100) // Low latency
.concurrency(50, 500); // Conservative
Production Environment​
let config = MonitorConfig::new("/data")
.recursive(true)
.debounce(500) // Balance responsiveness vs dedup
.concurrency(100, 1000) // Higher throughput
.ignore_patterns(vec![
"*.tmp".to_string(),
"*.swp".to_string(),
".git".to_string(),
"node_modules".to_string(),
"__pycache__".to_string(),
".DS_Store".to_string(),
])
.with_checksums(Some(50 * 1024 * 1024)); // 50MB limit
High-Volume Environment​
let config = MonitorConfig::new("/high-volume")
.recursive(true)
.debounce(1000) // Aggressive deduplication
.concurrency(200, 2000) // Maximum throughput
.ignore_patterns(/* extensive list */)
.with_checksums(None); // Disable for performance
Monitoring and Alerting​
Key Metrics​
Must Monitor​
| Metric | Type | Alert Threshold | Action |
|---|---|---|---|
fs_monitor.rate_limiter.utilization | Gauge | >0.8 | Increase concurrency |
fs_monitor.events.dropped | Counter | Rate >100/min | Investigate load |
fs_monitor.channel.used | Gauge | >90% capacity | Increase buffer |
fs_monitor.errors | Counter | Rate >10/min | Check logs |
Good to Monitor​
| Metric | Type | Purpose |
|---|---|---|
fs_monitor.events.received | Counter | Track total volume |
fs_monitor.events.debounced | Counter | Verify debouncing effectiveness |
fs_monitor.processing.latency_us | Histogram | Performance tracking |
fs_monitor.checksum.duration_ms | Histogram | Checksum performance |
Prometheus Queries​
# Event processing rate
rate(fs_monitor_events_published_total[5m])
# Rate limiter pressure
fs_monitor_rate_limiter_utilization > 0.8
# Error rate
rate(fs_monitor_errors_total[5m]) > 0.1
# P99 processing latency
histogram_quantile(0.99, rate(fs_monitor_processing_latency_us_bucket[5m]))
# Channel saturation
fs_monitor_channel_used / fs_monitor_channel_capacity > 0.9
Alert Rules​
groups:
- name: file_monitor
rules:
- alert: FileMonitorRateLimiterSaturated
expr: fs_monitor_rate_limiter_utilization > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "File monitor rate limiter saturated"
description: "Rate limiter utilization is {{ $value }}"
- alert: FileMonitorHighErrorRate
expr: rate(fs_monitor_errors_total[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate in file monitor"
description: "Error rate is {{ $value }} errors/sec"
- alert: FileMonitorChannelSaturated
expr: fs_monitor_channel_used / fs_monitor_channel_capacity > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "File monitor channel near capacity"
Logging Configuration​
Environment Variables​
# Development
export RUST_LOG=file_monitor=debug,info
# Production
export RUST_LOG=file_monitor=info,warn
# Troubleshooting
export RUST_LOG=file_monitor=trace
Structured Logging​
use tracing_subscriber::{fmt, EnvFilter};
fn init_logging() {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.with_target(true)
.with_thread_ids(true)
.with_file(true)
.with_line_number(true)
.json() // For production log aggregation
.init();
}
Deployment Patterns​
Systemd Service (Linux)​
[Unit]
Description=File Monitor Service
After=network.target
[Service]
Type=simple
User=monitor
Group=monitor
WorkingDirectory=/opt/file-monitor
ExecStart=/opt/file-monitor/bin/file-monitor /data
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
# Resource limits
LimitNOFILE=65536
MemoryMax=512M
CPUQuota=50%
# Environment
Environment="RUST_LOG=info"
Environment="RUST_BACKTRACE=1"
[Install]
WantedBy=multi-user.target
# Install and start
sudo cp file-monitor.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable file-monitor
sudo systemctl start file-monitor
# Check status
sudo systemctl status file-monitor
sudo journalctl -u file-monitor -f
Docker Deployment​
FROM rust:1.70-slim as builder
WORKDIR /build
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
COPY --from=builder /build/target/release/file-monitor /usr/local/bin/
USER nobody
ENTRYPOINT ["file-monitor"]
# Build
docker build -t file-monitor:latest .
# Run with volume mount
docker run -d \
--name file-monitor \
-v /data:/data:ro \
-e RUST_LOG=info \
--restart unless-stopped \
file-monitor:latest /data
Kubernetes Deployment​
apiVersion: apps/v1
kind: Deployment
metadata:
name: file-monitor
spec:
replicas: 1
selector:
matchLabels:
app: file-monitor
template:
metadata:
labels:
app: file-monitor
spec:
containers:
- name: file-monitor
image: file-monitor:latest
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: RUST_LOG
value: "info"
volumeMounts:
- name: data
mountPath: /data
readOnly: true
volumes:
- name: data
hostPath:
path: /data
type: Directory
Operational Procedures​
Health Checks​
# Endpoint (if implemented)
curl http://localhost:9090/health
# Check metrics
curl http://localhost:9090/metrics | grep fs_monitor
# Check logs
journalctl -u file-monitor --since "5 minutes ago"
Performance Tuning​
- Start conservative: Low concurrency, aggressive debouncing
- Monitor metrics: Track rate limiter utilization and drop rate
- Increase gradually: Adjust concurrency and buffer sizes
- Load test: Simulate production load before deployment
Troubleshooting​
High CPU Usage​
- Check event volume:
fs_monitor.events.received - Disable checksums if enabled
- Reduce concurrency temporarily
- Add more ignore patterns
- Increase debounce window
High Memory Usage​
- Check channel buffer usage:
fs_monitor.channel.used - Reduce channel buffer size
- Increase debounce cleanup frequency
- Check for file descriptor leaks
Events Dropped​
- Check rate limiter:
fs_monitor.rate_limiter.utilization - Increase
max_concurrent_tasks - Increase
channel_buffer_size - Review ignore patterns
- Consider multiple instances
Slow Shutdown​
- Reduce
shutdown_timeout_secs - Check active tasks during shutdown
- Review event volume during shutdown window
Backup and Recovery​
State Management​
The monitor is stateless but consider:
- Configuration backup (YAML/TOML files)
- Metrics retention (Prometheus)
- Log archival (if compliance required)
Disaster Recovery​
- Service failure: Systemd auto-restart
- Configuration corruption: Version control config files
- Resource exhaustion: Automatic rate limiting prevents cascading failure
Security Considerations​
File Access​
# Run as dedicated user
sudo useradd -r -s /bin/false monitor
# Grant read-only access
sudo setfacl -R -m u:monitor:rx /data
Container Security​
# Run as non-root
USER nobody
# Read-only root filesystem
FROM scratch
COPY --from=builder /app/file-monitor /file-monitor
USER 65534:65534
Performance Benchmarks​
Test Environment​
- CPU: Intel Xeon E5-2680 v4 @ 2.40GHz
- RAM: 64GB DDR4
- Disk: NVMe SSD
- OS: Ubuntu 22.04 LTS
Results​
| Scenario | Events/sec | CPU | Memory | Latency (p99) |
|---|---|---|---|---|
| Idle monitoring | 0 | <1% | 5 MB | N/A |
| Light load (10 evt/s) | 10 | 2% | 10 MB | 2ms |
| Moderate (100 evt/s) | 100 | 8% | 25 MB | 5ms |
| Heavy (1000 evt/s) | 1000 | 25% | 60 MB | 15ms |
| With checksums (100 evt/s) | 100 | 15% | 30 MB | 25ms |
Support and Escalation​
Log Collection​
# Collect recent logs
journalctl -u file-monitor --since "1 hour ago" > file-monitor.log
# Collect metrics
curl http://localhost:9090/metrics > metrics.txt
# System information
uname -a > system-info.txt
cat /proc/sys/fs/inotify/* > inotify-limits.txt
Bug Reports​
Include:
- Version:
file-monitor --version - Configuration used
- Platform details
- Reproduction steps
- Logs and metrics
- Resource usage (top/htop output)
Changelog​
- 2025-01-06: Initial production deployment guide
- Next review: After 3 months production usage