Skip to main content

inotify Performance Issue: Monitor Hangs on Large Directory Trees

Issue Summary

Problem: File monitor hangs for 3+ minutes when watching /home/hal/v4 recursively Root Cause: inotify watch setup is synchronous and blocks for large directory trees Impact: Monitor appears frozen, no events detected until setup completes

Symptoms

  1. Monitor starts and logs "File monitor created"
  2. No "File monitoring started" message appears
  3. Process hangs indefinitely (or for several minutes)
  4. No events are detected during setup phase
  5. After 3+ minutes, monitor suddenly starts working

Root Cause Analysis

inotify Fundamental Limitation

inotify is NOT recursive - Linux kernel does NOT support recursive directory watches

From Linux man pages (inotify(7)):

"Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created."

This means:

  • 1 directory = 1 inotify watch
  • Recursive watch of /home/hal/v4 with 1,857 subdirectories = 1,857 separate inotify watches

Watch Setup is Synchronous and Slow

The notify crate's watcher.watch() call:

  1. Walks the entire directory tree using walkdir or similar
  2. Creates one inotify watch per directory via inotify_add_watch() syscall
  3. Blocks until all watches are created
  4. Takes ~0.1 seconds per directory (varies by system load, I/O speed, filesystem type)

Timeline:

13:59:43.416755Z - Monitor created (FileMonitor::new())
[3 minutes of silence - watcher.watch() blocking]
14:02:43.700024Z - File monitoring started (after 1,857 watches created)
14:02:43.700758Z - First event detected (734μs after start)

System Limits

LimitDefaultCurrentRequired
max_user_watches8,192Unknown1,857+
max_user_instances128Unknown1
Memory per watch (64-bit)1 KB-~1.8 MB total

Check current limits:

cat /proc/sys/fs/inotify/max_user_watches    # Default: 8192
cat /proc/sys/fs/inotify/max_user_instances # Default: 128

Memory usage:

  • 1,857 watches × 1 KB = 1.8 MB kernel memory (non-swappable)
  • IntelliJ IDEA recommends 1,048,576 watches (1GB kernel memory)

Why Tests Work But Production Hangs

Tests (/tmp/file-monitor-demo):

  • Small directory trees (3-5 directories)
  • Watch setup completes in <10ms
  • No noticeable delay

Production (/home/hal/v4):

  • Large directory tree (1,857 directories)
  • Watch setup takes 3+ minutes
  • Appears completely frozen

Reproduction

Confirmed Behavior

# Small directory: Instant (71ms)
./monitor /workspace/PROJECTS/t2/docs --recursive --checksums --format json
# Output:
# 2025-10-06T14:20:58.967648Z INFO new: File monitor created
# 2025-10-06T14:20:59.038868Z INFO start: File monitoring started
# (71ms delay - acceptable)

# Large directory: Hangs (3+ minutes)
./monitor /home/hal/v4 --recursive --checksums --format json
# Output:
# 2025-10-06T13:59:43.416755Z INFO new: File monitor created
# [3 minutes of silence...]
# 2025-10-06T14:02:43.700024Z INFO start: File monitoring started
# (180 second delay - unacceptable)

Directory Count

$ find /home/hal/v4 -type d | wc -l
1857

Estimate: 1,857 directories × ~0.1 second/directory = ~185 seconds (~3 minutes)

Solutions

Solution 1: Increase inotify Limits (Immediate Fix)

Pros: Simple, immediate Cons: Doesn't fix the 3-minute hang, only prevents hitting limits

# Temporary (until reboot)
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

# Permanent
sudo tee /etc/sysctl.d/99-inotify.conf <<EOF
fs.inotify.max_user_watches=524288
fs.inotify.max_user_instances=512
EOF

sudo sysctl -p /etc/sysctl.d/99-inotify.conf

Recommended values:

  • Development machines: 524,288 watches (~500 MB kernel memory)
  • Production servers: 1,048,576 watches (~1 GB kernel memory)

Solution 2: Use PollWatcher Instead of inotify

Pros: No inotify limits, no kernel memory usage, works on all filesystems Cons: Higher CPU usage, slightly higher latency, may miss rapid changes

The notify crate provides PollWatcher which uses periodic filesystem polling instead of inotify:

use notify::{PollWatcher, RecursiveMode, Watcher};

// Create PollWatcher with 2-second poll interval
let watcher = PollWatcher::new(
tx,
notify::Config::default().with_poll_interval(Duration::from_secs(2))
)?;

watcher.watch(&path, RecursiveMode::Recursive)?;

Tradeoffs:

  • ✅ No inotify limits
  • ✅ Works with NFS, FUSE, networked filesystems
  • ✅ Doesn't consume kernel memory
  • ✅ No setup delay (starts instantly)
  • ❌ Higher CPU usage (periodic stat() calls)
  • ❌ Higher latency (2-5 second detection delay)
  • ❌ May miss very rapid file changes

When to use:

  • Very large directory trees (10,000+ directories)
  • Networked filesystems (NFS, CIFS, FUSE)
  • Systems with tight kernel memory constraints
  • When 2-5 second latency is acceptable

Solution 3: Make Watch Setup Async/Non-Blocking

Pros: Best UX, no apparent hang Cons: Requires code changes

Modify FileMonitor::start() to be async and spawn watch setup in background:

// Current (blocking):
pub fn start(&mut self) -> Result<()> {
watcher.watch(&path, RecursiveMode::Recursive)?; // BLOCKS for minutes
info!("File monitoring started");
Ok(())
}

// Proposed (non-blocking):
pub async fn start(&mut self) -> Result<()> {
let watcher = self.watcher.clone();
let path = self.config.watch_path.clone();

// Spawn watch setup in background task
tokio::spawn(async move {
info!("Setting up recursive watches (this may take several minutes)...");
let start = Instant::now();

watcher.watch(&path, RecursiveMode::Recursive)?;

let duration = start.elapsed();
info!(
"File monitoring started (setup took {:?} for {} directories)",
duration,
count_directories(&path)
);
});

// Return immediately, monitor starts receiving events as watches are added
Ok(())
}

Benefits:

  • Monitor appears to start immediately
  • Events from early directories detected while later ones still being set up
  • Progress logging shows what's happening
  • No perceived hang

Solution 4: Watch Smaller Directory Trees

Pros: Simplest, most reliable Cons: May miss events outside watched paths

Instead of watching /home/hal/v4 (1,857 dirs), watch specific subdirectories:

# Instead of:
./monitor /home/hal/v4 --recursive

# Watch specific projects:
./monitor /home/hal/v4/PROJECTS/t2 --recursive &
./monitor /home/hal/v4/.claude --recursive &

# Or watch non-recursively (single directory):
./monitor /home/hal/v4 --no-recursive

Directory count analysis:

$ find /home/hal/v4/PROJECTS/t2 -type d | wc -l
523 # Much smaller! (~30 second setup)

$ find /home/hal/v4/.claude -type d | wc -l
5 # Instant setup

Solution 5: Add Progress Logging

Pros: User knows what's happening, no perceived "hang" Cons: Still slow, just more visible

Add progress logging to watcher.watch() call:

use walkdir::WalkDir;

pub fn start(&mut self) -> Result<()> {
info!("Setting up recursive watches for {}", self.config.watch_path.display());

// Count directories first
let dir_count = WalkDir::new(&self.config.watch_path)
.follow_links(false)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_dir())
.count();

info!("Found {} directories to watch (this may take {}s)",
dir_count,
dir_count / 10); // Rough estimate: 10 dirs/sec

// Now add watches with progress updates
let start = Instant::now();
let mut count = 0;

for entry in WalkDir::new(&self.config.watch_path)
.follow_links(false)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_dir())
{
watcher.watch(entry.path(), RecursiveMode::NonRecursive)?;
count += 1;

if count % 100 == 0 {
info!("Watching {}/{} directories ({:.1}%)",
count, dir_count,
(count as f64 / dir_count as f64) * 100.0);
}
}

info!("File monitoring started in {:?} ({} directories)",
start.elapsed(), count);
Ok(())
}

Output:

INFO Setting up recursive watches for /home/hal/v4
INFO Found 1857 directories to watch (this may take 185s)
INFO Watching 100/1857 directories (5.4%)
INFO Watching 200/1857 directories (10.8%)
...
INFO File monitoring started in 183.2s (1857 directories)

For immediate use: Combination of Solutions 1, 4, and 5

  1. Increase inotify limits (safety buffer)

    sudo sysctl fs.inotify.max_user_watches=524288
  2. Watch smaller directory (reduce setup time)

    # Watch only PROJECTS directory instead of entire /home/hal/v4
    ./monitor /home/hal/v4/PROJECTS --recursive --checksums --format json
  3. Add progress logging (improve UX)

    • Implement Solution 5 to show progress during setup
    • User knows monitor is working, not frozen

For production deployment: Solution 3 (Async Watch Setup)

  • Best user experience (no perceived delay)
  • Handles any directory size
  • Production-grade solution

Implementation Plan

Phase 1: Immediate Fixes (Today)

  1. ✅ Document issue in inotify-performance-issue.md
  2. ⚠️ Increase inotify limits on Docker host
  3. ⚠️ Update monitor command to watch /home/hal/v4/PROJECTS instead of /home/hal/v4
  4. ⚠️ Update README.md with new command

Phase 2: Short-term Improvements (This Week)

  1. ⚠️ Add progress logging during watch setup (Solution 5)
  2. ⚠️ Add --poll flag to use PollWatcher instead of inotify
  3. ⚠️ Add directory count check before starting watch
  4. ⚠️ Warn user if directory count exceeds threshold

Phase 3: Long-term Solution (Next Sprint)

  1. ⚠️ Make FileMonitor::start() async
  2. ⚠️ Spawn watch setup in background task
  3. ⚠️ Emit events as watches are added (incremental startup)
  4. ⚠️ Add startup_progress channel for monitoring watch setup

Testing

Verify inotify Limits

# Check current limits
\cat /proc/sys/fs/inotify/max_user_watches
\cat /proc/sys/fs/inotify/max_user_instances

# Check current usage
find /proc/*/fd -lname anon_inode:inotify 2>/dev/null | \
cut -d/ -f3 | xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' | \
uniq -c | sort -n -r

Test Different Directory Sizes

# Tiny (instant)
time ./monitor /tmp/test-small --recursive --format json &

# Small (~1 second)
time ./monitor /home/hal/v4/.claude --recursive --format json &

# Medium (~30 seconds)
time ./monitor /home/hal/v4/PROJECTS/t2 --recursive --format json &

# Large (~3 minutes)
time ./monitor /home/hal/v4 --recursive --format json &

Measure Watch Setup Time

# With timing
./monitor /path/to/watch --recursive --format json 2>&1 | \
grep -E "created|started" | \
awk '{print $1, $2, $4}'

# Expected output:
# 2025-10-06T14:15:49.370913Z created
# 2025-10-06T14:18:52.123456Z started
# (3 minute delay)

References

External Resources

Internal Documentation

  • Test Success: docs/status-reports/file-monitor-test-success.md
  • Database Integration: docs/file-monitor/database-integration.md
  • Timing Analysis: docs/file-monitor/TIMING-analysis.md
  • README: .coditect/logs/README.md

Conclusion

The file monitor works correctly but suffers from a 3-minute startup delay when watching large directory trees due to inotify's synchronous, non-recursive watch setup.

Immediate fix: Watch smaller directory or increase inotify limits Long-term fix: Make watch setup async and non-blocking

The monitor is production-ready for small-to-medium directory trees (<500 directories), but needs async improvements for large-scale deployment.


Status: ❌ BLOCKING ISSUE Priority: HIGH Assigned: File Monitor Team Created: 2025-10-06 Last Updated: 2025-10-06