inotify Performance Issue: Monitor Hangs on Large Directory Trees
Issue Summary
Problem: File monitor hangs for 3+ minutes when watching /home/hal/v4 recursively
Root Cause: inotify watch setup is synchronous and blocks for large directory trees
Impact: Monitor appears frozen, no events detected until setup completes
Symptoms
- Monitor starts and logs "File monitor created"
- No "File monitoring started" message appears
- Process hangs indefinitely (or for several minutes)
- No events are detected during setup phase
- After 3+ minutes, monitor suddenly starts working
Root Cause Analysis
inotify Fundamental Limitation
inotify is NOT recursive - Linux kernel does NOT support recursive directory watches
From Linux man pages (inotify(7)):
"Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created."
This means:
- 1 directory = 1 inotify watch
- Recursive watch of
/home/hal/v4with 1,857 subdirectories = 1,857 separate inotify watches
Watch Setup is Synchronous and Slow
The notify crate's watcher.watch() call:
- Walks the entire directory tree using
walkdiror similar - Creates one inotify watch per directory via
inotify_add_watch()syscall - Blocks until all watches are created
- Takes ~0.1 seconds per directory (varies by system load, I/O speed, filesystem type)
Timeline:
13:59:43.416755Z - Monitor created (FileMonitor::new())
[3 minutes of silence - watcher.watch() blocking]
14:02:43.700024Z - File monitoring started (after 1,857 watches created)
14:02:43.700758Z - First event detected (734μs after start)
System Limits
| Limit | Default | Current | Required |
|---|---|---|---|
max_user_watches | 8,192 | Unknown | 1,857+ |
max_user_instances | 128 | Unknown | 1 |
| Memory per watch (64-bit) | 1 KB | - | ~1.8 MB total |
Check current limits:
cat /proc/sys/fs/inotify/max_user_watches # Default: 8192
cat /proc/sys/fs/inotify/max_user_instances # Default: 128
Memory usage:
- 1,857 watches × 1 KB = 1.8 MB kernel memory (non-swappable)
- IntelliJ IDEA recommends 1,048,576 watches (1GB kernel memory)
Why Tests Work But Production Hangs
Tests (/tmp/file-monitor-demo):
- Small directory trees (3-5 directories)
- Watch setup completes in <10ms
- No noticeable delay
Production (/home/hal/v4):
- Large directory tree (1,857 directories)
- Watch setup takes 3+ minutes
- Appears completely frozen
Reproduction
Confirmed Behavior
# Small directory: Instant (71ms)
./monitor /workspace/PROJECTS/t2/docs --recursive --checksums --format json
# Output:
# 2025-10-06T14:20:58.967648Z INFO new: File monitor created
# 2025-10-06T14:20:59.038868Z INFO start: File monitoring started
# (71ms delay - acceptable)
# Large directory: Hangs (3+ minutes)
./monitor /home/hal/v4 --recursive --checksums --format json
# Output:
# 2025-10-06T13:59:43.416755Z INFO new: File monitor created
# [3 minutes of silence...]
# 2025-10-06T14:02:43.700024Z INFO start: File monitoring started
# (180 second delay - unacceptable)
Directory Count
$ find /home/hal/v4 -type d | wc -l
1857
Estimate: 1,857 directories × ~0.1 second/directory = ~185 seconds (~3 minutes)
Solutions
Solution 1: Increase inotify Limits (Immediate Fix)
Pros: Simple, immediate Cons: Doesn't fix the 3-minute hang, only prevents hitting limits
# Temporary (until reboot)
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512
# Permanent
sudo tee /etc/sysctl.d/99-inotify.conf <<EOF
fs.inotify.max_user_watches=524288
fs.inotify.max_user_instances=512
EOF
sudo sysctl -p /etc/sysctl.d/99-inotify.conf
Recommended values:
- Development machines: 524,288 watches (~500 MB kernel memory)
- Production servers: 1,048,576 watches (~1 GB kernel memory)
Solution 2: Use PollWatcher Instead of inotify
Pros: No inotify limits, no kernel memory usage, works on all filesystems Cons: Higher CPU usage, slightly higher latency, may miss rapid changes
The notify crate provides PollWatcher which uses periodic filesystem polling instead of inotify:
use notify::{PollWatcher, RecursiveMode, Watcher};
// Create PollWatcher with 2-second poll interval
let watcher = PollWatcher::new(
tx,
notify::Config::default().with_poll_interval(Duration::from_secs(2))
)?;
watcher.watch(&path, RecursiveMode::Recursive)?;
Tradeoffs:
- ✅ No inotify limits
- ✅ Works with NFS, FUSE, networked filesystems
- ✅ Doesn't consume kernel memory
- ✅ No setup delay (starts instantly)
- ❌ Higher CPU usage (periodic
stat()calls) - ❌ Higher latency (2-5 second detection delay)
- ❌ May miss very rapid file changes
When to use:
- Very large directory trees (10,000+ directories)
- Networked filesystems (NFS, CIFS, FUSE)
- Systems with tight kernel memory constraints
- When 2-5 second latency is acceptable
Solution 3: Make Watch Setup Async/Non-Blocking
Pros: Best UX, no apparent hang Cons: Requires code changes
Modify FileMonitor::start() to be async and spawn watch setup in background:
// Current (blocking):
pub fn start(&mut self) -> Result<()> {
watcher.watch(&path, RecursiveMode::Recursive)?; // BLOCKS for minutes
info!("File monitoring started");
Ok(())
}
// Proposed (non-blocking):
pub async fn start(&mut self) -> Result<()> {
let watcher = self.watcher.clone();
let path = self.config.watch_path.clone();
// Spawn watch setup in background task
tokio::spawn(async move {
info!("Setting up recursive watches (this may take several minutes)...");
let start = Instant::now();
watcher.watch(&path, RecursiveMode::Recursive)?;
let duration = start.elapsed();
info!(
"File monitoring started (setup took {:?} for {} directories)",
duration,
count_directories(&path)
);
});
// Return immediately, monitor starts receiving events as watches are added
Ok(())
}
Benefits:
- Monitor appears to start immediately
- Events from early directories detected while later ones still being set up
- Progress logging shows what's happening
- No perceived hang
Solution 4: Watch Smaller Directory Trees
Pros: Simplest, most reliable Cons: May miss events outside watched paths
Instead of watching /home/hal/v4 (1,857 dirs), watch specific subdirectories:
# Instead of:
./monitor /home/hal/v4 --recursive
# Watch specific projects:
./monitor /home/hal/v4/PROJECTS/t2 --recursive &
./monitor /home/hal/v4/.claude --recursive &
# Or watch non-recursively (single directory):
./monitor /home/hal/v4 --no-recursive
Directory count analysis:
$ find /home/hal/v4/PROJECTS/t2 -type d | wc -l
523 # Much smaller! (~30 second setup)
$ find /home/hal/v4/.claude -type d | wc -l
5 # Instant setup
Solution 5: Add Progress Logging
Pros: User knows what's happening, no perceived "hang" Cons: Still slow, just more visible
Add progress logging to watcher.watch() call:
use walkdir::WalkDir;
pub fn start(&mut self) -> Result<()> {
info!("Setting up recursive watches for {}", self.config.watch_path.display());
// Count directories first
let dir_count = WalkDir::new(&self.config.watch_path)
.follow_links(false)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_dir())
.count();
info!("Found {} directories to watch (this may take {}s)",
dir_count,
dir_count / 10); // Rough estimate: 10 dirs/sec
// Now add watches with progress updates
let start = Instant::now();
let mut count = 0;
for entry in WalkDir::new(&self.config.watch_path)
.follow_links(false)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_dir())
{
watcher.watch(entry.path(), RecursiveMode::NonRecursive)?;
count += 1;
if count % 100 == 0 {
info!("Watching {}/{} directories ({:.1}%)",
count, dir_count,
(count as f64 / dir_count as f64) * 100.0);
}
}
info!("File monitoring started in {:?} ({} directories)",
start.elapsed(), count);
Ok(())
}
Output:
INFO Setting up recursive watches for /home/hal/v4
INFO Found 1857 directories to watch (this may take 185s)
INFO Watching 100/1857 directories (5.4%)
INFO Watching 200/1857 directories (10.8%)
...
INFO File monitoring started in 183.2s (1857 directories)
Recommended Solution
For immediate use: Combination of Solutions 1, 4, and 5
-
Increase inotify limits (safety buffer)
sudo sysctl fs.inotify.max_user_watches=524288 -
Watch smaller directory (reduce setup time)
# Watch only PROJECTS directory instead of entire /home/hal/v4
./monitor /home/hal/v4/PROJECTS --recursive --checksums --format json -
Add progress logging (improve UX)
- Implement Solution 5 to show progress during setup
- User knows monitor is working, not frozen
For production deployment: Solution 3 (Async Watch Setup)
- Best user experience (no perceived delay)
- Handles any directory size
- Production-grade solution
Implementation Plan
Phase 1: Immediate Fixes (Today)
- ✅ Document issue in
inotify-performance-issue.md - ⚠️ Increase inotify limits on Docker host
- ⚠️ Update monitor command to watch
/home/hal/v4/PROJECTSinstead of/home/hal/v4 - ⚠️ Update README.md with new command
Phase 2: Short-term Improvements (This Week)
- ⚠️ Add progress logging during watch setup (Solution 5)
- ⚠️ Add
--pollflag to use PollWatcher instead of inotify - ⚠️ Add directory count check before starting watch
- ⚠️ Warn user if directory count exceeds threshold
Phase 3: Long-term Solution (Next Sprint)
- ⚠️ Make
FileMonitor::start()async - ⚠️ Spawn watch setup in background task
- ⚠️ Emit events as watches are added (incremental startup)
- ⚠️ Add
startup_progresschannel for monitoring watch setup
Testing
Verify inotify Limits
# Check current limits
\cat /proc/sys/fs/inotify/max_user_watches
\cat /proc/sys/fs/inotify/max_user_instances
# Check current usage
find /proc/*/fd -lname anon_inode:inotify 2>/dev/null | \
cut -d/ -f3 | xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' | \
uniq -c | sort -n -r
Test Different Directory Sizes
# Tiny (instant)
time ./monitor /tmp/test-small --recursive --format json &
# Small (~1 second)
time ./monitor /home/hal/v4/.claude --recursive --format json &
# Medium (~30 seconds)
time ./monitor /home/hal/v4/PROJECTS/t2 --recursive --format json &
# Large (~3 minutes)
time ./monitor /home/hal/v4 --recursive --format json &
Measure Watch Setup Time
# With timing
./monitor /path/to/watch --recursive --format json 2>&1 | \
grep -E "created|started" | \
awk '{print $1, $2, $4}'
# Expected output:
# 2025-10-06T14:15:49.370913Z created
# 2025-10-06T14:18:52.123456Z started
# (3 minute delay)
References
External Resources
- Linux inotify(7) man page
- Rust notify crate docs
- GitHub issue: notify crate slow on large trees
- Watchexec inotify limits guide
- IntelliJ inotify watches limit
Internal Documentation
- Test Success:
docs/status-reports/file-monitor-test-success.md - Database Integration:
docs/file-monitor/database-integration.md - Timing Analysis:
docs/file-monitor/TIMING-analysis.md - README:
.coditect/logs/README.md
Conclusion
The file monitor works correctly but suffers from a 3-minute startup delay when watching large directory trees due to inotify's synchronous, non-recursive watch setup.
Immediate fix: Watch smaller directory or increase inotify limits Long-term fix: Make watch setup async and non-blocking
The monitor is production-ready for small-to-medium directory trees (<500 directories), but needs async improvements for large-scale deployment.
Status: ❌ BLOCKING ISSUE Priority: HIGH Assigned: File Monitor Team Created: 2025-10-06 Last Updated: 2025-10-06