ADR-002: Redis Lua Scripts for Atomic Operations
Status: Accepted Date: 2025-11-30 Deciders: Architecture Team, Backend Team Tags: redis, concurrency, atomicity, seat-management
Context
Floating/concurrent licenses require atomic seat counting to prevent race conditions. When multiple developers attempt to acquire the last available seat simultaneously, the system must guarantee that exactly one succeeds.
Race Condition Scenario
Without Atomic Operations:
Initial state: 4/5 seats used (1 available)
Timeline:
T0: Developer A checks seats_used (reads: 4)
T1: Developer B checks seats_used (reads: 4)
T2: Developer A sees 1 seat available, attempts to acquire
T3: Developer B sees 1 seat available, attempts to acquire
T4: Developer A writes seats_used = 5
T5: Developer B writes seats_used = 5 ❌ OVERBOOKING
Result: 6/5 seats used (120% utilization) - violates license limit
Requirements
Functional:
- Atomic Check-and-Set - Seat availability check and acquisition must be single operation
- No Overbooking - Never exceed license seat limit under any concurrency scenario
- No Lost Updates - Concurrent operations must serialize correctly
- Fast Execution - Sub-millisecond latency for seat operations
Non-Functional: 5. High Throughput - Support 1000+ concurrent seat checks/second 6. Simple Implementation - Minimize code complexity and maintenance burden 7. Testability - Easy to unit test and verify correctness 8. No External Dependencies - Avoid distributed lock managers or consensus systems
Technology Constraints
- Redis 6.x available via GCP Memorystore
- Multi-threaded API server (Gunicorn workers)
- Django ORM with PostgreSQL (row-level locking available but slow)
- Python 3.11+ (async/await support)
Decision
We will use Redis Lua scripts for atomic seat counting operations.
All seat management operations (check availability, acquire seat, release seat) will execute as atomic Lua scripts on Redis server, ensuring serialized execution with no race conditions.
Implementation Architecture
API Request (acquire seat)
↓
Django View (licenses/views.py)
↓
Redis Client (redis-py)
↓
Lua Script Execution (Redis server-side)
├─ ATOMIC BLOCK START
├─ Check seats_used < seats_total
├─ Add session_id to SET
├─ Update session metadata
├─ ATOMIC BLOCK END
↓
Return result (success/failure + seat count)
Key Property: Lua scripts execute atomically - no other Redis commands can execute until script completes.
Core Lua Script: Seat Acquisition
-- acquire_seat.lua
-- KEYS[1] = license_key
-- ARGV[1] = session_id
-- ARGV[2] = seats_total (max allowed seats)
-- ARGV[3] = session_metadata (JSON: {user_email, hardware_id, project_root, ...})
local license_key = KEYS[1]
local session_id = ARGV[1]
local seats_total = tonumber(ARGV[2])
local session_metadata = ARGV[3]
-- Redis key names
local active_sessions_key = "license:" .. license_key .. ":active_sessions"
local session_metadata_key = "license:" .. license_key .. ":session_metadata"
-- Check if session already has a seat (idempotent)
if redis.call('SISMEMBER', active_sessions_key, session_id) == 1 then
local current_count = redis.call('SCARD', active_sessions_key)
return {1, 'ALREADY_ACTIVE', current_count}
end
-- Get current active session count
local seats_used = redis.call('SCARD', active_sessions_key)
-- Check seat availability
if seats_used >= seats_total then
-- No seats available
return {0, 'NO_SEATS_AVAILABLE', seats_used, seats_total}
end
-- Acquire seat (add to set)
redis.call('SADD', active_sessions_key, session_id)
-- Store session metadata with timestamp
local timestamp = redis.call('TIME')[1] -- Unix timestamp in seconds
local metadata_with_timestamp = cjson.decode(session_metadata)
metadata_with_timestamp['acquired_at'] = tonumber(timestamp)
metadata_with_timestamp['last_heartbeat'] = tonumber(timestamp)
redis.call('HSET', session_metadata_key, session_id, cjson.encode(metadata_with_timestamp))
-- Return success
local new_count = redis.call('SCARD', active_sessions_key)
return {1, 'ACQUIRED', new_count, seats_total}
Return Format:
[1, 'ACQUIRED', current_count, max_seats]- Success[1, 'ALREADY_ACTIVE', current_count]- Idempotent (already has seat)[0, 'NO_SEATS_AVAILABLE', current_count, max_seats]- Failure
Seat Release Script
-- release_seat.lua
-- KEYS[1] = license_key
-- ARGV[1] = session_id
local license_key = KEYS[1]
local session_id = ARGV[1]
local active_sessions_key = "license:" .. license_key .. ":active_sessions"
local session_metadata_key = "license:" .. license_key .. ":session_metadata"
-- Check if session exists
if redis.call('SISMEMBER', active_sessions_key, session_id) == 0 then
return {0, 'SESSION_NOT_FOUND'}
end
-- Remove session from active set
redis.call('SREM', active_sessions_key, session_id)
-- Remove session metadata
redis.call('HDEL', session_metadata_key, session_id)
-- Return success with updated count
local new_count = redis.call('SCARD', active_sessions_key)
return {1, 'RELEASED', new_count}
Heartbeat Update Script
-- update_heartbeat.lua
-- KEYS[1] = license_key
-- ARGV[1] = session_id
-- ARGV[2] = heartbeat_metadata (JSON: {last_heartbeat, heartbeat_count})
local license_key = KEYS[1]
local session_id = ARGV[1]
local active_sessions_key = "license:" .. license_key .. ":active_sessions"
local session_metadata_key = "license:" .. license_key .. ":session_metadata"
-- Check if session is active
if redis.call('SISMEMBER', active_sessions_key, session_id) == 0 then
return {0, 'SESSION_NOT_ACTIVE'}
end
-- Get existing metadata
local existing_metadata_json = redis.call('HGET', session_metadata_key, session_id)
if not existing_metadata_json then
return {0, 'METADATA_NOT_FOUND'}
end
local metadata = cjson.decode(existing_metadata_json)
-- Update heartbeat timestamp and count
local timestamp = redis.call('TIME')[1]
metadata['last_heartbeat'] = tonumber(timestamp)
metadata['heartbeat_count'] = (metadata['heartbeat_count'] or 0) + 1
-- Save updated metadata
redis.call('HSET', session_metadata_key, session_id, cjson.encode(metadata))
return {1, 'HEARTBEAT_UPDATED', metadata['heartbeat_count']}
Django Integration
Load Lua Scripts (Django startup):
# backend/licenses/redis_scripts.py
import redis
from django.conf import settings
# Redis connection pool
redis_pool = redis.ConnectionPool(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=0,
decode_responses=False, # Keep bytes for Lua scripts
max_connections=100
)
redis_client = redis.Redis(connection_pool=redis_pool)
# Load Lua scripts and get SHA1 hashes
ACQUIRE_SEAT_SCRIPT = """
-- (Full Lua script from above)
"""
RELEASE_SEAT_SCRIPT = """
-- (Full Lua script from above)
"""
UPDATE_HEARTBEAT_SCRIPT = """
-- (Full Lua script from above)
"""
# Register scripts (returns SHA1 hash for EVALSHA)
acquire_seat_sha = redis_client.script_load(ACQUIRE_SEAT_SCRIPT)
release_seat_sha = redis_client.script_load(RELEASE_SEAT_SCRIPT)
update_heartbeat_sha = redis_client.script_load(UPDATE_HEARTBEAT_SCRIPT)
Use in Django Views:
# backend/licenses/views.py
from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import status
import json
from .redis_scripts import redis_client, acquire_seat_sha
from .models import License
from .serializers import LicenseAcquireRequestSerializer
class AcquireLicenseView(APIView):
"""Acquire floating license seat (atomic via Redis Lua)."""
def post(self, request):
# Validate request
serializer = LicenseAcquireRequestSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
license_key = serializer.validated_data['license_key']
session_id = serializer.validated_data['session_id']
user_email = serializer.validated_data['user_email']
hardware_id = serializer.validated_data['hardware_id']
project_root = serializer.validated_data['project_root']
# Get license from PostgreSQL
try:
license_obj = License.objects.get(license_key=license_key, is_active=True)
except License.DoesNotExist:
return Response(
{'error': 'License not found or inactive'},
status=status.HTTP_404_NOT_FOUND
)
# Check license expiration (PostgreSQL)
if license_obj.is_expired():
return Response(
{'error': 'License expired'},
status=status.HTTP_403_FORBIDDEN
)
# Prepare session metadata
session_metadata = {
'user_email': user_email,
'hardware_id': hardware_id,
'project_root': project_root,
'coditect_version': serializer.validated_data.get('coditect_version', '1.0.0'),
'usage_type': serializer.validated_data.get('usage_type', 'builder')
}
# Execute Lua script (ATOMIC operation)
try:
result = redis_client.evalsha(
acquire_seat_sha,
1, # Number of keys
license_key, # KEYS[1]
session_id, # ARGV[1]
str(license_obj.seats_total), # ARGV[2]
json.dumps(session_metadata) # ARGV[3]
)
success = result[0] == 1
reason = result[1].decode() if isinstance(result[1], bytes) else result[1]
current_count = result[2]
if success:
# Seat acquired or already active
return Response({
'success': True,
'reason': reason,
'seats_used': current_count,
'seats_total': license_obj.seats_total,
'session_id': session_id
}, status=status.HTTP_200_OK)
else:
# No seats available
return Response({
'success': False,
'reason': reason,
'seats_used': current_count,
'seats_total': license_obj.seats_total,
'wait_queue_position': self.get_queue_position(license_key, session_id)
}, status=status.HTTP_429_TOO_MANY_REQUESTS)
except redis.exceptions.RedisError as e:
# Redis error - fail gracefully
return Response(
{'error': f'License server error: {str(e)}'},
status=status.HTTP_500_INTERNAL_SERVER_ERROR
)
def get_queue_position(self, license_key, session_id):
"""Calculate wait queue position (optional feature)."""
# TODO: Implement wait queue with ZADD (sorted set by timestamp)
return None
Lua Script Advantages
Why Lua Scripts (vs. alternatives):
| Feature | Redis Lua | Database Locks | Distributed Locks (Redlock) |
|---|---|---|---|
| Atomicity | ✅ Guaranteed (server-side) | ⚠️ Row-level only | ❌ Complex consensus |
| Latency | ✅ Sub-millisecond | ❌ 10-50ms (network + DB) | ❌ 10-100ms (quorum) |
| Throughput | ✅ 10,000+ ops/sec | ❌ 100-500 ops/sec | ❌ 100-1000 ops/sec |
| Race Conditions | ✅ Impossible | ⚠️ Possible (wrong isolation) | ⚠️ Possible (clock skew) |
| Complexity | ✅ Simple (1 script) | ⚠️ Medium (transaction mgmt) | ❌ High (quorum, fencing) |
| Network Round-Trips | ✅ 1 round-trip | ❌ 3-5 round-trips | ❌ 5+ round-trips (quorum) |
| Failure Modes | ✅ Fail-fast (script error) | ⚠️ Deadlocks possible | ❌ Split-brain risk |
Consequences
Positive
✅ Perfect Atomicity
- All seat operations guaranteed atomic (no race conditions mathematically possible)
- Server-side execution eliminates network race windows
- Single round-trip per operation (no multi-step coordination)
✅ Exceptional Performance
- Sub-millisecond latency (<1ms p99)
- 10,000+ seat operations/second throughput (single Redis instance)
- Linear scalability with Redis cluster (100K+ ops/sec)
✅ Simple Implementation
- ~60 lines of Lua code for all operations
- No distributed lock management
- No complex transaction coordination
- Easy to reason about correctness
✅ Idempotent Operations
acquire_seatreturns success if already active (no duplicate seat counting)- Safe to retry on network failures
- No zombie sessions from retry storms
✅ Rich Metadata Support
- Session metadata stored in Redis HASH (O(1) access)
- Supports JSON metadata (user_email, project_root, coditect_version)
- Easy to query for debugging and analytics
✅ No Lost Updates
- SET operations prevent duplicate session_ids
- HASH operations update metadata atomically
- Concurrent heartbeats serialize correctly
Negative
⚠️ Lua Script Complexity
- Lua syntax different from Python (learning curve)
- Debugging Lua scripts harder than Python code
- Redis EVAL has memory limits (512MB script size, not an issue for us)
⚠️ Redis as Critical Dependency
- If Redis unavailable, license acquisition fails (no fallback)
- Mitigation: GCP Memorystore HA with automatic failover (99.9% uptime SLA)
- Mitigation: Offline grace period allows temporary Redis outage
⚠️ Script Update Complexity
- Changing Lua logic requires redeploying scripts
- Need to handle SHA1 hash changes (script_load on startup)
- Risk of version skew if multiple API servers with different scripts
⚠️ Limited Debugging
- No step-through debugger for Lua scripts
- Must use redis.log() for debugging (writes to Redis log)
- Errors return as generic Redis error messages
Neutral
🔄 Lua Execution Overhead
- Lua interpreter startup ~0.01ms per script execution
- Negligible compared to network latency (~1ms)
- Not a bottleneck for our use case
🔄 Script Loading
- Must call
SCRIPT LOADon application startup - Returns SHA1 hash, use
EVALSHAfor execution - Fallback: use
EVALwith full script (slower, but works)
Alternatives Considered
Alternative 1: PostgreSQL Row-Level Locks (SELECT FOR UPDATE)
Implementation:
from django.db import transaction
@transaction.atomic
def acquire_seat(license_key, session_id):
# Lock license row
license_obj = License.objects.select_for_update().get(license_key=license_key)
# Check seat availability
seats_used = Session.objects.filter(license=license_obj, status='active').count()
if seats_used >= license_obj.seats_total:
raise NoSeatsAvailable()
# Create session
Session.objects.create(license=license_obj, session_id=session_id, status='active')
Pros:
- ✅ Familiar Django ORM patterns
- ✅ ACID transactions guaranteed by PostgreSQL
- ✅ No additional dependencies (PostgreSQL already required)
Cons:
- ❌ High latency (10-50ms per operation due to database round-trips)
- ❌ Low throughput (100-500 ops/sec on single database)
- ❌ Database becomes bottleneck (license checks don't need full RDBMS power)
- ❌ Possible deadlocks with concurrent SELECT FOR UPDATE
- ❌ Difficult to scale horizontally (read replicas can't help with writes)
Rejected Because: Too slow for high-frequency license checks. Database designed for ACID guarantees, not sub-millisecond concurrent operations.
Alternative 2: Distributed Locks (Redlock Algorithm)
Implementation:
import redlock
# Initialize Redlock with quorum of Redis instances
lock_manager = redlock.Redlock([
{"host": "redis1", "port": 6379},
{"host": "redis2", "port": 6379},
{"host": "redis3", "port": 6379},
])
def acquire_seat(license_key, session_id):
lock_key = f"lock:license:{license_key}"
# Acquire distributed lock
lock = lock_manager.lock(lock_key, ttl=5000) # 5 second TTL
if not lock:
raise LockAcquisitionFailed()
try:
# Check and acquire seat (non-atomic without lock)
seats_used = redis.scard(f"license:{license_key}:active_sessions")
seats_total = get_seats_total(license_key)
if seats_used >= seats_total:
raise NoSeatsAvailable()
redis.sadd(f"license:{license_key}:active_sessions", session_id)
finally:
# Release lock
lock_manager.unlock(lock)
Pros:
- ✅ Distributed lock safety (quorum-based)
- ✅ Fault tolerance (survives single Redis failure)
Cons:
- ❌ High complexity (Redlock algorithm, quorum management)
- ❌ Multiple Redis instances required (3-5 for quorum)
- ❌ Clock skew risks (Redlock assumes synchronized clocks)
- ❌ Higher latency (must contact quorum, 5+ network round-trips)
- ❌ Lower throughput (lock contention, quorum overhead)
- ❌ More failure modes (split-brain, fencing tokens)
- ❌ Overkill for single-datacenter deployment
Rejected Because: Complexity far outweighs benefits. Redlock designed for multi-datacenter consensus, not single-region atomic operations. Lua scripts provide same safety with 10x simplicity.
Alternative 3: Application-Level Locking (Threading Lock)
Implementation:
import threading
# Global lock (per API server process)
seat_acquisition_lock = threading.Lock()
def acquire_seat(license_key, session_id):
with seat_acquisition_lock:
# Check and acquire (atomic within single process only)
seats_used = redis.scard(f"license:{license_key}:active_sessions")
seats_total = get_seats_total(license_key)
if seats_used >= seats_total:
raise NoSeatsAvailable()
redis.sadd(f"license:{license_key}:active_sessions", session_id)
Pros:
- ✅ Simplest implementation (Python threading module)
- ✅ No additional dependencies
Cons:
- ❌ Only works for single API server (fatal flaw)
- ❌ Race conditions with multiple Gunicorn workers
- ❌ Race conditions across multiple API servers (Kubernetes pods)
- ❌ Global lock serializes ALL license operations (poor performance)
Rejected Because: Does NOT provide atomicity across multiple API servers. Fatal flaw for distributed systems.
Alternative 4: Optimistic Locking with Version Counter
Implementation:
def acquire_seat(license_key, session_id, max_retries=10):
for attempt in range(max_retries):
# Get current version
version = redis.get(f"license:{license_key}:version")
# Check seat availability
seats_used = redis.scard(f"license:{license_key}:active_sessions")
seats_total = get_seats_total(license_key)
if seats_used >= seats_total:
raise NoSeatsAvailable()
# Try to acquire with version check (WATCH + MULTI/EXEC)
pipe = redis.pipeline()
pipe.watch(f"license:{license_key}:version")
if redis.get(f"license:{license_key}:version") != version:
# Version changed, retry
continue
pipe.multi()
pipe.sadd(f"license:{license_key}:active_sessions", session_id)
pipe.incr(f"license:{license_key}:version")
pipe.execute()
return # Success
raise TooManyRetries()
Pros:
- ✅ No server-side Lua scripts
- ✅ Uses standard Redis WATCH/MULTI/EXEC
Cons:
- ❌ Retry storms under high contention (exponential backoff needed)
- ❌ Complex retry logic (max retries, backoff strategy)
- ❌ Higher latency (multiple round-trips on contention)
- ❌ Lower throughput (retries consume resources)
- ❌ More complex code (version management, retry logic)
Rejected Because: Lua scripts provide same atomicity with simpler code and better performance. Optimistic locking adds retry complexity for no benefit.
Implementation Notes
Redis Configuration for Lua Scripts
Memory Settings:
# redis.conf
maxmemory 6gb
maxmemory-policy allkeys-lru # Evict least recently used keys
# Lua script timeout (default 5 seconds, increase if needed)
lua-time-limit 5000
Persistence:
# RDB snapshots (for seat recovery after crash)
save 900 1 # After 900 sec (15 min) if at least 1 key changed
save 300 10 # After 300 sec (5 min) if at least 10 keys changed
save 60 10000 # After 60 sec if at least 10000 keys changed
# AOF (append-only file) for durability
appendonly yes
appendfsync everysec # Fsync every second (good balance)
Script Loading Strategy
Option 1: Load on Application Startup (Recommended)
# backend/licenses/apps.py
from django.apps import AppConfig
class LicensesConfig(AppConfig):
name = 'licenses'
def ready(self):
# Load Lua scripts on Django startup
from .redis_scripts import (
redis_client,
ACQUIRE_SEAT_SCRIPT,
RELEASE_SEAT_SCRIPT,
UPDATE_HEARTBEAT_SCRIPT
)
global acquire_seat_sha, release_seat_sha, update_heartbeat_sha
acquire_seat_sha = redis_client.script_load(ACQUIRE_SEAT_SCRIPT)
release_seat_sha = redis_client.script_load(RELEASE_SEAT_SCRIPT)
update_heartbeat_sha = redis_client.script_load(UPDATE_HEARTBEAT_SCRIPT)
print(f"✅ Lua scripts loaded: {acquire_seat_sha[:8]}...")
Option 2: Lazy Loading with Fallback (More Resilient)
def execute_lua_script(script_sha, script_text, keys, args):
"""Execute Lua script with EVALSHA, fallback to EVAL if not loaded."""
try:
# Try EVALSHA (faster, script already loaded)
return redis_client.evalsha(script_sha, len(keys), *keys, *args)
except redis.exceptions.NoScriptError:
# Script not loaded, use EVAL (slower, but works)
return redis_client.eval(script_text, len(keys), *keys, *args)
Testing Lua Scripts
Unit Test Pattern:
# backend/tests/test_redis_scripts.py
import pytest
import redis
import json
from licenses.redis_scripts import (
redis_client,
ACQUIRE_SEAT_SCRIPT,
acquire_seat_sha
)
@pytest.fixture
def clean_redis():
"""Clean Redis before each test."""
redis_client.flushdb()
yield
redis_client.flushdb()
def test_acquire_seat_success(clean_redis):
"""Test successful seat acquisition."""
license_key = "TEST-LICENSE-001"
session_id = "session-001"
seats_total = 5
session_metadata = json.dumps({
'user_email': 'test@example.com',
'hardware_id': 'hw-001',
'project_root': '/home/user/project'
})
# Execute Lua script
result = redis_client.evalsha(
acquire_seat_sha,
1,
license_key,
session_id,
str(seats_total),
session_metadata
)
# Verify success
assert result[0] == 1
assert result[1] == b'ACQUIRED'
assert result[2] == 1 # 1 seat used
assert result[3] == 5 # 5 total seats
# Verify Redis state
assert redis_client.sismember(f"license:{license_key}:active_sessions", session_id) == 1
stored_metadata = redis_client.hget(f"license:{license_key}:session_metadata", session_id)
assert stored_metadata is not None
def test_acquire_seat_no_seats_available(clean_redis):
"""Test seat acquisition when all seats taken."""
license_key = "TEST-LICENSE-002"
seats_total = 2
# Pre-fill seats
for i in range(seats_total):
redis_client.sadd(f"license:{license_key}:active_sessions", f"session-{i}")
# Try to acquire (should fail)
result = redis_client.evalsha(
acquire_seat_sha,
1,
license_key,
"new-session",
str(seats_total),
json.dumps({'user_email': 'test@example.com'})
)
# Verify failure
assert result[0] == 0
assert result[1] == b'NO_SEATS_AVAILABLE'
assert result[2] == 2 # 2 seats used
assert result[3] == 2 # 2 total seats
def test_acquire_seat_idempotent(clean_redis):
"""Test idempotent seat acquisition (already active)."""
license_key = "TEST-LICENSE-003"
session_id = "session-001"
# First acquisition
redis_client.evalsha(acquire_seat_sha, 1, license_key, session_id, "5", json.dumps({}))
# Second acquisition (same session)
result = redis_client.evalsha(acquire_seat_sha, 1, license_key, session_id, "5", json.dumps({}))
# Verify idempotent success
assert result[0] == 1
assert result[1] == b'ALREADY_ACTIVE'
assert result[2] == 1 # Still 1 seat used (not 2)
Load Test Pattern:
# backend/tests/load_test_redis.py
import concurrent.futures
import redis
import json
import time
def acquire_seat_worker(license_key, session_id, seats_total):
"""Worker thread for load testing."""
r = redis.Redis(host='localhost', port=6379, db=0)
sha = r.script_load(ACQUIRE_SEAT_SCRIPT)
result = r.evalsha(
sha,
1,
license_key,
session_id,
str(seats_total),
json.dumps({'user_email': f'{session_id}@example.com'})
)
return result
def test_concurrent_seat_acquisition():
"""Load test: 100 concurrent acquisitions, 10 seats available."""
license_key = "LOAD-TEST-001"
seats_total = 10
num_workers = 100
redis_client.flushdb()
with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [
executor.submit(acquire_seat_worker, license_key, f"session-{i}", seats_total)
for i in range(num_workers)
]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
# Count successes and failures
successes = sum(1 for r in results if r[0] == 1 and r[1] == b'ACQUIRED')
failures = sum(1 for r in results if r[0] == 0 and r[1] == b'NO_SEATS_AVAILABLE')
# Verify exactly 10 successes (no overbooking)
assert successes == 10, f"Expected 10 successes, got {successes}"
assert failures == 90, f"Expected 90 failures, got {failures}"
# Verify Redis state
active_sessions = redis_client.smembers(f"license:{license_key}:active_sessions")
assert len(active_sessions) == 10, f"Expected 10 active sessions, got {len(active_sessions)}"
print(f"✅ Load test passed: {successes}/100 acquired, {failures}/100 denied, 0 overbooking")
Monitoring Lua Script Performance
Prometheus Metrics:
from prometheus_client import Counter, Histogram
# Lua script execution metrics
lua_script_executions = Counter(
'license_lua_script_executions_total',
'Total Lua script executions',
['script_name', 'result']
)
lua_script_latency = Histogram(
'license_lua_script_latency_seconds',
'Lua script execution latency',
['script_name']
)
def execute_acquire_seat(license_key, session_id, seats_total, session_metadata):
"""Execute acquire_seat Lua script with metrics."""
with lua_script_latency.labels(script_name='acquire_seat').time():
result = redis_client.evalsha(
acquire_seat_sha,
1,
license_key,
session_id,
str(seats_total),
json.dumps(session_metadata)
)
success = result[0] == 1
lua_script_executions.labels(
script_name='acquire_seat',
result='success' if success else 'failure'
).inc()
return result
Grafana Dashboard Queries:
# Lua script execution rate
rate(license_lua_script_executions_total[5m])
# Lua script latency percentiles
histogram_quantile(0.99, rate(license_lua_script_latency_seconds_bucket[5m]))
histogram_quantile(0.95, rate(license_lua_script_latency_seconds_bucket[5m]))
histogram_quantile(0.50, rate(license_lua_script_latency_seconds_bucket[5m]))
# Lua script error rate
rate(license_lua_script_executions_total{result="failure"}[5m])
Related ADRs
- ADR-001: Floating Licenses vs. Node-Locked Licenses (context for atomic operations)
- ADR-003: Check-on-Init Enforcement Pattern (when Lua scripts execute)
- ADR-004: Symlink Resolution Strategy (session_id generation for Lua scripts)
References
- Redis Lua Scripting
- Atomic Operations in Redis
- Redis Lua API Reference
- GCP Memorystore for Redis
- Redlock Algorithm (alternative considered)
Last Updated: 2025-11-30 Owner: Architecture Team, Backend Team Review Cycle: Quarterly or on Redis version upgrades