Technical Design Document: Generation Clock for Multi-Agent Task Coordination
Coditect Autonomous Development Platform
Version: 1.0
Status: Draft
Author: Architecture Team
Date: January 2026
1. Executive Summary
This document describes the application of the Generation Clock distributed systems pattern to coordinate multiple AI agents working on shared project plans within Coditect's multi-tenant, multi-user autonomous development platform.
Problem Statement
When multiple agents from different user sessions work on the same project:
- Agents are unaware of each other's existence
- Network partitions and session disconnects create "zombie agents" that continue working on tasks already reassigned
- Resource allocation at atomic task level may not align perfectly
- Multiple agents may complete the same task with different results
- No inherent mechanism exists to determine which result is authoritative
Solution Overview
Apply the Generation Clock pattern from distributed consensus systems:
- Each task claim increments a monotonic generation counter
- All agent work is tagged with the generation at claim time
- Results from lower generations are automatically rejected
- Lease-based expiration enables recovery from zombie agents
- Complete audit trail preserves all generations for compliance
2. Pattern Mapping
2.1 Conceptual Translation
| Distributed Systems Concept | Coditect Multi-Agent Equivalent |
|---|---|
| Cluster | Project workspace |
| Node | Agent instance |
| Leader | Agent session holding active task claim |
| Follower | Other agents observing task state |
| Log Entry | Task claim / work result |
| Generation (Term/Epoch) | Task claim generation counter |
| Heartbeat | Lease renewal |
| Zombie Leader | Stale agent working after lease expiration |
| Commit | Result acceptance |
| Conflict | Multiple agents completing same task |
2.2 Key Invariants
- Monotonicity: Task generation only increases, never decreases
- Single Writer: At most one valid claim per task at any time
- Generation Tagging: All work products carry the generation of their originating claim
- Higher Wins: In any conflict, higher generation is authoritative
- Lease Expiration: Claims without renewal become eligible for takeover
- Tenant Isolation: Generations are scoped within tenant boundaries
3. Task Nomenclature System
3.1 Standard Task Identifier Format
All tasks in Coditect follow a standardized nomenclature that enables efficient multi-agent resource allocation across parallel work streams:
[TRACK]-[SEQUENCE]-[DESCRIPTION]
Where:
TRACK := [A-Z]+ # Single or multi-letter track identifier
SEQUENCE := [0-9]{3} # Zero-padded sequential number (001-999)
DESCRIPTION := [a-zA-Z0-9_-]+ # Brief task description (kebab-case)
Examples:
A-001-setup-authentication-module
A-002-implement-oauth-providers
A-003-add-session-management
B-001-design-database-schema
B-002-create-user-tables
B-003-implement-migrations
C-001-build-api-endpoints
C-002-add-rate-limiting
3.2 Track Definitions
Tracks represent parallel work streams that can be executed concurrently by different agents:
| Track | Purpose | Typical Work |
|---|---|---|
| A | Core Architecture | Foundation, framework, core modules |
| B | Data Layer | Database, storage, caching |
| C | API Layer | Endpoints, contracts, validation |
| D | Frontend | UI components, client logic |
| E | Infrastructure | Deployment, CI/CD, monitoring |
| F | Testing | Test suites, coverage, QA |
| G | Documentation | Specs, guides, API docs |
| X | Experimental | Spikes, prototypes, POCs |
| Z | Maintenance | Bug fixes, tech debt, refactoring |
3.3 Task ID Structure
┌─────────────────────────────────────────────────────────────────────────┐
│ TASK ID ANATOMY │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ B-003-implement-user-dashboard │
│ │ │ │ │
│ │ │ └─── Description: Human-readable task summary │
│ │ │ (kebab-case, max 50 chars) │
│ │ │ │
│ │ └──────── Sequence: Position within track (001-999) │
│ │ Determines execution order for dependent tasks │
│ │ │
│ └────────── Track: Parallel work stream identifier │
│ Enables concurrent agent assignment │
│ │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ COMPOUND TASK IDs (Subtasks): │
│ │
│ B-003-implement-user-dashboard::1 │
│ B-003-implement-user-dashboard::2 │
│ B-003-implement-user-dashboard::3 │
│ │
│ Format: [TASK_ID]::[SUBTASK_INDEX] │
│ │
└─────────────────────────────────────────────────────────────────────────┘
3.4 Multi-Agent Resource Allocation by Track
The track system enables intelligent agent distribution:
PROJECT: E-Commerce Platform Rebuild
─────────────────────────────────────────────────────────────────────────
Track A (Architecture) Track B (Data)
├── Agent-Alpha [Session-001] ├── Agent-Beta [Session-002]
│ ├── A-001-core-framework │ ├── B-001-schema-design
│ ├── A-002-di-container │ ├── B-002-entity-models
│ └── A-003-event-bus │ └── B-003-repositories
│ │
Track C (API) Track D (Frontend)
├── Agent-Gamma [Session-003] └── Agent-Delta [Session-001]
│ ├── C-001-rest-endpoints ├── D-001-component-library
│ ├── C-002-graphql-schema ├── D-002-state-management
│ └── C-003-authentication └── D-003-routing
RESOURCE ALLOCATION RULES:
├── Single agent per track (default)
├── Multiple agents on high-priority tracks (configurable)
├── Cross-track dependencies tracked via DAG
└── Generation Clock applies per task_id (track+sequence+description)
3.5 Session Memory Integration
All task activity is logged to session memory for continuity and audit:
SESSION MEMORY STRUCTURE
─────────────────────────────────────────────────────────────────────────
session_id: "sess-alice-20260104-001"
tenant_id: "acme-corp"
project_id: "ecommerce-rebuild"
user_id: "alice@acme.com"
task_log:
- timestamp: "2026-01-04T06:00:00Z"
event: "CLAIM_ACQUIRED"
task_id: "B-002-entity-models"
track: "B"
sequence: 2
generation: 1
agent_id: "agent-beta"
- timestamp: "2026-01-04T06:15:00Z"
event: "RESULT_SUBMITTED"
task_id: "B-002-entity-models"
track: "B"
sequence: 2
generation: 1
agent_id: "agent-beta"
outcome: "ACCEPTED"
work_product_ref: "wp-b002-gen1-abc123"
- timestamp: "2026-01-04T06:16:00Z"
event: "CLAIM_ACQUIRED"
task_id: "B-003-repositories"
track: "B"
sequence: 3
generation: 1
agent_id: "agent-beta"
active_claims:
- task_id: "B-003-repositories"
generation: 1
expires_at: "2026-01-04T06:21:00Z"
completed_tasks:
- "B-001-schema-design"
- "B-002-entity-models"
track_progress:
A: { completed: 0, in_progress: 3, pending: 5 }
B: { completed: 2, in_progress: 1, pending: 4 }
C: { completed: 0, in_progress: 0, pending: 8 }
D: { completed: 0, in_progress: 0, pending: 6 }
3.6 Work Product Identification
Each task result is tagged with a unique work product reference:
WORK PRODUCT REFERENCE FORMAT
─────────────────────────────────────────────────────────────────────────
wp-[TASK_ID]-gen[GENERATION]-[HASH]
Examples:
wp-A-001-core-framework-gen1-7f3a2b
wp-B-003-repositories-gen2-9c4e1d
wp-C-002-graphql-schema-gen1-2b8f5a
Components:
wp : Work product prefix
TASK_ID : Full task identifier
gen[N] : Generation that produced this result
HASH : Short hash for uniqueness (first 6 chars of UUID)
3.7 Task Dependencies and Ordering
Within-track ordering is determined by sequence number. Cross-track dependencies are explicit:
# Project Plan Example
project: ecommerce-rebuild
tracks:
A:
- id: A-001-core-framework
description: "Initialize core application framework"
depends_on: []
- id: A-002-di-container
description: "Setup dependency injection container"
depends_on: [A-001-core-framework]
- id: A-003-event-bus
description: "Implement event bus for async communication"
depends_on: [A-002-di-container]
B:
- id: B-001-schema-design
description: "Design database schema"
depends_on: []
- id: B-002-entity-models
description: "Create entity models from schema"
depends_on: [B-001-schema-design, A-001-core-framework] # Cross-track!
- id: B-003-repositories
description: "Implement repository pattern"
depends_on: [B-002-entity-models, A-002-di-container] # Cross-track!
4. Architecture
4. Architecture
4.1 Component Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ CODITECT PLATFORM │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ Agent Layer │
│ │ (Alice) │ │ (Bob) │ │ (Alice) │ │
│ │ Session-1 │ │ Session-2 │ │ Session-3 │ │
│ │ Track: A │ │ Track: B │ │ Track: A │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────┬────┴────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ TASK COORDINATOR SERVICE │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Claim │ │ Result │ │ Lease │ │ │
│ │ │ Manager │ │ Validator │ │ Monitor │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Generation │ │ Track │ │ Session │ │ │
│ │ │ Clock │ │ Router │ │ Memory │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ STORAGE ABSTRACTION │ │
│ │ (Database-Agnostic Repository Interface) │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┼────────────┬────────────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Postgres │ │ MySQL │ │ Redis │ │ Custom │ Storage Layer │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
4.2 Data Model
4.2.1 Core Entities
┌─────────────────────────────────────────────────────────────────┐
│ TaskIdentifier │
├─────────────────────────────────────────────────────────────────┤
│ track: str # Track letter (A, B, C, ...) │
│ sequence: int # Sequence within track (1-999) │
│ description: str # Kebab-case task description │
│ subtask_index: int? # Optional subtask index (::1, ::2) │
├─────────────────────────────────────────────────────────────────┤
│ @property │
│ task_id: str # Full ID: "A-001-setup-auth" │
│ compound_id: str # With subtask: "A-001-setup-auth::2" │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TaskGeneration │
├─────────────────────────────────────────────────────────────────┤
│ task_id: str # Full task identifier (includes track) │
│ generation: int # Monotonic counter (starts at 1) │
│ session_id: str # Session that created this generation │
│ claimed_at: datetime # When claim was acquired │
│ lease_duration: int # Seconds until expiration │
└─────────────────────────────────────────────────────────────────┘
│
│ 1:1 (current)
▼
┌─────────────────────────────────────────────────────────────────┐
│ TaskClaim │
├─────────────────────────────────────────────────────────────────┤
│ id: str # Claim ID (UUID) │
│ tenant_id: str # Tenant isolation │
│ project_id: str # Project scope │
│ task_id: str # Full task ID (A-001-description) │
│ track: str # Extracted track letter │
│ sequence: int # Extracted sequence number │
│ agent_id: str # Agent holding claim │
│ generation: TaskGeneration │
│ state: ClaimState # ACTIVE | EXPIRED | RELEASED | SUPERSEDED│
│ work_product_ref: str? # Reference to produced artifact │
│ metadata: dict # Additional context │
└─────────────────────────────────────────────────────────────────┘
│
│ 1:N (history)
▼
┌─────────────────────────────────────────────────────────────────┐
│ ClaimHistory │
├─────────────────────────────────────────────────────────────────┤
│ id: str # History entry ID │
│ tenant_id: str # Tenant reference │
│ project_id: str # Project reference │
│ task_id: str # Task reference (A-001-description) │
│ track: str # Track for efficient queries │
│ sequence: int # Sequence for ordering │
│ generation: int # Generation number │
│ session_id: str # Session that held claim │
│ agent_id: str # Agent that held claim │
│ acquired_at: datetime # When acquired │
│ released_at: datetime # When released/expired/superseded │
│ release_reason: str # COMPLETED | EXPIRED | SUPERSEDED | ERROR│
│ result_accepted: bool # Whether result was accepted │
│ work_product_ref: str? # Reference to work product if accepted │
└─────────────────────────────────────────────────────────────────┘
│
│ 0:1
▼
┌─────────────────────────────────────────────────────────────────┐
│ TaskResult │
├─────────────────────────────────────────────────────────────────┤
│ id: str # Result ID │
│ tenant_id: str # Tenant isolation │
│ project_id: str # Project scope │
│ task_id: str # Task reference (A-001-description) │
│ track: str # Track for efficient queries │
│ sequence: int # Sequence for ordering │
│ generation: int # Generation that produced this result │
│ session_id: str # Session that submitted │
│ agent_id: str # Agent that produced result │
│ work_product_ref: str # Unique work product reference │
│ result_data: dict # Actual work output │
│ submitted_at: datetime # Submission timestamp │
│ accepted: bool # Whether result was accepted │
│ rejection_reason: str # If rejected, why │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SessionMemory │
├─────────────────────────────────────────────────────────────────┤
│ session_id: str # Session identifier │
│ tenant_id: str # Tenant scope │
│ project_id: str # Active project │
│ user_id: str # User who owns this session │
│ created_at: datetime # Session start time │
│ last_activity: datetime# Last action timestamp │
│ │
│ task_log: List[TaskLogEntry] # Chronological task events │
│ active_claims: List[ClaimRef] # Currently held claims │
│ completed_tasks: List[str] # Task IDs completed this sess │
│ track_assignments: Dict[str,str] # Track → Agent assignments │
│ track_progress: Dict[str,Progress] # Track completion stats │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TaskLogEntry │
├─────────────────────────────────────────────────────────────────┤
│ timestamp: datetime # When event occurred │
│ event: str # CLAIM_ACQUIRED | LEASE_RENEWED | │
│ # RESULT_SUBMITTED | CLAIM_LOST | │
│ # CLAIM_RELEASED │
│ task_id: str # Full task ID (A-001-description) │
│ track: str # Track letter │
│ sequence: int # Sequence number │
│ generation: int # Generation at event time │
│ agent_id: str # Agent involved │
│ outcome: str? # ACCEPTED | REJECTED | null │
│ work_product_ref: str? # Work product if result accepted │
│ details: dict # Additional event details │
└─────────────────────────────────────────────────────────────────┘
3.2.2 State Transitions
TASK CLAIM STATE MACHINE
┌─────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ │
│ │ │ │
│ │ NO_CLAIM │◄────────────────────┐ │
│ │ │ │ │
│ └────┬─────┘ │ │
│ │ │ │
│ │ claim_task() │ │
│ │ [gen=1] │ │
│ ▼ │ │
│ ┌──────────┐ │ │
│ │ │──── renew_lease() ──┘ │
│ │ ACTIVE │ [same gen] │ │
│ │ │◄────────────────────┤ │
│ └────┬─────┘ │ │
│ │ │ │
│ ┌────┴────┬──────────┬──────────┐│ │
│ │ │ │ ││ │
│ │ │ │ ││ │
│ ▼ ▼ ▼ ▼│ │
│ submit() timeout new_claim() error │
│ [accept] [expire] [supersede] │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌────────┐┌────────┐┌──────────┐┌───────┐ │
│ │RELEASED││EXPIRED ││SUPERSEDED││ ERROR │ │
│ └────────┘└────────┘└──────────┘└───────┘ │
│ │ │ │ │ │
│ │ │ │ │ │
│ └─────────┴──────────┴──────────┘ │
│ │ │
│ │ [history recorded] │
│ ▼ │
│ ┌──────────┐ │
│ │ ARCHIVED │ │
│ └──────────┘ │
│ │
└────────────────────────────────────────────────┘
GENERATION INCREMENT RULES
┌─────────────────────────────────────────────────────────┐
│ │
│ Event │ Generation Action │
│ ─────────────────────────────────────────────────────│
│ First claim on task │ Set to 1 │
│ Claim after RELEASED │ Increment +1 │
│ Claim after EXPIRED │ Increment +1 │
│ Claim after SUPERSEDED │ Already incremented │
│ Lease renewal │ No change │
│ Result submission │ No change │
│ │
└─────────────────────────────────────────────────────────┘
4. Operational Scenarios
4.1 Happy Path: Single Agent Completes Task
Timeline:
─────────────────────────────────────────────────────────────────────────
T=0 Agent-A claims task-101
├─ Check: No existing claim
├─ Action: Create claim with generation=1
└─ Result: Claim granted
T=1 Agent-A starts work
└─ Work tagged with generation=1
T=5 Agent-A renews lease
├─ Check: generation=1 matches, session matches
└─ Result: Lease extended
T=10 Agent-A completes work, submits result
├─ Check: generation=1 matches current claim
├─ Action: Accept result, mark task COMPLETED
└─ Result: Success
Final State:
├─ Task: COMPLETED
├─ Claim: RELEASED
├─ Result: ACCEPTED (generation=1)
└─ History: 1 entry (generation=1, COMPLETED)
4.2 Conflict: Two Agents, Lease Expiration
Timeline:
─────────────────────────────────────────────────────────────────────────
T=0 Agent-A (Alice, Session-1) claims task-101
├─ Action: Create claim, generation=1
└─ Result: Claim granted
T=5 Agent-A working...
└─ Network partition begins (Alice's connection unstable)
T=10 Agent-A lease renewal FAILS (network issue)
└─ Agent-A unaware, continues working
T=15 Lease expires (generation=1 claim now EXPIRED)
└─ Task eligible for new claims
T=16 Agent-B (Bob, Session-2) claims task-101
├─ Check: Existing claim EXPIRED
├─ Action: Create new claim, generation=2
└─ Result: Claim granted
T=20 Agent-B completes work, submits result
├─ Check: generation=2 matches current claim
├─ Action: Accept result
└─ Result: Success (generation=2)
T=25 Agent-A (network restored) attempts to submit result
├─ Check: generation=1 < current generation=2
├─ Action: REJECT result
└─ Result: STALE_GENERATION error
Final State:
├─ Task: COMPLETED
├─ Result: Agent-B's work (generation=2)
└─ History:
├─ Entry 1: generation=1, EXPIRED, result_rejected
└─ Entry 2: generation=2, COMPLETED, result_accepted
4.3 Rapid Failover: Multiple Sequential Failures
Timeline:
─────────────────────────────────────────────────────────────────────────
T=0 Agent-A claims task-101, generation=1
└─ Crashes at T=2 before completing
T=5 Lease expires
T=6 Agent-B claims task-101, generation=2
└─ Crashes at T=8 before completing
T=12 Lease expires
T=13 Agent-C claims task-101, generation=3
└─ Completes successfully at T=18
T=18 Agent-C submits result
├─ Check: generation=3 matches
└─ Result: ACCEPTED
T=20 Agent-A (recovered) attempts to submit
├─ Check: generation=1 < current=3
└─ Result: REJECTED (STALE_GENERATION)
T=22 Agent-B (recovered) attempts to submit
├─ Check: generation=2 < current=3
└─ Result: REJECTED (STALE_GENERATION)
Final State:
├─ Task: COMPLETED with Agent-C's result
└─ History: 3 generations recorded
├─ gen=1: EXPIRED, no result
├─ gen=2: EXPIRED, no result
└─ gen=3: COMPLETED, result accepted
4.4 Sub-Task Coordination
Task-101 has 5 subtasks: [101::1, 101::2, 101::3, 101::4, 101::5]
Timeline:
─────────────────────────────────────────────────────────────────────────
T=0 Agent-A claims subtasks 101::1, 101::2, 101::3
├─ 101::1 → generation=1
├─ 101::2 → generation=1
└─ 101::3 → generation=1
T=1 Agent-B claims subtasks 101::3, 101::4, 101::5
├─ 101::3 → DENIED (Agent-A holds active claim)
├─ 101::4 → generation=1
└─ 101::5 → generation=1
T=5 Agent-A completes 101::1, 101::2
└─ Results accepted (generation=1)
T=6 Agent-A lease on 101::3 expires (was slow)
T=7 Agent-B claims 101::3
└─ 101::3 → generation=2
T=10 Agent-B completes 101::3, 101::4, 101::5
└─ All results accepted
T=12 Agent-A attempts to submit 101::3
├─ Check: generation=1 < current=2
└─ Result: REJECTED
Final State:
├─ 101::1 → Agent-A result (gen=1) ✓
├─ 101::2 → Agent-A result (gen=1) ✓
├─ 101::3 → Agent-B result (gen=2) ✓
├─ 101::4 → Agent-B result (gen=1) ✓
└─ 101::5 → Agent-B result (gen=1) ✓
Parent task merge: All subtasks complete, merge results
5. API Specification
5.1 Coordinator Service API
┌─────────────────────────────────────────────────────────────────────────┐
│ TaskCoordinator API │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ CLAIM OPERATIONS │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ claim_task( │
│ tenant_id: str, │
│ project_id: str, │
│ task_id: str, │
│ agent_id: str, │
│ session_id: str, │
│ lease_duration_seconds: int = 300 │
│ ) → ClaimResult │
│ │
│ Returns: │
│ ClaimResult { │
│ success: bool │
│ claim: TaskClaim | None │
│ reason: GRANTED | DENIED_ACTIVE_CLAIM | DENIED_COMPLETED │
│ current_holder: str | None # If denied, who holds it │
│ } │
│ │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ renew_lease( │
│ tenant_id: str, │
│ project_id: str, │
│ task_id: str, │
│ session_id: str, │
│ expected_generation: int │
│ ) → RenewalResult │
│ │
│ Returns: │
│ RenewalResult { │
│ success: bool │
│ new_expiry: datetime | None │
│ reason: RENEWED | GENERATION_MISMATCH | SESSION_MISMATCH | │
│ NO_CLAIM | ALREADY_EXPIRED │
│ } │
│ │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ release_claim( │
│ tenant_id: str, │
│ project_id: str, │
│ task_id: str, │
│ session_id: str, │
│ expected_generation: int, │
│ reason: str = "VOLUNTARY" │
│ ) → ReleaseResult │
│ │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ RESULT OPERATIONS │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ submit_result( │
│ tenant_id: str, │
│ project_id: str, │
│ task_id: str, │
│ session_id: str, │
│ generation: int, │
│ result_data: dict │
│ ) → SubmissionResult │
│ │
│ Returns: │
│ SubmissionResult { │
│ accepted: bool │
│ reason: ACCEPTED | STALE_GENERATION | FUTURE_GENERATION | │
│ SESSION_MISMATCH | NO_CLAIM | TASK_ALREADY_COMPLETED │
│ current_generation: int │
│ work_lost: bool # True if agent did work but was rejected │
│ } │
│ │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ QUERY OPERATIONS │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ get_task_state( │
│ tenant_id: str, │
│ project_id: str, │
│ task_id: str │
│ ) → TaskState │
│ │
│ get_claim_history( │
│ tenant_id: str, │
│ project_id: str, │
│ task_id: str │
│ ) → List[ClaimHistory] │
│ │
│ get_active_claims_for_session( │
│ tenant_id: str, │
│ session_id: str │
│ ) → List[TaskClaim] │
│ │
└─────────────────────────────────────────────────────────────────────────┘
5.2 Storage Repository Interface
┌─────────────────────────────────────────────────────────────────────────┐
│ ClaimRepository (Abstract Interface) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ATOMIC OPERATIONS (Must be transactional) │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ atomic_claim_or_fail( │
│ key: ClaimKey, │
│ new_claim: TaskClaim, │
│ expected_state: ExpectedState # NO_CLAIM | EXPIRED | SPECIFIC_GEN │
│ ) → AtomicResult │
│ │
│ atomic_update_if_generation_matches( │
│ key: ClaimKey, │
│ expected_generation: int, │
│ update: ClaimUpdate │
│ ) → AtomicResult │
│ │
│ atomic_submit_result( │
│ claim_key: ClaimKey, │
│ result: TaskResult, │
│ expected_generation: int │
│ ) → AtomicResult │
│ │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ READ OPERATIONS │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ get_current_claim(key: ClaimKey) → TaskClaim | None │
│ get_claim_history(key: ClaimKey) → List[ClaimHistory] │
│ get_result(key: ClaimKey) → TaskResult | None │
│ list_claims_by_session(tenant_id: str, session_id: str) → List[Claim] │
│ list_expired_claims(tenant_id: str, before: datetime) → List[Claim] │
│ │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ BATCH OPERATIONS │
│ ───────────────────────────────────────────────────────────────────────│
│ │
│ batch_claim(keys: List[ClaimKey], claim_template: ClaimTemplate) │
│ → Dict[ClaimKey, ClaimResult] │
│ │
│ batch_get_states(keys: List[ClaimKey]) → Dict[ClaimKey, TaskState] │
│ │
└─────────────────────────────────────────────────────────────────────────┘
6. Consistency Guarantees
6.1 Strong Consistency Requirements
| Operation | Guarantee | Mechanism |
|---|---|---|
| Claim acquisition | Linearizable | Atomic compare-and-swap |
| Generation increment | Monotonic | Single writer per task |
| Result acceptance | Exactly-once | Generation + session validation |
| Lease renewal | Conditional | Generation match required |
| History append | Durable | Write-ahead before response |
6.2 Conflict Resolution Rules
RULE 1: Higher generation always wins
─────────────────────────────────────
If result_A.generation > result_B.generation:
ACCEPT result_A
REJECT result_B
RULE 2: Same generation, same session wins
──────────────────────────────────────────
If result_A.generation == result_B.generation:
If result_A.session_id == current_claim.session_id:
ACCEPT result_A
Else:
REJECT result_A (session mismatch)
RULE 3: Completed task rejects all
──────────────────────────────────
If task.state == COMPLETED:
REJECT all new results
(Task must be explicitly reset to accept new work)
RULE 4: No claim, no acceptance
───────────────────────────────
If current_claim is None:
REJECT result
(Cannot submit without prior claim)
6.3 Failure Modes and Recovery
┌─────────────────────────────────────────────────────────────────────────┐
│ FAILURE MODE MATRIX │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Failure │ Detection │ Recovery │
│ ──────────────────────────────────────────────────────────────────────│
│ Agent crash │ Lease expiration │ New claim (gen+1) │
│ Network partition │ Lease renewal fail │ New claim (gen+1) │
│ Coordinator crash │ Health check │ Restart, replay log │
│ Database unavailable │ Connection timeout │ Retry with backoff │
│ Duplicate submission │ Generation check │ Reject stale │
│ Split-brain agents │ Generation compare │ Higher gen wins │
│ Lease renewal race │ Atomic CAS │ Loser gets error │
│ Clock skew │ Server-side time │ Single source of time │
│ │
└─────────────────────────────────────────────────────────────────────────┘
7. Multi-Tenancy Considerations
7.1 Isolation Boundaries
TENANT ISOLATION MODEL
─────────────────────────────────────────────────────────────────────────
Level 1: Logical Isolation (Default)
├─ All tenants share infrastructure
├─ Tenant ID prefix on all keys/queries
├─ Query filtering enforced at repository layer
└─ Cost-effective, simpler operations
Level 2: Schema Isolation (Optional)
├─ Separate database schema per tenant
├─ Connection routing by tenant ID
├─ Stronger isolation guarantees
└─ Higher operational complexity
Level 3: Instance Isolation (Enterprise)
├─ Dedicated database instance per tenant
├─ Network-level isolation
├─ Compliance requirement for some industries
└─ Highest cost, maximum isolation
GENERATION SCOPE
─────────────────────────────────────────────────────────────────────────
Generations are scoped to: TENANT + PROJECT + TASK
Key structure: /{tenant_id}/{project_id}/claims/{task_id}
This means:
├─ Tenant A's task-101 generation is independent of Tenant B's task-101
├─ Project X's task-101 is independent of Project Y's task-101
└─ No cross-tenant generation conflicts possible
7.2 Resource Limits
┌─────────────────────────────────────────────────────────────────────────┐
│ CONFIGURABLE LIMITS PER TENANT │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Limit │ Default │ Enterprise │ Rationale │
│ ──────────────────────────────────────────────────────────────────────│
│ max_concurrent_claims │ 100 │ 1000 │ Prevent runaway │
│ max_lease_duration_seconds │ 3600 │ 7200 │ Resource cleanup│
│ min_lease_duration_seconds │ 30 │ 30 │ Prevent thrash │
│ max_generations_per_task │ 100 │ 1000 │ History growth │
│ claim_history_retention_days │ 90 │ 365 │ Compliance │
│ max_result_size_bytes │ 10MB │ 100MB │ Storage costs │
│ max_active_sessions │ 50 │ 500 │ Fair usage │
│ │
└─────────────────────────────────────────────────────────────────────────┘
8. Observability
8.1 Metrics
COUNTER METRICS
─────────────────────────────────────────────────────────────────────────
coditect_claims_total{tenant, project, result}
result: granted | denied_active | denied_completed
coditect_results_total{tenant, project, result}
result: accepted | stale_generation | session_mismatch | no_claim
coditect_lease_renewals_total{tenant, project, result}
result: renewed | failed
coditect_lease_expirations_total{tenant, project}
coditect_generation_increments_total{tenant, project}
GAUGE METRICS
─────────────────────────────────────────────────────────────────────────
coditect_active_claims{tenant, project}
coditect_pending_tasks{tenant, project}
coditect_completed_tasks{tenant, project}
HISTOGRAM METRICS
─────────────────────────────────────────────────────────────────────────
coditect_claim_duration_seconds{tenant, project}
Buckets: 1, 5, 15, 60, 300, 900, 3600
coditect_task_completion_time_seconds{tenant, project}
Buckets: 10, 60, 300, 900, 3600, 7200
coditect_generations_per_task{tenant, project}
Buckets: 1, 2, 3, 5, 10, 20, 50
8.2 Audit Events
{
"event_type": "CLAIM_ACQUIRED",
"timestamp": "2026-01-04T06:30:00Z",
"tenant_id": "acme-corp",
"project_id": "proj-123",
"task_id": "task-456",
"generation": 3,
"session_id": "sess-789",
"agent_id": "agent-abc",
"previous_generation": 2,
"previous_state": "EXPIRED",
"lease_duration_seconds": 300,
"metadata": {
"user_id": "user-alice",
"client_ip": "192.168.1.100"
}
}
{
"event_type": "RESULT_REJECTED",
"timestamp": "2026-01-04T06:35:00Z",
"tenant_id": "acme-corp",
"project_id": "proj-123",
"task_id": "task-456",
"submitted_generation": 2,
"current_generation": 3,
"rejection_reason": "STALE_GENERATION",
"session_id": "sess-old",
"agent_id": "agent-xyz",
"work_lost": true,
"metadata": {
"result_size_bytes": 15234,
"processing_time_seconds": 45
}
}
9. Compliance and Audit Trail
9.1 Regulatory Requirements
For healthcare and fintech deployments:
AUDIT REQUIREMENTS MATRIX
─────────────────────────────────────────────────────────────────────────
Requirement │ HIPAA │ SOC2 │ PCI-DSS │ Implementation
─────────────────────────────────────────────────────────────────────────
Immutable audit log │ ✓ │ ✓ │ ✓ │ Append-only history
User attribution │ ✓ │ ✓ │ ✓ │ Session→User mapping
Timestamp integrity │ ✓ │ ✓ │ ✓ │ Server-side UTC
Data retention │ 6 yr │ 1 yr │ 1 yr │ Configurable
Access logging │ ✓ │ ✓ │ ✓ │ All operations logged
Change tracking │ ✓ │ ✓ │ ✓ │ Generation history
Non-repudiation │ ✓ │ ✓ │ ✓ │ Session signatures
─────────────────────────────────────────────────────────────────────────
9.2 Generation History as Audit Trail
The generation history provides complete lineage:
TASK LINEAGE QUERY
─────────────────────────────────────────────────────────────────────────
For task "task-456" in project "proj-123":
Generation │ Session │ Agent │ User │ Acquired │ Released │ Result
───────────────────────────────────────────────────────────────────────────────────────────────
1 │ sess-001 │ agent-A │ Alice │ 2026-01-04 06:00 │ 2026-01-04 06:05 │ EXPIRED
2 │ sess-002 │ agent-B │ Bob │ 2026-01-04 06:06 │ 2026-01-04 06:08 │ EXPIRED
3 │ sess-003 │ agent-C │ Alice │ 2026-01-04 06:09 │ 2026-01-04 06:15 │ ACCEPTED
───────────────────────────────────────────────────────────────────────────────────────────────
Rejected submissions:
├─ sess-001/agent-A submitted at 06:20 → REJECTED (gen 1 < current 3)
└─ sess-002/agent-B submitted at 06:18 → REJECTED (gen 2 < current 3)
10. Performance Considerations
10.1 Scalability Targets
TARGET METRICS
─────────────────────────────────────────────────────────────────────────
Concurrent claims per tenant: 10,000
Claims per second (platform): 50,000
Result submissions per second: 10,000
P99 claim latency: < 50ms
P99 submission latency: < 100ms
History query latency: < 200ms
10.2 Optimization Strategies
1. CLAIM CACHING
├─ Cache active claims in memory
├─ Invalidate on state change
└─ Reduces database reads for renewals
2. BATCH OPERATIONS
├─ Batch claim acquisitions for subtasks
├─ Batch history writes
└─ Reduces round trips
3. PARTITIONING
├─ Partition by tenant_id
├─ Secondary partition by project_id
└─ Enables horizontal scaling
4. LEASE TIMING OPTIMIZATION
├─ Renewal at 40% of lease duration
├─ Jitter to prevent thundering herd
└─ Adaptive based on task complexity
5. ASYNC HISTORY WRITES
├─ Synchronous: claim state change
├─ Asynchronous: detailed history
└─ Eventual consistency acceptable for history
11. Implementation Phases
Phase 1: Core (Week 1-2)
- Domain models (TaskGeneration, TaskClaim, TaskResult)
- Repository interface definition
- In-memory repository for testing
- Basic coordinator with claim/submit/renew
Phase 2: Persistence (Week 3-4)
- PostgreSQL repository implementation
- Transaction handling
- Migration scripts
- Integration tests
Phase 3: Agent Integration (Week 5-6)
- Agent task runner with generation awareness
- Lease renewal background task
- Error handling and retry logic
- Agent SDK
Phase 4: Observability (Week 7-8)
- Metrics instrumentation
- Audit event emission
- Dashboard templates
- Alerting rules
Phase 5: Production Hardening (Week 9-10)
- Load testing
- Chaos testing (lease expiration, network partition)
- Performance optimization
- Documentation
12. References
- Joshi, Unmesh. Patterns of Distributed Systems. Addison-Wesley, 2024. Chapter 9: Generation Clock.
- Ongaro, Diego. Consensus: Bridging Theory and Practice. Stanford PhD Thesis, 2014.
- Lamport, Leslie. "Time, Clocks, and the Ordering of Events in a Distributed System." Communications of the ACM, 1978.
Document Version: 1.0 | Last Updated: January 2026