ADR LMS 007: Interactive Assessment Engine

ADR-LMS-007: Interactive Assessment Engine

Status: Proposed Date: 2025-12-11 Phase: Phase 2 - Core LMS Infrastructure Deciders: Hal Casteel (Founder/CEO/CTO), CODITECT Core Team Technical Story: Enable comprehensive interactive assessments with multiple question types, adaptive difficulty, and automated grading for CODITECT certification

Context and Problem Statement

The current CODITECT training uses static markdown-based quizzes:

### Question 1
What is the correct Task Tool Pattern?
- A) Task(subagent_type="agent-name"...)
- B) Task(subagent_type="general-purpose"...)  ← Correct
- C) /agent-name

This approach has limitations:

Static Format - Questions in markdown, manually graded
No Randomization - Same questions in same order
No Adaptive Difficulty - All users get same questions
Limited Question Types - Only multiple choice
No Code Execution - Cannot test actual coding ability
No Time Tracking - No proctoring capabilities
Manual Grading - Short answers require human review

The Problem: How do we create an interactive assessment engine that supports multiple question types, adaptive difficulty, automated grading, and secure proctoring?

Decision Drivers

Technical Requirements

R1: Multiple question types (MCQ, true/false, short answer, code execution)
R2: Question randomization and pooling
R3: Adaptive difficulty based on user performance
R4: Automated grading with rubrics
R5: Code execution sandbox for practical tests
R6: Time limits and attempt restrictions
R7: Partial credit support

User Experience Goals

UX1: CLI-friendly quiz interface
UX2: Immediate feedback on answers
UX3: Progress saving (resume later)
UX4: Detailed score breakdown
UX5: Remediation suggestions

Security Requirements

S1: Question pool randomization
S2: Time-based session tokens
S3: Answer submission integrity
S4: Anti-cheating measures for proctored exams

Decision Outcome

Chosen Solution: Implement a comprehensive assessment engine with multiple question types, item response theory (IRT) for adaptive testing, sandboxed code execution, and automated+AI-assisted grading.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                   Assessment Engine Architecture                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                   Question Bank                          │    │
│  │                                                          │    │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │    │
│  │  │  MCQ    │  │ True/   │  │  Short  │  │  Code   │    │    │
│  │  │         │  │ False   │  │ Answer  │  │  Exec   │    │    │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘    │    │
│  │                                                          │    │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │    │
│  │  │ Matching│  │  Essay  │  │ Fill-in │  │ Ordering│    │    │
│  │  │         │  │         │  │ Blank   │  │         │    │    │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                             │                                    │
│                             ▼                                    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                 Adaptive Selection                       │    │
│  │                                                          │    │
│  │  ┌──────────────┐    ┌──────────────┐                   │    │
│  │  │     IRT      │───▶│   Question   │                   │    │
│  │  │  Algorithm   │    │   Selector   │                   │    │
│  │  └──────────────┘    └──────────────┘                   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                             │                                    │
│                             ▼                                    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                   Grading Engine                         │    │
│  │                                                          │    │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────────┐     │    │
│  │  │  Exact     │  │   Rubric   │  │   AI-Assisted  │     │    │
│  │  │  Match     │  │   Grading  │  │   (LLM)        │     │    │
│  │  └────────────┘  └────────────┘  └────────────────┘     │    │
│  │                                                          │    │
│  │  ┌────────────┐  ┌────────────┐                         │    │
│  │  │   Code     │  │  Partial   │                         │    │
│  │  │   Runner   │  │  Credit    │                         │    │
│  │  └────────────┘  └────────────┘                         │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Implementation Details

1. Database Schema

-- Question bank
CREATE TABLE assessment_questions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    question_id TEXT UNIQUE NOT NULL,             -- UUID
    question_type TEXT NOT NULL,                   -- mcq, true_false, short_answer, code, matching, essay, fill_blank, ordering

    -- Content
    question_text TEXT NOT NULL,
    question_html TEXT,                            -- Rich text version
    code_snippet TEXT,                             -- For code questions
    code_language TEXT,                            -- python, javascript, bash, etc.

    -- Answer configuration
    options TEXT,                                  -- JSON array for MCQ/matching
    correct_answer TEXT NOT NULL,                  -- Answer or JSON for complex types
    answer_explanation TEXT,                       -- Shown after submission

    -- Grading
    grading_type TEXT DEFAULT 'exact',             -- exact, rubric, ai_assisted, code_execution
    grading_rubric TEXT,                           -- JSON rubric for essays/short answer
    partial_credit BOOLEAN DEFAULT 0,
    max_points INTEGER DEFAULT 1,

    -- IRT parameters (Item Response Theory)
    difficulty REAL DEFAULT 0.5,                   -- 0.0-1.0 (0=easy, 1=hard)
    discrimination REAL DEFAULT 1.0,               -- How well it differentiates ability
    guessing_param REAL DEFAULT 0.25,              -- Probability of guessing correctly

    -- Metadata
    skill_id INTEGER,                              -- Associated skill
    module_id INTEGER,                             -- Associated module
    tags TEXT,                                     -- JSON array

    -- Statistics
    times_shown INTEGER DEFAULT 0,
    times_correct INTEGER DEFAULT 0,
    avg_time_seconds REAL,

    -- Status
    is_active BOOLEAN DEFAULT 1,
    created_by TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (skill_id) REFERENCES learning_skills(id) ON DELETE SET NULL,
    FOREIGN KEY (module_id) REFERENCES learning_modules(id) ON DELETE SET NULL
);

CREATE INDEX idx_questions_type ON assessment_questions(question_type);
CREATE INDEX idx_questions_skill ON assessment_questions(skill_id);
CREATE INDEX idx_questions_difficulty ON assessment_questions(difficulty);

-- Assessment definitions
CREATE TABLE assessments (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    assessment_id TEXT UNIQUE NOT NULL,            -- UUID
    assessment_type TEXT NOT NULL,                 -- quiz, exam, practice, certification

    -- Configuration
    title TEXT NOT NULL,
    description TEXT,
    instructions TEXT,

    -- Question selection
    question_pool TEXT,                            -- JSON: {skill_ids: [], module_ids: [], tags: [], count: 20}
    fixed_questions TEXT,                          -- JSON array of specific question IDs
    total_questions INTEGER NOT NULL,
    randomize_questions BOOLEAN DEFAULT 1,
    randomize_options BOOLEAN DEFAULT 1,

    -- Timing
    time_limit_minutes INTEGER,
    show_time_remaining BOOLEAN DEFAULT 1,
    auto_submit_on_timeout BOOLEAN DEFAULT 1,

    -- Attempts
    max_attempts INTEGER,                          -- NULL = unlimited
    cooldown_hours INTEGER DEFAULT 24,             -- Wait time between attempts

    -- Scoring
    passing_score INTEGER DEFAULT 70,
    show_score_immediately BOOLEAN DEFAULT 1,
    show_answers_after BOOLEAN DEFAULT 1,
    show_explanations BOOLEAN DEFAULT 1,

    -- Adaptive testing
    is_adaptive BOOLEAN DEFAULT 0,
    starting_difficulty REAL DEFAULT 0.5,

    -- Associated content
    learning_path_id INTEGER,
    module_id INTEGER,
    cert_id INTEGER,                               -- Required for certification

    -- Status
    is_published BOOLEAN DEFAULT 0,
    published_at TEXT,
    is_active BOOLEAN DEFAULT 1,
    created_by TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (learning_path_id) REFERENCES learning_paths(id) ON DELETE SET NULL,
    FOREIGN KEY (module_id) REFERENCES learning_modules(id) ON DELETE SET NULL,
    FOREIGN KEY (cert_id) REFERENCES cert_definitions(id) ON DELETE SET NULL
);

-- Assessment attempts
CREATE TABLE assessment_attempts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    attempt_id TEXT UNIQUE NOT NULL,               -- UUID
    assessment_id INTEGER NOT NULL,
    user_id TEXT NOT NULL,

    -- Progress
    status TEXT DEFAULT 'in_progress',             -- in_progress, submitted, graded, expired
    current_question INTEGER DEFAULT 0,
    questions_answered INTEGER DEFAULT 0,

    -- Timing
    started_at TEXT DEFAULT CURRENT_TIMESTAMP,
    submitted_at TEXT,
    time_spent_seconds INTEGER DEFAULT 0,

    -- Results
    score_raw INTEGER,
    score_max INTEGER,
    score_percentage REAL,
    passed BOOLEAN,
    grade TEXT,                                    -- A, B, C, D, F or custom

    -- Question sequence (randomized)
    question_sequence TEXT,                        -- JSON array of question IDs in order

    -- Detailed answers
    answers TEXT,                                  -- JSON: {question_id: {answer, is_correct, points, time_seconds}}

    -- Feedback
    feedback TEXT,
    grader_notes TEXT,
    graded_by TEXT,                                -- 'auto' or user_id for manual
    graded_at TEXT,

    -- Session security
    session_token TEXT,                            -- For resumption
    ip_address TEXT,
    user_agent TEXT,

    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (assessment_id) REFERENCES assessments(id) ON DELETE CASCADE,
    FOREIGN KEY (user_id) REFERENCES auth_users(user_id) ON DELETE CASCADE
);

CREATE INDEX idx_attempts_user ON assessment_attempts(user_id);
CREATE INDEX idx_attempts_assessment ON assessment_attempts(assessment_id);
CREATE INDEX idx_attempts_status ON assessment_attempts(status);

-- Answer submissions (per question)
CREATE TABLE assessment_answers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    attempt_id INTEGER NOT NULL,
    question_id INTEGER NOT NULL,

    -- Answer data
    answer_value TEXT,                             -- User's answer
    answer_json TEXT,                              -- Complex answer (matching, ordering)

    -- Grading
    is_correct BOOLEAN,
    points_earned REAL DEFAULT 0,
    points_possible REAL DEFAULT 1,

    -- Timing
    started_at TEXT,
    answered_at TEXT,
    time_spent_seconds INTEGER,

    -- Code execution (if applicable)
    code_output TEXT,
    code_error TEXT,
    test_results TEXT,                             -- JSON: {passed: 5, failed: 1, tests: [...]}

    -- Grading details
    grading_method TEXT,                           -- auto, manual, ai
    grader_feedback TEXT,
    rubric_scores TEXT,                            -- JSON: {criterion: score}

    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (attempt_id) REFERENCES assessment_attempts(id) ON DELETE CASCADE,
    FOREIGN KEY (question_id) REFERENCES assessment_questions(id) ON DELETE CASCADE,
    UNIQUE(attempt_id, question_id)
);

CREATE INDEX idx_answers_attempt ON assessment_answers(attempt_id);

2. Question Types Implementation

from enum import Enum
from typing import Any, Dict, List, Optional
from dataclasses import dataclass

class QuestionType(Enum):
    MCQ = "mcq"                      # Multiple choice (single answer)
    MCQ_MULTI = "mcq_multi"          # Multiple choice (multiple answers)
    TRUE_FALSE = "true_false"
    SHORT_ANSWER = "short_answer"
    CODE = "code"
    MATCHING = "matching"
    ORDERING = "ordering"
    FILL_BLANK = "fill_blank"
    ESSAY = "essay"


@dataclass
class GradingResult:
    is_correct: bool
    points_earned: float
    points_possible: float
    feedback: str
    rubric_scores: Optional[Dict] = None


def grade_answer(question: dict, user_answer: Any) -> GradingResult:
    """
    Grade a user's answer based on question type.
    """
    q_type = QuestionType(question['question_type'])

    graders = {
        QuestionType.MCQ: grade_mcq,
        QuestionType.MCQ_MULTI: grade_mcq_multi,
        QuestionType.TRUE_FALSE: grade_true_false,
        QuestionType.SHORT_ANSWER: grade_short_answer,
        QuestionType.CODE: grade_code,
        QuestionType.MATCHING: grade_matching,
        QuestionType.ORDERING: grade_ordering,
        QuestionType.FILL_BLANK: grade_fill_blank,
        QuestionType.ESSAY: grade_essay,
    }

    grader = graders.get(q_type)
    if not grader:
        raise ValueError(f"Unknown question type: {q_type}")

    return grader(question, user_answer)


def grade_mcq(question: dict, user_answer: str) -> GradingResult:
    """Grade multiple choice question (single answer)."""
    correct = question['correct_answer']
    is_correct = user_answer.strip().lower() == correct.strip().lower()

    return GradingResult(
        is_correct=is_correct,
        points_earned=question['max_points'] if is_correct else 0,
        points_possible=question['max_points'],
        feedback=question['answer_explanation'] if not is_correct else "Correct!"
    )


def grade_mcq_multi(question: dict, user_answer: List[str]) -> GradingResult:
    """Grade multiple choice with multiple correct answers (partial credit)."""
    correct_answers = set(json.loads(question['correct_answer']))
    user_answers = set(user_answer)

    # Calculate partial credit
    correct_selected = len(correct_answers & user_answers)
    incorrect_selected = len(user_answers - correct_answers)
    total_correct = len(correct_answers)

    # Points = (correct - incorrect) / total, min 0
    points_ratio = max(0, (correct_selected - incorrect_selected) / total_correct)
    points_earned = points_ratio * question['max_points']

    is_correct = user_answers == correct_answers

    return GradingResult(
        is_correct=is_correct,
        points_earned=points_earned,
        points_possible=question['max_points'],
        feedback=f"You selected {correct_selected}/{total_correct} correct answers."
    )


def grade_short_answer(question: dict, user_answer: str) -> GradingResult:
    """Grade short answer with fuzzy matching or AI."""
    correct = question['correct_answer']
    grading_type = question.get('grading_type', 'exact')

    if grading_type == 'exact':
        # Exact match (case-insensitive, trimmed)
        is_correct = user_answer.strip().lower() == correct.strip().lower()
        points = question['max_points'] if is_correct else 0

    elif grading_type == 'contains':
        # Check if answer contains key terms
        key_terms = json.loads(correct)  # List of required terms
        found_terms = sum(1 for term in key_terms if term.lower() in user_answer.lower())
        points = (found_terms / len(key_terms)) * question['max_points']
        is_correct = found_terms == len(key_terms)

    elif grading_type == 'ai_assisted':
        # Use LLM for grading
        result = grade_with_ai(question, user_answer)
        return result

    return GradingResult(
        is_correct=is_correct,
        points_earned=points,
        points_possible=question['max_points'],
        feedback=question['answer_explanation'] if not is_correct else "Correct!"
    )


def grade_code(question: dict, user_code: str) -> GradingResult:
    """Grade code question with sandbox execution."""
    language = question.get('code_language', 'python')
    test_cases = json.loads(question.get('correct_answer', '[]'))

    # Execute code in sandbox
    results = execute_code_sandboxed(user_code, language, test_cases)

    passed = sum(1 for r in results if r['passed'])
    total = len(results)

    points = (passed / total) * question['max_points'] if total > 0 else 0
    is_correct = passed == total

    return GradingResult(
        is_correct=is_correct,
        points_earned=points,
        points_possible=question['max_points'],
        feedback=f"Passed {passed}/{total} test cases.",
        rubric_scores={'test_results': results}
    )


def execute_code_sandboxed(code: str, language: str, test_cases: List[dict]) -> List[dict]:
    """
    Execute code in a sandboxed environment and run test cases.
    Uses Docker containers for isolation.
    """
    results = []

    for test in test_cases:
        try:
            # Build Docker command
            container = f"code-runner-{language}"
            timeout = test.get('timeout', 5)

            # Write code to temp file
            code_file = f"/tmp/code_{uuid.uuid4()}.{language}"
            with open(code_file, 'w') as f:
                f.write(code)
                if test.get('input'):
                    f.write(f"\n\n# Test input\n{test['input']}")

            # Run in Docker
            result = subprocess.run(
                [
                    'docker', 'run', '--rm',
                    '--memory=128m', '--cpus=0.5',
                    '--network=none',  # No network access
                    '-v', f'{code_file}:/code/main.{language}:ro',
                    container,
                    'python', '/code/main.py'  # or appropriate runner
                ],
                capture_output=True,
                timeout=timeout,
                text=True
            )

            actual_output = result.stdout.strip()
            expected_output = test['expected_output'].strip()

            results.append({
                'name': test.get('name', f'Test {len(results)+1}'),
                'passed': actual_output == expected_output,
                'expected': expected_output,
                'actual': actual_output,
                'error': result.stderr if result.returncode != 0 else None
            })

        except subprocess.TimeoutExpired:
            results.append({
                'name': test.get('name', f'Test {len(results)+1}'),
                'passed': False,
                'error': 'Timeout exceeded'
            })
        except Exception as e:
            results.append({
                'name': test.get('name', f'Test {len(results)+1}'),
                'passed': False,
                'error': str(e)
            })

    return results


def grade_with_ai(question: dict, user_answer: str) -> GradingResult:
    """Use LLM for grading essays and complex short answers."""
    rubric = json.loads(question.get('grading_rubric', '{}'))

    prompt = f"""You are grading a student's answer. Be fair but rigorous.

Question: {question['question_text']}

Model Answer / Key Points: {question['correct_answer']}

Grading Rubric:
{json.dumps(rubric, indent=2)}

Student's Answer:
{user_answer}

Grade this answer. For each rubric criterion, assign a score from 0-{rubric.get('max_per_criterion', 5)}.
Then provide overall feedback.

Respond in JSON format:
{{
    "rubric_scores": {{"criterion_name": score, ...}},
    "total_points": <sum of scores>,
    "max_points": {question['max_points']},
    "feedback": "specific feedback for the student",
    "strengths": ["strength 1", ...],
    "areas_for_improvement": ["area 1", ...]
}}"""

    response = call_llm(prompt, model="claude-3-haiku-20240307")
    result = json.loads(response)

    return GradingResult(
        is_correct=result['total_points'] >= question['max_points'] * 0.7,
        points_earned=result['total_points'],
        points_possible=question['max_points'],
        feedback=result['feedback'],
        rubric_scores=result['rubric_scores']
    )

3. Adaptive Testing (IRT)

import numpy as np
from scipy.optimize import minimize_scalar

def select_next_question_adaptive(
    user_id: str,
    assessment_id: int,
    answered_questions: List[int],
    user_responses: List[bool]
) -> int:
    """
    Select next question using Item Response Theory (3PL model).
    Maximizes information at current ability estimate.
    """
    # Estimate current ability
    ability = estimate_ability(answered_questions, user_responses)

    # Get available questions
    available = get_available_questions(assessment_id, exclude=answered_questions)

    # Calculate information for each question at current ability
    best_question = None
    max_information = -float('inf')

    for q in available:
        info = calculate_item_information(
            ability,
            difficulty=q['difficulty'],
            discrimination=q['discrimination'],
            guessing=q['guessing_param']
        )

        if info > max_information:
            max_information = info
            best_question = q

    return best_question['id']


def estimate_ability(questions: List[int], responses: List[bool]) -> float:
    """
    Estimate ability using Maximum Likelihood Estimation (MLE).
    """
    if not responses:
        return 0.0  # Default to average ability

    # Get question parameters
    params = []
    for q_id in questions:
        q = get_question(q_id)
        params.append({
            'a': q['discrimination'],
            'b': q['difficulty'],
            'c': q['guessing_param']
        })

    def neg_log_likelihood(theta):
        """Negative log likelihood for MLE."""
        ll = 0
        for i, (p, r) in enumerate(zip(params, responses)):
            prob = calculate_probability(theta, p['a'], p['b'], p['c'])
            if r:  # Correct response
                ll += np.log(max(prob, 1e-10))
            else:  # Incorrect response
                ll += np.log(max(1 - prob, 1e-10))
        return -ll

    # Find theta that maximizes likelihood
    result = minimize_scalar(neg_log_likelihood, bounds=(-4, 4), method='bounded')
    return result.x


def calculate_probability(theta: float, a: float, b: float, c: float) -> float:
    """
    Calculate probability of correct response using 3PL model.

    P(θ) = c + (1-c) / (1 + exp(-a(θ-b)))

    theta: ability level
    a: discrimination parameter
    b: difficulty parameter
    c: guessing parameter
    """
    exponent = -a * (theta - b)
    return c + (1 - c) / (1 + np.exp(exponent))


def calculate_item_information(theta: float, difficulty: float,
                                discrimination: float, guessing: float) -> float:
    """
    Calculate item information at given ability level.

    I(θ) = a² * (P-c)² / ((1-c)² * P * Q)

    where P = probability of correct, Q = 1-P
    """
    P = calculate_probability(theta, discrimination, difficulty, guessing)
    Q = 1 - P

    numerator = (discrimination ** 2) * ((P - guessing) ** 2)
    denominator = ((1 - guessing) ** 2) * P * Q

    return numerator / denominator if denominator > 0 else 0

4. CLI Commands

# Take assessments
/quiz list                               # Available quizzes
/quiz start ASSESSMENT_ID                # Start a quiz
/quiz resume ATTEMPT_ID                  # Resume in-progress
/quiz submit ATTEMPT_ID                  # Submit for grading

# During quiz
/quiz answer OPTION                      # Answer current question
/quiz skip                               # Skip to next question
/quiz review                             # Review answered questions
/quiz time                               # Show remaining time
/quiz progress                           # Show progress

# Results
/quiz results ATTEMPT_ID                 # View detailed results
/quiz history                            # Past quiz attempts
/quiz analytics                          # Performance analytics

# Practice mode
/quiz practice --skill SKILL_ID          # Practice questions for a skill
/quiz practice --adaptive                # Adaptive practice session

# Admin/Instructor
/quiz create --title "Title" --config config.json
/quiz questions add ASSESSMENT_ID        # Add questions
/quiz questions import FILE.json         # Bulk import
/quiz publish ASSESSMENT_ID
/quiz results-export ASSESSMENT_ID       # Export all results

Question Import Format

{
  "questions": [
    {
      "type": "mcq",
      "text": "What is the correct Task Tool Pattern for invoking agents?",
      "options": [
        {"key": "A", "text": "Task(subagent_type=\"agent-name\", ...)"},
        {"key": "B", "text": "Task(subagent_type=\"general-purpose\", prompt=\"Use agent-name subagent to...\")"},
        {"key": "C", "text": "/agent-name [prompt]"},
        {"key": "D", "text": "agent-name: [prompt]"}
      ],
      "correct_answer": "B",
      "explanation": "The Task Tool Pattern requires subagent_type='general-purpose' with a prompt that includes 'Use [agent-name] subagent to...'",
      "difficulty": 0.3,
      "skill": "task-tool-pattern",
      "tags": ["foundation", "agent-invocation"]
    },
    {
      "type": "code",
      "text": "Write a Task Tool invocation that uses the competitive-market-analyst agent to research the AI IDE market.",
      "language": "python",
      "test_cases": [
        {
          "name": "Contains Task",
          "check": "contains",
          "expected": "Task("
        },
        {
          "name": "Correct subagent_type",
          "check": "contains",
          "expected": "subagent_type=\"general-purpose\""
        },
        {
          "name": "Contains agent name",
          "check": "contains",
          "expected": "competitive-market-analyst"
        }
      ],
      "difficulty": 0.5,
      "skill": "agent-invocation"
    }
  ]
}

Consequences

Positive

P1: Comprehensive question type support
P2: Adaptive testing for efficient assessment
P3: Automated grading reduces manual effort
P4: Code execution validates practical skills
P5: Detailed analytics for improvement

Negative

N1: Code sandbox security complexity
N2: AI grading requires LLM costs
N3: IRT requires calibrated questions

Risks

Risk 1: Code execution sandbox escape
- Mitigation: Docker isolation, no network, resource limits
Risk 2: AI grading inconsistency
- Mitigation: Rubric constraints, human review option

ADR-031-lms-phase-2.md - Quiz engine overview
ADR-033-lms-certificates.md - Certification exams
CODITECT-OPERATOR-ASSESSMENTS.md - Current assessment content

Status: Proposed - Phase 2 Core Infrastructure Last Updated: 2025-12-11 Version: 1.0.0

Context and Problem Statement​

Decision Drivers​

Technical Requirements​

User Experience Goals​

Security Requirements​

Decision Outcome​

Architecture Overview​

Implementation Details​

1. Database Schema​

2. Question Types Implementation​

3. Adaptive Testing (IRT)​

4. CLI Commands​

Question Import Format​

Consequences​

Positive​

Negative​

Risks​

Related Documents​