Assessment Creation Patterns

When to Use This Skill

Use this skill when implementing assessment creation patterns patterns in your codebase.

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Level 1: Quick Reference (Under 500 tokens)

Core Assessment Structure

# Bloom's Taxonomy Levels (Low to High)
BLOOMS_LEVELS = {
    "remember": 1,     # Recall facts
    "understand": 2,   # Explain concepts
    "apply": 3,        # Use in new situations
    "analyze": 4,      # Break down relationships
    "evaluate": 5,     # Make judgments
    "create": 6        # Produce new work
}

# Question Template
{
    "id": "q001",
    "type": "multiple_choice",  # or "short_answer", "coding", "essay"
    "difficulty": "medium",      # beginner, medium, advanced, expert
    "bloom_level": "apply",
    "topic": "neural_networks",
    "question": "Question text with context",
    "options": ["A", "B", "C", "D"],
    "correct_answer": "B",
    "explanation": "Why B is correct and others are wrong",
    "time_limit": 120,           # seconds
    "points": 10,
    "tags": ["supervised_learning", "backpropagation"]
}

Difficulty Progression

# Adaptive difficulty scaling
def calculate_next_difficulty(user_performance):
    """
    Adjust difficulty based on user accuracy and speed.

    Performance bands:
    - 90%+ correct, fast → increase 2 levels
    - 70-90% correct → increase 1 level
    - 50-70% correct → maintain level
    - <50% correct → decrease 1 level
    """
    accuracy = user_performance["correct"] / user_performance["total"]
    avg_time = user_performance["avg_time"]

    if accuracy >= 0.9 and avg_time < user_performance["expected_time"]:
        return "increase_2"
    elif accuracy >= 0.7:
        return "increase_1"
    elif accuracy >= 0.5:
        return "maintain"
    else:
        return "decrease_1"

Bias Detection Checklist

bias_checks:
  language:
    - avoid_gendered_pronouns: true
    - use_inclusive_examples: true
    - check_cultural_assumptions: true

  accessibility:
    - provide_alt_text_for_images: true
    - avoid_color_only_cues: true
    - support_screen_readers: true

  fairness:
    - balanced_topic_coverage: true
    - no_trick_questions: true
    - clear_success_criteria: true

Level 2: Implementation Details (Under 2000 tokens)

Multi-Format Question Types

1. Multiple Choice (MCQ)

class MultipleChoiceQuestion:
    """Best for: Knowledge recall, concept understanding (Bloom's 1-3)"""

    def __init__(self, stem, options, correct_index, distractors):
        self.stem = stem
        self.options = options  # List of 4-5 options
        self.correct_index = correct_index
        self.distractors = distractors  # Common misconceptions

    def validate_quality(self):
        """Quality checks for MCQ"""
        checks = {
            "stem_clear": len(self.stem.split()) >= 10,
            "options_homogeneous": self._check_option_length_variance() < 0.3,
            "no_all_of_above": "all of the above" not in str(self.options).lower(),
            "no_none_of_above": "none of the above" not in str(self.options).lower(),
            "distractors_plausible": len(self.distractors) >= 3
        }
        return all(checks.values()), checks

    def _check_option_length_variance(self):
        lengths = [len(opt) for opt in self.options]
        return (max(lengths) - min(lengths)) / max(lengths)

2. Coding Challenges

class CodingQuestion:
    """Best for: Application, analysis (Bloom's 3-4)"""

    def __init__(self, problem, test_cases, starter_code, hints):
        self.problem = problem
        self.test_cases = test_cases  # [{input, expected_output, points}]
        self.starter_code = starter_code
        self.hints = hints  # Progressive hints

    def auto_grade(self, submission):
        """Run test cases and calculate score"""
        results = []
        for i, test in enumerate(self.test_cases):
            try:
                output = self._execute_code(submission, test["input"])
                passed = output == test["expected_output"]
                results.append({
                    "test_id": i,
                    "passed": passed,
                    "points": test["points"] if passed else 0,
                    "feedback": self._generate_feedback(output, test["expected_output"])
                })
            except Exception as e:
                results.append({
                    "test_id": i,
                    "passed": False,
                    "points": 0,
                    "error": str(e)
                })

        return {
            "total_score": sum(r["points"] for r in results),
            "max_score": sum(t["points"] for t in self.test_cases),
            "results": results
        }

3. Essay/Short Answer

class EssayQuestion:
    """Best for: Evaluation, creation (Bloom's 5-6)"""

    def __init__(self, prompt, rubric, word_limit):
        self.prompt = prompt
        self.rubric = rubric  # Scoring criteria
        self.word_limit = word_limit

    def create_rubric(self):
        """
        Example rubric structure:
        {
            "criteria": [
                {
                    "name": "Clarity",
                    "weight": 0.3,
                    "levels": {
                        "exemplary": {"points": 4, "description": "Crystal clear"},
                        "proficient": {"points": 3, "description": "Mostly clear"},
                        "developing": {"points": 2, "description": "Somewhat unclear"},
                        "beginning": {"points": 1, "description": "Very unclear"}
                    }
                },
                {
                    "name": "Evidence",
                    "weight": 0.4,
                    "levels": {...}
                },
                {
                    "name": "Organization",
                    "weight": 0.3,
                    "levels": {...}
                }
            ]
        }
        """
        return self.rubric

Bloom's Taxonomy Alignment

# Question generation by Bloom's level
def generate_question_by_bloom_level(topic, level, content_context):
    """
    Generate questions aligned to Bloom's taxonomy.
    """

    bloom_templates = {
        "remember": {
            "verbs": ["define", "list", "recall", "identify", "name"],
            "template": "What is the definition of {concept}?",
            "example": "What is the definition of gradient descent in machine learning?"
        },
        "understand": {
            "verbs": ["explain", "describe", "summarize", "interpret", "compare"],
            "template": "Explain how {concept} works in the context of {context}.",
            "example": "Explain how backpropagation works in neural network training."
        },
        "apply": {
            "verbs": ["apply", "demonstrate", "use", "implement", "solve"],
            "template": "Given {scenario}, how would you apply {concept} to solve {problem}?",
            "example": "Given a dataset with missing values, how would you apply imputation techniques?"
        },
        "analyze": {
            "verbs": ["analyze", "compare", "contrast", "differentiate", "examine"],
            "template": "Compare {concept_a} and {concept_b}. What are the trade-offs?",
            "example": "Compare L1 and L2 regularization. What are the trade-offs in model performance?"
        },
        "evaluate": {
            "verbs": ["evaluate", "critique", "judge", "justify", "assess"],
            "template": "Evaluate the effectiveness of {approach} for {use_case}. Justify your answer.",
            "example": "Evaluate the effectiveness of CNNs vs Transformers for image classification."
        },
        "create": {
            "verbs": ["design", "create", "develop", "construct", "formulate"],
            "template": "Design a {solution} that {objective} while considering {constraints}.",
            "example": "Design a recommendation system that balances accuracy and diversity."
        }
    }

    template_info = bloom_templates[level]
    return {
        "level": level,
        "verbs": template_info["verbs"],
        "template": template_info["template"],
        "example": template_info["example"],
        "topic": topic
    }

Adaptive Assessment Engine

class AdaptiveAssessment:
    """
    Dynamically adjust question difficulty based on user performance.
    Implements Item Response Theory (IRT) principles.
    """

    def __init__(self, question_bank, starting_difficulty="medium"):
        self.question_bank = question_bank
        self.current_difficulty = starting_difficulty
        self.user_ability = 0.5  # Scale 0-1
        self.history = []

    def select_next_question(self):
        """
        Select question matching user's current ability level.
        Uses IRT to maximize information gain.
        """
        # Filter questions near user ability
        candidates = [
            q for q in self.question_bank
            if abs(q.difficulty_score - self.user_ability) < 0.2
        ]

        # Prioritize untested topics
        untested_topics = self._get_untested_topics()
        preferred = [q for q in candidates if q.topic in untested_topics]

        if preferred:
            return max(preferred, key=lambda q: q.information_value(self.user_ability))
        else:
            return max(candidates, key=lambda q: q.information_value(self.user_ability))

    def update_ability_estimate(self, question, correct):
        """
        Update user ability estimate using Bayesian updating.
        """
        # Simple ELO-like update
        expected_prob = self._expected_probability(question.difficulty_score)
        actual = 1 if correct else 0

        K = 0.1  # Learning rate
        self.user_ability += K * (actual - expected_prob)
        self.user_ability = max(0, min(1, self.user_ability))  # Clamp [0,1]

        self.history.append({
            "question_id": question.id,
            "difficulty": question.difficulty_score,
            "correct": correct,
            "ability_after": self.user_ability
        })

    def _expected_probability(self, question_difficulty):
        """Probability user answers correctly (logistic function)"""
        import math
        return 1 / (1 + math.exp(-3 * (self.user_ability - question_difficulty)))

Level 3: Complete Reference (Full tokens)

Bias Detection and Mitigation

Automated Bias Checks

class BiasDetector:
    """
    Comprehensive bias detection for assessment questions.
    """

    def __init__(self):
        self.bias_patterns = self._load_bias_patterns()
        self.inclusive_language_guide = self._load_inclusive_language()

    def analyze_question(self, question_text):
        """Run all bias checks"""
        results = {
            "language_bias": self._check_language_bias(question_text),
            "cultural_bias": self._check_cultural_bias(question_text),
            "accessibility": self._check_accessibility(question_text),
            "cognitive_load": self._check_cognitive_load(question_text),
            "fairness": self._check_fairness(question_text)
        }

        results["overall_score"] = self._calculate_bias_score(results)
        results["recommendations"] = self._generate_recommendations(results)

        return results

    def _check_language_bias(self, text):
        """Check for gendered language, idioms, jargon"""
        issues = []

        # Gendered pronouns
        gendered_words = ["he", "she", "his", "her", "him", "himself", "herself"]
        for word in gendered_words:
            if re.search(rf'\b{word}\b', text, re.IGNORECASE):
                issues.append({
                    "type": "gendered_language",
                    "word": word,
                    "suggestion": "Use 'they/them' or rephrase"
                })

        # Idioms that may not translate
        idioms = ["piece of cake", "hit the nail on the head", "beat around the bush"]
        for idiom in idioms:
            if idiom.lower() in text.lower():
                issues.append({
                    "type": "idiom",
                    "phrase": idiom,
                    "suggestion": "Use literal language"
                })

        # Unnecessary jargon
        jargon_terms = self._detect_jargon(text)
        for term in jargon_terms:
            issues.append({
                "type": "jargon",
                "term": term,
                "suggestion": f"Define '{term}' or use simpler language"
            })

        return {
            "passed": len(issues) == 0,
            "issues": issues,
            "score": 1 - (len(issues) * 0.1)  # -10% per issue
        }

    def _check_cultural_bias(self, text):
        """Check for cultural assumptions"""
        issues = []

        # Western-centric references
        western_holidays = ["Christmas", "Thanksgiving", "Easter"]
        for holiday in western_holidays:
            if holiday in text:
                issues.append({
                    "type": "cultural_reference",
                    "reference": holiday,
                    "suggestion": "Use culturally neutral examples"
                })

        # Currency assumptions (USD-centric)
        if re.search(r'\$\d+', text):
            issues.append({
                "type": "currency_assumption",
                "suggestion": "Specify currency or use generic units"
            })

        # Date format assumptions (MM/DD vs DD/MM)
        if re.search(r'\d{1,2}/\d{1,2}/\d{2,4}', text):
            issues.append({
                "type": "date_format",
                "suggestion": "Use ISO 8601 format (YYYY-MM-DD) or write out month"
            })

        return {
            "passed": len(issues) == 0,
            "issues": issues,
            "score": 1 - (len(issues) * 0.15)
        }

    def _check_accessibility(self, text):
        """Check for accessibility issues"""
        issues = []

        # Images without alt text descriptions
        if "<img" in text and "alt=" not in text:
            issues.append({
                "type": "missing_alt_text",
                "suggestion": "Add descriptive alt text for all images"
            })

        # Color-only cues
        color_cues = ["red circle", "green checkmark", "blue line"]
        for cue in color_cues:
            if cue in text.lower():
                issues.append({
                    "type": "color_dependency",
                    "cue": cue,
                    "suggestion": "Add non-color identifiers (shape, label)"
                })

        # Reading level too high
        readability_score = self._calculate_readability(text)
        if readability_score > 12:  # Above 12th grade level
            issues.append({
                "type": "high_reading_level",
                "score": readability_score,
                "suggestion": "Simplify language to 10th grade level or below"
            })

        return {
            "passed": len(issues) == 0,
            "issues": issues,
            "readability_grade": readability_score,
            "score": 1 - (len(issues) * 0.2)
        }

Assessment Analytics

class AssessmentAnalytics:
    """
    Post-assessment analysis for continuous improvement.
    """

    def analyze_question_performance(self, question_id, responses):
        """
        Calculate question statistics:
        - Difficulty index (P-value)
        - Discrimination index (point-biserial correlation)
        - Distractor analysis
        """
        total = len(responses)
        correct_count = sum(1 for r in responses if r["correct"])

        # Difficulty Index (P-value)
        p_value = correct_count / total
        difficulty = self._classify_difficulty(p_value)

        # Discrimination Index
        # Compare top 27% vs bottom 27% of overall performers
        sorted_responses = sorted(responses, key=lambda r: r["user_total_score"], reverse=True)
        top_27 = sorted_responses[:int(total * 0.27)]
        bottom_27 = sorted_responses[-int(total * 0.27):]

        top_correct = sum(1 for r in top_27 if r["correct"])
        bottom_correct = sum(1 for r in bottom_27 if r["correct"])

        discrimination_index = (top_correct - bottom_correct) / len(top_27)

        # Distractor analysis (for MCQ)
        distractor_stats = self._analyze_distractors(responses)

        return {
            "question_id": question_id,
            "total_responses": total,
            "p_value": p_value,
            "difficulty": difficulty,
            "discrimination_index": discrimination_index,
            "quality": self._assess_question_quality(p_value, discrimination_index),
            "distractor_stats": distractor_stats,
            "recommendations": self._generate_item_recommendations(p_value, discrimination_index)
        }

    def _classify_difficulty(self, p_value):
        """
        P-value interpretation:
        - 0.90-1.00: Very easy
        - 0.70-0.89: Easy
        - 0.30-0.69: Medium
        - 0.10-0.29: Hard
        - 0.00-0.09: Very hard
        """
        if p_value >= 0.90:
            return "very_easy"
        elif p_value >= 0.70:
            return "easy"
        elif p_value >= 0.30:
            return "medium"
        elif p_value >= 0.10:
            return "hard"
        else:
            return "very_hard"

    def _assess_question_quality(self, p_value, discrimination):
        """
        Quality criteria:
        - Good: 0.30 < P < 0.70 and D > 0.30
        - Acceptable: 0.20 < P < 0.80 and D > 0.20
        - Poor: Otherwise
        """
        if 0.30 < p_value < 0.70 and discrimination > 0.30:
            return "good"
        elif 0.20 < p_value < 0.80 and discrimination > 0.20:
            return "acceptable"
        else:
            return "poor"

    def _generate_item_recommendations(self, p_value, discrimination):
        """Actionable recommendations for question improvement"""
        recs = []

        if p_value > 0.90:
            recs.append("Question too easy - increase difficulty or remove")
        elif p_value < 0.10:
            recs.append("Question too hard - verify answer key or simplify")

        if discrimination < 0.10:
            recs.append("Poor discrimination - question not distinguishing high/low performers")
        elif discrimination < 0:
            recs.append("CRITICAL: Negative discrimination - low performers doing better than high performers. Check answer key!")

        if 0.30 < p_value < 0.70 and discrimination > 0.30:
            recs.append("Excellent question - retain in question bank")

        return recs

Complete Assessment Workflow

# End-to-end assessment creation workflow

# 1. Define learning objectives
learning_objectives = [
    {
        "id": "LO1",
        "description": "Understand gradient descent optimization",
        "bloom_level": "understand",
        "topic": "optimization"
    },
    {
        "id": "LO2",
        "description": "Apply backpropagation to train neural networks",
        "bloom_level": "apply",
        "topic": "neural_networks"
    }
]

# 2. Generate questions aligned to objectives
question_generator = AssessmentGenerator(learning_objectives)
questions = question_generator.generate_balanced_assessment(
    total_questions=20,
    bloom_distribution={
        "remember": 0.15,
        "understand": 0.25,
        "apply": 0.40,
        "analyze": 0.15,
        "evaluate": 0.05
    },
    difficulty_distribution={
        "beginner": 0.20,
        "medium": 0.50,
        "advanced": 0.30
    }
)

# 3. Run bias detection
bias_detector = BiasDetector()
for question in questions:
    bias_report = bias_detector.analyze_question(question.text)
    if bias_report["overall_score"] < 0.7:
        question.flag_for_review(bias_report)

# 4. Create adaptive assessment
assessment = AdaptiveAssessment(questions, starting_difficulty="medium")

# 5. Administer and collect responses
user_session = assessment.start_session(user_id="user123")
while not assessment.is_complete():
    next_q = assessment.select_next_question()
    user_response = user_session.present_question(next_q)
    assessment.update_ability_estimate(next_q, user_response["correct"])

# 6. Generate results and analytics
results = assessment.get_results()
analytics = AssessmentAnalytics()
item_analysis = analytics.analyze_all_questions(assessment.history)

# 7. Export report
report = {
    "user_id": "user123",
    "final_ability": assessment.user_ability,
    "questions_completed": len(assessment.history),
    "score": results["score"],
    "strengths": results["strengths"],
    "weaknesses": results["weaknesses"],
    "recommended_next_steps": results["recommendations"],
    "item_analysis": item_analysis
}

Integration with Learning Management Systems

# LTI (Learning Tools Interoperability) integration example
class LTIAssessmentProvider:
    """
    Integrate adaptive assessments with Canvas, Moodle, Blackboard via LTI 1.3.
    """

    def launch_assessment(self, lti_launch_data):
        """Handle LTI launch request from LMS"""
        user_id = lti_launch_data["user_id"]
        course_id = lti_launch_data["context_id"]

        # Initialize adaptive assessment for user
        assessment = self._get_or_create_assessment(user_id, course_id)

        return {
            "launch_url": f"/assessment/{assessment.id}",
            "user": user_id,
            "course": course_id
        }

    def submit_grade(self, assessment_id, score):
        """Send grade back to LMS via LTI Outcomes service"""
        assessment = self._load_assessment(assessment_id)

        lti_outcome = {
            "lis_result_sourcedid": assessment.sourcedid,
            "score": score,  # Normalized 0-1
            "timestamp": datetime.utcnow().isoformat()
        }

        return self._post_grade_to_lms(lti_outcome)

This skill provides comprehensive assessment creation patterns covering adaptive testing, Bloom's taxonomy alignment, bias detection, and LMS integration.

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: assessment-creation-patterns

Completed:
- [x] Assessment questions generated across all Bloom's levels
- [x] Bias detection passed with >70% score
- [x] Adaptive difficulty algorithm implemented
- [x] Test suite validates question quality
- [x] LMS integration configured

Outputs:
- questions/module_1_assessment.json (Question bank with metadata)
- src/adaptive_assessment.py (Adaptive engine implementation)
- src/bias_detector.py (Bias analysis tooling)
- reports/item_analysis.csv (Question performance metrics)

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Questions all at one Bloom's level (no progression)
❌ Bias score below 70% with unresolved issues
❌ Adaptive algorithm stuck at same difficulty
❌ Discrimination index negative (low performers do better)
❌ MCQ options have "all of the above" or "none of the above"
❌ Coding tests lack test cases or auto-grading
❌ Essay rubrics missing or have vague criteria
❌ Cultural bias detected (holidays, currency, idioms)
❌ Accessibility issues (missing alt text, color-only cues)

When NOT to Use

Do NOT use this skill when:

Creating simple quizzes with <5 questions (use basic MCQ patterns instead)
No need for adaptive difficulty (use static-assessment-patterns instead)
Building surveys or opinion polls (use survey-design-patterns instead)
Purely subjective assessments with no right answers
Target audience too narrow for bias detection value
No LMS integration needed and simple grading suffices
Assessment must be paper-based without digital tools

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
All questions at "remember" level	No higher-order thinking tested	Distribute across Bloom's levels 1-6
Using "all of the above"	Reduces question quality	Write specific distractors
Gendered language	Bias and exclusion	Use "they/them" or rephrase
Color-only cues	Accessibility failure	Add shape/label identifiers
Vague rubrics	Subjective grading	Define clear criteria per level
No distractor analysis	Poor MCQ quality	Track which options chosen, refine
Fixed difficulty	Boredom or frustration	Implement adaptive selection
Cultural assumptions	Bias against global audience	Use neutral examples
Skipping item analysis	Can't improve questions	Run P-value and discrimination checks

Principles

This skill embodies:

#5 Eliminate Ambiguity - Clear success criteria in rubrics, unambiguous question stems
#6 Clear, Understandable, Explainable - Questions readable at 10th grade level or below
#7 Fairness and Bias Mitigation - Bias detection, inclusive language, accessibility checks
#8 No Assumptions - Cultural neutrality, explicit definitions for jargon
#10 Test Everything - Item analysis validates question quality with data

Full Standard: CODITECT-STANDARD-AUTOMATION.md

When to Use This Skill​

How to Use This Skill​

Level 1: Quick Reference (Under 500 tokens)​

Core Assessment Structure​

Difficulty Progression​

Bias Detection Checklist​

Level 2: Implementation Details (Under 2000 tokens)​

Multi-Format Question Types​

1. Multiple Choice (MCQ)​

2. Coding Challenges​

3. Essay/Short Answer​

Bloom's Taxonomy Alignment​

Adaptive Assessment Engine​

Level 3: Complete Reference (Full tokens)​

Bias Detection and Mitigation​

Automated Bias Checks​

Assessment Analytics​

Complete Assessment Workflow​

Integration with Learning Management Systems​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​