Skip to main content

Assessment Creation Patterns

Assessment Creation Patterns

When to Use This Skill

Use this skill when implementing assessment creation patterns patterns in your codebase.

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Level 1: Quick Reference (Under 500 tokens)

Core Assessment Structure

# Bloom's Taxonomy Levels (Low to High)
BLOOMS_LEVELS = {
"remember": 1, # Recall facts
"understand": 2, # Explain concepts
"apply": 3, # Use in new situations
"analyze": 4, # Break down relationships
"evaluate": 5, # Make judgments
"create": 6 # Produce new work
}

# Question Template
{
"id": "q001",
"type": "multiple_choice", # or "short_answer", "coding", "essay"
"difficulty": "medium", # beginner, medium, advanced, expert
"bloom_level": "apply",
"topic": "neural_networks",
"question": "Question text with context",
"options": ["A", "B", "C", "D"],
"correct_answer": "B",
"explanation": "Why B is correct and others are wrong",
"time_limit": 120, # seconds
"points": 10,
"tags": ["supervised_learning", "backpropagation"]
}

Difficulty Progression

# Adaptive difficulty scaling
def calculate_next_difficulty(user_performance):
"""
Adjust difficulty based on user accuracy and speed.

Performance bands:
- 90%+ correct, fast → increase 2 levels
- 70-90% correct → increase 1 level
- 50-70% correct → maintain level
- <50% correct → decrease 1 level
"""
accuracy = user_performance["correct"] / user_performance["total"]
avg_time = user_performance["avg_time"]

if accuracy >= 0.9 and avg_time < user_performance["expected_time"]:
return "increase_2"
elif accuracy >= 0.7:
return "increase_1"
elif accuracy >= 0.5:
return "maintain"
else:
return "decrease_1"

Bias Detection Checklist

bias_checks:
language:
- avoid_gendered_pronouns: true
- use_inclusive_examples: true
- check_cultural_assumptions: true

accessibility:
- provide_alt_text_for_images: true
- avoid_color_only_cues: true
- support_screen_readers: true

fairness:
- balanced_topic_coverage: true
- no_trick_questions: true
- clear_success_criteria: true

Level 2: Implementation Details (Under 2000 tokens)

Multi-Format Question Types

1. Multiple Choice (MCQ)

class MultipleChoiceQuestion:
"""Best for: Knowledge recall, concept understanding (Bloom's 1-3)"""

def __init__(self, stem, options, correct_index, distractors):
self.stem = stem
self.options = options # List of 4-5 options
self.correct_index = correct_index
self.distractors = distractors # Common misconceptions

def validate_quality(self):
"""Quality checks for MCQ"""
checks = {
"stem_clear": len(self.stem.split()) >= 10,
"options_homogeneous": self._check_option_length_variance() < 0.3,
"no_all_of_above": "all of the above" not in str(self.options).lower(),
"no_none_of_above": "none of the above" not in str(self.options).lower(),
"distractors_plausible": len(self.distractors) >= 3
}
return all(checks.values()), checks

def _check_option_length_variance(self):
lengths = [len(opt) for opt in self.options]
return (max(lengths) - min(lengths)) / max(lengths)

2. Coding Challenges

class CodingQuestion:
"""Best for: Application, analysis (Bloom's 3-4)"""

def __init__(self, problem, test_cases, starter_code, hints):
self.problem = problem
self.test_cases = test_cases # [{input, expected_output, points}]
self.starter_code = starter_code
self.hints = hints # Progressive hints

def auto_grade(self, submission):
"""Run test cases and calculate score"""
results = []
for i, test in enumerate(self.test_cases):
try:
output = self._execute_code(submission, test["input"])
passed = output == test["expected_output"]
results.append({
"test_id": i,
"passed": passed,
"points": test["points"] if passed else 0,
"feedback": self._generate_feedback(output, test["expected_output"])
})
except Exception as e:
results.append({
"test_id": i,
"passed": False,
"points": 0,
"error": str(e)
})

return {
"total_score": sum(r["points"] for r in results),
"max_score": sum(t["points"] for t in self.test_cases),
"results": results
}

3. Essay/Short Answer

class EssayQuestion:
"""Best for: Evaluation, creation (Bloom's 5-6)"""

def __init__(self, prompt, rubric, word_limit):
self.prompt = prompt
self.rubric = rubric # Scoring criteria
self.word_limit = word_limit

def create_rubric(self):
"""
Example rubric structure:
{
"criteria": [
{
"name": "Clarity",
"weight": 0.3,
"levels": {
"exemplary": {"points": 4, "description": "Crystal clear"},
"proficient": {"points": 3, "description": "Mostly clear"},
"developing": {"points": 2, "description": "Somewhat unclear"},
"beginning": {"points": 1, "description": "Very unclear"}
}
},
{
"name": "Evidence",
"weight": 0.4,
"levels": {...}
},
{
"name": "Organization",
"weight": 0.3,
"levels": {...}
}
]
}
"""
return self.rubric

Bloom's Taxonomy Alignment

# Question generation by Bloom's level
def generate_question_by_bloom_level(topic, level, content_context):
"""
Generate questions aligned to Bloom's taxonomy.
"""

bloom_templates = {
"remember": {
"verbs": ["define", "list", "recall", "identify", "name"],
"template": "What is the definition of {concept}?",
"example": "What is the definition of gradient descent in machine learning?"
},
"understand": {
"verbs": ["explain", "describe", "summarize", "interpret", "compare"],
"template": "Explain how {concept} works in the context of {context}.",
"example": "Explain how backpropagation works in neural network training."
},
"apply": {
"verbs": ["apply", "demonstrate", "use", "implement", "solve"],
"template": "Given {scenario}, how would you apply {concept} to solve {problem}?",
"example": "Given a dataset with missing values, how would you apply imputation techniques?"
},
"analyze": {
"verbs": ["analyze", "compare", "contrast", "differentiate", "examine"],
"template": "Compare {concept_a} and {concept_b}. What are the trade-offs?",
"example": "Compare L1 and L2 regularization. What are the trade-offs in model performance?"
},
"evaluate": {
"verbs": ["evaluate", "critique", "judge", "justify", "assess"],
"template": "Evaluate the effectiveness of {approach} for {use_case}. Justify your answer.",
"example": "Evaluate the effectiveness of CNNs vs Transformers for image classification."
},
"create": {
"verbs": ["design", "create", "develop", "construct", "formulate"],
"template": "Design a {solution} that {objective} while considering {constraints}.",
"example": "Design a recommendation system that balances accuracy and diversity."
}
}

template_info = bloom_templates[level]
return {
"level": level,
"verbs": template_info["verbs"],
"template": template_info["template"],
"example": template_info["example"],
"topic": topic
}

Adaptive Assessment Engine

class AdaptiveAssessment:
"""
Dynamically adjust question difficulty based on user performance.
Implements Item Response Theory (IRT) principles.
"""

def __init__(self, question_bank, starting_difficulty="medium"):
self.question_bank = question_bank
self.current_difficulty = starting_difficulty
self.user_ability = 0.5 # Scale 0-1
self.history = []

def select_next_question(self):
"""
Select question matching user's current ability level.
Uses IRT to maximize information gain.
"""
# Filter questions near user ability
candidates = [
q for q in self.question_bank
if abs(q.difficulty_score - self.user_ability) < 0.2
]

# Prioritize untested topics
untested_topics = self._get_untested_topics()
preferred = [q for q in candidates if q.topic in untested_topics]

if preferred:
return max(preferred, key=lambda q: q.information_value(self.user_ability))
else:
return max(candidates, key=lambda q: q.information_value(self.user_ability))

def update_ability_estimate(self, question, correct):
"""
Update user ability estimate using Bayesian updating.
"""
# Simple ELO-like update
expected_prob = self._expected_probability(question.difficulty_score)
actual = 1 if correct else 0

K = 0.1 # Learning rate
self.user_ability += K * (actual - expected_prob)
self.user_ability = max(0, min(1, self.user_ability)) # Clamp [0,1]

self.history.append({
"question_id": question.id,
"difficulty": question.difficulty_score,
"correct": correct,
"ability_after": self.user_ability
})

def _expected_probability(self, question_difficulty):
"""Probability user answers correctly (logistic function)"""
import math
return 1 / (1 + math.exp(-3 * (self.user_ability - question_difficulty)))

Level 3: Complete Reference (Full tokens)

Bias Detection and Mitigation

Automated Bias Checks

class BiasDetector:
"""
Comprehensive bias detection for assessment questions.
"""

def __init__(self):
self.bias_patterns = self._load_bias_patterns()
self.inclusive_language_guide = self._load_inclusive_language()

def analyze_question(self, question_text):
"""Run all bias checks"""
results = {
"language_bias": self._check_language_bias(question_text),
"cultural_bias": self._check_cultural_bias(question_text),
"accessibility": self._check_accessibility(question_text),
"cognitive_load": self._check_cognitive_load(question_text),
"fairness": self._check_fairness(question_text)
}

results["overall_score"] = self._calculate_bias_score(results)
results["recommendations"] = self._generate_recommendations(results)

return results

def _check_language_bias(self, text):
"""Check for gendered language, idioms, jargon"""
issues = []

# Gendered pronouns
gendered_words = ["he", "she", "his", "her", "him", "himself", "herself"]
for word in gendered_words:
if re.search(rf'\b{word}\b', text, re.IGNORECASE):
issues.append({
"type": "gendered_language",
"word": word,
"suggestion": "Use 'they/them' or rephrase"
})

# Idioms that may not translate
idioms = ["piece of cake", "hit the nail on the head", "beat around the bush"]
for idiom in idioms:
if idiom.lower() in text.lower():
issues.append({
"type": "idiom",
"phrase": idiom,
"suggestion": "Use literal language"
})

# Unnecessary jargon
jargon_terms = self._detect_jargon(text)
for term in jargon_terms:
issues.append({
"type": "jargon",
"term": term,
"suggestion": f"Define '{term}' or use simpler language"
})

return {
"passed": len(issues) == 0,
"issues": issues,
"score": 1 - (len(issues) * 0.1) # -10% per issue
}

def _check_cultural_bias(self, text):
"""Check for cultural assumptions"""
issues = []

# Western-centric references
western_holidays = ["Christmas", "Thanksgiving", "Easter"]
for holiday in western_holidays:
if holiday in text:
issues.append({
"type": "cultural_reference",
"reference": holiday,
"suggestion": "Use culturally neutral examples"
})

# Currency assumptions (USD-centric)
if re.search(r'\$\d+', text):
issues.append({
"type": "currency_assumption",
"suggestion": "Specify currency or use generic units"
})

# Date format assumptions (MM/DD vs DD/MM)
if re.search(r'\d{1,2}/\d{1,2}/\d{2,4}', text):
issues.append({
"type": "date_format",
"suggestion": "Use ISO 8601 format (YYYY-MM-DD) or write out month"
})

return {
"passed": len(issues) == 0,
"issues": issues,
"score": 1 - (len(issues) * 0.15)
}

def _check_accessibility(self, text):
"""Check for accessibility issues"""
issues = []

# Images without alt text descriptions
if "<img" in text and "alt=" not in text:
issues.append({
"type": "missing_alt_text",
"suggestion": "Add descriptive alt text for all images"
})

# Color-only cues
color_cues = ["red circle", "green checkmark", "blue line"]
for cue in color_cues:
if cue in text.lower():
issues.append({
"type": "color_dependency",
"cue": cue,
"suggestion": "Add non-color identifiers (shape, label)"
})

# Reading level too high
readability_score = self._calculate_readability(text)
if readability_score > 12: # Above 12th grade level
issues.append({
"type": "high_reading_level",
"score": readability_score,
"suggestion": "Simplify language to 10th grade level or below"
})

return {
"passed": len(issues) == 0,
"issues": issues,
"readability_grade": readability_score,
"score": 1 - (len(issues) * 0.2)
}

Assessment Analytics

class AssessmentAnalytics:
"""
Post-assessment analysis for continuous improvement.
"""

def analyze_question_performance(self, question_id, responses):
"""
Calculate question statistics:
- Difficulty index (P-value)
- Discrimination index (point-biserial correlation)
- Distractor analysis
"""
total = len(responses)
correct_count = sum(1 for r in responses if r["correct"])

# Difficulty Index (P-value)
p_value = correct_count / total
difficulty = self._classify_difficulty(p_value)

# Discrimination Index
# Compare top 27% vs bottom 27% of overall performers
sorted_responses = sorted(responses, key=lambda r: r["user_total_score"], reverse=True)
top_27 = sorted_responses[:int(total * 0.27)]
bottom_27 = sorted_responses[-int(total * 0.27):]

top_correct = sum(1 for r in top_27 if r["correct"])
bottom_correct = sum(1 for r in bottom_27 if r["correct"])

discrimination_index = (top_correct - bottom_correct) / len(top_27)

# Distractor analysis (for MCQ)
distractor_stats = self._analyze_distractors(responses)

return {
"question_id": question_id,
"total_responses": total,
"p_value": p_value,
"difficulty": difficulty,
"discrimination_index": discrimination_index,
"quality": self._assess_question_quality(p_value, discrimination_index),
"distractor_stats": distractor_stats,
"recommendations": self._generate_item_recommendations(p_value, discrimination_index)
}

def _classify_difficulty(self, p_value):
"""
P-value interpretation:
- 0.90-1.00: Very easy
- 0.70-0.89: Easy
- 0.30-0.69: Medium
- 0.10-0.29: Hard
- 0.00-0.09: Very hard
"""
if p_value >= 0.90:
return "very_easy"
elif p_value >= 0.70:
return "easy"
elif p_value >= 0.30:
return "medium"
elif p_value >= 0.10:
return "hard"
else:
return "very_hard"

def _assess_question_quality(self, p_value, discrimination):
"""
Quality criteria:
- Good: 0.30 < P < 0.70 and D > 0.30
- Acceptable: 0.20 < P < 0.80 and D > 0.20
- Poor: Otherwise
"""
if 0.30 < p_value < 0.70 and discrimination > 0.30:
return "good"
elif 0.20 < p_value < 0.80 and discrimination > 0.20:
return "acceptable"
else:
return "poor"

def _generate_item_recommendations(self, p_value, discrimination):
"""Actionable recommendations for question improvement"""
recs = []

if p_value > 0.90:
recs.append("Question too easy - increase difficulty or remove")
elif p_value < 0.10:
recs.append("Question too hard - verify answer key or simplify")

if discrimination < 0.10:
recs.append("Poor discrimination - question not distinguishing high/low performers")
elif discrimination < 0:
recs.append("CRITICAL: Negative discrimination - low performers doing better than high performers. Check answer key!")

if 0.30 < p_value < 0.70 and discrimination > 0.30:
recs.append("Excellent question - retain in question bank")

return recs

Complete Assessment Workflow

# End-to-end assessment creation workflow

# 1. Define learning objectives
learning_objectives = [
{
"id": "LO1",
"description": "Understand gradient descent optimization",
"bloom_level": "understand",
"topic": "optimization"
},
{
"id": "LO2",
"description": "Apply backpropagation to train neural networks",
"bloom_level": "apply",
"topic": "neural_networks"
}
]

# 2. Generate questions aligned to objectives
question_generator = AssessmentGenerator(learning_objectives)
questions = question_generator.generate_balanced_assessment(
total_questions=20,
bloom_distribution={
"remember": 0.15,
"understand": 0.25,
"apply": 0.40,
"analyze": 0.15,
"evaluate": 0.05
},
difficulty_distribution={
"beginner": 0.20,
"medium": 0.50,
"advanced": 0.30
}
)

# 3. Run bias detection
bias_detector = BiasDetector()
for question in questions:
bias_report = bias_detector.analyze_question(question.text)
if bias_report["overall_score"] < 0.7:
question.flag_for_review(bias_report)

# 4. Create adaptive assessment
assessment = AdaptiveAssessment(questions, starting_difficulty="medium")

# 5. Administer and collect responses
user_session = assessment.start_session(user_id="user123")
while not assessment.is_complete():
next_q = assessment.select_next_question()
user_response = user_session.present_question(next_q)
assessment.update_ability_estimate(next_q, user_response["correct"])

# 6. Generate results and analytics
results = assessment.get_results()
analytics = AssessmentAnalytics()
item_analysis = analytics.analyze_all_questions(assessment.history)

# 7. Export report
report = {
"user_id": "user123",
"final_ability": assessment.user_ability,
"questions_completed": len(assessment.history),
"score": results["score"],
"strengths": results["strengths"],
"weaknesses": results["weaknesses"],
"recommended_next_steps": results["recommendations"],
"item_analysis": item_analysis
}

Integration with Learning Management Systems

# LTI (Learning Tools Interoperability) integration example
class LTIAssessmentProvider:
"""
Integrate adaptive assessments with Canvas, Moodle, Blackboard via LTI 1.3.
"""

def launch_assessment(self, lti_launch_data):
"""Handle LTI launch request from LMS"""
user_id = lti_launch_data["user_id"]
course_id = lti_launch_data["context_id"]

# Initialize adaptive assessment for user
assessment = self._get_or_create_assessment(user_id, course_id)

return {
"launch_url": f"/assessment/{assessment.id}",
"user": user_id,
"course": course_id
}

def submit_grade(self, assessment_id, score):
"""Send grade back to LMS via LTI Outcomes service"""
assessment = self._load_assessment(assessment_id)

lti_outcome = {
"lis_result_sourcedid": assessment.sourcedid,
"score": score, # Normalized 0-1
"timestamp": datetime.utcnow().isoformat()
}

return self._post_grade_to_lms(lti_outcome)

This skill provides comprehensive assessment creation patterns covering adaptive testing, Bloom's taxonomy alignment, bias detection, and LMS integration.

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: assessment-creation-patterns

Completed:
- [x] Assessment questions generated across all Bloom's levels
- [x] Bias detection passed with >70% score
- [x] Adaptive difficulty algorithm implemented
- [x] Test suite validates question quality
- [x] LMS integration configured

Outputs:
- questions/module_1_assessment.json (Question bank with metadata)
- src/adaptive_assessment.py (Adaptive engine implementation)
- src/bias_detector.py (Bias analysis tooling)
- reports/item_analysis.csv (Question performance metrics)

Completion Checklist

Before marking this skill as complete, verify:

  • Questions span all 6 Bloom's taxonomy levels
  • Each question has difficulty score (0-1)
  • Bias detector ran with no BLOCKING issues
  • Adaptive algorithm adjusts difficulty based on performance
  • MCQ distractors are plausible and tested
  • Coding challenges have automated grading
  • Essay rubrics have clear scoring criteria
  • Question bank includes provider state handlers
  • Item analysis shows discrimination index >0.20
  • All questions validated against JSON schema

Failure Indicators

This skill has FAILED if:

  • ❌ Questions all at one Bloom's level (no progression)
  • ❌ Bias score below 70% with unresolved issues
  • ❌ Adaptive algorithm stuck at same difficulty
  • ❌ Discrimination index negative (low performers do better)
  • ❌ MCQ options have "all of the above" or "none of the above"
  • ❌ Coding tests lack test cases or auto-grading
  • ❌ Essay rubrics missing or have vague criteria
  • ❌ Cultural bias detected (holidays, currency, idioms)
  • ❌ Accessibility issues (missing alt text, color-only cues)

When NOT to Use

Do NOT use this skill when:

  • Creating simple quizzes with <5 questions (use basic MCQ patterns instead)
  • No need for adaptive difficulty (use static-assessment-patterns instead)
  • Building surveys or opinion polls (use survey-design-patterns instead)
  • Purely subjective assessments with no right answers
  • Target audience too narrow for bias detection value
  • No LMS integration needed and simple grading suffices
  • Assessment must be paper-based without digital tools

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
All questions at "remember" levelNo higher-order thinking testedDistribute across Bloom's levels 1-6
Using "all of the above"Reduces question qualityWrite specific distractors
Gendered languageBias and exclusionUse "they/them" or rephrase
Color-only cuesAccessibility failureAdd shape/label identifiers
Vague rubricsSubjective gradingDefine clear criteria per level
No distractor analysisPoor MCQ qualityTrack which options chosen, refine
Fixed difficultyBoredom or frustrationImplement adaptive selection
Cultural assumptionsBias against global audienceUse neutral examples
Skipping item analysisCan't improve questionsRun P-value and discrimination checks

Principles

This skill embodies:

  • #5 Eliminate Ambiguity - Clear success criteria in rubrics, unambiguous question stems
  • #6 Clear, Understandable, Explainable - Questions readable at 10th grade level or below
  • #7 Fairness and Bias Mitigation - Bias detection, inclusive language, accessibility checks
  • #8 No Assumptions - Cultural neutrality, explicit definitions for jargon
  • #10 Test Everything - Item analysis validates question quality with data

Full Standard: CODITECT-STANDARD-AUTOMATION.md