Skip to main content

ADR-LMS-008: Learning Analytics & Reporting

Status: Proposed Date: 2025-12-11 Phase: Phase 2 - Core LMS Infrastructure Deciders: Hal Casteel (Founder/CEO/CTO), CODITECT Core Team Technical Story: Enable comprehensive learning analytics, progress dashboards, and compliance reporting for individuals, teams, and organizations


Context and Problem Statement

The current CODITECT training system lacks analytics capabilities:

  1. No Progress Metrics - No aggregate view of learning progress
  2. No Completion Tracking - No visibility into certification rates
  3. No Time Analytics - No understanding of learning time investment
  4. No Content Effectiveness - No insight into which content works
  5. No Team Dashboards - No organizational learning visibility
  6. No Compliance Reports - No audit-ready training documentation

The Problem: How do we provide actionable learning analytics for learners, instructors, and organizations while maintaining privacy and enabling compliance reporting?


Decision Drivers

Technical Requirements

  • R1: Real-time progress dashboards
  • R2: Aggregated analytics (daily/weekly/monthly)
  • R3: Funnel analysis for learning paths
  • R4: Content effectiveness metrics
  • R5: Skill gap analysis
  • R6: Time-on-task tracking
  • R7: Exportable compliance reports

User Experience Goals

  • UX1: Personal learning dashboard
  • UX2: Instructor class overview
  • UX3: Organization admin dashboard
  • UX4: Progress trend visualizations
  • UX5: Actionable recommendations

Business Requirements

  • B1: ROI metrics for training investment
  • B2: Compliance audit reports (SOC2, HIPAA training)
  • B3: Team performance benchmarking
  • B4: Content utilization reports

Decision Outcome

Chosen Solution: Implement a multi-tier analytics system with real-time metrics, daily aggregations, funnel analysis, and exportable compliance reports.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│ Learning Analytics Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Event Collection │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Module │ │ Quiz │ │ Session │ │ │
│ │ │ Progress │ │ Attempts │ │ Activity │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │ │ │ │
│ │ └───────────────┼───────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Event Stream │ │ │
│ │ │ (SQLite) │ │ │
│ │ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Aggregation Layer │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Daily │ │ Weekly │ │ Monthly │ │ │
│ │ │ Rollups │ │ Rollups │ │ Rollups │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Reporting Layer │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌─────────┐ │ │
│ │ │ Personal │ │ Instructor│ │ Org │ │Compliance│ │ │
│ │ │ Dashboard │ │ Dashboard │ │ Dashboard │ │ Reports │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Implementation Details

1. Database Schema

-- Analytics event stream (raw events)
CREATE TABLE analytics_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id TEXT UNIQUE NOT NULL,
event_type TEXT NOT NULL, -- module_start, module_complete, quiz_start, quiz_complete, etc.

-- Actor
user_id TEXT NOT NULL,
org_id INTEGER,

-- Object
object_type TEXT, -- module, quiz, path, skill, etc.
object_id TEXT,
object_name TEXT,

-- Event data
event_data TEXT, -- JSON additional data

-- Context
session_id TEXT,
ip_address TEXT,
user_agent TEXT,

-- Timing
event_timestamp TEXT DEFAULT CURRENT_TIMESTAMP,
duration_seconds INTEGER,

created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_analytics_user ON analytics_events(user_id, event_timestamp);
CREATE INDEX idx_analytics_type ON analytics_events(event_type, event_timestamp);
CREATE INDEX idx_analytics_org ON analytics_events(org_id, event_timestamp);
CREATE INDEX idx_analytics_object ON analytics_events(object_type, object_id);

-- Daily aggregations
CREATE TABLE analytics_daily (
id INTEGER PRIMARY KEY AUTOINCREMENT,
date TEXT NOT NULL, -- YYYY-MM-DD

-- Scope
scope_type TEXT NOT NULL, -- platform, org, user
scope_id TEXT, -- org_id or user_id (NULL for platform)

-- Metric
metric_name TEXT NOT NULL,
metric_value REAL NOT NULL,

-- Dimensions (optional)
dimension1_name TEXT,
dimension1_value TEXT,
dimension2_name TEXT,
dimension2_value TEXT,

created_at TEXT DEFAULT CURRENT_TIMESTAMP,

UNIQUE(date, scope_type, scope_id, metric_name, dimension1_name, dimension1_value, dimension2_name, dimension2_value)
);

CREATE INDEX idx_daily_date ON analytics_daily(date);
CREATE INDEX idx_daily_scope ON analytics_daily(scope_type, scope_id);
CREATE INDEX idx_daily_metric ON analytics_daily(metric_name);

-- Weekly/Monthly aggregations (same schema)
CREATE TABLE analytics_weekly AS SELECT * FROM analytics_daily WHERE 1=0;
CREATE TABLE analytics_monthly AS SELECT * FROM analytics_daily WHERE 1=0;

-- Funnel definitions
CREATE TABLE analytics_funnels (
id INTEGER PRIMARY KEY AUTOINCREMENT,
funnel_key TEXT UNIQUE NOT NULL,
funnel_name TEXT NOT NULL,
description TEXT,

-- Steps (ordered)
steps TEXT NOT NULL, -- JSON: [{event_type, filter, name}, ...]

-- Configuration
conversion_window_hours INTEGER DEFAULT 168, -- 7 days default

is_active BOOLEAN DEFAULT 1,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Funnel step completions
CREATE TABLE analytics_funnel_progress (
id INTEGER PRIMARY KEY AUTOINCREMENT,
funnel_id INTEGER NOT NULL,
user_id TEXT NOT NULL,

step_index INTEGER NOT NULL,
step_name TEXT NOT NULL,
completed_at TEXT DEFAULT CURRENT_TIMESTAMP,

-- Time from previous step
time_from_previous_seconds INTEGER,

-- Session context
session_id TEXT,

FOREIGN KEY (funnel_id) REFERENCES analytics_funnels(id) ON DELETE CASCADE,
FOREIGN KEY (user_id) REFERENCES auth_users(user_id) ON DELETE CASCADE
);

CREATE INDEX idx_funnel_progress ON analytics_funnel_progress(funnel_id, user_id, step_index);

-- Cohort definitions
CREATE TABLE analytics_cohorts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
cohort_key TEXT UNIQUE NOT NULL,
cohort_name TEXT NOT NULL,
description TEXT,

-- Definition
cohort_type TEXT NOT NULL, -- signup_date, first_module, certification, custom
cohort_filter TEXT, -- JSON filter conditions

is_active BOOLEAN DEFAULT 1,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Cohort membership (pre-computed)
CREATE TABLE analytics_cohort_members (
id INTEGER PRIMARY KEY AUTOINCREMENT,
cohort_id INTEGER NOT NULL,
user_id TEXT NOT NULL,
cohort_date TEXT NOT NULL, -- e.g., signup week

FOREIGN KEY (cohort_id) REFERENCES analytics_cohorts(id) ON DELETE CASCADE,
FOREIGN KEY (user_id) REFERENCES auth_users(user_id) ON DELETE CASCADE,
UNIQUE(cohort_id, user_id)
);

-- Compliance reports
CREATE TABLE compliance_reports (
id INTEGER PRIMARY KEY AUTOINCREMENT,
report_id TEXT UNIQUE NOT NULL,
report_type TEXT NOT NULL, -- training_completion, certification, audit_trail

-- Scope
org_id INTEGER,
user_ids TEXT, -- JSON array (NULL = all org users)

-- Time range
start_date TEXT NOT NULL,
end_date TEXT NOT NULL,

-- Report data
report_data TEXT, -- JSON report content
summary TEXT,

-- Export
export_format TEXT, -- json, csv, pdf
export_url TEXT,

-- Metadata
generated_by TEXT,
generated_at TEXT DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (org_id) REFERENCES learning_organizations(id) ON DELETE CASCADE
);

2. Event Tracking

from datetime import datetime, timedelta
from typing import Optional, Dict, Any

def track_event(
event_type: str,
user_id: str,
object_type: str = None,
object_id: str = None,
object_name: str = None,
event_data: Dict[str, Any] = None,
duration_seconds: int = None,
session_id: str = None
):
"""
Track a learning analytics event.
"""
event_id = str(uuid.uuid4())
user = get_user(user_id)
org_id = get_user_org(user_id)

db.execute("""
INSERT INTO analytics_events (
event_id, event_type, user_id, org_id,
object_type, object_id, object_name,
event_data, session_id, duration_seconds
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
event_id, event_type, user_id, org_id,
object_type, object_id, object_name,
json.dumps(event_data) if event_data else None,
session_id, duration_seconds
))

# Update real-time counters
update_realtime_counters(event_type, user_id, org_id, object_type, object_id)

# Check funnel progress
check_funnel_progress(event_type, user_id, event_data)

return event_id


# Event types
EVENT_TYPES = {
# Module events
'module_started': 'User started a learning module',
'module_completed': 'User completed a learning module',
'module_progress': 'User made progress in a module',

# Quiz events
'quiz_started': 'User started a quiz',
'quiz_completed': 'User completed a quiz',
'quiz_passed': 'User passed a quiz',
'quiz_failed': 'User failed a quiz',

# Path events
'path_enrolled': 'User enrolled in a learning path',
'path_completed': 'User completed a learning path',

# Certificate events
'cert_earned': 'User earned a certificate',
'cert_verified': 'Certificate was verified',

# Skill events
'skill_practiced': 'User practiced a skill',
'skill_mastered': 'User mastered a skill',

# Session events
'session_started': 'Learning session started',
'session_ended': 'Learning session ended',

# Badge events
'badge_earned': 'User earned a badge'
}


# Integration with existing learning functions
def on_module_complete(user_id: str, module_id: int, time_spent: int, score: int = None):
"""Called when a user completes a module."""
module = get_module(module_id)

track_event(
event_type='module_completed',
user_id=user_id,
object_type='module',
object_id=str(module_id),
object_name=module['module_name'],
event_data={
'score': score,
'time_spent_seconds': time_spent,
'phase': module['phase']
},
duration_seconds=time_spent
)


def on_quiz_complete(user_id: str, quiz_id: int, attempt: dict):
"""Called when a user completes a quiz."""
quiz = get_quiz(quiz_id)

event_type = 'quiz_passed' if attempt['passed'] else 'quiz_failed'

track_event(
event_type=event_type,
user_id=user_id,
object_type='quiz',
object_id=str(quiz_id),
object_name=quiz['title'],
event_data={
'score': attempt['score_percentage'],
'passing_score': quiz['passing_score'],
'attempt_number': get_attempt_number(user_id, quiz_id),
'time_spent_seconds': attempt['time_spent_seconds']
},
duration_seconds=attempt['time_spent_seconds']
)

3. Aggregation Jobs

from datetime import datetime, timedelta

def run_daily_aggregation(date: str = None):
"""
Run daily aggregation job.
Typically scheduled to run at midnight UTC.
"""
if date is None:
date = (datetime.utcnow() - timedelta(days=1)).strftime('%Y-%m-%d')

start_time = f"{date}T00:00:00Z"
end_time = f"{date}T23:59:59Z"

# Platform-wide metrics
aggregate_platform_metrics(date, start_time, end_time)

# Per-organization metrics
for org in get_active_organizations():
aggregate_org_metrics(date, org['id'], start_time, end_time)

# Per-user metrics
for user in get_active_users(date):
aggregate_user_metrics(date, user['user_id'], start_time, end_time)


def aggregate_platform_metrics(date: str, start_time: str, end_time: str):
"""Aggregate platform-wide metrics for a day."""

metrics = [
# Active users
('active_users', count_active_users(start_time, end_time)),

# New registrations
('new_registrations', count_new_registrations(start_time, end_time)),

# Module completions
('module_completions', count_events('module_completed', start_time, end_time)),

# Quiz completions
('quiz_completions', count_events('quiz_completed', start_time, end_time)),
('quiz_pass_rate', calculate_quiz_pass_rate(start_time, end_time)),

# Certificates issued
('certificates_issued', count_events('cert_earned', start_time, end_time)),

# Total learning time (hours)
('total_learning_hours', sum_learning_time(start_time, end_time) / 3600),

# Average session duration (minutes)
('avg_session_duration_min', calculate_avg_session_duration(start_time, end_time) / 60),
]

for metric_name, metric_value in metrics:
upsert_daily_metric(date, 'platform', None, metric_name, metric_value)

# Per-module metrics
for module in get_all_modules():
completions = count_module_completions(module['id'], start_time, end_time)
avg_score = calculate_module_avg_score(module['id'], start_time, end_time)
avg_time = calculate_module_avg_time(module['id'], start_time, end_time)

upsert_daily_metric(
date, 'platform', None, 'module_completions',
completions, 'module', str(module['id'])
)
if avg_score:
upsert_daily_metric(
date, 'platform', None, 'module_avg_score',
avg_score, 'module', str(module['id'])
)


def upsert_daily_metric(
date: str,
scope_type: str,
scope_id: str,
metric_name: str,
metric_value: float,
dimension1_name: str = None,
dimension1_value: str = None
):
"""Insert or update a daily metric."""
db.execute("""
INSERT INTO analytics_daily (
date, scope_type, scope_id, metric_name, metric_value,
dimension1_name, dimension1_value
) VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(date, scope_type, scope_id, metric_name, dimension1_name, dimension1_value, dimension2_name, dimension2_value)
DO UPDATE SET metric_value = excluded.metric_value
""", (date, scope_type, scope_id, metric_name, metric_value,
dimension1_name, dimension1_value))

4. Dashboard Queries

def get_user_dashboard(user_id: str) -> dict:
"""
Get personal learning dashboard data.
"""
return {
'summary': {
'modules_completed': count_user_module_completions(user_id),
'total_modules': count_total_modules(),
'completion_percentage': calculate_user_completion_percentage(user_id),
'certificates_earned': count_user_certificates(user_id),
'total_learning_hours': sum_user_learning_time(user_id) / 3600,
'current_streak_days': calculate_learning_streak(user_id),
},
'progress': {
'by_path': get_user_path_progress(user_id),
'by_skill': get_user_skill_mastery(user_id),
'recent_activity': get_user_recent_activity(user_id, limit=10),
},
'achievements': {
'badges_earned': get_user_badges(user_id),
'certificates': get_user_certificates(user_id),
'next_milestone': get_next_milestone(user_id),
},
'trends': {
'daily_activity': get_user_daily_activity(user_id, days=30),
'weekly_hours': get_user_weekly_hours(user_id, weeks=12),
'skill_progress': get_user_skill_progress_over_time(user_id, weeks=8),
},
'recommendations': {
'next_modules': recommend_next_modules(user_id, limit=3),
'skills_to_practice': recommend_skills_to_practice(user_id, limit=3),
'review_due': get_review_due_count(user_id),
}
}


def get_org_dashboard(org_id: int) -> dict:
"""
Get organization learning dashboard.
"""
return {
'summary': {
'total_learners': count_org_learners(org_id),
'active_learners_30d': count_org_active_learners(org_id, days=30),
'avg_completion_rate': calculate_org_avg_completion(org_id),
'total_certificates': count_org_certificates(org_id),
'total_learning_hours': sum_org_learning_time(org_id) / 3600,
},
'completion_rates': {
'by_path': get_org_path_completion_rates(org_id),
'by_module': get_org_module_completion_rates(org_id),
'by_team': get_org_team_completion_rates(org_id),
},
'leaderboard': {
'top_learners': get_org_top_learners(org_id, limit=10),
'most_improved': get_org_most_improved(org_id, limit=5),
'streak_leaders': get_org_streak_leaders(org_id, limit=5),
},
'trends': {
'daily_active': get_org_daily_active(org_id, days=30),
'weekly_completions': get_org_weekly_completions(org_id, weeks=12),
'monthly_certificates': get_org_monthly_certificates(org_id, months=6),
},
'compliance': {
'required_training_status': get_required_training_status(org_id),
'expiring_certifications': get_expiring_certifications(org_id, days=30),
'overdue_learners': get_overdue_learners(org_id),
}
}

5. Compliance Reports

def generate_compliance_report(
org_id: int,
report_type: str,
start_date: str,
end_date: str,
user_ids: List[str] = None,
export_format: str = 'json'
) -> dict:
"""
Generate a compliance report for audit purposes.
"""
report_id = str(uuid.uuid4())

if report_type == 'training_completion':
report_data = generate_training_completion_report(org_id, start_date, end_date, user_ids)
elif report_type == 'certification':
report_data = generate_certification_report(org_id, start_date, end_date, user_ids)
elif report_type == 'audit_trail':
report_data = generate_audit_trail_report(org_id, start_date, end_date, user_ids)
else:
raise ValueError(f"Unknown report type: {report_type}")

# Generate summary
summary = generate_report_summary(report_type, report_data)

# Export
if export_format == 'csv':
export_url = export_report_csv(report_id, report_data)
elif export_format == 'pdf':
export_url = export_report_pdf(report_id, report_data, summary)
else:
export_url = None

# Save report
db.execute("""
INSERT INTO compliance_reports (
report_id, report_type, org_id, user_ids,
start_date, end_date, report_data, summary,
export_format, export_url, generated_by
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
report_id, report_type, org_id, json.dumps(user_ids),
start_date, end_date, json.dumps(report_data), summary,
export_format, export_url, get_current_user_id()
))

return {
'report_id': report_id,
'report_type': report_type,
'summary': summary,
'export_url': export_url,
'data': report_data
}


def generate_training_completion_report(org_id: int, start_date: str, end_date: str, user_ids: List[str] = None) -> dict:
"""
Generate detailed training completion report for compliance.
"""
users = get_org_users(org_id, user_ids)
required_training = get_required_training(org_id)

report = {
'organization': get_org_info(org_id),
'period': {'start': start_date, 'end': end_date},
'generated_at': datetime.utcnow().isoformat(),
'summary': {
'total_users': len(users),
'fully_compliant': 0,
'partially_compliant': 0,
'non_compliant': 0,
},
'required_training': [
{
'id': t['id'],
'name': t['title'],
'due_date': t.get('due_date'),
'completion_rate': 0,
}
for t in required_training
],
'user_details': []
}

for user in users:
user_status = {
'user_id': user['user_id'],
'name': user['display_name'],
'email': user['email'],
'status': 'compliant',
'training_records': []
}

completed_all = True
for training in required_training:
completion = get_training_completion(user['user_id'], training['id'], end_date)

record = {
'training_id': training['id'],
'training_name': training['title'],
'required': True,
'completed': completion is not None,
'completed_at': completion['completed_at'] if completion else None,
'score': completion['score'] if completion else None,
'certificate_id': completion['certificate_id'] if completion else None,
}

if not record['completed']:
completed_all = False
user_status['status'] = 'non_compliant'

user_status['training_records'].append(record)

if completed_all:
report['summary']['fully_compliant'] += 1
elif user_status['status'] == 'non_compliant':
report['summary']['non_compliant'] += 1
else:
report['summary']['partially_compliant'] += 1

report['user_details'].append(user_status)

# Calculate completion rates
for i, training in enumerate(required_training):
completed = sum(
1 for u in report['user_details']
if any(t['training_id'] == training['id'] and t['completed'] for t in u['training_records'])
)
report['required_training'][i]['completion_rate'] = (completed / len(users) * 100) if users else 0

return report

6. CLI Commands

# Personal dashboard
/analytics dashboard # View personal dashboard
/analytics progress # Detailed progress view
/analytics activity --days 30 # Activity history
/analytics trends # Progress trends

# Instructor view
/analytics class CLASS_ID # Class overview
/analytics student USER_ID # Individual student view
/analytics quiz-results QUIZ_ID # Quiz analytics

# Organization admin
/analytics org # Organization dashboard
/analytics team TEAM_ID # Team dashboard
/analytics leaderboard # Top performers
/analytics compliance # Compliance status

# Reports
/analytics report training-completion --format pdf
/analytics report certification --start 2025-01-01 --end 2025-12-31
/analytics report audit-trail --user USER_ID

# Export
/analytics export --format csv --start DATE --end DATE

Key Metrics Definitions

MetricDefinitionCalculation
Active Users (DAU/MAU)Users with ≥1 learning eventCount distinct user_id per day/month
Completion Rate% of enrolled learners completingcompleted / enrolled * 100
Pass Rate% of quiz attempts passingpassed_attempts / total_attempts * 100
Learning TimeTotal time spent learningSum of session durations
Skill MasteryAverage mastery across skillsMean of skill_mastery.mastery_level
StreakConsecutive days with activityCount of continuous active days
Time to CompletionDays from enrollment to completioncompleted_at - enrolled_at

Consequences

Positive

  • P1: Actionable learning insights
  • P2: Compliance audit capability
  • P3: Content effectiveness measurement
  • P4: Team performance visibility
  • P5: ROI justification for training

Negative

  • N1: Storage growth from event stream
  • N2: Aggregation job complexity
  • N3: Privacy considerations

Risks

  • Risk 1: Event volume scaling
    • Mitigation: Event sampling, retention policies
  • Risk 2: PII in analytics
    • Mitigation: Anonymization, access controls


Status: Proposed - Phase 2 Core Infrastructure Last Updated: 2025-12-11 Version: 1.0.0