J.1-J.2: Product Telemetry & Feature Analytics

Evidence Documentation BIO-QMS Platform Track J: Memory & Analytics

Executive Summary

This document provides comprehensive evidence for the implementation of Product Telemetry (J.1) and Feature Analytics (J.2) capabilities within the BIO-QMS platform. The analytics infrastructure enables data-driven product decisions through systematic event tracking, user behavior analysis, feature adoption measurement, and cohort analytics.

Key Capabilities:

Dual-pipeline event tracking (client-side + server-side)
Privacy-compliant user identification with opt-out controls
Real-time event monitoring and quality dashboards
Feature adoption funnels and cohort retention analysis
Automated product health insights and anomaly detection

Technology Stack:

Collection: JavaScript SDK + Django middleware
Transport: Google Cloud Pub/Sub
Storage: BigQuery (event warehouse)
Analytics: SQL + Python notebooks
Visualization: Looker Studio + custom React dashboards

J.1: Product Telemetry

J.1.1: Event Tracking Framework

Event Categories

The BIO-QMS platform tracks three primary categories of events:

Category	Scope	Examples	Volume
User Interaction	Client-side	Page views, button clicks, form submissions	85%
System Events	Server-side	API calls, background jobs, errors	12%
Business Events	Server-side	Work order state changes, agent invocations	3%

Event Schema v1.0

All events conform to a canonical JSON schema for consistency and queryability:

{
  "event_id": "evt_2Q3R4S5T6U7V8W9X",
  "event_name": "work_order.status_changed",
  "event_version": "1.0.0",
  "timestamp": "2026-02-16T14:32:15.234Z",
  "session_id": "ses_A1B2C3D4E5F6G7H8",
  "user_id": "usr_bio_12345",
  "anonymous_id": "anon_9876543210abcdef",
  "tenant_id": "tnt_acme_pharma",
  "context": {
    "page": {
      "url": "/work-orders/WO-2026-0042",
      "path": "/work-orders/:id",
      "title": "Work Order WO-2026-0042 - Buffer Preparation",
      "referrer": "/dashboard"
    },
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
    "ip": "10.0.1.45",
    "locale": "en-US",
    "timezone": "America/New_York"
  },
  "properties": {
    "work_order_id": "WO-2026-0042",
    "previous_status": "in_progress",
    "new_status": "pending_review",
    "duration_seconds": 1847,
    "assigned_to": "usr_bio_67890",
    "department": "manufacturing"
  },
  "traits": {
    "role": "operator",
    "subscription_tier": "enterprise",
    "account_age_days": 127,
    "feature_flags": ["ai_copilot", "batch_processing"]
  }
}

Core Event Fields

Field	Type	Required	Description
`event_id`	string	Yes	Unique event identifier (UUID v4)
`event_name`	string	Yes	Namespaced event name (category.action)
`event_version`	string	Yes	Schema version for event evolution
`timestamp`	ISO 8601	Yes	Event occurrence time (UTC)
`session_id`	string	Yes	30-minute session identifier
`user_id`	string	No	Identified user (post-login)
`anonymous_id`	string	Yes	Device fingerprint (pre-login)
`tenant_id`	string	Yes	Multi-tenant isolation
`context`	object	Yes	Contextual metadata
`properties`	object	No	Event-specific data
`traits`	object	No	User/account attributes

Client-Side Event Tracking

JavaScript SDK (@bioqms/analytics)

// Installation
npm install @bioqms/analytics

// Initialization
import { Analytics } from '@bioqms/analytics';

const analytics = new Analytics({
  writeKey: 'pk_live_bio_qms_analytics',
  apiHost: 'https://events.bioqms.com',
  flushInterval: 10000, // 10 seconds
  flushAt: 20, // batch size
  debug: process.env.NODE_ENV === 'development'
});

// Page view tracking
analytics.page('Work Orders', {
  category: 'Operations',
  path: window.location.pathname
});

// User interaction tracking
analytics.track('button.clicked', {
  button_id: 'create_work_order',
  button_text: 'New Work Order',
  location: 'dashboard_header'
});

// Form submission tracking
analytics.track('form.submitted', {
  form_id: 'work_order_creation',
  form_type: 'create',
  fields_completed: 12,
  validation_errors: 0,
  time_to_complete_seconds: 87
});

// Feature usage tracking
analytics.track('feature.used', {
  feature_name: 'ai_copilot',
  feature_action: 'generate_sop',
  success: true,
  response_time_ms: 2341
});

// Error tracking
analytics.track('error.occurred', {
  error_type: 'api_error',
  error_code: 'E502',
  error_message: 'Gateway timeout',
  endpoint: '/api/v1/work-orders',
  retry_count: 2
});

// User identification (post-login)
analytics.identify('usr_bio_12345', {
  email: 'operator@acmepharma.com',
  role: 'operator',
  department: 'manufacturing',
  subscription_tier: 'enterprise',
  signup_date: '2025-10-12'
});

Automatic Event Capture

The SDK automatically captures:

Page views (route changes in SPA)
Click events on interactive elements
Form submissions (success + errors)
JavaScript errors and unhandled rejections
Performance metrics (page load, API latency)
Session start/end

React Integration

// hooks/useAnalytics.ts
import { useContext } from 'react';
import { AnalyticsContext } from '../contexts/AnalyticsContext';

export const useAnalytics = () => {
  const analytics = useContext(AnalyticsContext);

  const trackEvent = (eventName: string, properties?: Record<string, any>) => {
    analytics.track(eventName, {
      ...properties,
      component: 'WorkOrderList',
      timestamp: new Date().toISOString()
    });
  };

  return { trackEvent };
};

// Usage in component
import { useAnalytics } from '../hooks/useAnalytics';

const WorkOrderList = () => {
  const { trackEvent } = useAnalytics();

  const handleCreateWorkOrder = () => {
    trackEvent('work_order.create_initiated', {
      source: 'list_view',
      template_used: false
    });
    // ... creation logic
  };

  return (
    <button onClick={handleCreateWorkOrder}>
      Create Work Order
    </button>
  );
};

Server-Side Event Tracking

Django Middleware (analytics.middleware.EventTrackingMiddleware)

# analytics/middleware.py
import uuid
import time
from django.utils.deprecation import MiddlewareMixin
from .client import analytics_client

class EventTrackingMiddleware(MiddlewareMixin):
    """
    Automatically tracks API requests and responses.
    """

    def process_request(self, request):
        request.analytics_start_time = time.time()
        request.analytics_event_id = str(uuid.uuid4())

        # Skip tracking for health checks and static files
        if request.path in ['/health', '/ready'] or request.path.startswith('/static/'):
            request.analytics_skip = True
            return

        request.analytics_skip = False

        # Extract user context
        user_id = None
        anonymous_id = request.COOKIES.get('bioqms_anon_id')

        if request.user.is_authenticated:
            user_id = f"usr_bio_{request.user.id}"

        request.analytics_context = {
            'user_id': user_id,
            'anonymous_id': anonymous_id,
            'tenant_id': getattr(request, 'tenant_id', None),
            'ip': self._get_client_ip(request),
            'user_agent': request.META.get('HTTP_USER_AGENT', ''),
            'path': request.path,
            'method': request.method
        }

    def process_response(self, request, response):
        if getattr(request, 'analytics_skip', True):
            return response

        duration_ms = (time.time() - request.analytics_start_time) * 1000

        # Track API request
        analytics_client.track({
            'event_id': request.analytics_event_id,
            'event_name': 'api.request_completed',
            'event_version': '1.0.0',
            'timestamp': time.time(),
            'user_id': request.analytics_context['user_id'],
            'anonymous_id': request.analytics_context['anonymous_id'],
            'tenant_id': request.analytics_context['tenant_id'],
            'context': request.analytics_context,
            'properties': {
                'endpoint': request.path,
                'method': request.method,
                'status_code': response.status_code,
                'duration_ms': duration_ms,
                'response_size_bytes': len(response.content)
            }
        })

        return response

    def _get_client_ip(self, request):
        x_forwarded_for = request.META.get('HTTP_X_FORWARDED_FOR')
        if x_forwarded_for:
            return x_forwarded_for.split(',')[0].strip()
        return request.META.get('REMOTE_ADDR')

Business Event Tracking

# work_orders/services.py
from analytics.client import analytics_client
from django.utils import timezone

class WorkOrderService:
    """
    Business logic with embedded analytics tracking.
    """

    def update_status(self, work_order, new_status, user):
        old_status = work_order.status
        duration = (timezone.now() - work_order.updated_at).total_seconds()

        work_order.status = new_status
        work_order.save()

        # Track business event
        analytics_client.track({
            'event_name': 'work_order.status_changed',
            'event_version': '1.0.0',
            'timestamp': timezone.now().isoformat(),
            'user_id': f"usr_bio_{user.id}",
            'tenant_id': work_order.tenant.external_id,
            'properties': {
                'work_order_id': work_order.external_id,
                'previous_status': old_status,
                'new_status': new_status,
                'duration_seconds': duration,
                'assigned_to': work_order.assigned_to_id,
                'department': work_order.department,
                'priority': work_order.priority
            }
        })

        return work_order

    def invoke_agent(self, work_order, agent_type, prompt):
        start_time = time.time()

        try:
            result = self._execute_agent(agent_type, prompt)
            success = True
            error = None
        except Exception as e:
            result = None
            success = False
            error = str(e)

        duration_ms = (time.time() - start_time) * 1000

        # Track agent invocation
        analytics_client.track({
            'event_name': 'agent.invoked',
            'event_version': '1.0.0',
            'timestamp': time.time(),
            'tenant_id': work_order.tenant.external_id,
            'properties': {
                'work_order_id': work_order.external_id,
                'agent_type': agent_type,
                'agent_name': result.get('agent_name') if result else None,
                'prompt_length': len(prompt),
                'response_length': len(result.get('response', '')) if result else 0,
                'success': success,
                'error': error,
                'duration_ms': duration_ms,
                'tokens_used': result.get('tokens_used') if result else 0,
                'model': result.get('model') if result else None
            }
        })

        return result

Event Naming Conventions

Format: category.action[.modifier]

Category	Actions	Examples
page	viewed	`page.viewed`
button	clicked	`button.clicked`
form	submitted, validated, errored	`form.submitted`
work_order	created, updated, deleted, status_changed	`work_order.status_changed`
api	request_completed, request_failed	`api.request_completed`
agent	invoked, completed, failed	`agent.invoked`
feature	used, enabled, disabled	`feature.used`
error	occurred, recovered	`error.occurred`
session	started, ended	`session.started`

Versioning: Events include event_version to support schema evolution without breaking existing analytics.

J.1.2: Event Pipeline Architecture

Pipeline Overview

Events API (`/api/v1/events`)

Django View (analytics/views.py)

from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import status
from google.cloud import pubsub_v1
import json
import uuid
from django.conf import settings

class EventsAPIView(APIView):
    """
    Accepts event payloads from client SDK and server middleware.
    """

    authentication_classes = []  # Public endpoint with write key validation

    def __init__(self):
        super().__init__()
        self.publisher = pubsub_v1.PublisherClient()
        self.topic_path = self.publisher.topic_path(
            settings.GCP_PROJECT_ID,
            'bioqms-events'
        )

    def post(self, request):
        # Validate write key
        write_key = request.headers.get('X-Write-Key')
        if write_key != settings.ANALYTICS_WRITE_KEY:
            return Response(
                {'error': 'Invalid write key'},
                status=status.HTTP_401_UNAUTHORIZED
            )

        # Support batch events
        events = request.data if isinstance(request.data, list) else [request.data]

        results = []
        for event in events:
            # Validate event schema
            validation_result = self._validate_event(event)
            if not validation_result['valid']:
                results.append({
                    'event_id': event.get('event_id'),
                    'status': 'rejected',
                    'reason': validation_result['error']
                })
                continue

            # Enrich event
            enriched_event = self._enrich_event(event, request)

            # Publish to Pub/Sub
            try:
                message_data = json.dumps(enriched_event).encode('utf-8')
                future = self.publisher.publish(
                    self.topic_path,
                    message_data,
                    event_name=enriched_event['event_name'],
                    tenant_id=enriched_event.get('tenant_id', 'unknown')
                )
                message_id = future.result(timeout=5.0)

                results.append({
                    'event_id': enriched_event['event_id'],
                    'status': 'accepted',
                    'message_id': message_id
                })
            except Exception as e:
                results.append({
                    'event_id': enriched_event['event_id'],
                    'status': 'failed',
                    'reason': str(e)
                })

        return Response({
            'processed': len(results),
            'results': results
        }, status=status.HTTP_200_OK)

    def _validate_event(self, event):
        required_fields = ['event_name', 'timestamp']
        missing = [f for f in required_fields if f not in event]

        if missing:
            return {
                'valid': False,
                'error': f"Missing required fields: {', '.join(missing)}"
            }

        # Validate timestamp format
        try:
            from dateutil.parser import parse
            parse(event['timestamp'])
        except Exception:
            return {
                'valid': False,
                'error': 'Invalid timestamp format (expected ISO 8601)'
            }

        return {'valid': True}

    def _enrich_event(self, event, request):
        # Add server-side enrichment
        if 'event_id' not in event:
            event['event_id'] = f"evt_{uuid.uuid4().hex}"

        if 'event_version' not in event:
            event['event_version'] = '1.0.0'

        # Add received timestamp
        event['received_at'] = timezone.now().isoformat()

        # Enrich IP if missing
        if 'context' not in event:
            event['context'] = {}

        if 'ip' not in event['context']:
            event['context']['ip'] = self._get_client_ip(request)

        return event

    def _get_client_ip(self, request):
        x_forwarded_for = request.META.get('HTTP_X_FORWARDED_FOR')
        if x_forwarded_for:
            return x_forwarded_for.split(',')[0].strip()
        return request.META.get('REMOTE_ADDR')

Cloud Pub/Sub Configuration

Topic: bioqms-events

# terraform/pubsub.tf
resource "google_pubsub_topic" "events" {
  name    = "bioqms-events"
  project = var.project_id

  message_retention_duration = "7d"

  schema_settings {
    schema   = google_pubsub_schema.event_schema.id
    encoding = "JSON"
  }
}

resource "google_pubsub_schema" "event_schema" {
  name       = "bioqms-event-schema-v1"
  type       = "AVRO"
  definition = file("${path.module}/schemas/event.avsc")
}

resource "google_pubsub_subscription" "bigquery_sink" {
  name  = "bioqms-events-to-bigquery"
  topic = google_pubsub_topic.events.name

  bigquery_config {
    table            = "${var.project_id}:analytics.events"
    use_topic_schema = true
    write_metadata   = true
  }

  ack_deadline_seconds       = 20
  message_retention_duration = "7d"

  retry_policy {
    minimum_backoff = "10s"
    maximum_backoff = "600s"
  }
}

resource "google_pubsub_subscription" "realtime_dashboard" {
  name  = "bioqms-events-to-dashboard"
  topic = google_pubsub_topic.events.name

  push_config {
    push_endpoint = "https://dashboard.bioqms.com/api/events/stream"

    oidc_token {
      service_account_email = google_service_account.dashboard.email
    }
  }

  ack_deadline_seconds = 10
}

BigQuery Streaming Insert

Table Schema (analytics.events)

CREATE TABLE `bioqms-prod.analytics.events` (
  -- Core identifiers
  event_id STRING NOT NULL,
  event_name STRING NOT NULL,
  event_version STRING NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  received_at TIMESTAMP NOT NULL,

  -- Session & user identifiers
  session_id STRING,
  user_id STRING,
  anonymous_id STRING,
  tenant_id STRING NOT NULL,

  -- Context (nested)
  context STRUCT<
    page STRUCT<
      url STRING,
      path STRING,
      title STRING,
      referrer STRING
    >,
    user_agent STRING,
    ip STRING,
    locale STRING,
    timezone STRING,
    device STRUCT<
      type STRING,
      vendor STRING,
      model STRING
    >,
    os STRUCT<
      name STRING,
      version STRING
    >,
    browser STRUCT<
      name STRING,
      version STRING
    >
  >,

  -- Event properties (flexible JSON)
  properties JSON,

  -- User traits (flexible JSON)
  traits JSON,

  -- Partitioning & clustering
  _partition_time TIMESTAMP
)
PARTITION BY DATE(timestamp)
CLUSTER BY tenant_id, event_name, user_id
OPTIONS (
  partition_expiration_days = 730,  -- 2 years
  require_partition_filter = true
);

Indices

-- Fast user lookup
CREATE INDEX idx_events_user_id
ON `bioqms-prod.analytics.events` (user_id)
OPTIONS (description = 'Index for user-based queries');

-- Fast session lookup
CREATE INDEX idx_events_session_id
ON `bioqms-prod.analytics.events` (session_id)
OPTIONS (description = 'Index for session reconstruction');

-- Fast event type lookup
CREATE INDEX idx_events_event_name
ON `bioqms-prod.analytics.events` (event_name)
OPTIONS (description = 'Index for event type filtering');

Pipeline Reliability Features

Feature	Implementation	Purpose
Deduplication	Event ID-based	Prevent duplicate processing
Retry Logic	Exponential backoff (10s to 10min)	Handle transient failures
Dead Letter Queue	Pub/Sub DLQ after 5 retries	Isolate problematic events
Schema Validation	Pub/Sub schema enforcement	Reject malformed events
Rate Limiting	10,000 events/sec per tenant	Prevent abuse
Batch Processing	100 events per SDK flush	Reduce API calls

Pipeline Monitoring

Cloud Monitoring Alerts

# monitoring/alerts.yaml
alerts:
  - name: High Event Ingestion Latency
    condition: |
      metric: pubsub.googleapis.com/subscription/oldest_unacked_message_age
      threshold: 60s
      duration: 5m
    severity: warning
    notification_channels:
      - ops-pagerduty

  - name: BigQuery Streaming Insert Failures
    condition: |
      metric: bigquery.googleapis.com/streaming_insert/failed_rows
      threshold: 100
      duration: 5m
    severity: critical
    notification_channels:
      - ops-pagerduty
      - engineering-slack

  - name: Event Validation Failure Rate
    condition: |
      metric: custom.googleapis.com/analytics/event_validation_failures
      threshold: 5%
      duration: 10m
    severity: warning
    notification_channels:
      - engineering-slack

Dashboard Metrics

Events ingested per minute (by tenant, by event type)
P50/P95/P99 ingestion latency
Validation failure rate
Pub/Sub message backlog
BigQuery streaming insert success rate

J.1.3: User Identification & Session Management

Anonymous → Identified User Stitching

Flow Diagram

Anonymous ID Generation

Client SDK

// @bioqms/analytics/src/identity.js
import { v4 as uuidv4 } from 'uuid';
import Cookies from 'js-cookie';

export class IdentityManager {
  constructor() {
    this.anonymousId = this._getOrCreateAnonymousId();
    this.userId = null;
  }

  _getOrCreateAnonymousId() {
    let anonId = Cookies.get('bioqms_anon_id');

    if (!anonId) {
      anonId = `anon_${uuidv4().replace(/-/g, '')}`;
      Cookies.set('bioqms_anon_id', anonId, {
        expires: 365, // 1 year
        secure: true,
        sameSite: 'Lax'
      });
    }

    return anonId;
  }

  identify(userId, traits = {}) {
    const previousUserId = this.userId;
    this.userId = userId;

    // Send alias event to link anonymous_id to user_id
    this._sendAliasEvent(this.anonymousId, userId);

    // Send identify event with user traits
    this._sendIdentifyEvent(userId, traits);

    return this;
  }

  _sendAliasEvent(anonymousId, userId) {
    fetch('https://events.bioqms.com/api/v1/events', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Write-Key': this.writeKey
      },
      body: JSON.stringify({
        event_name: 'user.alias',
        event_version: '1.0.0',
        timestamp: new Date().toISOString(),
        anonymous_id: anonymousId,
        user_id: userId,
        properties: {
          previous_id: anonymousId,
          new_id: userId
        }
      })
    });
  }

  _sendIdentifyEvent(userId, traits) {
    fetch('https://events.bioqms.com/api/v1/events', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Write-Key': this.writeKey
      },
      body: JSON.stringify({
        event_name: 'user.identified',
        event_version: '1.0.0',
        timestamp: new Date().toISOString(),
        user_id: userId,
        traits: traits
      })
    });
  }

  reset() {
    this.userId = null;
    // Keep anonymous_id for cross-session tracking
  }
}

User Alias Resolution

BigQuery Materialized View

CREATE MATERIALIZED VIEW `bioqms-prod.analytics.user_aliases`
AS
SELECT
  anonymous_id,
  user_id,
  MIN(timestamp) AS first_identified_at,
  MAX(timestamp) AS last_seen_at,
  tenant_id
FROM `bioqms-prod.analytics.events`
WHERE event_name = 'user.alias'
GROUP BY anonymous_id, user_id, tenant_id;

-- Backfill script to update historical events
MERGE INTO `bioqms-prod.analytics.events` AS target
USING `bioqms-prod.analytics.user_aliases` AS source
ON target.anonymous_id = source.anonymous_id
  AND target.timestamp >= source.first_identified_at
  AND target.user_id IS NULL
WHEN MATCHED THEN
  UPDATE SET user_id = source.user_id;

Session Management

Session Definition: A session is a sequence of events from a single user within 30 minutes of inactivity.

Session ID Generation

// @bioqms/analytics/src/session.js
export class SessionManager {
  constructor() {
    this.sessionId = null;
    this.lastActivityTime = null;
    this.sessionTimeout = 30 * 60 * 1000; // 30 minutes

    this._initSession();
  }

  _initSession() {
    const storedSession = sessionStorage.getItem('bioqms_session_id');
    const storedTime = sessionStorage.getItem('bioqms_last_activity');

    if (storedSession && storedTime) {
      const timeSinceLastActivity = Date.now() - parseInt(storedTime);

      if (timeSinceLastActivity < this.sessionTimeout) {
        // Continue existing session
        this.sessionId = storedSession;
        this.lastActivityTime = Date.now();
        return;
      }
    }

    // Start new session
    this._startNewSession();
  }

  _startNewSession() {
    this.sessionId = `ses_${uuidv4().replace(/-/g, '')}`;
    this.lastActivityTime = Date.now();

    sessionStorage.setItem('bioqms_session_id', this.sessionId);
    sessionStorage.setItem('bioqms_last_activity', this.lastActivityTime.toString());

    // Track session start event
    this._trackSessionStart();
  }

  updateActivity() {
    const now = Date.now();
    const timeSinceLastActivity = now - this.lastActivityTime;

    if (timeSinceLastActivity > this.sessionTimeout) {
      // Session expired, start new one
      this._trackSessionEnd();
      this._startNewSession();
    } else {
      // Update activity timestamp
      this.lastActivityTime = now;
      sessionStorage.setItem('bioqms_last_activity', this.lastActivityTime.toString());
    }
  }

  _trackSessionStart() {
    // Tracked automatically by SDK
  }

  _trackSessionEnd() {
    fetch('https://events.bioqms.com/api/v1/events', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Write-Key': this.writeKey
      },
      body: JSON.stringify({
        event_name: 'session.ended',
        event_version: '1.0.0',
        timestamp: new Date().toISOString(),
        session_id: this.sessionId,
        user_id: this.userId,
        anonymous_id: this.anonymousId,
        properties: {
          duration_seconds: (this.lastActivityTime - this.sessionStartTime) / 1000
        }
      })
    });
  }
}

Session Reconstruction Query

-- Reconstruct sessions from events
WITH session_events AS (
  SELECT
    session_id,
    user_id,
    anonymous_id,
    tenant_id,
    MIN(timestamp) AS session_start,
    MAX(timestamp) AS session_end,
    TIMESTAMP_DIFF(MAX(timestamp), MIN(timestamp), SECOND) AS duration_seconds,
    COUNT(*) AS event_count,
    ARRAY_AGG(DISTINCT event_name ORDER BY event_name) AS event_types,
    ARRAY_AGG(
      STRUCT(timestamp, event_name, properties)
      ORDER BY timestamp
      LIMIT 1000
    ) AS events
  FROM `bioqms-prod.analytics.events`
  WHERE DATE(timestamp) = CURRENT_DATE()
  GROUP BY session_id, user_id, anonymous_id, tenant_id
)
SELECT * FROM session_events
WHERE duration_seconds > 10  -- Filter out single-event sessions
ORDER BY session_start DESC;

Cross-Device Identification

Strategy: User ID serves as the universal identifier across devices.

Scenario	Behavior
User logs in on Device A	`anonymous_id_A` → `user_123`
User logs in on Device B	`anonymous_id_B` → `user_123`
Cross-device query	`WHERE user_id = 'user_123'` returns events from both devices

Implementation: No special handling needed. User ID linkage automatically enables cross-device analytics.

J.1.4: Privacy Controls & Compliance

Regulatory Compliance

Regulation	Requirements	BIO-QMS Implementation
GDPR	Right to erasure, data portability, consent	Opt-out API, data export, banner consent
CCPA	Right to know, delete, opt-out	Privacy dashboard, deletion API
HIPAA	PHI protection (if applicable)	PII scrubbing, encryption at rest/transit
21 CFR Part 11	Audit trails, electronic signatures	Immutable event log, signature tracking

Opt-Out Mechanism

User Opt-Out Flow

Opt-Out API

# privacy/views.py
from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework.permissions import IsAuthenticated
from google.cloud import bigquery
from django.conf import settings

class AnalyticsOptOutView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request):
        user_id = f"usr_bio_{request.user.id}"
        tenant_id = request.tenant.external_id

        # Add to opt-out list
        client = bigquery.Client(project=settings.GCP_PROJECT_ID)

        query = f"""
        INSERT INTO `bioqms-prod.analytics.opt_out_users` (user_id, tenant_id, opted_out_at)
        VALUES ('{user_id}', '{tenant_id}', CURRENT_TIMESTAMP())
        """

        client.query(query).result()

        # Trigger deletion of historical events (async job)
        from analytics.tasks import delete_user_events
        delete_user_events.delay(user_id, tenant_id)

        return Response({
            'status': 'success',
            'message': 'You have been opted out of analytics tracking. Historical data will be deleted within 48 hours.'
        })

    def delete(self, request):
        # Opt back in
        user_id = f"usr_bio_{request.user.id}"
        tenant_id = request.tenant.external_id

        client = bigquery.Client(project=settings.GCP_PROJECT_ID)

        query = f"""
        DELETE FROM `bioqms-prod.analytics.opt_out_users`
        WHERE user_id = '{user_id}' AND tenant_id = '{tenant_id}'
        """

        client.query(query).result()

        return Response({
            'status': 'success',
            'message': 'You have been opted back in to analytics tracking.'
        })

Client SDK Opt-Out

// @bioqms/analytics/src/privacy.js
export class PrivacyManager {
  optOut() {
    Cookies.set('bioqms_opt_out', 'true', {
      expires: 3650, // 10 years
      secure: true,
      sameSite: 'Strict'
    });

    // Stop tracking
    this.analytics.enabled = false;

    // Clear existing identifiers
    Cookies.remove('bioqms_anon_id');
    sessionStorage.removeItem('bioqms_session_id');
  }

  optIn() {
    Cookies.remove('bioqms_opt_out');
    this.analytics.enabled = true;
  }

  isOptedOut() {
    return Cookies.get('bioqms_opt_out') === 'true';
  }
}

PII Scrubbing

Server-Side Scrubbing

# analytics/pii_scrubber.py
import re

class PIIScrubber:
    """
    Removes personally identifiable information from event properties.
    """

    EMAIL_PATTERN = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
    PHONE_PATTERN = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b')
    SSN_PATTERN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
    CREDIT_CARD_PATTERN = re.compile(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b')

    PII_FIELDS = [
        'email', 'phone', 'ssn', 'credit_card', 'password',
        'first_name', 'last_name', 'full_name', 'address',
        'date_of_birth', 'driver_license'
    ]

    @classmethod
    def scrub_event(cls, event):
        """
        Recursively scrub PII from event properties.
        """
        if 'properties' in event:
            event['properties'] = cls._scrub_dict(event['properties'])

        if 'traits' in event:
            event['traits'] = cls._scrub_dict(event['traits'])

        if 'context' in event:
            event['context'] = cls._scrub_dict(event['context'])

        return event

    @classmethod
    def _scrub_dict(cls, data):
        if not isinstance(data, dict):
            return data

        scrubbed = {}
        for key, value in data.items():
            # Remove known PII fields
            if key.lower() in cls.PII_FIELDS:
                scrubbed[key] = '[REDACTED]'
                continue

            # Recursively scrub nested dicts
            if isinstance(value, dict):
                scrubbed[key] = cls._scrub_dict(value)
            elif isinstance(value, str):
                scrubbed[key] = cls._scrub_string(value)
            else:
                scrubbed[key] = value

        return scrubbed

    @classmethod
    def _scrub_string(cls, text):
        # Redact email addresses
        text = cls.EMAIL_PATTERN.sub('[EMAIL_REDACTED]', text)

        # Redact phone numbers
        text = cls.PHONE_PATTERN.sub('[PHONE_REDACTED]', text)

        # Redact SSNs
        text = cls.SSN_PATTERN.sub('[SSN_REDACTED]', text)

        # Redact credit cards
        text = cls.CREDIT_CARD_PATTERN.sub('[CC_REDACTED]', text)

        return text

Usage in Events API

# analytics/views.py
from analytics.pii_scrubber import PIIScrubber

class EventsAPIView(APIView):
    def post(self, request):
        events = request.data if isinstance(request.data, list) else [request.data]

        for event in events:
            # Scrub PII before publishing
            event = PIIScrubber.scrub_event(event)

            # ... publish to Pub/Sub

Data Retention Policy

Data Type	Retention Period	Deletion Method
Raw Events	24 months	BigQuery partition expiration
Aggregated Metrics	5 years	Manual archival to Cloud Storage
User Profiles	Account lifetime + 90 days	Hard delete on request
Opted-Out User Data	Deleted within 48 hours	Scheduled job

Automated Deletion Job

# analytics/tasks.py
from celery import shared_task
from google.cloud import bigquery
from django.conf import settings

@shared_task
def delete_user_events(user_id, tenant_id):
    """
    Delete all events for a user who has opted out.
    """
    client = bigquery.Client(project=settings.GCP_PROJECT_ID)

    # Delete from events table (partitioned by date)
    # Must delete from each partition individually for efficiency
    query = f"""
    DELETE FROM `bioqms-prod.analytics.events`
    WHERE user_id = '{user_id}'
      AND tenant_id = '{tenant_id}'
      AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 24 MONTH)
    """

    job = client.query(query)
    job.result()  # Wait for completion

    # Delete from user_aliases
    query = f"""
    DELETE FROM `bioqms-prod.analytics.user_aliases`
    WHERE user_id = '{user_id}'
      AND tenant_id = '{tenant_id}'
    """

    client.query(query).result()

    # Log deletion for audit trail
    query = f"""
    INSERT INTO `bioqms-prod.analytics.deletion_log` (user_id, tenant_id, deleted_at, reason)
    VALUES ('{user_id}', '{tenant_id}', CURRENT_TIMESTAMP(), 'user_opt_out')
    """

    client.query(query).result()

    return {
        'user_id': user_id,
        'tenant_id': tenant_id,
        'status': 'deleted'
    }

Export API

# privacy/views.py
class DataExportView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request):
        user_id = f"usr_bio_{request.user.id}"
        tenant_id = request.tenant.external_id

        # Trigger async export job
        from analytics.tasks import export_user_data
        job_id = export_user_data.delay(user_id, tenant_id)

        return Response({
            'status': 'processing',
            'job_id': str(job_id),
            'message': 'Your data export is being prepared. You will receive an email when ready.'
        })

@shared_task
def export_user_data(user_id, tenant_id):
    """
    Export all events and user data to JSON file, upload to Cloud Storage,
    and email download link to user.
    """
    client = bigquery.Client(project=settings.GCP_PROJECT_ID)

    # Query all user events
    query = f"""
    SELECT *
    FROM `bioqms-prod.analytics.events`
    WHERE user_id = '{user_id}'
      AND tenant_id = '{tenant_id}'
    ORDER BY timestamp
    """

    results = client.query(query).result()

    # Convert to JSON
    import json
    events = [dict(row) for row in results]

    export_data = {
        'user_id': user_id,
        'tenant_id': tenant_id,
        'exported_at': timezone.now().isoformat(),
        'event_count': len(events),
        'events': events
    }

    # Upload to Cloud Storage
    from google.cloud import storage
    storage_client = storage.Client()
    bucket = storage_client.bucket('bioqms-user-exports')

    filename = f"user-data-export-{user_id}-{timezone.now().strftime('%Y%m%d%H%M%S')}.json"
    blob = bucket.blob(filename)
    blob.upload_from_string(
        json.dumps(export_data, indent=2),
        content_type='application/json'
    )

    # Generate signed URL (valid for 7 days)
    download_url = blob.generate_signed_url(
        version='v4',
        expiration=timedelta(days=7),
        method='GET'
    )

    # Send email with download link
    send_mail(
        subject='Your BIO-QMS Data Export is Ready',
        message=f'Download your data export: {download_url}\n\nThis link expires in 7 days.',
        from_email='privacy@bioqms.com',
        recipient_list=[request.user.email]
    )

    return {
        'user_id': user_id,
        'filename': filename,
        'download_url': download_url
    }

J.1.5: Real-Time Analytics Dashboard

Dashboard Architecture

Live Event Stream

Backend (FastAPI WebSocket)

# dashboard/websocket.py
from fastapi import WebSocket, WebSocketDisconnect
from google.cloud import pubsub_v1
import asyncio
import json

class EventStreamManager:
    def __init__(self):
        self.active_connections: List[WebSocket] = []
        self.subscriber = pubsub_v1.SubscriberClient()
        self.subscription_path = self.subscriber.subscription_path(
            'bioqms-prod',
            'bioqms-events-to-dashboard'
        )

    async def connect(self, websocket: WebSocket):
        await websocket.accept()
        self.active_connections.append(websocket)

    def disconnect(self, websocket: WebSocket):
        self.active_connections.remove(websocket)

    async def broadcast(self, message: dict):
        for connection in self.active_connections:
            try:
                await connection.send_json(message)
            except Exception:
                await self.disconnect(connection)

    def start_listening(self):
        """
        Subscribe to Pub/Sub and broadcast events to connected clients.
        """
        def callback(message):
            event = json.loads(message.data.decode('utf-8'))

            # Filter sensitive fields
            sanitized_event = {
                'event_id': event['event_id'],
                'event_name': event['event_name'],
                'timestamp': event['timestamp'],
                'tenant_id': event['tenant_id'],
                'user_id': event.get('user_id', 'anonymous')
            }

            # Broadcast to all connected clients
            asyncio.create_task(self.broadcast(sanitized_event))

            message.ack()

        streaming_pull_future = self.subscriber.subscribe(
            self.subscription_path,
            callback=callback
        )

        # Keep subscription alive
        try:
            streaming_pull_future.result()
        except Exception as e:
            streaming_pull_future.cancel()
            raise

# WebSocket endpoint
from fastapi import FastAPI
app = FastAPI()
event_manager = EventStreamManager()

@app.websocket("/ws/events")
async def websocket_endpoint(websocket: WebSocket):
    await event_manager.connect(websocket)
    try:
        while True:
            # Keep connection alive
            await websocket.receive_text()
    except WebSocketDisconnect:
        event_manager.disconnect(websocket)

Frontend (React Component)

// components/LiveEventStream.tsx
import React, { useEffect, useState } from 'react';
import { Box, List, ListItem, Typography, Chip } from '@mui/material';

interface Event {
  event_id: string;
  event_name: string;
  timestamp: string;
  tenant_id: string;
  user_id: string;
}

export const LiveEventStream: React.FC = () => {
  const [events, setEvents] = useState<Event[]>([]);
  const [ws, setWs] = useState<WebSocket | null>(null);

  useEffect(() => {
    const websocket = new WebSocket('wss://dashboard.bioqms.com/ws/events');

    websocket.onopen = () => {
      console.log('WebSocket connected');
    };

    websocket.onmessage = (message) => {
      const event = JSON.parse(message.data) as Event;
      setEvents(prev => [event, ...prev].slice(0, 50)); // Keep last 50
    };

    websocket.onerror = (error) => {
      console.error('WebSocket error:', error);
    };

    websocket.onclose = () => {
      console.log('WebSocket disconnected');
      // Reconnect after 5 seconds
      setTimeout(() => {
        setWs(new WebSocket('wss://dashboard.bioqms.com/ws/events'));
      }, 5000);
    };

    setWs(websocket);

    return () => {
      websocket.close();
    };
  }, []);

  const getEventColor = (eventName: string): string => {
    if (eventName.startsWith('work_order')) return 'primary';
    if (eventName.startsWith('agent')) return 'secondary';
    if (eventName.startsWith('error')) return 'error';
    return 'default';
  };

  return (
    <Box sx={{ height: '600px', overflow: 'auto', bgcolor: '#f5f5f5', p: 2 }}>
      <Typography variant="h6" gutterBottom>
        Live Event Stream
      </Typography>
      <List>
        {events.map(event => (
          <ListItem key={event.event_id} sx={{ bgcolor: 'white', mb: 1, borderRadius: 1 }}>
            <Box sx={{ flex: 1 }}>
              <Typography variant="body2" color="textSecondary">
                {new Date(event.timestamp).toLocaleTimeString()}
              </Typography>
              <Chip
                label={event.event_name}
                color={getEventColor(event.event_name) as any}
                size="small"
                sx={{ mt: 0.5 }}
              />
            </Box>
            <Typography variant="caption" color="textSecondary">
              {event.tenant_id}
            </Typography>
          </ListItem>
        ))}
      </List>
    </Box>
  );
};

Volume Monitoring

1-Minute Aggregations (Redis)

# dashboard/aggregations.py
import redis
from datetime import datetime, timedelta
import json

class VolumeAggregator:
    def __init__(self):
        self.redis_client = redis.Redis(
            host='redis.bioqms.com',
            port=6379,
            db=0,
            decode_responses=True
        )

    def increment_event(self, event_name: str, tenant_id: str):
        """
        Increment event counter for current minute.
        """
        minute_key = datetime.utcnow().strftime('%Y-%m-%d %H:%M')

        # Global counter
        self.redis_client.hincrby(f"events:{minute_key}", "total", 1)

        # Per-event counter
        self.redis_client.hincrby(f"events:{minute_key}", event_name, 1)

        # Per-tenant counter
        self.redis_client.hincrby(f"events:{minute_key}:tenant", tenant_id, 1)

        # Set expiration (keep for 1 hour)
        self.redis_client.expire(f"events:{minute_key}", 3600)

    def get_recent_volume(self, minutes: int = 60) -> dict:
        """
        Get event volume for the last N minutes.
        """
        now = datetime.utcnow()
        data = []

        for i in range(minutes):
            minute = (now - timedelta(minutes=i)).strftime('%Y-%m-%d %H:%M')
            total = self.redis_client.hget(f"events:{minute}", "total") or 0

            data.append({
                'minute': minute,
                'count': int(total)
            })

        return {
            'data': list(reversed(data)),
            'total': sum(d['count'] for d in data),
            'avg_per_minute': sum(d['count'] for d in data) / minutes
        }

    def get_top_events(self, minutes: int = 60, limit: int = 10) -> list:
        """
        Get most frequent events in the last N minutes.
        """
        event_counts = {}
        now = datetime.utcnow()

        for i in range(minutes):
            minute = (now - timedelta(minutes=i)).strftime('%Y-%m-%d %H:%M')
            events = self.redis_client.hgetall(f"events:{minute}")

            for event_name, count in events.items():
                if event_name == 'total':
                    continue
                event_counts[event_name] = event_counts.get(event_name, 0) + int(count)

        # Sort by count and return top N
        sorted_events = sorted(event_counts.items(), key=lambda x: x[1], reverse=True)

        return [
            {'event_name': name, 'count': count}
            for name, count in sorted_events[:limit]
        ]

Volume Chart Component

// components/VolumeChart.tsx
import React, { useEffect, useState } from 'react';
import { LineChart, Line, XAxis, YAxis, Tooltip, ResponsiveContainer } from 'recharts';

interface VolumeData {
  minute: string;
  count: number;
}

export const VolumeChart: React.FC = () => {
  const [data, setData] = useState<VolumeData[]>([]);

  useEffect(() => {
    const fetchData = async () => {
      const response = await fetch('/api/dashboard/volume?minutes=60');
      const result = await response.json();
      setData(result.data);
    };

    fetchData();
    const interval = setInterval(fetchData, 10000); // Refresh every 10s

    return () => clearInterval(interval);
  }, []);

  return (
    <ResponsiveContainer width="100%" height={300}>
      <LineChart data={data}>
        <XAxis
          dataKey="minute"
          tickFormatter={(value) => new Date(value).toLocaleTimeString([], {
            hour: '2-digit',
            minute: '2-digit'
          })}
        />
        <YAxis />
        <Tooltip />
        <Line type="monotone" dataKey="count" stroke="#8884d8" strokeWidth={2} />
      </LineChart>
    </ResponsiveContainer>
  );
};

Data Quality Metrics

Quality Checks

Check	Threshold	Alert Condition
Schema Validation	99% valid	< 95%
Duplicate Rate	< 1%	> 5%
Missing User ID	< 10% (logged-in events)	> 20%
Timestamp Drift	< 5 seconds	> 60 seconds
Ingestion Latency	P95 < 10s	P95 > 60s

Quality Dashboard Query

-- Real-time quality metrics (last hour)
WITH quality_checks AS (
  SELECT
    -- Schema validation rate
    COUNTIF(event_name IS NOT NULL AND timestamp IS NOT NULL) / COUNT(*) AS schema_valid_rate,

    -- Duplicate rate (based on event_id)
    (COUNT(*) - COUNT(DISTINCT event_id)) / COUNT(*) AS duplicate_rate,

    -- Missing user_id rate (for events that should have user_id)
    COUNTIF(
      event_name NOT IN ('page.viewed', 'session.started')
      AND user_id IS NULL
    ) / COUNTIF(event_name NOT IN ('page.viewed', 'session.started')) AS missing_user_rate,

    -- Timestamp drift (server received_at vs client timestamp)
    AVG(TIMESTAMP_DIFF(received_at, timestamp, SECOND)) AS avg_timestamp_drift_seconds,

    -- Ingestion latency percentiles
    APPROX_QUANTILES(TIMESTAMP_DIFF(received_at, timestamp, SECOND), 100)[OFFSET(50)] AS p50_latency,
    APPROX_QUANTILES(TIMESTAMP_DIFF(received_at, timestamp, SECOND), 100)[OFFSET(95)] AS p95_latency,
    APPROX_QUANTILES(TIMESTAMP_DIFF(received_at, timestamp, SECOND), 100)[OFFSET(99)] AS p99_latency

  FROM `bioqms-prod.analytics.events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
)
SELECT
  schema_valid_rate * 100 AS schema_valid_pct,
  duplicate_rate * 100 AS duplicate_pct,
  missing_user_rate * 100 AS missing_user_pct,
  avg_timestamp_drift_seconds,
  p50_latency,
  p95_latency,
  p99_latency,

  -- Quality score (0-100)
  (
    schema_valid_rate * 40 +
    (1 - duplicate_rate) * 20 +
    (1 - missing_user_rate) * 20 +
    IF(p95_latency < 10, 20, IF(p95_latency < 60, 10, 0))
  ) AS overall_quality_score

FROM quality_checks;

Quality Dashboard Component

// components/QualityMetrics.tsx
import React, { useEffect, useState } from 'react';
import { Box, Grid, Card, CardContent, Typography, LinearProgress } from '@mui/material';

interface QualityMetrics {
  schema_valid_pct: number;
  duplicate_pct: number;
  missing_user_pct: number;
  p95_latency: number;
  overall_quality_score: number;
}

export const QualityMetrics: React.FC = () => {
  const [metrics, setMetrics] = useState<QualityMetrics | null>(null);

  useEffect(() => {
    const fetchMetrics = async () => {
      const response = await fetch('/api/dashboard/quality');
      const data = await response.json();
      setMetrics(data);
    };

    fetchMetrics();
    const interval = setInterval(fetchMetrics, 30000); // Refresh every 30s

    return () => clearInterval(interval);
  }, []);

  if (!metrics) return <div>Loading...</div>;

  const getColor = (value: number, threshold: number, inverse = false) => {
    const compare = inverse ? value > threshold : value < threshold;
    return compare ? 'error' : 'success';
  };

  return (
    <Box>
      <Typography variant="h6" gutterBottom>
        Data Quality Metrics (Last Hour)
      </Typography>

      <Grid container spacing={2}>
        <Grid item xs={12} md={6}>
          <Card>
            <CardContent>
              <Typography variant="subtitle2">Schema Validation</Typography>
              <Typography variant="h4">{metrics.schema_valid_pct.toFixed(1)}%</Typography>
              <LinearProgress
                variant="determinate"
                value={metrics.schema_valid_pct}
                color={getColor(metrics.schema_valid_pct, 95)}
              />
            </CardContent>
          </Card>
        </Grid>

        <Grid item xs={12} md={6}>
          <Card>
            <CardContent>
              <Typography variant="subtitle2">Duplicate Rate</Typography>
              <Typography variant="h4">{metrics.duplicate_pct.toFixed(2)}%</Typography>
              <LinearProgress
                variant="determinate"
                value={metrics.duplicate_pct}
                color={getColor(metrics.duplicate_pct, 5, true)}
              />
            </CardContent>
          </Card>
        </Grid>

        <Grid item xs={12} md={6}>
          <Card>
            <CardContent>
              <Typography variant="subtitle2">P95 Ingestion Latency</Typography>
              <Typography variant="h4">{metrics.p95_latency.toFixed(1)}s</Typography>
              <LinearProgress
                variant="determinate"
                value={(metrics.p95_latency / 60) * 100}
                color={getColor(metrics.p95_latency, 10, true)}
              />
            </CardContent>
          </Card>
        </Grid>

        <Grid item xs={12} md={6}>
          <Card>
            <CardContent>
              <Typography variant="subtitle2">Overall Quality Score</Typography>
              <Typography variant="h4">{metrics.overall_quality_score.toFixed(0)}/100</Typography>
              <LinearProgress
                variant="determinate"
                value={metrics.overall_quality_score}
                color={getColor(metrics.overall_quality_score, 80)}
              />
            </CardContent>
          </Card>
        </Grid>
      </Grid>
    </Box>
  );
};

J.2: Feature Analytics

J.2.1: Feature Adoption Tracking

Feature Flag Integration

LaunchDarkly → Analytics Bridge

# features/analytics_bridge.py
from ldclient import get as ld_client
from analytics.client import analytics_client

class FeatureFlagAnalytics:
    """
    Tracks feature flag exposure and usage.
    """

    @staticmethod
    def track_feature_exposure(user_id: str, tenant_id: str, flag_key: str, variation: any):
        """
        Called automatically when a feature flag is evaluated.
        """
        analytics_client.track({
            'event_name': 'feature.exposed',
            'event_version': '1.0.0',
            'user_id': user_id,
            'tenant_id': tenant_id,
            'properties': {
                'feature_key': flag_key,
                'variation': variation,
                'timestamp': timezone.now().isoformat()
            }
        })

    @staticmethod
    def track_feature_usage(user_id: str, tenant_id: str, feature_key: str, action: str):
        """
        Called when a user actively uses a feature.
        """
        analytics_client.track({
            'event_name': 'feature.used',
            'event_version': '1.0.0',
            'user_id': user_id,
            'tenant_id': tenant_id,
            'properties': {
                'feature_key': feature_key,
                'action': action,
                'timestamp': timezone.now().isoformat()
            }
        })

# Configure LaunchDarkly hook
from ldclient.config import Config
from ldclient.hook import Hook

class AnalyticsHook(Hook):
    def after_evaluation(self, series_context, data):
        FeatureFlagAnalytics.track_feature_exposure(
            user_id=series_context.context.get_key(),
            tenant_id=series_context.context.get_value('tenant_id'),
            flag_key=series_context.key,
            variation=data.value
        )

# Apply hook
config = Config(
    sdk_key="sdk-key-123",
    hooks=[AnalyticsHook()]
)

Adoption Funnel Stages

Stage Definitions

Stage	Criteria	Typical Conversion
Exposed	Feature flag evaluated to `true`	100% (baseline)
Tried	First `feature.used` event within 7 days	40-60%
Adopted	3+ uses within 30 days	15-30%
Power User	10+ uses within 30 days	5-15%
Champion	Daily use (20+ days/month)	2-5%

Feature Adoption Query

-- Feature adoption funnel by cohort
WITH feature_users AS (
  SELECT DISTINCT
    user_id,
    tenant_id,
    DATE(MIN(timestamp)) OVER (PARTITION BY user_id) AS first_exposure_date
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'feature.exposed'
    AND JSON_EXTRACT_SCALAR(properties, '$.feature_key') = 'ai_copilot'
),

usage_counts AS (
  SELECT
    user_id,
    tenant_id,
    COUNT(*) AS total_uses,
    COUNT(DISTINCT DATE(timestamp)) AS days_used,
    MIN(timestamp) AS first_use,
    MAX(timestamp) AS last_use
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'feature.used'
    AND JSON_EXTRACT_SCALAR(properties, '$.feature_key') = 'ai_copilot'
  GROUP BY user_id, tenant_id
),

funnel_stages AS (
  SELECT
    fu.user_id,
    fu.tenant_id,
    fu.first_exposure_date,
    uc.total_uses,
    uc.days_used,
    uc.first_use,

    -- Stage classification
    CASE
      WHEN uc.days_used >= 20 THEN 'Champion'
      WHEN uc.total_uses >= 10 THEN 'Power User'
      WHEN uc.total_uses >= 3 THEN 'Adopted'
      WHEN uc.total_uses >= 1 THEN 'Tried'
      ELSE 'Exposed Only'
    END AS adoption_stage,

    -- Time to adoption
    TIMESTAMP_DIFF(uc.first_use, fu.first_exposure_date, DAY) AS days_to_first_use

  FROM feature_users fu
  LEFT JOIN usage_counts uc ON fu.user_id = uc.user_id AND fu.tenant_id = uc.tenant_id
)

SELECT
  adoption_stage,
  COUNT(*) AS user_count,
  ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS percentage,
  AVG(days_to_first_use) AS avg_days_to_first_use,
  AVG(total_uses) AS avg_total_uses
FROM funnel_stages
GROUP BY adoption_stage
ORDER BY
  CASE adoption_stage
    WHEN 'Champion' THEN 1
    WHEN 'Power User' THEN 2
    WHEN 'Adopted' THEN 3
    WHEN 'Tried' THEN 4
    WHEN 'Exposed Only' THEN 5
  END;

Example Output

adoption_stage	user_count	percentage	avg_days_to_first_use	avg_total_uses
Champion	12	2.4%	1.3	87.5
Power User	31	6.2%	2.1	15.2
Adopted	89	17.8%	3.7	5.1
Tried	147	29.4%	5.2	1.8
Exposed Only	221	44.2%	NULL	NULL

DAU/WAU/MAU per Feature

-- Daily/Weekly/Monthly Active Users for each feature
WITH feature_usage AS (
  SELECT
    JSON_EXTRACT_SCALAR(properties, '$.feature_key') AS feature_key,
    user_id,
    tenant_id,
    DATE(timestamp) AS usage_date
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'feature.used'
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
)

SELECT
  feature_key,

  -- DAU (today)
  COUNT(DISTINCT CASE WHEN usage_date = CURRENT_DATE() THEN user_id END) AS dau,

  -- WAU (last 7 days)
  COUNT(DISTINCT CASE WHEN usage_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) THEN user_id END) AS wau,

  -- MAU (last 30 days)
  COUNT(DISTINCT user_id) AS mau,

  -- Stickiness ratio (DAU/MAU)
  ROUND(
    COUNT(DISTINCT CASE WHEN usage_date = CURRENT_DATE() THEN user_id END) * 100.0 /
    NULLIF(COUNT(DISTINCT user_id), 0),
    2
  ) AS stickiness_ratio

FROM feature_usage
GROUP BY feature_key
ORDER BY mau DESC;

Example Output

feature_key	dau	wau	mau	stickiness_ratio
work_order_templates	87	312	1,205	7.22%
ai_copilot	45	189	500	9.00%
batch_processing	23	98	287	8.01%
electronic_signatures	156	678	1,890	8.25%

J.2.2: Cohort Analysis

Cohort Definition

Cohort: Group of users who share a common characteristic (e.g., signup month, subscription tier, first feature used).

Cohort Types

Type	Definition	Use Case
Time-based	Users who signed up in same period	Retention analysis
Feature-based	Users who first used specific feature	Feature-driven growth
Behavioral	Users with similar usage patterns	Personalization
Firmographic	Users from same industry/size	Segment targeting

Retention Curve by Cohort

Query: Monthly Retention

-- Cohort retention analysis (monthly)
WITH user_cohorts AS (
  SELECT
    user_id,
    tenant_id,
    DATE_TRUNC(MIN(DATE(timestamp)), MONTH) AS cohort_month
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'user.identified'
  GROUP BY user_id, tenant_id
),

monthly_activity AS (
  SELECT
    user_id,
    tenant_id,
    DATE_TRUNC(DATE(timestamp), MONTH) AS activity_month
  FROM `bioqms-prod.analytics.events`
  WHERE event_name IN ('page.viewed', 'work_order.created', 'feature.used')
  GROUP BY user_id, tenant_id, activity_month
),

cohort_activity AS (
  SELECT
    uc.cohort_month,
    ma.activity_month,
    DATE_DIFF(ma.activity_month, uc.cohort_month, MONTH) AS months_since_signup,
    COUNT(DISTINCT uc.user_id) AS active_users
  FROM user_cohorts uc
  LEFT JOIN monthly_activity ma ON uc.user_id = ma.user_id AND uc.tenant_id = ma.tenant_id
  GROUP BY cohort_month, activity_month, months_since_signup
),

cohort_sizes AS (
  SELECT
    cohort_month,
    COUNT(DISTINCT user_id) AS cohort_size
  FROM user_cohorts
  GROUP BY cohort_month
)

SELECT
  ca.cohort_month,
  cs.cohort_size,
  ca.months_since_signup,
  ca.active_users,
  ROUND(ca.active_users * 100.0 / cs.cohort_size, 2) AS retention_rate
FROM cohort_activity ca
JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month
WHERE ca.months_since_signup IS NOT NULL
ORDER BY ca.cohort_month, ca.months_since_signup;

Example Output (Retention Table)

cohort_month	cohort_size	Month 0	Month 1	Month 2	Month 3	Month 6	Month 12
2025-01	87	100%	78%	65%	58%	42%	35%
2025-02	105	100%	81%	69%	61%	47%	-
2025-03	132	100%	83%	71%	64%	-	-
2025-04	156	100%	85%	74%	-	-	-

Retention Curve Visualization

// components/RetentionCurve.tsx
import React from 'react';
import { LineChart, Line, XAxis, YAxis, Tooltip, Legend } from 'recharts';

interface RetentionData {
  cohort_month: string;
  retention_by_month: number[];
}

export const RetentionCurve: React.FC<{ data: RetentionData[] }> = ({ data }) => {
  // Transform data for charting
  const chartData = data[0].retention_by_month.map((_, monthIndex) => {
    const point = {
      month: `Month ${monthIndex}`
    };

    data.forEach(cohort => {
      point[cohort.cohort_month] = cohort.retention_by_month[monthIndex];
    });

    return point;
  });

  return (
    <LineChart width={800} height={400} data={chartData}>
      <XAxis dataKey="month" />
      <YAxis tickFormatter={(value) => `${value}%`} />
      <Tooltip formatter={(value) => `${value}%`} />
      <Legend />
      {data.map((cohort, index) => (
        <Line
          key={cohort.cohort_month}
          type="monotone"
          dataKey={cohort.cohort_month}
          stroke={`hsl(${index * 60}, 70%, 50%)`}
          strokeWidth={2}
        />
      ))}
    </LineChart>
  );
};

Feature Adoption by Cohort

Query: AI Copilot Adoption by Signup Cohort

-- AI Copilot adoption rate by user cohort
WITH user_cohorts AS (
  SELECT
    user_id,
    tenant_id,
    DATE_TRUNC(MIN(DATE(timestamp)), MONTH) AS cohort_month
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'user.identified'
  GROUP BY user_id, tenant_id
),

copilot_adopters AS (
  SELECT DISTINCT
    user_id,
    tenant_id
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'feature.used'
    AND JSON_EXTRACT_SCALAR(properties, '$.feature_key') = 'ai_copilot'
),

cohort_sizes AS (
  SELECT
    cohort_month,
    COUNT(DISTINCT user_id) AS total_users
  FROM user_cohorts
  GROUP BY cohort_month
),

cohort_adoption AS (
  SELECT
    uc.cohort_month,
    COUNT(DISTINCT ca.user_id) AS adopters
  FROM user_cohorts uc
  LEFT JOIN copilot_adopters ca ON uc.user_id = ca.user_id AND uc.tenant_id = ca.tenant_id
  GROUP BY uc.cohort_month
)

SELECT
  cs.cohort_month,
  cs.total_users,
  ca.adopters,
  ROUND(ca.adopters * 100.0 / cs.total_users, 2) AS adoption_rate
FROM cohort_sizes cs
JOIN cohort_adoption ca ON cs.cohort_month = ca.cohort_month
ORDER BY cs.cohort_month;

Example Output

cohort_month	total_users	adopters	adoption_rate
2025-01	87	18	20.69%
2025-02	105	34	32.38%
2025-03	132	51	38.64%
2025-04	156	67	42.95%

Insight: Later cohorts adopt AI Copilot faster, indicating improved onboarding or stronger product-market fit.

Revenue Expansion by Cohort

Query: Upgrade Rate by Cohort

-- Subscription tier upgrades by cohort
WITH user_cohorts AS (
  SELECT
    user_id,
    tenant_id,
    DATE_TRUNC(MIN(DATE(timestamp)), MONTH) AS cohort_month,
    JSON_EXTRACT_SCALAR(MIN(traits), '$.subscription_tier') AS initial_tier
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'user.identified'
  GROUP BY user_id, tenant_id
),

tier_changes AS (
  SELECT
    user_id,
    tenant_id,
    JSON_EXTRACT_SCALAR(properties, '$.previous_tier') AS from_tier,
    JSON_EXTRACT_SCALAR(properties, '$.new_tier') AS to_tier,
    timestamp AS upgrade_date
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'subscription.tier_changed'
),

cohort_expansions AS (
  SELECT
    uc.cohort_month,
    uc.initial_tier,
    COUNT(DISTINCT uc.user_id) AS cohort_size,
    COUNT(DISTINCT tc.user_id) AS expanded_users,
    SUM(
      CASE
        WHEN tc.from_tier = 'starter' AND tc.to_tier = 'professional' THEN 1
        WHEN tc.from_tier = 'professional' AND tc.to_tier = 'enterprise' THEN 1
        ELSE 0
      END
    ) AS upgrade_count
  FROM user_cohorts uc
  LEFT JOIN tier_changes tc ON uc.user_id = tc.user_id AND uc.tenant_id = tc.tenant_id
  GROUP BY uc.cohort_month, uc.initial_tier
)

SELECT
  cohort_month,
  initial_tier,
  cohort_size,
  expanded_users,
  ROUND(expanded_users * 100.0 / cohort_size, 2) AS expansion_rate,
  upgrade_count
FROM cohort_expansions
ORDER BY cohort_month, initial_tier;

Example Output

cohort_month	initial_tier	cohort_size	expanded_users	expansion_rate	upgrade_count
2025-01	starter	52	12	23.08%	12
2025-01	professional	35	5	14.29%	5
2025-02	starter	67	18	26.87%	18
2025-02	professional	38	7	18.42%	7

J.2.3: User Segmentation

Behavioral Segments

RFM Model (Recency, Frequency, Monetary)

-- User segmentation using RFM model
WITH user_activity AS (
  SELECT
    user_id,
    tenant_id,
    MAX(DATE(timestamp)) AS last_activity_date,
    DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) AS days_since_last_activity,
    COUNT(DISTINCT DATE(timestamp)) AS activity_days,
    COUNT(*) AS total_events
  FROM `bioqms-prod.analytics.events`
  WHERE event_name IN ('page.viewed', 'work_order.created', 'feature.used')
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY user_id, tenant_id
),

user_revenue AS (
  SELECT
    user_id,
    tenant_id,
    SUM(CAST(JSON_EXTRACT_SCALAR(properties, '$.amount') AS FLOAT64)) AS total_revenue
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'subscription.payment_received'
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY user_id, tenant_id
),

rfm_scores AS (
  SELECT
    ua.user_id,
    ua.tenant_id,
    ua.days_since_last_activity,
    ua.activity_days,
    ua.total_events,
    COALESCE(ur.total_revenue, 0) AS total_revenue,

    -- Recency score (1-5, lower is better)
    CASE
      WHEN ua.days_since_last_activity <= 7 THEN 5
      WHEN ua.days_since_last_activity <= 14 THEN 4
      WHEN ua.days_since_last_activity <= 30 THEN 3
      WHEN ua.days_since_last_activity <= 60 THEN 2
      ELSE 1
    END AS recency_score,

    -- Frequency score (1-5, higher is better)
    CASE
      WHEN ua.activity_days >= 60 THEN 5
      WHEN ua.activity_days >= 30 THEN 4
      WHEN ua.activity_days >= 15 THEN 3
      WHEN ua.activity_days >= 7 THEN 2
      ELSE 1
    END AS frequency_score,

    -- Monetary score (1-5, higher is better)
    CASE
      WHEN ur.total_revenue >= 10000 THEN 5
      WHEN ur.total_revenue >= 5000 THEN 4
      WHEN ur.total_revenue >= 1000 THEN 3
      WHEN ur.total_revenue >= 500 THEN 2
      WHEN ur.total_revenue > 0 THEN 1
      ELSE 0
    END AS monetary_score

  FROM user_activity ua
  LEFT JOIN user_revenue ur ON ua.user_id = ur.user_id AND ua.tenant_id = ur.tenant_id
),

user_segments AS (
  SELECT
    *,
    CASE
      WHEN recency_score >= 4 AND frequency_score >= 4 THEN 'Power User'
      WHEN recency_score >= 3 AND frequency_score >= 2 THEN 'Casual User'
      WHEN recency_score <= 2 AND frequency_score >= 3 THEN 'Dormant User'
      WHEN recency_score <= 2 AND frequency_score <= 2 THEN 'Churning User'
      ELSE 'Occasional User'
    END AS behavioral_segment
  FROM rfm_scores
)

SELECT
  behavioral_segment,
  COUNT(*) AS user_count,
  AVG(days_since_last_activity) AS avg_days_since_last_activity,
  AVG(activity_days) AS avg_activity_days,
  AVG(total_events) AS avg_total_events,
  SUM(total_revenue) AS segment_revenue
FROM user_segments
GROUP BY behavioral_segment
ORDER BY user_count DESC;

Example Output

behavioral_segment	user_count	avg_days_since_last_activity	avg_activity_days	avg_total_events	segment_revenue
Casual User	342	8.2	18.5	127.3	$45,230
Power User	78	2.1	61.7	1,542.8	$98,450
Occasional User	189	21.5	9.2	43.1	$12,890
Dormant User	67	45.3	27.4	189.7	$8,120
Churning User	124	58.9	5.1	28.9	$3,450

Role-Based Segmentation

-- Feature usage by user role
WITH user_roles AS (
  SELECT DISTINCT
    user_id,
    tenant_id,
    JSON_EXTRACT_SCALAR(traits, '$.role') AS role
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'user.identified'
    AND JSON_EXTRACT_SCALAR(traits, '$.role') IS NOT NULL
),

feature_usage_by_role AS (
  SELECT
    ur.role,
    JSON_EXTRACT_SCALAR(e.properties, '$.feature_key') AS feature_key,
    COUNT(DISTINCT e.user_id) AS unique_users,
    COUNT(*) AS total_uses
  FROM user_roles ur
  JOIN `bioqms-prod.analytics.events` e ON ur.user_id = e.user_id AND ur.tenant_id = e.tenant_id
  WHERE e.event_name = 'feature.used'
    AND DATE(e.timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY ur.role, feature_key
)

SELECT
  role,
  feature_key,
  unique_users,
  total_uses,
  ROUND(total_uses * 1.0 / unique_users, 2) AS avg_uses_per_user
FROM feature_usage_by_role
ORDER BY role, total_uses DESC;

Example Output

role	feature_key	unique_users	total_uses	avg_uses_per_user
operator	work_order_templates	87	1,234	14.18
operator	electronic_signatures	156	3,456	22.15
operator	ai_copilot	45	289	6.42
qa_reviewer	batch_processing	23	187	8.13
qa_reviewer	electronic_signatures	34	892	26.24
manager	analytics_dashboard	12	567	47.25
manager	audit_trail	18	234	13.00

Insight: QA Reviewers use electronic signatures most frequently (26x/user), while Managers heavily rely on analytics dashboards (47x/user).

Tier-Based Segmentation

-- Feature usage and retention by subscription tier
WITH tier_users AS (
  SELECT DISTINCT
    user_id,
    tenant_id,
    JSON_EXTRACT_SCALAR(traits, '$.subscription_tier') AS tier
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'user.identified'
),

tier_activity AS (
  SELECT
    tu.tier,
    COUNT(DISTINCT e.user_id) AS active_users,
    AVG(
      DATE_DIFF(
        DATE(MAX(e.timestamp)),
        DATE(MIN(e.timestamp)),
        DAY
      )
    ) AS avg_tenure_days,
    AVG(
      COUNT(DISTINCT DATE(e.timestamp))
    ) AS avg_active_days
  FROM tier_users tu
  JOIN `bioqms-prod.analytics.events` e ON tu.user_id = e.user_id AND tu.tenant_id = e.tenant_id
  WHERE DATE(e.timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY tu.tier, tu.user_id, tu.tenant_id
)

SELECT
  tier,
  COUNT(*) AS user_count,
  AVG(avg_tenure_days) AS avg_tenure_days,
  AVG(avg_active_days) AS avg_active_days
FROM tier_activity
GROUP BY tier
ORDER BY
  CASE tier
    WHEN 'enterprise' THEN 1
    WHEN 'professional' THEN 2
    WHEN 'starter' THEN 3
  END;

Example Output

tier	user_count	avg_tenure_days	avg_active_days
enterprise	34	127.5	67.3
professional	189	98.2	41.8
starter	287	45.7	18.2

J.2.4: Product KPI Dashboard

Core Product Metrics

DAU/WAU/MAU Calculation

-- Daily, Weekly, and Monthly Active Users
WITH active_users AS (
  SELECT
    DATE(timestamp) AS activity_date,
    user_id,
    tenant_id
  FROM `bioqms-prod.analytics.events`
  WHERE event_name IN ('page.viewed', 'work_order.created', 'feature.used', 'api.request_completed')
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY activity_date, user_id, tenant_id
)

SELECT
  -- DAU (today)
  COUNT(DISTINCT CASE WHEN activity_date = CURRENT_DATE() THEN user_id END) AS dau,

  -- WAU (last 7 days)
  COUNT(DISTINCT CASE WHEN activity_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) THEN user_id END) AS wau,

  -- MAU (last 30 days)
  COUNT(DISTINCT user_id) AS mau,

  -- DAU/MAU ratio (stickiness)
  ROUND(
    COUNT(DISTINCT CASE WHEN activity_date = CURRENT_DATE() THEN user_id END) * 100.0 /
    NULLIF(COUNT(DISTINCT user_id), 0),
    2
  ) AS stickiness_ratio,

  -- WAU/MAU ratio
  ROUND(
    COUNT(DISTINCT CASE WHEN activity_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) THEN user_id END) * 100.0 /
    NULLIF(COUNT(DISTINCT user_id), 0),
    2
  ) AS engagement_ratio

FROM active_users;

Example Output

dau	wau	mau	stickiness_ratio	engagement_ratio
487	1,823	3,456	14.09%	52.77%

Benchmark: Good SaaS products aim for 20%+ DAU/MAU (daily stickiness) and 60%+ WAU/MAU (weekly engagement).

Feature Adoption Matrix

2x2 Matrix: Usage Frequency vs. User Penetration

-- Feature adoption matrix
WITH feature_usage AS (
  SELECT
    JSON_EXTRACT_SCALAR(properties, '$.feature_key') AS feature_key,
    user_id,
    COUNT(*) AS use_count
  FROM `bioqms-prod.analytics.events`
  WHERE event_name = 'feature.used'
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY feature_key, user_id
),

total_users AS (
  SELECT COUNT(DISTINCT user_id) AS total
  FROM `bioqms-prod.analytics.events`
  WHERE event_name IN ('page.viewed', 'feature.used')
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
),

feature_metrics AS (
  SELECT
    feature_key,
    COUNT(DISTINCT user_id) AS users_using,
    AVG(use_count) AS avg_uses_per_user,
    SUM(use_count) AS total_uses
  FROM feature_usage
  GROUP BY feature_key
)

SELECT
  fm.feature_key,
  fm.users_using,
  ROUND(fm.users_using * 100.0 / tu.total, 2) AS penetration_pct,
  ROUND(fm.avg_uses_per_user, 2) AS avg_frequency,
  fm.total_uses,

  -- Quadrant classification
  CASE
    WHEN fm.users_using * 100.0 / tu.total >= 50 AND fm.avg_uses_per_user >= 10 THEN 'Core Feature'
    WHEN fm.users_using * 100.0 / tu.total >= 50 AND fm.avg_uses_per_user < 10 THEN 'Broad but Shallow'
    WHEN fm.users_using * 100.0 / tu.total < 50 AND fm.avg_uses_per_user >= 10 THEN 'Niche Power Feature'
    ELSE 'Low Engagement'
  END AS quadrant

FROM feature_metrics fm
CROSS JOIN total_users tu
ORDER BY fm.total_uses DESC;

Example Output

feature_key	users_using	penetration_pct	avg_frequency	total_uses	quadrant
electronic_signatures	1,890	54.69%	22.15	41,864	Core Feature
work_order_templates	1,205	34.87%	14.18	17,087	Broad but Shallow
ai_copilot	500	14.47%	18.56	9,280	Niche Power Feature
batch_processing	287	8.31%	8.13	2,333	Low Engagement

Visualization

// components/FeatureAdoptionMatrix.tsx
import React from 'react';
import { ScatterChart, Scatter, XAxis, YAxis, Tooltip, Cell } from 'recharts';

interface FeatureMetric {
  feature_key: string;
  penetration_pct: number;
  avg_frequency: number;
  quadrant: string;
}

export const FeatureAdoptionMatrix: React.FC<{ data: FeatureMetric[] }> = ({ data }) => {
  const getColor = (quadrant: string): string => {
    switch (quadrant) {
      case 'Core Feature': return '#4caf50';
      case 'Broad but Shallow': return '#ff9800';
      case 'Niche Power Feature': return '#2196f3';
      case 'Low Engagement': return '#f44336';
      default: return '#757575';
    }
  };

  return (
    <ScatterChart width={800} height={600}>
      <XAxis
        type="number"
        dataKey="penetration_pct"
        name="User Penetration (%)"
        domain={[0, 100]}
        label={{ value: 'User Penetration (%)', position: 'insideBottom', offset: -5 }}
      />
      <YAxis
        type="number"
        dataKey="avg_frequency"
        name="Avg Uses per User"
        label={{ value: 'Average Frequency', angle: -90, position: 'insideLeft' }}
      />
      <Tooltip cursor={{ strokeDasharray: '3 3' }} />

      {/* Quadrant lines */}
      <line x1="50%" y1="0" x2="50%" y2="100%" stroke="#ccc" strokeDasharray="5 5" />
      <line x1="0" y1="10" x2="100%" y2="10" stroke="#ccc" strokeDasharray="5 5" />

      <Scatter name="Features" data={data}>
        {data.map((entry, index) => (
          <Cell key={`cell-${index}`} fill={getColor(entry.quadrant)} />
        ))}
      </Scatter>
    </ScatterChart>
  );
};

Stickiness Ratio Tracking

Definition: DAU / MAU (percentage of monthly users who are active daily)

-- Daily stickiness ratio over time
WITH daily_metrics AS (
  SELECT
    DATE(timestamp) AS activity_date,
    COUNT(DISTINCT user_id) AS dau
  FROM `bioqms-prod.analytics.events`
  WHERE event_name IN ('page.viewed', 'work_order.created', 'feature.used')
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY activity_date
),

monthly_mau AS (
  SELECT
    DATE(timestamp) AS activity_date,
    COUNT(DISTINCT user_id) OVER (
      ORDER BY DATE(timestamp)
      ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) AS mau_rolling_30d
  FROM `bioqms-prod.analytics.events`
  WHERE event_name IN ('page.viewed', 'work_order.created', 'feature.used')
    AND DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY activity_date
)

SELECT
  dm.activity_date,
  dm.dau,
  mm.mau_rolling_30d AS mau,
  ROUND(dm.dau * 100.0 / NULLIF(mm.mau_rolling_30d, 0), 2) AS stickiness_ratio
FROM daily_metrics dm
JOIN monthly_mau mm ON dm.activity_date = mm.activity_date
ORDER BY dm.activity_date;

Example Output (Last 7 Days)

activity_date	dau	mau	stickiness_ratio
2026-02-10	412	3,234	12.74%
2026-02-11	456	3,287	13.87%
2026-02-12	489	3,312	14.76%
2026-02-13	501	3,345	14.97%
2026-02-14	478	3,389	14.10%
2026-02-15	465	3,412	13.63%
2026-02-16	487	3,456	14.09%

J.2.5: Automated Insight Reports

Weekly Product Health Email

Template

<!-- email_templates/weekly_product_health.html -->
<!DOCTYPE html>
<html>
<head>
  <style>
    body { font-family: Arial, sans-serif; line-height: 1.6; color: #333; }
    .header { background: #4caf50; color: white; padding: 20px; text-align: center; }
    .metric { border: 1px solid #ddd; padding: 15px; margin: 10px 0; border-radius: 5px; }
    .metric-value { font-size: 32px; font-weight: bold; color: #4caf50; }
    .metric-label { font-size: 14px; color: #666; }
    .metric-change { font-size: 14px; }
    .positive { color: #4caf50; }
    .negative { color: #f44336; }
    table { width: 100%; border-collapse: collapse; margin: 20px 0; }
    th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
    th { background: #f5f5f5; }
  </style>
</head>
<body>
  <div class="header">
    <h1>BIO-QMS Weekly Product Health Report</h1>
    <p>Week of {{ week_start }} to {{ week_end }}</p>
  </div>

  <div style="padding: 20px;">
    <h2>Key Metrics</h2>

    <div class="metric">
      <div class="metric-label">Weekly Active Users (WAU)</div>
      <div class="metric-value">{{ wau }}</div>
      <div class="metric-change {{ 'positive' if wau_change >= 0 else 'negative' }}">
        {{ wau_change_pct }}% vs. last week
      </div>
    </div>

    <div class="metric">
      <div class="metric-label">Stickiness Ratio (DAU/MAU)</div>
      <div class="metric-value">{{ stickiness_ratio }}%</div>
      <div class="metric-change {{ 'positive' if stickiness_change >= 0 else 'negative' }}">
        {{ stickiness_change_pct }}pp vs. last week
      </div>
    </div>

    <div class="metric">
      <div class="metric-label">New User Signups</div>
      <div class="metric-value">{{ new_signups }}</div>
      <div class="metric-change {{ 'positive' if signup_change >= 0 else 'negative' }}">
        {{ signup_change_pct }}% vs. last week
      </div>
    </div>

    <h2>Top Features (by usage)</h2>
    <table>
      <thead>
        <tr>
          <th>Feature</th>
          <th>Unique Users</th>
          <th>Total Uses</th>
          <th>Avg Uses/User</th>
          <th>Change</th>
        </tr>
      </thead>
      <tbody>
        {% for feature in top_features %}
        <tr>
          <td>{{ feature.name }}</td>
          <td>{{ feature.users }}</td>
          <td>{{ feature.uses }}</td>
          <td>{{ feature.avg_uses }}</td>
          <td class="{{ 'positive' if feature.change >= 0 else 'negative' }}">
            {{ feature.change_pct }}%
          </td>
        </tr>
        {% endfor %}
      </tbody>
    </table>

    <h2>Insights & Recommendations</h2>
    <ul>
      {% for insight in insights %}
      <li>{{ insight }}</li>
      {% endfor %}
    </ul>

    <p style="margin-top: 40px; font-size: 12px; color: #666;">
      This report was generated automatically by the BIO-QMS Analytics Engine.
      <br>
      <a href="https://dashboard.bioqms.com/analytics">View full dashboard</a>
    </p>
  </div>
</body>
</html>

Report Generation Script

# analytics/reports.py
from django.core.mail import EmailMultiAlternatives
from django.template.loader import render_to_string
from google.cloud import bigquery
from datetime import datetime, timedelta
import os

class WeeklyReportGenerator:
    def __init__(self):
        self.client = bigquery.Client()

    def generate_and_send(self, recipient_emails: list):
        # Calculate date ranges
        week_end = datetime.now().date()
        week_start = week_end - timedelta(days=7)
        prev_week_end = week_start - timedelta(days=1)
        prev_week_start = prev_week_end - timedelta(days=7)

        # Fetch metrics
        current_metrics = self._fetch_metrics(week_start, week_end)
        prev_metrics = self._fetch_metrics(prev_week_start, prev_week_end)

        # Calculate changes
        wau_change_pct = self._calculate_change(current_metrics['wau'], prev_metrics['wau'])
        stickiness_change_pct = current_metrics['stickiness_ratio'] - prev_metrics['stickiness_ratio']
        signup_change_pct = self._calculate_change(current_metrics['new_signups'], prev_metrics['new_signups'])

        # Fetch top features
        top_features = self._fetch_top_features(week_start, week_end, prev_week_start, prev_week_end)

        # Generate insights
        insights = self._generate_insights(current_metrics, prev_metrics, top_features)

        # Render email
        html_content = render_to_string('email_templates/weekly_product_health.html', {
            'week_start': week_start.strftime('%B %d, %Y'),
            'week_end': week_end.strftime('%B %d, %Y'),
            'wau': current_metrics['wau'],
            'wau_change_pct': f"+{wau_change_pct:.1f}" if wau_change_pct >= 0 else f"{wau_change_pct:.1f}",
            'stickiness_ratio': f"{current_metrics['stickiness_ratio']:.2f}",
            'stickiness_change_pct': f"+{stickiness_change_pct:.2f}" if stickiness_change_pct >= 0 else f"{stickiness_change_pct:.2f}",
            'new_signups': current_metrics['new_signups'],
            'signup_change_pct': f"+{signup_change_pct:.1f}" if signup_change_pct >= 0 else f"{signup_change_pct:.1f}",
            'top_features': top_features,
            'insights': insights
        })

        # Send email
        email = EmailMultiAlternatives(
            subject=f"BIO-QMS Product Health Report - Week of {week_start.strftime('%b %d')}",
            body="Please view this email in an HTML-capable email client.",
            from_email="analytics@bioqms.com",
            to=recipient_emails
        )
        email.attach_alternative(html_content, "text/html")
        email.send()

    def _fetch_metrics(self, start_date, end_date):
        query = f"""
        WITH active_users AS (
          SELECT DISTINCT user_id
          FROM `bioqms-prod.analytics.events`
          WHERE DATE(timestamp) BETWEEN '{start_date}' AND '{end_date}'
            AND event_name IN ('page.viewed', 'work_order.created', 'feature.used')
        ),
        daily_users AS (
          SELECT DATE(timestamp) AS activity_date, COUNT(DISTINCT user_id) AS dau
          FROM `bioqms-prod.analytics.events`
          WHERE DATE(timestamp) BETWEEN '{start_date}' AND '{end_date}'
            AND event_name IN ('page.viewed', 'work_order.created', 'feature.used')
          GROUP BY activity_date
        ),
        new_signups AS (
          SELECT COUNT(DISTINCT user_id) AS count
          FROM `bioqms-prod.analytics.events`
          WHERE DATE(timestamp) BETWEEN '{start_date}' AND '{end_date}'
            AND event_name = 'user.identified'
        )
        SELECT
          (SELECT COUNT(*) FROM active_users) AS wau,
          (SELECT AVG(dau) FROM daily_users) AS avg_dau,
          (SELECT AVG(dau) * 100.0 / (SELECT COUNT(*) FROM active_users) FROM daily_users) AS stickiness_ratio,
          (SELECT count FROM new_signups) AS new_signups
        """

        result = list(self.client.query(query).result())[0]
        return dict(result)

    def _fetch_top_features(self, current_start, current_end, prev_start, prev_end):
        query = f"""
        WITH current_week AS (
          SELECT
            JSON_EXTRACT_SCALAR(properties, '$.feature_key') AS feature_key,
            COUNT(DISTINCT user_id) AS users,
            COUNT(*) AS uses
          FROM `bioqms-prod.analytics.events`
          WHERE DATE(timestamp) BETWEEN '{current_start}' AND '{current_end}'
            AND event_name = 'feature.used'
          GROUP BY feature_key
        ),
        prev_week AS (
          SELECT
            JSON_EXTRACT_SCALAR(properties, '$.feature_key') AS feature_key,
            COUNT(*) AS uses
          FROM `bioqms-prod.analytics.events`
          WHERE DATE(timestamp) BETWEEN '{prev_start}' AND '{prev_end}'
            AND event_name = 'feature.used'
          GROUP BY feature_key
        )
        SELECT
          cw.feature_key AS name,
          cw.users,
          cw.uses,
          ROUND(cw.uses * 1.0 / cw.users, 2) AS avg_uses,
          ROUND((cw.uses - COALESCE(pw.uses, 0)) * 100.0 / NULLIF(pw.uses, 0), 1) AS change_pct
        FROM current_week cw
        LEFT JOIN prev_week pw ON cw.feature_key = pw.feature_key
        ORDER BY cw.uses DESC
        LIMIT 10
        """

        results = self.client.query(query).result()
        return [dict(row) for row in results]

    def _calculate_change(self, current, previous):
        if previous == 0:
            return 0
        return ((current - previous) / previous) * 100

    def _generate_insights(self, current_metrics, prev_metrics, top_features):
        insights = []

        # WAU trend
        wau_change = self._calculate_change(current_metrics['wau'], prev_metrics['wau'])
        if wau_change > 10:
            insights.append(f"🚀 Weekly active users grew {wau_change:.1f}% — strong user growth momentum!")
        elif wau_change < -10:
            insights.append(f"⚠️ Weekly active users declined {abs(wau_change):.1f}% — investigate user churn.")

        # Stickiness trend
        stickiness_change = current_metrics['stickiness_ratio'] - prev_metrics['stickiness_ratio']
        if stickiness_change > 2:
            insights.append(f"✨ Stickiness improved by {stickiness_change:.2f}pp — users are engaging more frequently.")
        elif stickiness_change < -2:
            insights.append(f"⚠️ Stickiness declined by {abs(stickiness_change):.2f}pp — consider retention initiatives.")

        # Feature insights
        growing_features = [f for f in top_features if f['change_pct'] > 20]
        if growing_features:
            insights.append(f"📈 Fast-growing features: {', '.join([f['name'] for f in growing_features[:3]])} — consider expanding these capabilities.")

        declining_features = [f for f in top_features if f['change_pct'] < -20]
        if declining_features:
            insights.append(f"📉 Declining features: {', '.join([f['name'] for f in declining_features[:3]])} — investigate user feedback or bugs.")

        return insights

# Celery task to send weekly reports
from celery import shared_task

@shared_task
def send_weekly_product_health_report():
    generator = WeeklyReportGenerator()

    # Fetch recipients from configuration
    recipients = [
        'product@bioqms.com',
        'engineering@bioqms.com',
        'executives@bioqms.com'
    ]

    generator.generate_and_send(recipients)

Anomaly Detection Alerts

Statistical Anomaly Detection

# analytics/anomaly_detection.py
from google.cloud import bigquery
import numpy as np
from scipy import stats
from django.core.mail import send_mail

class AnomalyDetector:
    def __init__(self):
        self.client = bigquery.Client()

    def check_for_anomalies(self):
        """
        Detect anomalies in key product metrics using statistical methods.
        """
        anomalies = []

        # Check DAU anomaly
        dau_anomaly = self._check_metric_anomaly(
            metric_name='DAU',
            query="""
            SELECT DATE(timestamp) AS date, COUNT(DISTINCT user_id) AS value
            FROM `bioqms-prod.analytics.events`
            WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
              AND event_name IN ('page.viewed', 'work_order.created', 'feature.used')
            GROUP BY date
            ORDER BY date
            """
        )
        if dau_anomaly:
            anomalies.append(dau_anomaly)

        # Check error rate anomaly
        error_rate_anomaly = self._check_metric_anomaly(
            metric_name='Error Rate',
            query="""
            SELECT
              DATE(timestamp) AS date,
              COUNTIF(event_name = 'error.occurred') * 100.0 / COUNT(*) AS value
            FROM `bioqms-prod.analytics.events`
            WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
            GROUP BY date
            ORDER BY date
            """
        )
        if error_rate_anomaly:
            anomalies.append(error_rate_anomaly)

        # Check API latency anomaly
        latency_anomaly = self._check_metric_anomaly(
            metric_name='API P95 Latency',
            query="""
            SELECT
              DATE(timestamp) AS date,
              APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(properties, '$.duration_ms') AS FLOAT64), 100)[OFFSET(95)] AS value
            FROM `bioqms-prod.analytics.events`
            WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
              AND event_name = 'api.request_completed'
            GROUP BY date
            ORDER BY date
            """
        )
        if latency_anomaly:
            anomalies.append(latency_anomaly)

        # Send alerts if anomalies detected
        if anomalies:
            self._send_anomaly_alert(anomalies)

        return anomalies

    def _check_metric_anomaly(self, metric_name: str, query: str, threshold_std: float = 2.5):
        """
        Detect anomalies using Z-score method (>2.5 standard deviations).
        """
        results = list(self.client.query(query).result())

        if len(results) < 14:
            return None  # Need at least 2 weeks of data

        values = [float(row['value']) for row in results]
        dates = [row['date'] for row in results]

        # Calculate mean and std of historical data (exclude last 2 days)
        historical = values[:-2]
        mean = np.mean(historical)
        std = np.std(historical)

        # Check last 2 days for anomalies
        recent = values[-2:]
        recent_dates = dates[-2:]

        for i, value in enumerate(recent):
            z_score = (value - mean) / std if std > 0 else 0

            if abs(z_score) > threshold_std:
                return {
                    'metric': metric_name,
                    'date': recent_dates[i],
                    'value': value,
                    'mean': mean,
                    'std': std,
                    'z_score': z_score,
                    'severity': 'critical' if abs(z_score) > 3 else 'warning'
                }

        return None

    def _send_anomaly_alert(self, anomalies: list):
        """
        Send email alert for detected anomalies.
        """
        subject = f"🚨 BIO-QMS Anomaly Alert - {len(anomalies)} anomalies detected"

        message_lines = ["Anomalies detected in BIO-QMS product metrics:\n"]

        for anomaly in anomalies:
            direction = "spike" if anomaly['z_score'] > 0 else "drop"
            message_lines.append(
                f"- {anomaly['metric']} on {anomaly['date']}: "
                f"{direction} of {abs(anomaly['z_score']):.2f} standard deviations "
                f"(value: {anomaly['value']:.2f}, mean: {anomaly['mean']:.2f})"
            )

        message_lines.append("\nView full dashboard: https://dashboard.bioqms.com/analytics")

        send_mail(
            subject=subject,
            message="\n".join(message_lines),
            from_email="alerts@bioqms.com",
            recipient_list=['oncall@bioqms.com', 'product@bioqms.com'],
            fail_silently=False
        )

# Celery task to check for anomalies daily
@shared_task
def check_product_anomalies():
    detector = AnomalyDetector()
    anomalies = detector.check_for_anomalies()
    return {'anomalies_found': len(anomalies), 'details': anomalies}

Monthly Review Deck (Auto-Generated)

Deck Structure

Executive Summary (1 slide)
User Growth Metrics (2 slides)
Feature Adoption Matrix (1 slide)
Retention Cohort Analysis (2 slides)
Top Insights & Recommendations (2 slides)

Deck Generation (Google Slides API)

# analytics/deck_generator.py
from google.oauth2 import service_account
from googleapiclient.discovery import build
from datetime import datetime, timedelta

class MonthlyDeckGenerator:
    def __init__(self):
        credentials = service_account.Credentials.from_service_account_file(
            'credentials/google-slides-sa.json',
            scopes=['https://www.googleapis.com/auth/presentations']
        )
        self.slides_service = build('slides', 'v1', credentials=credentials)

    def generate_deck(self, month: datetime):
        # Create presentation
        presentation = self.slides_service.presentations().create(body={
            'title': f'BIO-QMS Product Analytics - {month.strftime("%B %Y")}'
        }).execute()

        presentation_id = presentation['presentationId']

        # Add slides
        self._add_title_slide(presentation_id, month)
        self._add_user_growth_slides(presentation_id, month)
        self._add_feature_adoption_slide(presentation_id, month)
        self._add_retention_slides(presentation_id, month)
        self._add_insights_slide(presentation_id, month)

        # Share with stakeholders
        share_url = f"https://docs.google.com/presentation/d/{presentation_id}"

        return {
            'presentation_id': presentation_id,
            'share_url': share_url
        }

    def _add_title_slide(self, presentation_id, month):
        # Implementation details...
        pass

    # ... additional slide generation methods

Summary

This document provides comprehensive evidence for the implementation of Product Telemetry (J.1) and Feature Analytics (J.2) in the BIO-QMS platform. The system enables:

J.1 Product Telemetry:

Dual-pipeline event tracking (client + server)
Privacy-compliant user identification
Real-time monitoring dashboards
Data quality enforcement

J.2 Feature Analytics:

Feature adoption funnels with 5 stages
Cohort retention analysis
Behavioral segmentation (RFM model)
Automated weekly reports and anomaly alerts

Key Metrics Tracked:

DAU/WAU/MAU (Daily/Weekly/Monthly Active Users)
Stickiness ratio (DAU/MAU)
Feature penetration and frequency
Cohort retention curves
Revenue expansion by cohort

Technology Stack:

Collection: JavaScript SDK + Django middleware
Transport: Google Cloud Pub/Sub
Storage: BigQuery (partitioned tables, 24-month retention)
Analytics: SQL queries + Python notebooks
Visualization: Looker Studio + React dashboards

Compliance:

GDPR/CCPA opt-out mechanisms
PII scrubbing (automated regex-based)
24-month data retention with automated deletion
Audit trail for all privacy actions

Evidence Files:

Event schema: analytics/schemas/event-v1.0.json
Client SDK: @bioqms/analytics (npm package)
Server middleware: analytics/middleware.py
BigQuery tables: analytics.events, analytics.user_aliases
Dashboard components: dashboard/components/
Report generators: analytics/reports.py

Author: Claude (Opus 4.6) Track: J (Memory & Analytics) Version: 1.0.0 Date: February 16, 2026

Executive Summary​

J.1: Product Telemetry​

J.1.1: Event Tracking Framework​

Event Categories​

Event Schema v1.0​

Core Event Fields​

Client-Side Event Tracking​

Server-Side Event Tracking​

Event Naming Conventions​

J.1.2: Event Pipeline Architecture​

Pipeline Overview​

Events API (/api/v1/events)​

Cloud Pub/Sub Configuration​

BigQuery Streaming Insert​

Pipeline Reliability Features​

Pipeline Monitoring​

J.1.3: User Identification & Session Management​

Anonymous → Identified User Stitching​

Anonymous ID Generation​

User Alias Resolution​

Session Management​

Cross-Device Identification​

J.1.4: Privacy Controls & Compliance​

Regulatory Compliance​

Opt-Out Mechanism​

PII Scrubbing​

Data Retention Policy​

Data Export (GDPR Right to Portability)​

J.1.5: Real-Time Analytics Dashboard​

Dashboard Architecture​

Live Event Stream​

Volume Monitoring​

Data Quality Metrics​

J.2: Feature Analytics​

J.2.1: Feature Adoption Tracking​

Feature Flag Integration​

Adoption Funnel Stages​

Feature Adoption Query​

DAU/WAU/MAU per Feature​

J.2.2: Cohort Analysis​

Cohort Definition​

Retention Curve by Cohort​

Feature Adoption by Cohort​

Revenue Expansion by Cohort​

J.2.3: User Segmentation​

Behavioral Segments​

Role-Based Segmentation​

Tier-Based Segmentation​

J.2.4: Product KPI Dashboard​

Core Product Metrics​

Feature Adoption Matrix​

Stickiness Ratio Tracking​

J.2.5: Automated Insight Reports​

Weekly Product Health Email​

Anomaly Detection Alerts​

Monthly Review Deck (Auto-Generated)​

Summary​

Executive Summary

J.1: Product Telemetry

J.1.1: Event Tracking Framework

Event Categories

Event Schema v1.0

Core Event Fields

Client-Side Event Tracking

Server-Side Event Tracking

Event Naming Conventions

J.1.2: Event Pipeline Architecture

Pipeline Overview

Events API (`/api/v1/events`)

Cloud Pub/Sub Configuration

BigQuery Streaming Insert

Pipeline Reliability Features

Pipeline Monitoring

J.1.3: User Identification & Session Management

Anonymous → Identified User Stitching

Anonymous ID Generation

User Alias Resolution

Session Management

Cross-Device Identification

J.1.4: Privacy Controls & Compliance

Regulatory Compliance

Opt-Out Mechanism

PII Scrubbing

Data Retention Policy

Data Export (GDPR Right to Portability)

J.1.5: Real-Time Analytics Dashboard

Dashboard Architecture

Live Event Stream

Volume Monitoring

Data Quality Metrics

J.2: Feature Analytics

J.2.1: Feature Adoption Tracking

Feature Flag Integration

Adoption Funnel Stages

Feature Adoption Query

DAU/WAU/MAU per Feature

J.2.2: Cohort Analysis

Cohort Definition

Retention Curve by Cohort

Feature Adoption by Cohort

Revenue Expansion by Cohort

J.2.3: User Segmentation

Behavioral Segments

Role-Based Segmentation

Tier-Based Segmentation

J.2.4: Product KPI Dashboard

Core Product Metrics

Feature Adoption Matrix

Stickiness Ratio Tracking

J.2.5: Automated Insight Reports

Weekly Product Health Email

Anomaly Detection Alerts

Monthly Review Deck (Auto-Generated)

Summary