CODITECT Standard: Universal Session Format (CUSF)

Standard ID: CODITECT-STD-010 Version: 1.0.0 Effective Date: January 28, 2026 ADR Reference: ADR-134 JSON Schema: schemas/cusf-v1.0.0.json

1. Purpose

This standard defines the CODITECT Universal Session Format (CUSF) - a canonical JSONL format for LLM session exports that enables:

Multi-LLM Support: Unified format for Claude, Codex, Gemini, KIMI, and other LLMs
Reconstruction Capability: Full database rebuild from export files
Provenance Tracking: Complete metadata for source attribution and analytics
Streaming Processing: Line-by-line JSONL for memory-efficient handling

2. Scope

This standard applies to:

All session exports from CODITECT-managed LLMs
The /sx command output format
Session import/reconstruction operations
Context extraction pipeline inputs
Cross-LLM session analytics

3. Format Specification

3.1 File Structure

CUSF files use JSONL (JSON Lines) format with one JSON object per line:

{meta_entry}
{session_start}
{message}
{tool_use}
{tool_result}
{message}
...
{session_end}

File Naming Convention:

{timestamp}-session-{llm_source}-{session_id}--{export_type}.jsonl

Example: 2026-01-28T13-06-48Z-session-claude-abc123--context-export.jsonl

3.2 Entry Types

3.2.1 Meta Entry (REQUIRED - First Line)

{
  "_meta": {
    "format": "cusf",
    "version": "1.0.0",
    "exported_at": "2026-01-28T12:00:00Z",
    "exporter": "coditect-sx/1.0.0"
  }
}

Field	Type	Required	Description
`_meta.format`	string	✅	Must be `"cusf"`
`_meta.version`	string	✅	Semantic version (e.g., `"1.0.0"`)
`_meta.exported_at`	ISO 8601	✅	Export timestamp in UTC
`_meta.exporter`	string	✅	Tool identifier (e.g., `"coditect-sx/1.0.0"`)

3.2.2 Session Start

{
  "type": "session_start",
  "session_id": "abc123-def456-ghi789",
  "llm_source": "claude",
  "llm_model": "claude-opus-4-5-20251101",
  "started_at": "2026-01-28T10:00:00Z",
  "project_path": "/Users/user/project",
  "git_branch": "main",
  "cwd": "/Users/user/project",
  "machine_id": "machine-uuid",
  "tenant_id": "tenant-001",
  "user_id": "user-001"
}

Field	Type	Required	Description
`type`	string	✅	Must be `"session_start"`
`session_id`	UUID	✅	Unique session identifier
`llm_source`	enum	✅	`"claude"`, `"codex"`, `"gemini"`, `"kimi"`, `"gpt"`, `"other"`
`llm_model`	string	❌	Specific model ID
`started_at`	ISO 8601	✅	Session start timestamp
`project_path`	string	❌	Project directory
`git_branch`	string	❌	Git branch at start
`cwd`	string	❌	Working directory
`machine_id`	string	❌	CODITECT machine UUID
`tenant_id`	string	❌	Multi-tenant org ID
`user_id`	string	❌	User identifier

3.2.3 Message

{
  "type": "message",
  "role": "user",
  "content": "Help me fix the bug in auth.py",
  "timestamp": "2026-01-28T10:01:00Z",
  "parent_id": null,
  "message_id": "msg-001",
  "model": "claude-opus-4-5-20251101",
  "usage": {
    "input": 1000,
    "output": 500,
    "cache_read": 200,
    "cache_write": 100
  },
  "thinking": "Extended thinking content here...",
  "stop_reason": "end_turn"
}

Field	Type	Required	Description
`type`	string	✅	Must be `"message"`
`role`	enum	✅	`"user"`, `"assistant"`, `"system"`
`content`	string	✅	Message text content
`timestamp`	ISO 8601	✅	Message timestamp
`message_id`	string	✅	Unique message identifier
`parent_id`	string/null	❌	Parent message for threading
`model`	string	❌	Model that generated (assistant only)
`usage`	object	❌	Token usage statistics
`usage.input`	integer	❌	Input tokens
`usage.output`	integer	❌	Output tokens
`usage.cache_read`	integer	❌	Cached input tokens
`usage.cache_write`	integer	❌	Tokens written to cache
`thinking`	string	❌	Extended thinking content
`stop_reason`	enum	❌	`"end_turn"`, `"max_tokens"`, `"tool_use"`, `"error"`

3.2.4 Tool Use

{
  "type": "tool_use",
  "tool_name": "Read",
  "tool_input": {
    "file_path": "/path/to/file.py"
  },
  "tool_id": "tool-001",
  "timestamp": "2026-01-28T10:02:00Z",
  "parent_id": "msg-002"
}

Field	Type	Required	Description
`type`	string	✅	Must be `"tool_use"`
`tool_name`	string	✅	Tool name (Read, Write, Bash, etc.)
`tool_input`	object	❌	Tool input parameters
`tool_id`	string	✅	Unique tool invocation ID
`timestamp`	ISO 8601	✅	Invocation timestamp
`parent_id`	string	❌	Parent assistant message ID

3.2.5 Tool Result

{
  "type": "tool_result",
  "tool_id": "tool-001",
  "result": "File content here...",
  "is_error": false,
  "error_message": null,
  "timestamp": "2026-01-28T10:02:01Z",
  "truncated": false
}

Field	Type	Required	Description
`type`	string	✅	Must be `"tool_result"`
`tool_id`	string	✅	Matching tool_use ID
`result`	string	❌	Tool output (may be truncated)
`is_error`	boolean	❌	Default: `false`
`error_message`	string	❌	Error details if `is_error=true`
`timestamp`	ISO 8601	✅	Result timestamp
`truncated`	boolean	❌	Whether result was truncated

3.2.6 Session End

{
  "type": "session_end",
  "session_id": "abc123-def456-ghi789",
  "ended_at": "2026-01-28T12:00:00Z",
  "total_messages": 42,
  "total_tokens": {
    "input": 50000,
    "output": 25000
  },
  "end_reason": "export"
}

Field	Type	Required	Description
`type`	string	✅	Must be `"session_end"`
`session_id`	UUID	✅	Matching session_start ID
`ended_at`	ISO 8601	✅	Session end timestamp
`total_messages`	integer	❌	Message count
`total_tokens.input`	integer	❌	Total input tokens
`total_tokens.output`	integer	❌	Total output tokens
`end_reason`	enum	❌	`"user_exit"`, `"export"`, `"context_limit"`, `"error"`, `"timeout"`

4. LLM-Specific Normalization

4.1 Claude

Native Field	CUSF Field	Transformation
`role: "human"`	`role: "user"`	Map role name
`content[].text`	`content`	Extract text from content blocks
`content[].type: "tool_use"`	`tool_use` entry	Separate entry
`content[].type: "tool_result"`	`tool_result` entry	Separate entry
`stop_reason`	`stop_reason`	Direct mapping

4.2 Codex

Native Field	CUSF Field	Transformation
`role: "developer"`	`role: "system"`	Map role name
`created_at` (Unix epoch)	`timestamp`	Convert to ISO 8601
No tool tracking	Skip tool entries	Codex CLI format differs

4.3 Gemini

Native Field	CUSF Field	Transformation
`role: "model"`	`role: "assistant"`	Map role name
`parts[].text`	`content`	Join text parts
`parts[].functionCall`	`tool_use` entry	Extract to separate entry
`parts[].functionResponse`	`tool_result` entry	Extract to separate entry

4.4 KIMI

Native Field	CUSF Field	Transformation
From context.jsonl	Direct mapping	Most fields align
From wire.jsonl	Full fidelity	Lossless format
`model_id`	`llm_model`	Rename field

5. Validation

5.1 Schema Validation

Validate CUSF files against the JSON Schema:

# Using Python jsonschema
python3 -c "
import json
from jsonschema import validate, Draft202012Validator

schema = json.load(open('schemas/cusf-v1.0.0.json'))
validator = Draft202012Validator(schema)

with open('export.jsonl') as f:
    for line_num, line in enumerate(f, 1):
        entry = json.loads(line)
        try:
            validator.validate(entry)
        except Exception as e:
            print(f'Line {line_num}: {e.message}')
"

5.2 Structural Validation

Rule	Description
First line	Must be `_meta` entry
`session_start`	Must appear before any messages
`tool_result`	Must have matching `tool_use` by `tool_id`
`session_end.session_id`	Must match `session_start.session_id`
Timestamps	Must be chronologically ordered

5.3 Reconstruction Validation

Roundtrip test: Export → Import → Re-export should produce identical content (excluding _meta.exported_at).

6. Implementation Reference

6.1 Key Files

Component	Path	Purpose
JSON Schema	`schemas/cusf-v1.0.0.json`	Formal schema definition
Formatter	`scripts/core/cusf_formatter.py`	Normalize all LLM formats
Claude Extractor	`scripts/extractors/claude_extractor.py`	Claude → CUSF
Codex Extractor	`scripts/extractors/codex_extractor.py`	Codex → CUSF
Gemini Extractor	`scripts/extractors/gemini_extractor.py`	Gemini → CUSF
KIMI Extractor	`scripts/extractors/kimi_extractor.py`	KIMI → CUSF
Export Script	`scripts/session-export.py`	Unified `/sx` command
Import Script	`scripts/session-import.py`	Reconstruction importer

6.2 Python Usage

from scripts.core.cusf_formatter import CUSFFormatter

formatter = CUSFFormatter()

# Normalize any LLM format to CUSF
entries = formatter.normalize(
    raw_messages=messages,
    llm_source="claude",
    session_id="abc123"
)

# Write CUSF file
with open("export.jsonl", "w") as f:
    for entry in entries:
        f.write(json.dumps(entry) + "\n")

6.3 Command Line Usage

# Export current session
/sx

# Export specific LLM
/sx --llm claude
/sx --llm codex
/sx --llm gemini
/sx --llm kimi

# Export with reconstruction metadata
/sx --reconstruct

# Validate CUSF file
python3 scripts/validate-cusf.py export.jsonl

7. Versioning

7.1 Semantic Versioning

CUSF uses semantic versioning (MAJOR.MINOR.PATCH):

Change Type	Version Bump	Example
Breaking schema change	MAJOR	New required field
New optional field	MINOR	Add `thinking` field
Bug fix, clarification	PATCH	Fix regex pattern

7.2 Backward Compatibility

Readers MUST accept unknown fields gracefully
Writers SHOULD NOT add fields not in the schema
MAJOR version changes MAY break readers

Standard	Relationship
ADR-134	Unified watcher that triggers exports
ADR-080	Context extraction that processes CUSF
ADR-118	Database schema for imported sessions

9. Compliance

9.1 Quality Grading

Grade	Criteria
A (90-100%)	Full schema compliance, all optional fields populated
B (80-89%)	Required fields present, key optionals (usage, timestamps)
C (70-79%)	Required fields only, passes validation
D (60-69%)	Minor validation errors
F (<60%)	Missing required fields, invalid structure

9.2 Validation Command

# Validate CUSF export
python3 scripts/validate-cusf.py --grade export.jsonl

# Expected output:
# ✅ Schema validation: PASS
# ✅ Structural validation: PASS
# ✅ Reconstruction test: PASS
# Grade: A (95%)

Last Updated: January 28, 2026 Standard Owner: CODITECT Core Team Approval: ADR-134 Implementation

1. Purpose​

2. Scope​

3. Format Specification​

3.1 File Structure​

3.2 Entry Types​

3.2.1 Meta Entry (REQUIRED - First Line)​

3.2.2 Session Start​

3.2.3 Message​

3.2.4 Tool Use​

3.2.5 Tool Result​

3.2.6 Session End​

4. LLM-Specific Normalization​

4.1 Claude​

4.2 Codex​

4.3 Gemini​

4.4 KIMI​

5. Validation​

5.1 Schema Validation​

5.2 Structural Validation​

5.3 Reconstruction Validation​

6. Implementation Reference​

6.1 Key Files​

6.2 Python Usage​

6.3 Command Line Usage​

7. Versioning​

7.1 Semantic Versioning​

7.2 Backward Compatibility​

8. Related Standards​

9. Compliance​

9.1 Quality Grading​

9.2 Validation Command​