CODITECT Standard: Universal Session Format (CUSF)
Standard ID: CODITECT-STD-010 Version: 1.0.0 Effective Date: January 28, 2026 ADR Reference: ADR-134 JSON Schema: schemas/cusf-v1.0.0.json
1. Purpose
This standard defines the CODITECT Universal Session Format (CUSF) - a canonical JSONL format for LLM session exports that enables:
- Multi-LLM Support: Unified format for Claude, Codex, Gemini, KIMI, and other LLMs
- Reconstruction Capability: Full database rebuild from export files
- Provenance Tracking: Complete metadata for source attribution and analytics
- Streaming Processing: Line-by-line JSONL for memory-efficient handling
2. Scope
This standard applies to:
- All session exports from CODITECT-managed LLMs
- The
/sxcommand output format - Session import/reconstruction operations
- Context extraction pipeline inputs
- Cross-LLM session analytics
3. Format Specification
3.1 File Structure
CUSF files use JSONL (JSON Lines) format with one JSON object per line:
{meta_entry}
{session_start}
{message}
{tool_use}
{tool_result}
{message}
...
{session_end}
File Naming Convention:
{timestamp}-session-{llm_source}-{session_id}--{export_type}.jsonl
Example: 2026-01-28T13-06-48Z-session-claude-abc123--context-export.jsonl
3.2 Entry Types
3.2.1 Meta Entry (REQUIRED - First Line)
{
"_meta": {
"format": "cusf",
"version": "1.0.0",
"exported_at": "2026-01-28T12:00:00Z",
"exporter": "coditect-sx/1.0.0"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
_meta.format | string | ✅ | Must be "cusf" |
_meta.version | string | ✅ | Semantic version (e.g., "1.0.0") |
_meta.exported_at | ISO 8601 | ✅ | Export timestamp in UTC |
_meta.exporter | string | ✅ | Tool identifier (e.g., "coditect-sx/1.0.0") |
3.2.2 Session Start
{
"type": "session_start",
"session_id": "abc123-def456-ghi789",
"llm_source": "claude",
"llm_model": "claude-opus-4-5-20251101",
"started_at": "2026-01-28T10:00:00Z",
"project_path": "/Users/user/project",
"git_branch": "main",
"cwd": "/Users/user/project",
"machine_id": "machine-uuid",
"tenant_id": "tenant-001",
"user_id": "user-001"
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "session_start" |
session_id | UUID | ✅ | Unique session identifier |
llm_source | enum | ✅ | "claude", "codex", "gemini", "kimi", "gpt", "other" |
llm_model | string | ❌ | Specific model ID |
started_at | ISO 8601 | ✅ | Session start timestamp |
project_path | string | ❌ | Project directory |
git_branch | string | ❌ | Git branch at start |
cwd | string | ❌ | Working directory |
machine_id | string | ❌ | CODITECT machine UUID |
tenant_id | string | ❌ | Multi-tenant org ID |
user_id | string | ❌ | User identifier |
3.2.3 Message
{
"type": "message",
"role": "user",
"content": "Help me fix the bug in auth.py",
"timestamp": "2026-01-28T10:01:00Z",
"parent_id": null,
"message_id": "msg-001",
"model": "claude-opus-4-5-20251101",
"usage": {
"input": 1000,
"output": 500,
"cache_read": 200,
"cache_write": 100
},
"thinking": "Extended thinking content here...",
"stop_reason": "end_turn"
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "message" |
role | enum | ✅ | "user", "assistant", "system" |
content | string | ✅ | Message text content |
timestamp | ISO 8601 | ✅ | Message timestamp |
message_id | string | ✅ | Unique message identifier |
parent_id | string/null | ❌ | Parent message for threading |
model | string | ❌ | Model that generated (assistant only) |
usage | object | ❌ | Token usage statistics |
usage.input | integer | ❌ | Input tokens |
usage.output | integer | ❌ | Output tokens |
usage.cache_read | integer | ❌ | Cached input tokens |
usage.cache_write | integer | ❌ | Tokens written to cache |
thinking | string | ❌ | Extended thinking content |
stop_reason | enum | ❌ | "end_turn", "max_tokens", "tool_use", "error" |
3.2.4 Tool Use
{
"type": "tool_use",
"tool_name": "Read",
"tool_input": {
"file_path": "/path/to/file.py"
},
"tool_id": "tool-001",
"timestamp": "2026-01-28T10:02:00Z",
"parent_id": "msg-002"
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "tool_use" |
tool_name | string | ✅ | Tool name (Read, Write, Bash, etc.) |
tool_input | object | ❌ | Tool input parameters |
tool_id | string | ✅ | Unique tool invocation ID |
timestamp | ISO 8601 | ✅ | Invocation timestamp |
parent_id | string | ❌ | Parent assistant message ID |
3.2.5 Tool Result
{
"type": "tool_result",
"tool_id": "tool-001",
"result": "File content here...",
"is_error": false,
"error_message": null,
"timestamp": "2026-01-28T10:02:01Z",
"truncated": false
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "tool_result" |
tool_id | string | ✅ | Matching tool_use ID |
result | string | ❌ | Tool output (may be truncated) |
is_error | boolean | ❌ | Default: false |
error_message | string | ❌ | Error details if is_error=true |
timestamp | ISO 8601 | ✅ | Result timestamp |
truncated | boolean | ❌ | Whether result was truncated |
3.2.6 Session End
{
"type": "session_end",
"session_id": "abc123-def456-ghi789",
"ended_at": "2026-01-28T12:00:00Z",
"total_messages": 42,
"total_tokens": {
"input": 50000,
"output": 25000
},
"end_reason": "export"
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✅ | Must be "session_end" |
session_id | UUID | ✅ | Matching session_start ID |
ended_at | ISO 8601 | ✅ | Session end timestamp |
total_messages | integer | ❌ | Message count |
total_tokens.input | integer | ❌ | Total input tokens |
total_tokens.output | integer | ❌ | Total output tokens |
end_reason | enum | ❌ | "user_exit", "export", "context_limit", "error", "timeout" |
4. LLM-Specific Normalization
4.1 Claude
| Native Field | CUSF Field | Transformation |
|---|---|---|
role: "human" | role: "user" | Map role name |
content[].text | content | Extract text from content blocks |
content[].type: "tool_use" | tool_use entry | Separate entry |
content[].type: "tool_result" | tool_result entry | Separate entry |
stop_reason | stop_reason | Direct mapping |
4.2 Codex
| Native Field | CUSF Field | Transformation |
|---|---|---|
role: "developer" | role: "system" | Map role name |
created_at (Unix epoch) | timestamp | Convert to ISO 8601 |
| No tool tracking | Skip tool entries | Codex CLI format differs |
4.3 Gemini
| Native Field | CUSF Field | Transformation |
|---|---|---|
role: "model" | role: "assistant" | Map role name |
parts[].text | content | Join text parts |
parts[].functionCall | tool_use entry | Extract to separate entry |
parts[].functionResponse | tool_result entry | Extract to separate entry |
4.4 KIMI
| Native Field | CUSF Field | Transformation |
|---|---|---|
| From context.jsonl | Direct mapping | Most fields align |
| From wire.jsonl | Full fidelity | Lossless format |
model_id | llm_model | Rename field |
5. Validation
5.1 Schema Validation
Validate CUSF files against the JSON Schema:
# Using Python jsonschema
python3 -c "
import json
from jsonschema import validate, Draft202012Validator
schema = json.load(open('schemas/cusf-v1.0.0.json'))
validator = Draft202012Validator(schema)
with open('export.jsonl') as f:
for line_num, line in enumerate(f, 1):
entry = json.loads(line)
try:
validator.validate(entry)
except Exception as e:
print(f'Line {line_num}: {e.message}')
"
5.2 Structural Validation
| Rule | Description |
|---|---|
| First line | Must be _meta entry |
session_start | Must appear before any messages |
tool_result | Must have matching tool_use by tool_id |
session_end.session_id | Must match session_start.session_id |
| Timestamps | Must be chronologically ordered |
5.3 Reconstruction Validation
Roundtrip test: Export → Import → Re-export should produce identical content (excluding _meta.exported_at).
6. Implementation Reference
6.1 Key Files
| Component | Path | Purpose |
|---|---|---|
| JSON Schema | schemas/cusf-v1.0.0.json | Formal schema definition |
| Formatter | scripts/core/cusf_formatter.py | Normalize all LLM formats |
| Claude Extractor | scripts/extractors/claude_extractor.py | Claude → CUSF |
| Codex Extractor | scripts/extractors/codex_extractor.py | Codex → CUSF |
| Gemini Extractor | scripts/extractors/gemini_extractor.py | Gemini → CUSF |
| KIMI Extractor | scripts/extractors/kimi_extractor.py | KIMI → CUSF |
| Export Script | scripts/session-export.py | Unified /sx command |
| Import Script | scripts/session-import.py | Reconstruction importer |
6.2 Python Usage
from scripts.core.cusf_formatter import CUSFFormatter
formatter = CUSFFormatter()
# Normalize any LLM format to CUSF
entries = formatter.normalize(
raw_messages=messages,
llm_source="claude",
session_id="abc123"
)
# Write CUSF file
with open("export.jsonl", "w") as f:
for entry in entries:
f.write(json.dumps(entry) + "\n")
6.3 Command Line Usage
# Export current session
/sx
# Export specific LLM
/sx --llm claude
/sx --llm codex
/sx --llm gemini
/sx --llm kimi
# Export with reconstruction metadata
/sx --reconstruct
# Validate CUSF file
python3 scripts/validate-cusf.py export.jsonl
7. Versioning
7.1 Semantic Versioning
CUSF uses semantic versioning (MAJOR.MINOR.PATCH):
| Change Type | Version Bump | Example |
|---|---|---|
| Breaking schema change | MAJOR | New required field |
| New optional field | MINOR | Add thinking field |
| Bug fix, clarification | PATCH | Fix regex pattern |
7.2 Backward Compatibility
- Readers MUST accept unknown fields gracefully
- Writers SHOULD NOT add fields not in the schema
- MAJOR version changes MAY break readers
8. Related Standards
| Standard | Relationship |
|---|---|
| ADR-134 | Unified watcher that triggers exports |
| ADR-080 | Context extraction that processes CUSF |
| ADR-118 | Database schema for imported sessions |
9. Compliance
9.1 Quality Grading
| Grade | Criteria |
|---|---|
| A (90-100%) | Full schema compliance, all optional fields populated |
| B (80-89%) | Required fields present, key optionals (usage, timestamps) |
| C (70-79%) | Required fields only, passes validation |
| D (60-69%) | Minor validation errors |
| F (<60%) | Missing required fields, invalid structure |
9.2 Validation Command
# Validate CUSF export
python3 scripts/validate-cusf.py --grade export.jsonl
# Expected output:
# ✅ Schema validation: PASS
# ✅ Structural validation: PASS
# ✅ Reconstruction test: PASS
# Grade: A (95%)
Last Updated: January 28, 2026 Standard Owner: CODITECT Core Team Approval: ADR-134 Implementation