Skip to main content

/transcript-normalize - Transcript Normalization

Normalize transcript TXT files into structured Markdown using the optimal paragraphing rule and consistent naming.

Usage

# Normalize a folder of .txt files
/transcript-normalize <input-dir> <output-dir>

# With renaming enabled
/transcript-normalize <input-dir> <output-dir> --rename

# Dry run (no writes)
/transcript-normalize <input-dir> <output-dir> --dry-run

# Preserve timestamps and de-hyphenate
/transcript-normalize <input-dir> <output-dir> --keep-timestamps --dehyphenate

# Remove fillers and emit report
/transcript-normalize <input-dir> <output-dir> --remove-fillers --report report.json

Options

OptionDescription
<input-dir>Directory containing source .txt files
<output-dir>Directory for generated .md files
--renameRename input files to lowercase kebab-case
--dry-runShow planned actions without writing
(auto)Speaker detection from leading timestamp line (default on)
--keep-timestampsPreserve leading timestamps
--dehyphenateMerge hyphenated line breaks
--remove-fillersRemove filler words (um, uh, you know)
--speaker-labelsDetect inline speaker lines (Name: ...)
--reportWrite JSON summary report to the specified path
--paragraphing optimalApply optimal paragraphing rule

System Prompt

EXECUTION DIRECTIVE: When /transcript-normalize is invoked, you MUST:

  1. Validate input/output paths
  2. Optionally rename files (lowercase kebab-case, no spaces)
  3. Read source TXT files
  4. Insert missing spaces after punctuation before sentence splitting
  5. Split sentences with abbreviation handling
  6. Apply paragraphing rules (4 sentence target, 6 sentence cap, topic cues)
  7. Emit Markdown with speaker line (if detected)
  8. Report summary

Execution Flow

python3 scripts/transcript-normalize.py \
--input <input-dir> \
--output <output-dir> \
--rename \
--paragraphing optimal

Success Output

DONE: /transcript-normalize
Input: <input-dir>
Output: <output-dir>
Files processed: N
Files renamed: M

Principles

This command embodies:

  • #3 Keep It Simple - Single-purpose normalization flow
  • #5 Eliminate Ambiguity - Explicit rules for sentence/paragraph splitting
  • #6 Clear, Understandable, Explainable - Deterministic output structure

Full Standard: CODITECT-STANDARD-AUTOMATION.md


CommandPurpose
/lowercase-migrationEnforce lowercase-kebab-case naming
/documentGeneral document processing

Version: 1.0.0 Created: 2026-01-27 Updated: 2026-01-27 Author: CODITECT Team