Agent Skills Framework Extension
Document Merging Skill
When to Use This Skill
Use this skill when implementing document merging patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Intelligent document merging with conflict detection, semantic analysis, and automated resolution.
Core Capabilities
- Conflict Detection - Identify merge conflicts
- Semantic Merging - Content-aware merging
- Diff Resolution - Resolve differences intelligently
- Content Harmonization - Unify style and structure
- Version Reconciliation - Merge multiple versions
- Automated Strategies - Smart merge decision making
Document Merger
scripts/document-merger.py
from dataclasses import dataclass from typing import List, Dict, Optional, Tuple from enum import Enum import difflib import re
class ConflictType(Enum): CONTENT = "content" STRUCTURE = "structure" FORMATTING = "formatting" SEMANTIC = "semantic"
@dataclass class Conflict: type: ConflictType location: str version_a: str version_b: str suggested_resolution: Optional[str] confidence: float
@dataclass class MergeResult: merged_content: str conflicts: List[Conflict] auto_resolved: int manual_review_needed: int merge_strategy_used: str
class DocumentMerger: """Intelligently merge documents"""
def merge(
self,
version_a: str,
version_b: str,
base: Optional[str] = None,
strategy: str = 'semantic'
) -> MergeResult:
"""Merge two document versions"""
# Detect conflicts
conflicts = self._detect_conflicts(version_a, version_b, base)
# Auto-resolve conflicts
auto_resolved = []
manual_conflicts = []
for conflict in conflicts:
resolution = self._auto_resolve(conflict, strategy)
if resolution:
auto_resolved.append((conflict, resolution))
else:
manual_conflicts.append(conflict)
# Build merged content
merged = self._build_merged(
version_a,
version_b,
auto_resolved,
manual_conflicts
)
return MergeResult(
merged_content=merged,
conflicts=conflicts,
auto_resolved=len(auto_resolved),
manual_review_needed=len(manual_conflicts),
merge_strategy_used=strategy
)
def _detect_conflicts(
self,
version_a: str,
version_b: str,
base: Optional[str]
) -> List[Conflict]:
"""Detect all conflicts between versions"""
conflicts = []
# Split into lines for comparison
lines_a = version_a.split('\n')
lines_b = version_b.split('\n')
# Use difflib to find differences
diff = difflib.unified_diff(lines_a, lines_b, lineterm='')
current_conflict = None
version_a_lines = []
version_b_lines = []
for line in diff:
if line.startswith('---') or line.startswith('+++'):
continue
elif line.startswith('@@'):
# Location marker
if current_conflict:
conflicts.append(current_conflict)
current_conflict = None
location = line
version_a_lines = []
version_b_lines = []
elif line.startswith('-'):
version_a_lines.append(line[1:])
elif line.startswith('+'):
version_b_lines.append(line[1:])
elif line.startswith(' '):
# Context line - create conflict if we have differences
if version_a_lines or version_b_lines:
conflict_type = self._classify_conflict(
'\n'.join(version_a_lines),
'\n'.join(version_b_lines)
)
current_conflict = Conflict(
type=conflict_type,
location=location if location else "unknown",
version_a='\n'.join(version_a_lines),
version_b='\n'.join(version_b_lines),
suggested_resolution=None,
confidence=0.0
)
conflicts.append(current_conflict)
version_a_lines = []
version_b_lines = []
return conflicts
def _classify_conflict(self, version_a: str, version_b: str) -> ConflictType:
"""Classify type of conflict"""
# Content conflict if substantially different
similarity = difflib.SequenceMatcher(None, version_a, version_b).ratio()
if similarity < 0.3:
return ConflictType.CONTENT
# Structure conflict if headings differ
if self._is_heading(version_a) != self._is_heading(version_b):
return ConflictType.STRUCTURE
# Formatting conflict if only whitespace/punctuation differs
if version_a.strip() == version_b.strip():
return ConflictType.FORMATTING
# Otherwise semantic
return ConflictType.SEMANTIC
def _is_heading(self, text: str) -> bool:
"""Check if text is a heading"""
return bool(re.match(r'^#{1,6}\s+', text))
def _auto_resolve(
self,
conflict: Conflict,
strategy: str
) -> Optional[str]:
"""Attempt to auto-resolve conflict"""
# Formatting conflicts: keep version B (newer)
if conflict.type == ConflictType.FORMATTING:
conflict.suggested_resolution = conflict.version_b
conflict.confidence = 0.9
return conflict.version_b
# Content conflicts with high similarity: merge
if conflict.type == ConflictType.CONTENT:
similarity = difflib.SequenceMatcher(
None,
conflict.version_a,
conflict.version_b
).ratio()
if similarity > 0.8:
# Take longer version (more detailed)
if len(conflict.version_b) > len(conflict.version_a):
conflict.suggested_resolution = conflict.version_b
else:
conflict.suggested_resolution = conflict.version_a
conflict.confidence = 0.7
return conflict.suggested_resolution
# Semantic conflicts: try to combine
if conflict.type == ConflictType.SEMANTIC:
combined = self._combine_semantic(
conflict.version_a,
conflict.version_b
)
if combined:
conflict.suggested_resolution = combined
conflict.confidence = 0.6
return combined
return None # Cannot auto-resolve
def _combine_semantic(self, version_a: str, version_b: str) -> Optional[str]:
"""Combine semantically similar content"""
# Simple strategy: concatenate unique sentences
sentences_a = set(version_a.split('.'))
sentences_b = set(version_b.split('.'))
all_sentences = sentences_a.union(sentences_b)
combined = '. '.join(s.strip() for s in all_sentences if s.strip())
return combined if combined else None
def _build_merged(
self,
version_a: str,
version_b: str,
auto_resolved: List[Tuple[Conflict, str]],
manual_conflicts: List[Conflict]
) -> str:
"""Build merged document"""
# Start with version B as base
merged = version_b
# Apply auto-resolutions
for conflict, resolution in auto_resolved:
# Replace conflict in merged content
if conflict.version_b in merged:
merged = merged.replace(conflict.version_b, resolution, 1)
# Add markers for manual conflicts
for conflict in manual_conflicts:
marker = f"""
<<<<<<< Version A {conflict.version_a}
{conflict.version_b}
Version B """ if conflict.version_b in merged: merged = merged.replace(conflict.version_b, marker, 1)
return merged
Usage
merger = DocumentMerger()
doc_a = """# API Documentation
Authentication
Use JWT tokens for authentication.
Endpoints
GET /api/users - List users """
doc_b = """# API Documentation
Authentication
Use OAuth2 for authentication.
Endpoints
GET /api/users - Retrieve all users POST /api/users - Create new user """
result = merger.merge(doc_a, doc_b, strategy='semantic')
print(f"Auto-resolved: {result.auto_resolved}") print(f"Manual review needed: {result.manual_review_needed}") print("\nMerged content:") print(result.merged_content)
Conflict Resolver
scripts/conflict-resolver.py
from dataclasses import dataclass from typing import List, Optional import re
@dataclass class Resolution: strategy: str result: str confidence: float explanation: str
class ConflictResolver: """Resolve merge conflicts intelligently"""
STRATEGIES = [
'take_both',
'take_longer',
'take_newer',
'combine',
'prefer_structured'
]
def resolve(self, conflict: Conflict, context: Optional[str] = None) -> Resolution:
"""Resolve conflict using best strategy"""
# Try each strategy and score
resolutions = []
for strategy in self.STRATEGIES:
result = self._apply_strategy(conflict, strategy, context)
if result:
resolutions.append(result)
# Return best resolution
if resolutions:
return max(resolutions, key=lambda r: r.confidence)
else:
return Resolution(
strategy='manual',
result=f"CONFLICT: Choose between A or B",
confidence=0.0,
explanation="No automatic resolution possible"
)
def _apply_strategy(
self,
conflict: Conflict,
strategy: str,
context: Optional[str]
) -> Optional[Resolution]:
"""Apply specific resolution strategy"""
if strategy == 'take_both':
# Combine both versions
result = f"{conflict.version_a}\n\n{conflict.version_b}"
return Resolution(
strategy='take_both',
result=result,
confidence=0.6,
explanation="Combined both versions"
)
elif strategy == 'take_longer':
# Take longer (more detailed) version
if len(conflict.version_b) > len(conflict.version_a):
result = conflict.version_b
confidence = 0.7
else:
result = conflict.version_a
confidence = 0.7
return Resolution(
strategy='take_longer',
result=result,
confidence=confidence,
explanation="Selected longer version"
)
elif strategy == 'combine':
# Intelligent combination
combined = self._intelligent_combine(conflict)
if combined:
return Resolution(
strategy='combine',
result=combined,
confidence=0.8,
explanation="Intelligently combined content"
)
elif strategy == 'prefer_structured':
# Prefer version with better structure
score_a = self._structure_score(conflict.version_a)
score_b = self._structure_score(conflict.version_b)
if score_b > score_a:
return Resolution(
strategy='prefer_structured',
result=conflict.version_b,
confidence=0.75,
explanation="Preferred better-structured version"
)
else:
return Resolution(
strategy='prefer_structured',
result=conflict.version_a,
confidence=0.75,
explanation="Preferred better-structured version"
)
return None
def _intelligent_combine(self, conflict: Conflict) -> Optional[str]:
"""Intelligently combine conflicting content"""
# Extract unique bullet points
bullets_a = re.findall(r'^[-*]\s+(.+)$', conflict.version_a, re.MULTILINE)
bullets_b = re.findall(r'^[-*]\s+(.+)$', conflict.version_b, re.MULTILINE)
if bullets_a or bullets_b:
# Combine unique bullets
all_bullets = list(set(bullets_a + bullets_b))
return '\n'.join(f"- {b}" for b in all_bullets)
# Extract paragraphs
paras_a = conflict.version_a.split('\n\n')
paras_b = conflict.version_b.split('\n\n')
# Take unique paragraphs
all_paras = []
seen = set()
for para in paras_a + paras_b:
if para.strip() and para.strip() not in seen:
all_paras.append(para)
seen.add(para.strip())
if all_paras:
return '\n\n'.join(all_paras)
return None
def _structure_score(self, text: str) -> float:
"""Score structural quality"""
score = 0.0
# Has headings
if re.search(r'^#{1,6}\s+', text, re.MULTILINE):
score += 0.3
# Has lists
if re.search(r'^[-*]\s+', text, re.MULTILINE):
score += 0.2
# Has paragraphs
if '\n\n' in text:
score += 0.2
# Good length
if 50 < len(text) < 500:
score += 0.3
return score
Usage
resolver = ConflictResolver()
conflict = Conflict( type=ConflictType.CONTENT, location="line 10", version_a="Use JWT authentication", version_b="Use OAuth2 authentication with JWT tokens", suggested_resolution=None, confidence=0.0 )
resolution = resolver.resolve(conflict) print(f"Strategy: {resolution.strategy}") print(f"Result: {resolution.result}") print(f"Confidence: {resolution.confidence:.2f}")
Content Harmonizer
// scripts/content-harmonizer.ts
interface StyleGuide {
headingStyle: 'atx' | 'setext'; // # vs underline
listStyle: '-' | '*' | '+';
codeBlockStyle: 'fenced' | 'indented';
lineLength: number;
}
class ContentHarmonizer {
/**
* Harmonize content style and structure
*/
harmonize(content: string, styleGuide: StyleGuide): string {
let harmonized = content;
// Normalize headings
harmonized = this.normalizeHeadings(harmonized, styleGuide.headingStyle);
// Normalize lists
harmonized = this.normalizeLists(harmonized, styleGuide.listStyle);
// Normalize code blocks
harmonized = this.normalizeCodeBlocks(harmonized, styleGuide.codeBlockStyle);
// Wrap long lines
harmonized = this.wrapLines(harmonized, styleGuide.lineLength);
return harmonized;
}
private normalizeHeadings(content: string, style: 'atx' | 'setext'): string {
if (style === 'atx') {
// Convert setext to atx (# style)
content = content.replace(/^(.+)\n=+$/gm, '# $1');
content = content.replace(/^(.+)\n-+$/gm, '## $1');
}
return content;
}
private normalizeLists(content: string, marker: string): string {
// Convert all list markers to preferred style
return content.replace(/^[*+-]\s+/gm, `${marker} `);
}
private normalizeCodeBlocks(content: string, style: 'fenced' | 'indented'): string {
if (style === 'fenced') {
// Ensure fenced code blocks
return content.replace(/^( .+)$/gm, '```\n$1\n```');
}
return content;
}
private wrapLines(content: string, maxLength: number): string {
const lines = content.split('\n');
const wrapped: string[] = [];
for (const line of lines) {
if (line.length <= maxLength || line.startsWith('#') || line.startsWith('```')) {
wrapped.push(line);
} else {
// Wrap at word boundaries
const words = line.split(' ');
let currentLine = '';
for (const word of words) {
if ((currentLine + word).length > maxLength) {
wrapped.push(currentLine.trim());
currentLine = word + ' ';
} else {
currentLine += word + ' ';
}
}
if (currentLine) {
wrapped.push(currentLine.trim());
}
}
}
return wrapped.join('\n');
}
}
// Usage
const harmonizer = new ContentHarmonizer();
const content = `
Some heading
============
* List item 1
+ List item 2
- List item 3
code example
more code
`;
const styleGuide: StyleGuide = {
headingStyle: 'atx',
listStyle: '-',
codeBlockStyle: 'fenced',
lineLength: 100
};
const harmonized = harmonizer.harmonize(content, styleGuide);
console.log(harmonized);
## Usage Examples
### Document Merging
Apply document-merging skill to merge two versions of README with conflict detection
### Conflict Resolution
Apply document-merging skill to auto-resolve formatting and semantic conflicts
### Content Harmonization
Apply document-merging skill to harmonize markdown style across merged documents
Integration Points
- thoughts-analysis-patterns - Semantic analysis
- session-analysis-patterns - Version tracking
- research-patterns - Content validation
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: document-merging
Completed:
- [x] Conflicts detected and classified
- [x] Auto-resolution attempted (X/Y conflicts resolved)
- [x] Manual conflicts marked with clear boundaries
- [x] Content harmonized to style guide
- [x] Merged document validated
Outputs:
- Merged document with conflicts resolved
- Conflict report showing resolution strategy
- Style harmonization applied
- Auto-resolved: X conflicts
- Manual review needed: Y conflicts
Completion Checklist
Before marking this skill as complete, verify:
- Both document versions loaded successfully
- Conflict detection completed (all differences found)
- Auto-resolution attempted with confidence scoring
- Unresolved conflicts marked with clear delimiters
- Content harmonization applied (headings, lists, formatting)
- Merged output validates (no syntax errors)
- Conflict report generated with resolution strategies
- Manual review items clearly identified
Failure Indicators
This skill has FAILED if:
- ❌ Document versions could not be parsed
- ❌ Conflict detection produced no results when differences exist
- ❌ Auto-resolution created invalid merged content
- ❌ Manual conflict markers are malformed or missing
- ❌ Content harmonization broke document structure
- ❌ Merged output has syntax errors or broken links
- ❌ No conflict report generated
When NOT to Use
Do NOT use this skill when:
- Single document with no merge required (use standard editing instead)
- Documents are in different formats (convert first)
- Binary file merging (use specialized binary merge tools)
- Simple text append operations (use concatenation)
- Version control system conflicts (use git-merge-conflict-resolution skill)
- Documents have incompatible schemas (reconcile schemas first)
- Auto-merge without review is acceptable (use simpler merge)
Use alternative skills:
git-merge-conflict-resolution- For git conflictsschema-migration-patterns- For schema conflictscontent-transformation-patterns- For format conversion
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Auto-resolving all conflicts | High-risk decisions made without review | Only auto-resolve low-confidence conflicts (formatting, whitespace) |
| Ignoring semantic meaning | Content-based conflicts treated as text diffs | Use semantic analysis to understand content intent |
| No conflict markers | Manual review impossible | Always mark unresolved conflicts with clear delimiters |
| Applying wrong style guide | Inconsistent merged output | Detect or specify style guide before harmonization |
| Merging without base version | Three-way merge more accurate | Use base version when available for better conflict detection |
| No validation after merge | Broken output goes undetected | Always validate merged content (syntax, links, structure) |
| Skipping conflict report | No audit trail of decisions | Generate detailed report showing all resolutions |
Principles
This skill embodies:
- #5 Eliminate Ambiguity - Clear conflict markers and resolution explanations
- #6 Clear, Understandable, Explainable - Explicit merge strategies and confidence scores
- #8 No Assumptions - Ask for style guide, don't assume merge strategy
- #11 Reliability - Validate merged output, detect errors early
- Trust & Transparency - Show all conflicts, explain auto-resolution decisions
Full Standard: CODITECT-STANDARD-AUTOMATION.md
Version: 1.1.0 | Updated: 2026-01-04 | Author: CODITECT Team