Skip to main content

ADR 008: Use Kimi (Moonshot AI) for Vision Analysis

Status

Accepted

Context

The pipeline requires vision analysis for extracted video frames. Initially, we used Anthropic Claude Vision, but we want to:

  1. Support multiple vision providers for redundancy
  2. Potentially reduce costs (Kimi may offer different pricing)
  3. Provide flexibility for users in different regions
  4. Support users who may have access to one API but not another

Decision

Add support for Kimi (Moonshot AI) Vision API as an alternative to Claude Vision.

Kimi Vision Capabilities

  • Model: moonshot-v1-8k-vision-preview
  • Image Support: PNG, JPEG, WEBP, GIF
  • Context Window: 8K tokens
  • Strengths:
    • Strong OCR capabilities
    • Good at technical diagrams
    • Competitive pricing
    • Available in China/APAC regions

Implementation

class KimiVisionAnalyzer:
API_BASE = "https://api.moonshot.cn/v1"

async def analyze_frame(self, frame: Frame, context: str) -> ImageAnalysis:
# Encode image to base64
# Send to Kimi API with structured prompt
# Parse response into ImageAnalysis

Provider Selection

class VisionProvider:
@staticmethod
def create(provider: str = "auto"):
if provider == "auto":
# Auto-detect based on available API keys
if os.getenv('KIMI_API_KEY'):
return KimiVisionAnalyzer()
elif os.getenv('ANTHROPIC_API_KEY'):
return VisionAnalyzer() # Claude

Configuration

PipelineConfig(
vision_provider="auto" # "auto", "kimi", or "claude"
)

Environment Variables:

export KIMI_API_KEY="..."           # For Kimi
export MOONSHOT_API_KEY="..." # Alternative name
export ANTHROPIC_API_KEY="..." # For Claude

Consequences

Positive

  • ✅ Multiple provider options
  • ✅ Regional availability (Kimi strong in APAC)
  • ✅ Cost flexibility
  • ✅ Redundancy if one service is down
  • ✅ Users can choose based on existing subscriptions

Negative

  • ⚠️ Two APIs to maintain
  • ⚠️ Slight differences in output format
  • ⚠️ Different rate limits to manage

Notes