Tests for LLM Judge (H.3.5.3).
Tests cover:
- LLMJudge initialization and configuration
- Prompt building
- Response parsing
- Provenance tracking
- LLMJudgePanel multi-model coordination
- Model diversity verification
File: test_llm_judge.py
Classes
TestLLMJudgeInit
Tests for LLMJudge initialization.
TestLLMJudgePromptBuilding
Tests for prompt building.
TestLLMJudgeResponseParsing
Tests for response parsing.
TestLLMJudgeEvaluation
Tests for LLM judge evaluation.
TestLLMJudgePanel
Tests for LLMJudgePanel.
TestLLMJudgePanelEvaluation
Tests for panel evaluation.
TestCreateLLMJudgePanel
Tests for create_llm_judge_panel convenience function.
Functions
create_mock_persona(persona_id, model)
Create a mock persona for testing.
test_init_with_persona()
Test initialization with persona.
test_init_with_custom_client()
Test initialization with custom client.
test_model_from_routing()
Test model comes from persona routing.
setUp()
No description
test_build_system_prompt()
Test system prompt contains persona info.
test_build_prompt_with_document()
Test prompt includes document info.
test_format_votes()
Test vote formatting.
setUp()
No description
test_parse_json_response()
Test parsing valid JSON response.
test_parse_json_in_markdown()
Test parsing JSON wrapped in markdown code blocks.
test_parse_fallback_for_invalid_json()
Test fallback parsing for invalid JSON.
test_successful_evaluation()
Test successful LLM evaluation with provenance.
test_failed_evaluation()
Test failed LLM evaluation creates rejection.
test_get_judge_info()
Test getting panel judge info.
Usage
python test_llm_judge.py