Skip to main content

Tests for LLM Judge (H.3.5.3).

Tests cover:

  • LLMJudge initialization and configuration
  • Prompt building
  • Response parsing
  • Provenance tracking
  • LLMJudgePanel multi-model coordination
  • Model diversity verification

File: test_llm_judge.py

Classes

TestLLMJudgeInit

Tests for LLMJudge initialization.

TestLLMJudgePromptBuilding

Tests for prompt building.

TestLLMJudgeResponseParsing

Tests for response parsing.

TestLLMJudgeEvaluation

Tests for LLM judge evaluation.

TestLLMJudgePanel

Tests for LLMJudgePanel.

TestLLMJudgePanelEvaluation

Tests for panel evaluation.

TestCreateLLMJudgePanel

Tests for create_llm_judge_panel convenience function.

Functions

create_mock_persona(persona_id, model)

Create a mock persona for testing.

test_init_with_persona()

Test initialization with persona.

test_init_with_custom_client()

Test initialization with custom client.

test_model_from_routing()

Test model comes from persona routing.

setUp()

No description

test_build_system_prompt()

Test system prompt contains persona info.

test_build_prompt_with_document()

Test prompt includes document info.

test_format_votes()

Test vote formatting.

setUp()

No description

test_parse_json_response()

Test parsing valid JSON response.

test_parse_json_in_markdown()

Test parsing JSON wrapped in markdown code blocks.

test_parse_fallback_for_invalid_json()

Test fallback parsing for invalid JSON.

test_successful_evaluation()

Test successful LLM evaluation with provenance.

test_failed_evaluation()

Test failed LLM evaluation creates rejection.

test_get_judge_info()

Test getting panel judge info.

Usage

python test_llm_judge.py