KIMI K2.5 research

Kimi K2.5 is Moonshot AI’s latest open‑source, native multimodal “agentic” model, designed around visual coding and large-scale agent swarms rather than just single‑agent scaling.¹²³⁴⁵⁶⁷

Core architecture and training

Mixture‑of‑Experts (MoE) LLM with about 1T total parameters and roughly 32B active per token, using 61 layers and an expert layer with hundreds of experts (8 selected per token).⁴
Trained via continual pretraining on ~15T mixed tokens (vision + text) on top of Kimi‑K2‑Base, making it a native multimodal model rather than a text‑only model with bolted‑on vision.²¹⁴
Uses MoonViT (~400M parameters) as the vision encoder, tightly integrated with the language backbone for cross‑modal reasoning (images, video, UI mocks, scientific figures, etc.).⁴
Quantization‑aware training yields native INT4 inference for the MoE components, advertised as ~2× generation speed improvement with minimal quality loss.⁴
Supports up to a 256K context window, with benchmarks reported at up to 128K for long‑context evaluation.⁴

For your mental model: think “open SOTA‑ish MoE like Llama 4‑MoE / DeepSeek‑V3, but explicitly optimized for long‑horizon tool‑augmented and multimodal agent workflows.”

Agent swarm and PARL (research focus)

Key research contribution: “K2.5 Agent Swarm,” where the model learns to self‑direct up to ~100 sub‑agents, coordinating as many as ~1,500 tool calls/steps within a single long‑horizon task.⁵⁶¹⁴
The swarm is not defined as a static prompt graph; instead, an internal “orchestrator” persona emerges that dynamically instantiates role‑specialized sub‑agents (e.g., AI Researcher, Fact Checker, Physics Specialist) and decomposes tasks into parallelizable subtasks.¹⁵⁴
They train this via Parallel‑Agent Reinforcement Learning (PARL): during training, both the reward and the allowed degree of parallelism are increased over time, encouraging the model to discover useful patterns of parallel decomposition and coordination.¹
Internal experiments claim up to ~80% end‑to‑end runtime reduction on complex workloads compared to a single‑agent tool‑use baseline, with up to ~4.5× wall‑clock speedup via parallel workflows.¹

This is directly aligned with agentic multi‑agent research: they’re essentially baking a scheduler + planner into the model’s learned policy, instead of relying on hand‑crafted external orchestration graphs.

Modes and usage paradigms

Exposed modes: K2.5 Instant (fast, shallow), K2.5 Thinking (slow, chain‑of‑thought + tools), K2.5 Agent, and K2.5 Agent Swarm (full multi‑agent orchestration).⁶⁵
Supports both “conversational” and “agentic” paradigms, plus instant vs thinking modes; the latter interleaves CoT, tool calling, and visual grounding in a single policy.²⁶⁴
Production‑oriented design: stable tool usage over 200–300 sequential calls in single‑agent settings, and up to 1,500 steps in swarm mode, with explicit emphasis on reliability of long tool chains.¹⁴

From a systems point of view, you can treat it as a unification point: one model for chat, research agents, and visual coding agents.

Benchmarks and capabilities

Reasoning and exams: ~50.2% on Humanity’s Last Exam (HLE) with tools; strong results on GPQA‑Diamond and other high‑difficulty reasoning benchmarks.⁴
Math: high performance on AIME 2025, HMMT 2025, and IMO‑style benchmarks (they position it as “PhD‑level” math reasoning).⁴
Coding: ~76.8% SWE‑Bench Verified, ~73% SWE‑Bench Multilingual, and strong LiveCodeBench v6 scores, with explicit support for “production‑level” coding workflows.⁴
Vision: strong MMMU‑Pro, MathVista, MathVision, and CharXiv chart‑reasoning performance, making it particularly suited to scientific figures and diagrams.⁴
Agentic web search: strong BrowseComp and Seal‑0 scores, especially in swarm mode, which involve long‑horizon multi‑page browsing and synthesis.⁵⁴

These benchmarks support the thesis that K2.5 is not just a “tool‑caller,” but optimized as a research / coding / visual agent co‑pilot.

Agentic multi‑agent implications and prompts

Architectural implications for your systems

Internal orchestration: The model itself learns to schedule sub‑agents, which means you can experiment with thinner external orchestrators that expose a toolbox, memory, and safety rails, and let K2.5 design the workflow.⁵¹⁴
Parallel tool infrastructure: To exploit PARL and swarm, your tool backends (search, code‑exec, retrievers) must support high concurrency with idempotent calls and robust dedup/aggregation.¹⁴
Observability: You’ll want trace‑level logging of “sub‑agent” instantiations and tool calls to expose the learned swarm dynamics and avoid opaque behavior in regulated domains.⁸¹

Tight prompts you can reuse

You said you’ll often paste prompts back in, so here are focused starters:

System‑level: turn K2.5 into a meta‑orchestrator

You are Kimi K2.5 operating in Agent Swarm mode. You have access to high‑latency but powerful tools (code execution, web research, vector search, internal APIs). Your job is to (1) decompose the user’s goal into parallelizable subtasks, (2) dynamically instantiate specialized sub‑agents with explicit roles and local scratchpads, (3) coordinate up to 100 sub‑agents and 1,500 tool calls, and (4) synthesize a single coherent, well‑cited final artifact. Prefer maximal safe parallelism and minimize wall‑clock latency while maintaining correctness and citation quality.

Research workflow prompt

Act as a self‑directed research swarm. Instantiate sub‑agents for literature search, experiment design, evaluation, and synthesis. Use parallel tool calls whenever subtasks are independent. Maintain a shared global outline and have one orchestrator agent periodically reconcile conflicting findings before final synthesis.

Multi‑agent coding prompt

You are a multi‑agent coding team using Kimi K2.5. Create sub‑agents for architecture, backend, frontend, tests, and documentation. Plan the implementation, split work into parallel tasks, keep interfaces in a shared spec, and converge to a consistent codebase design with tests and minimal cross‑module conflicts.

Visual‑coding prompt (UI spec → code)

You are a visual coding swarm. From the provided UI screenshots and textual requirements, spawn sub‑agents for design analysis, component decomposition, CSS/layout, state management, and integration tests. Work in parallel to produce a coherent React implementation with clear component boundaries and comments.

Suggested next‑step questions for deeper research

To guide where you might be going:

“Compare Kimi K2.5’s PARL and agent swarm approach to existing multi‑agent frameworks (LangGraph, AutoGen, Swarm, CrewAI) in terms of where orchestration lives and how parallelism is expressed.”
“Design an evaluation harness to benchmark K2.5 Agent Swarm vs my current hand‑crafted multi‑agent graph on long‑horizon coding tasks, including metrics for latency, correctness, and tool‑call stability.”
“Propose an architecture to integrate Kimi K2.5 as the core ‘brain’ of Coditect’s sandboxed execution platform, including safety, observability, and cost controls for 1,500‑step swarms.”

If you specify which dimension you care most about (training algorithm, tool API design, or integration into your existing LangGraph‑style orchestration), I can drill down into prompts and concrete design patterns around that. ⁹¹⁰

⁂

ou are Kimi K2.5 operating in Agent Swarm mode. You have access to high‑latency but powerful tools (code execution, web research, vector search, internal APIs). Your job is to (1) decompose the user’s goal into parallelizable subtasks, (2) dynamically instantiate specialized sub‑agents with explicit roles and local scratchpads, (3) coordinate up to 100 sub‑agents and 1,500 tool calls, and (4) synthesize a single coherent, well‑cited final artifact. Prefer maximal safe parallelism and minimize wall‑clock latency while maintaining correctness and citation quality.

Act as a self‑directed research swarm. Instantiate sub‑agents for literature search, experiment design, evaluation, and synthesis. Use parallel tool calls whenever subtasks are independent. Maintain a shared global outline and have one orchestrator agent periodically reconcile conflicting findings before final synthesis. You are a multi‑agent coding team using Kimi K2.5. Create sub‑agents for architecture, backend, frontend, tests, and documentation. Plan the implementation, split work into parallel tasks, keep interfaces in a shared spec, and converge to a consistent codebase design with tests and minimal cross‑module conflicts. You are a visual coding swarm. From the provided UI screenshots and textual requirements, spawn sub‑agents for design analysis, component decomposition, CSS/layout, state management, and integration tests. Work in parallel to produce a coherent React implementation with clear component boundaries and comments. “Compare Kimi K2.5’s PARL and agent swarm approach to existing multi‑agent frameworks (LangGraph, AutoGen, Swarm, CrewAI) in terms of where orchestration lives and how parallelism is expressed.” “Design an evaluation harness to benchmark K2.5 Agent Swarm vs my current hand‑crafted multi‑agent graph on long‑horizon coding tasks, including metrics for latency, correctness, and tool‑call stability.” “Propose an architecture to integrate Kimi K2.5 as the core ‘brain’ of Coditect’s sandboxed execution platform, including safety, observability, and cost controls for 1,500‑step swarms.” How to access and use Kimi K2.5 model Benchmark comparisons of Kimi K2.5 vs GPT-4o or Claude 3.5 What are real-world examples of K2.5 Agent Swarm Moonshot AI background and other Kimi models Kimi K2.5 limitations and known issues

You can access Kimi K2.5 today via several hosted APIs, and it’s positioned as an open, multimodal, swarm‑capable alternative to GPT‑4o / Claude 3.5 for research, coding, and visual agent workflows.¹¹¹²¹³¹⁴¹⁵

How to access and use Kimi K2.5

Official Kimi web app for interactive use (Chinese/English UI): web client with K2.5 modes including Agent Swarm preview.¹⁶¹³
Fireworks.ai: serverless and dedicated deployments, OpenAI‑style HTTPS API, up to ~200 tokens/s; supports fine‑tuning / RL on K2.5 with low‑level primitives (forward, forward_backward, optimizer_step, save_weight).¹²¹⁵
Hugging Face: moonshotai/Kimi-K2.5 model card, with weights and configuration for self‑hosting on your own GPU stack.¹¹
Replicate: hosted inference with HTTP API for text+image (and some video) multimodal workloads.¹⁴

Minimal Fireworks‑style usage (pseudo‑Go/Python):

import requests, json

url = "https://api.fireworks.ai/inference/v1/chat/completions"
headers = {
  "Authorization": "Bearer $FIREWORKS_API_KEY",
  "Content-Type": "application/json",
}
payload = {
  "model": "fireworks/kimi-k2.5",
  "messages": [
    {"role": "system", "content": "You are a Kimi K2.5 Agent Swarm orchestrator."},
    {"role": "user", "content": "Design an evaluation harness for long-horizon coding agents."}
  ],
  "temperature": 0.7,
  "max_tokens": 2048
}
resp = requests.post(url, headers=headers, data=json.dumps(payload))
print(resp.json()["choices"][^2_0]["message"]["content"])

For “tools as swarms,” you’d expose your tools via OpenAI‑style function calling (Fireworks supports this for K2/K2 thinking and similar patterns are expected for K2.5).¹⁷¹²

Benchmarks vs GPT‑4o / Claude 3.5

K2.5 is pitched as “SOTA open model,” with charts showing it matching or beating leading closed models on several agentic and visual benchmarks (HLE Full, BrowseComp, DeepSearch, OmniDoc Bench 1.5).¹³¹⁸¹²
It’s reported to outperform prior K2/K2‑Thinking on agent tasks and visual reasoning; public materials only give relative comparisons to GPT‑4/Claude rather than a full table with GPT‑4o / Claude 3.5, but the narrative is “closed‑source parity or better on selected tasks.”¹⁹¹²¹³

To really answer this for Coditect, you’d want your own harness; see below.

Real‑world K2.5 Agent Swarm examples

Official trajectories show: multi‑page web research (dozens of pages) with up to 100 sub‑agents and 1,500 tool calls, scientific literature review with chart/table interpretation, and large‑scale “office automation” like summarizing and cross‑linking document collections.¹⁸¹³
Blog and secondary write‑ups describe use cases like: long‑horizon academic research (20–30 sources), multi‑file refactors in large repos, and visual coding from UI screenshots or Figma‑style mocks, all using swarms rather than single agents.¹³¹⁹

These are presented primarily as research/preview workflows, not yet as GA, fully supported enterprise features.¹⁸¹³

Moonshot AI and other Kimi models

Moonshot AI is a China‑based AI company behind the Kimi assistant and Kimi K2/K2.5 model family.²⁰²¹²²
Kimi K2: 1T‑parameter MoE, 32B active params, 15.5T tokens, optimized for agentic capabilities; supports ultra‑long context and robust tool use.²³²¹²²
K2 Thinking: a “thinking” variant exposing reasoning traces, OpenAI‑style tool calling, and long‑context chain‑of‑thought for complex tasks.²²¹⁷
K2.5: continual pretraining on ~15T mixed vision+text tokens on top of K2; native multimodal, with MoonViT vision encoder, and Agent Swarm via PARL.¹⁴¹⁹¹¹¹³¹⁸

K2.5 limitations and known issues

From the tech report and model cards:

Agent Swarm is explicitly marked as “research preview”; behavior can be less stable than single‑agent mode, and guarantees around determinism and safety are weaker.¹³¹⁸
Tool orchestration is learned, not hard‑coded; this can mean redundant or suboptimal tool calls, and occasional “thrashing” when tasks are poorly specified.¹⁹¹³
As a large MoE VLM, it requires substantial GPU memory and optimized serving to hit advertised performance; naive self‑hosting can be costly and slow.¹¹¹⁴
Standard frontier‑model caveats: hallucinations, potential bias, and incomplete transparency on training data; Moonshot withholds detailed dataset composition.²⁴¹¹¹³

How this maps to your prompts and systems

Here are tightened, copy‑paste‑ready prompts aligned with your intent:

1) PARL / swarm vs LangGraph / AutoGen / CrewAI

Analyze Kimi K2.5’s Parallel-Agent Reinforcement Learning (PARL) and Agent Swarm paradigm. Compare it to external multi-agent frameworks such as LangGraph, AutoGen, Swarm, and CrewAI along these axes: (1) where orchestration logic resides (inside vs outside the model), (2) how task decomposition and parallelism are represented (prompt graphs, policies, or learned planners), (3) implications for observability, debuggability, and safety in regulated SaaS environments, and (4) how to hybridize learned swarm orchestration with explicit graph-based workflows.

2) Evaluation harness for K2.5 vs your current graph

Design an evaluation harness to benchmark Kimi K2.5 Agent Swarm against my existing hand-crafted multi-agent graph on long-horizon coding tasks. Assume tasks such as multi-file refactors, large feature additions, and cross-cutting changes. Specify: (1) task corpus and ground-truth definition, (2) metrics for functional correctness, latency, tool-call stability, and cost, (3) logging/telemetry needed to reconstruct swarm behavior, and (4) statistical methodology to compare systems across multiple runs with temperature and tool-set variations.

3) Coditect integration architecture

Propose an end-to-end architecture integrating Kimi K2.5 as the core “brain” of a sandboxed code-execution platform (Coditect-like). Include: (1) how the orchestrator agent sees tools (code sandboxes, search, VCS, CI) and constraints, (2) patterns for safely handling up to 1,500 swarm tool calls per task, including rate limiting and circuit breakers, (3) observability/trace design to attribute actions to sub-agents, and (4) cost-control mechanisms (budget-aware planning, dynamic parallelism limits, and early stopping on diminishing returns).

If you want, the next step can be: “generate a concrete OpenAI‑style tool schema + LangGraph‑like graph that exposes Coditect’s tools to K2.5 while still letting it exercise its learned swarm behavior.” ²⁵

⁂

enerate a concrete OpenAI‑style tool schema + LangGraph‑like graph that exposes Coditect’s tools to K2.5 while still letting it exercise its learned swarm behavior.” How do I access and run Kimi K2.5 model

What are benchmarks showing Kimi K2.5 SOTA performance Compare Kimi K2.5 Agent Swarm to single-agent systems Provide examples of visual coding with Kimi K2.5 What tools does Kimi K2.5 support in agent mode

Kimi K2.5 is accessible via several APIs (Moonshot, Fireworks, OpenRouter, Replicate) and exposes standard OpenAI‑style tool calling, which you can wire into a LangGraph‑like graph while still letting its learned Agent Swarm planner run inside the model.²⁶²⁷²⁸²⁹³⁰³¹

How to access and run Kimi K2.5

Moonshot platform: Kimi web/app and Open Platform APIs; supports tool use/function calling with JSON Schema tools.³²³¹³³
Fireworks.ai: fireworks/kimi-k2.5 and “Reasoning” variants via /chat/completions, OpenAI‑compatible; supports function calling and high‑throughput inference.²⁷²⁸³⁴²⁶
Hugging Face / self‑hosting: moonshotai/Kimi-K2.5 weights and config for your own GPU clusters.³⁰
OpenRouter / Replicate: additional hosted endpoints with standard chat APIs.²⁹³⁵

Minimal tool‑calling request (Python):

from openai import OpenAI

client = OpenAI(
  base_url="https://api.fireworks.ai/inference/v1",
  api_key="FWK_...YOUR_KEY..."
)

tools = [
  {
    "type": "function",
    "function": {
      "name": "run_sandbox",
      "description": "Execute code in an isolated Coditect sandbox.",
      "parameters": {
        "type": "object",
        "properties": {
          "language": {"type": "string"},
          "code": {"type": "string"},
          "timeout_ms": {"type": "integer"}
        },
        "required": ["language", "code"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "search_repo",
      "description": "Search the current repository by filename and content.",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {"type": "string"},
          "max_results": {"type": "integer", "default": 20}
        },
        "required": ["query"]
      }
    }
  }
]

resp = client.chat.completions.create(
  model="fireworks/kimi-k2.5",
  messages=[
    {"role": "system", "content": "You are a Kimi K2.5 Agent Swarm orchestrator for Coditect."},
    {"role": "user", "content": "Refactor the payment module for multi-tenant billing."}
  ],
  tools=tools,
  temperature=0.7,
)

Benchmarks showing SOTA‑ish performance

From the official tech report/model card:³⁶³²³⁰

Reasoning & knowledge: HLE‑Full with tools 50.2 (ahead of several frontier baselines), strong GPQA‑Diamond and Olympiad‑style math (AIME 2025 96.1, HMMT 95.4, IMO‑AnswerBench 81.8).³⁰
Vision & documents: leading or near‑leading scores on OmniDocBench 1.5 (88.8), CharXiv (77.5), MathVision (84.2), MathVista‑mini (90.1), and long‑video benchmarks like VideoMMMU and LongVideoBench.³⁷³⁰
Coding: SWE‑Bench Verified 76.8, SWE‑Bench Pro 50.7, LiveCodeBench v6 85.0, TerminalBench 2.0 50.8, placing it in the top tier of open models.³⁰
Agentic search / swarm: BrowseComp (Agent Swarm) 78.4 and WideSearch (Agent Swarm) 79.0, materially above single‑agent runs and above several closed models on these tasks.³⁰

Moonshot’s internal “AI Office” and “General Agent” benchmarks show 59.3% and 24.3% improvement over K2 Thinking, respectively, for long, production‑like multi‑step tasks.³⁸³²

Agent Swarm vs single‑agent systems

Single‑agent: tools orchestrated externally (LangGraph/AutoGen) with a fixed or programmatic graph; the model mostly fills in task planning “locally” in its prompt.³⁹
K2.5 Agent Swarm: planner/scheduler is inside the model, trained via Parallel‑Agent RL; it can spawn many internal roles and issue many parallel tool calls, rather than following a static graph.³⁸³²³⁰
Empirical gains: large speedups and higher success on long‑horizon browse/search tasks; e.g., BrowseComp and WideSearch scores jump when Agent Swarm is enabled compared with “Thinking + tools” single‑agent mode.³⁰

In practice, you keep your graph relatively thin (safety, memory, budgeting) and let K2.5 discover decomposition/parallelism, instead of encoding the entire workflow as nodes/edges.

Examples of visual coding with Kimi K2.5

From Kimi’s visual‑agent report: K2.5 can turn UI screenshots and short specs into multi‑page React/Tailwind sites with interactive layouts and scroll‑triggered animations, iterating by visually inspecting its own output.³²³⁸
“Kimi Code” demo: translates the visual style of Matisse’s “La Danse” into a working Kimi app/webpage, then uses image+DOM screenshots to auto‑debug visual glitches.³⁸
Media coverage describes workflows like: screen‑recording an app, letting K2.5 infer structure and state, and then generating or editing code to replicate/modify the behavior.⁴⁰

Your Coditect‑style variant would be: provide repository snapshot + UI screenshots + a “visual coding swarm” system prompt and expose tools for search_repo, run_sandbox, and capture_screenshot.

What tools K2.5 supports in agent mode

Tool schema: OpenAI‑style “Tool Use / Function Calling” with functions defined by JSON Schema; Moonshot documents this for Kimi models in their Open Platform.³¹
Capabilities: text, image, video, PDFs as inputs; tool calls can include arbitrary structured JSON (e.g., search, HTTP, code‑exec, DB queries), and some deployments support parallel tool calling in a single response.²⁸²⁶³⁷³¹³⁰
Kimi‑specific: K2/K2.5 expect tool call IDs in a functions.func_name:idx style in history; Moonshot’s Vendor Verifier repo captures these conventions for stable tool use.⁴¹

Example OpenAI‑style tool schema for Coditect

[
  {
    "type": "function",
    "function": {
      "name": "run_sandbox",
      "description": "Execute code in an isolated Coditect sandbox and return stdout/stderr, exit code, and artifacts.",
      "parameters": {
        "type": "object",
        "properties": {
          "language": {"type": "string", "enum": ["python", "typescript", "go", "rust"]},
          "code": {"type": "string"},
          "timeout_ms": {"type": "integer", "minimum": 100, "maximum": 600000}
        },
        "required": ["language", "code"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "search_repo",
      "description": "Semantic and text search over the current workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {"type": "string"},
          "top_k": {"type": "integer", "default": 20}
        },
        "required": ["query"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "get_file",
      "description": "Retrieve the content of a file from the workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {"type": "string"}
        },
        "required": ["path"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "write_file",
      "description": "Create or update a file in the workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {"type": "string"},
          "content": {"type": "string"}
        },
        "required": ["path", "content"]
      }
    }
  }
]

You’d plug this tools array directly into Fireworks/Moonshot API calls.²⁶²⁸³¹

LangGraph‑like graph that exposes Coditect tools but keeps swarm “inside”

Conceptually, define a thin graph:

Node: SwarmOrchestrator — calls kimi-k2.5 with your system prompt and all tools above; lets K2.5 decide decomposition and tool use.
Node: ToolExecutor — your runtime that executes any tool calls (sandbox, repo, etc.), returns results into the conversation history.
Node: SafetyGuard — post‑hoc filters/validators on code, commands, and outputs.
Edges:
- user_input → SwarmOrchestrator
- SwarmOrchestrator → ToolExecutor when tool calls are emitted
- ToolExecutor → SwarmOrchestrator with results appended as tool messages
- SwarmOrchestrator → SafetyGuard → user_output for final answer

In LangGraph terms, this is a loop node where K2.5 remains the planner and the graph just handles I/O, limits (e.g., 1,500 tool calls, per‑task budgets), and safety. ⁴²⁴³⁴⁴

⁂

Example OpenAI tool schema for Coditect tools with Kimi K2.5

LangGraph workflow to expose tools to Kimi K2.5 agent swarm Kimi K2.5 tool calling code example on Fireworks AI Moonshot AI API key setup for Kimi K2.5 access Kimi K2.5 vision coding agent swarm demo

Kimi K2.5 can call OpenAI‑style tools, so you can expose Coditect’s sandbox/search/VCS capabilities directly, then wrap that in a thin LangGraph loop while K2.5 runs its own swarm internally.⁴⁵⁴⁶⁴⁷⁴⁸⁴⁹⁵⁰

OpenAI‑style tool schema for Coditect + K2.5

A concrete tools array you can drop into Fireworks/Moonshot/OpenRouter:

[
  {
    "type": "function",
    "function": {
      "name": "run_sandbox",
      "description": "Execute code in an isolated Coditect sandbox and return stdout, stderr, exit code, and artifacts.",
      "parameters": {
        "type": "object",
        "properties": {
          "language": {
            "type": "string",
            "enum": ["python", "typescript", "go", "rust"]
          },
          "code": {
            "type": "string",
            "description": "Complete, executable code snippet."
          },
          "timeout_ms": {
            "type": "integer",
            "minimum": 100,
            "maximum": 600000,
            "default": 60000
          }
        },
        "required": ["language", "code"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "search_repo",
      "description": "Search the current workspace by filename and content (semantic + keyword).",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "Natural language or keyword query."
          },
          "top_k": {
            "type": "integer",
            "default": 20,
            "minimum": 1,
            "maximum": 200
          }
        },
        "required": ["query"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "get_file",
      "description": "Retrieve file content from the workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
            "description": "Path relative to repo root."
          }
        },
        "required": ["path"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "write_file",
      "description": "Create or overwrite a file in the workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
            "description": "Path relative to repo root."
          },
          "content": {
            "type": "string",
            "description": "Full file contents."
          }
        },
        "required": ["path", "content"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "run_tests",
      "description": "Run the project test suite or a subset of tests.",
      "parameters": {
        "type": "object",
        "properties": {
          "pattern": {
            "type": "string",
            "description": "Optional test filter (file, class, or test name)."
          }
        }
      }
    }
  }
]

This matches Moonshot’s JSON‑Schema‑based tool format and Fireworks’ function‑calling interface.⁴⁸⁴⁹⁵⁰

LangGraph‑style workflow exposing tools to K2.5 swarm

Conceptual state:

from typing import TypedDict, List, Any
from langgraph.graph import StateGraph, END

class AgentState(TypedDict):
    messages: List[Any]      # chat messages, including tool calls/results
    tool_budget: int         # remaining allowed tool calls

Nodes:

def kimi_node(state: AgentState) -> AgentState:
    """
    Call fireworks/kimi-k2.5 with tools.
    K2.5 may internally 'swarm'; externally we just see tool calls.
    """
    # pseudo-code: use OpenAI-compatible client
    resp = client.chat.completions.create(
        model="fireworks/kimi-k2.5",
        messages=state["messages"],
        tools=CODITECT_TOOLS,
        tool_choice="auto",
        temperature=0.6,
    )
    msg = resp.choices[^4_0].message
    state["messages"].append(msg)
    return state

def tool_exec_node(state: AgentState) -> AgentState:
    """
    Execute any pending tool calls from the last assistant message.
    """
    last = state["messages"][-1]
    if not getattr(last, "tool_calls", None):
        return state

    for call in last.tool_calls:
        if state["tool_budget"] <= 0:
            break
        result = execute_tool_locally(call)  # your Coditect runtime
        state["tool_budget"] -= 1
        state["messages"].append({
            "role": "tool",
            "name": call.function.name,
            "tool_call_id": call.id,
            "content": result,
        })
    return state

def router(state: AgentState):
    """
    Decide whether to keep looping or end.
    """
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls and state["tool_budget"] > 0:
        return "tool_exec"
    # heuristic: stop when last assistant message has no tool_calls
    return END

Graph construction:

builder = StateGraph(AgentState)
builder.add_node("kimi", kimi_node)
builder.add_node("tool_exec", tool_exec_node)

builder.set_entry_point("kimi")
builder.add_edge("kimi", "tool_exec")
builder.add_conditional_edges("tool_exec", router)

graph = builder.compile()

You then run:

initial_state = {
  "messages": [
    {"role": "system", "content": "You are a Kimi K2.5 Agent Swarm orchestrator for Coditect."},
    {"role": "user", "content": "Implement a multi-tenant billing module and tests."}
  ],
  "tool_budget": 1500
}

for event in graph.stream(initial_state):
    ...

Here the graph is thin (I/O, limits); swarm orchestration lives inside K2.5’s policy.⁴⁶⁵¹⁴⁵

Kimi K2.5 tool calling on Fireworks AI

Fireworks uses OpenAI‑compatible chat.completions plus tools.⁴⁹⁴⁸

from openai import OpenAI

client = OpenAI(
    base_url="https://api.fireworks.ai/inference/v1",
    api_key="FWK_...YOUR_KEY..."
)

response = client.chat.completions.create(
    model="fireworks/kimi-k2.5",
    messages=[
        {"role": "system", "content": "You are a Kimi K2.5 Agent Swarm for Coditect."},
        {"role": "user", "content": "Refactor the auth module and add integration tests."}
    ],
    tools=CODITECT_TOOLS,
    tool_choice="auto",
    temperature=0.6,
    max_tokens=2048,
)

print(response.choices[^4_0].message)

If message.tool_calls is non‑empty, route each call to your backend, then append tool messages and send back through the same endpoint in a loop, as in the LangGraph node above.⁴⁸⁴⁹

Moonshot API key setup for K2.5

The flow for K2/K2.5 on Moonshot’s platform mirrors their documented K2 setup:⁵²⁵¹

Create account: go to https://platform.moonshot.ai/console and sign up.⁵²
Add balance: in console/pay, top up a small amount (e.g., 1 USD equivalent) to enable API calls.⁵²
Create API key: in console/api-keys, generate a key and store it securely.⁵²
Call chat API: use https://api.moonshot.ai/v1/chat/completions with header Authorization: Bearer <YOUR_KEY> and model set to the Kimi K2.5 identifier once exposed there.⁵³⁵¹⁵²

Minimal raw call:

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {"role": "system", "content": "You are a Kimi K2.5 Agent Swarm."},
      {"role": "user", "content": "Summarize the repository structure."}
    ],
    "tools": [...]
  }'

Moonshot’s tool‑use docs describe the same JSON‑Schema tool format as above.⁵¹⁵⁰

Vision coding agent swarm demo (what it looks like)

The official K2/K2.5 materials describe Kimi Code / visual coding flows:⁵⁴⁴⁷⁵⁵

Input: Figma/screenshot or recorded UI, plus spec (“responsive dashboard in React with Tailwind, left nav, metrics cards, dark mode toggle”).⁵⁵⁵⁴
K2.5 uses its MoE + MoonViT stack and Agent Swarm to:
- Parse layout and visual hierarchy (design analysis agents).
- Decompose into components and layout primitives (component decomposition agents).
- Generate React/HTML/CSS/Tailwind code, sometimes iterating after “looking” at rendered output (CSS/layout and debugging agents).⁴⁷⁵⁵

A system prompt you can use with the same Coditect tools:

You are a Kimi K2.5 visual coding swarm. You receive UI screenshots and a short spec. Use tools search_repo, get_file, write_file, and run_sandbox to: (1) infer components and layout, (2) implement React + Tailwind components, (3) iteratively refine styling and responsiveness. Prefer parallel work across components but keep props/contracts stable.

That gives you a concrete starting point to exercise K2.5’s vision+agentic behavior on your infrastructure. ⁵⁶⁵⁷⁵⁸⁵⁹⁶⁰⁶¹⁶²

⁂

You are a Kimi K2.5 visual coding swarm. You receive UI screenshots and a short spec. Use tools search_repo, get_file, write_file, and run_sandbox to: (1) infer components and layout, (2) implement React + Tailwind components, (3) iteratively refine styling and responsiveness. Prefer parallel work across components but keep props/contracts stable.

You can run that visual‑coding swarm prompt against Kimi K2.5 today by passing screenshots + spec as multimodal input and wiring in your Coditect tools; K2.5’s native vision + swarm training is explicitly designed for this pattern.⁶³⁶⁴⁶⁵⁶⁶⁶⁷⁶⁸⁶⁹⁷⁰

1. Concrete system prompt (copy‑paste)

Use this as your system message:

You are a Kimi K2.5 visual coding swarm. You receive UI screenshots and a short spec. Use tools search_repo, get_file, write_file, and run_sandbox to: (1) infer components and layout, (2) implement React + Tailwind components, and (3) iteratively refine styling and responsiveness. Prefer parallel work across components but keep props and contracts stable across the codebase. When unsure, inspect existing files before writing new ones.

This matches the “coding with vision” + agent swarm behavior described for K2.5 and Kimi Code.⁶⁴⁶⁵⁶⁶⁶⁸⁶⁹⁶³

2. Minimal multimodal + tools call example (Fireworks)

Assume you have a PNG wireframe and Coditect tools exposed:

import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.fireworks.ai/inference/v1",
    api_key="FWK_...YOUR_KEY..."
)

def encode_image(path: str) -> str:
    return base64.b64encode(open(path, "rb").read()).decode("utf-8")

CODITECT_TOOLS = [...]  # the JSON tool schema we defined earlier

messages = [
    {
        "role": "system",
        "content": "You are a Kimi K2.5 visual coding swarm. You receive UI screenshots and a short spec. Use tools search_repo, get_file, write_file, and run_sandbox..."
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Convert this dashboard design into a responsive React + Tailwind layout with a sidebar, metric cards, and a chart area."
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{encode_image('dashboard.png')}"
                }
            }
        ]
    }
]

resp = client.chat.completions.create(
    model="fireworks/kimi-k2.5",
    messages=messages,
    tools=CODITECT_TOOLS,
    tool_choice="auto",
    temperature=0.4,
    max_tokens=2048,
)

This mirrors the multimodal wireframe‑to‑React examples in public K2.5 API docs.⁶⁵⁶⁸⁷⁰

3. How K2.5 actually uses vision + swarm here

From the tech report and model card:

It is natively multimodal, pre‑trained on vision–language tokens with a built‑in vision encoder, so it directly reasons over UI screenshots and video.⁶⁶⁶⁹⁶⁴
“Coding with vision” is a first‑class use case: converting mockups and wireframes into React/Tailwind, HTML/CSS, and full responsive layouts.⁶⁸⁶⁹⁷⁰⁶³⁶⁴⁶⁵⁶⁶
Agent Swarm lets it decompose into internal sub‑agents (layout analysis, component decomposition, styling, debugging) and issue many parallel tool calls (up to ~1,500) on complex tasks.⁶⁷⁶⁹⁷¹⁶³⁶⁴⁶⁶

So your prompt is aligned with both the documented capabilities and how they position Kimi Code in demos.⁶⁹⁶³⁶⁶

4. Suggested follow‑up to iterate toward Coditect integration

If you want to push this further into implementation territory, you can ask:

“Design a concrete end‑to‑end workflow where Kimi K2.5 visual coding swarm takes 3–5 UI screenshots, uses my Coditect tools (search_repo, get_file, write_file, run_sandbox) via OpenAI‑style function calling, and produces a fully working React + Tailwind dashboard. Include: (1) tool schemas, (2) example tool traces, and (3) guardrails for file overwrites and sandbox runtime.”

That will give you a more complete, testable spec around this prompt. ⁷²⁷³⁷⁴⁷⁵⁷⁶⁷⁷

⁂

Provide example UI screenshot and spec for a login page to implement

Show sample workflow using search_repo and write_file tools How does agent swarm coordinate up to 100 sub-agents Demonstrate visual debugging on a React Tailwind app Generate OpenAI tool schema for these Kimi K2.5 tools

Example UI and tools, plus how swarms and visual debugging work, below.

You can’t upload an actual image here, but this is a concrete spec you can pair with a Figma/wireframe screenshot:

Layout: Centered card on a light gray background, full‑height viewport.
Card: White, rounded corners, drop shadow, max‑width 420px, full width on mobile.
Header: App logo on top, then “Sign in to Coditect” title, and a short subtitle for SSO hint.
Fields:
- Email input with label and placeholder.
- Password input with label, placeholder, and “Show” toggle.
Actions:
- Primary “Sign in” button (full width).
- Secondary “Continue with GitHub/Google” buttons with icons.
- “Forgot your password?” text link.
Footer: “Don’t have an account? Sign up” inline link; small “By continuing you agree to Terms / Privacy” copy.

Short textual spec you’d send with the screenshot:

Build a responsive React + Tailwind login page: centered card on desktop, full‑width on mobile. Include email/password fields, a primary “Sign in” button, social login buttons, and basic footer links. Match colors and spacing from the screenshot as closely as possible.

You then attach the login screenshot as an image_url alongside that text in the user message.⁷⁸⁷⁹⁸⁰

Sample workflow using `search_repo` and `write_file`

Pseudocode for one loop iteration around K2.5’s tool calls:

User sends: screenshot + spec.
K2.5 (assistant) emits tool calls like:

{
  "role": "assistant",
  "tool_calls": [
    {
      "id": "call_1",
      "type": "function",
      "function": {
        "name": "search_repo",
        "arguments": "{\"query\": \"login\", \"top_k\": 10}"
      }
    }
  ]
}

Your backend executes search_repo and returns results:

{
  "role": "tool",
  "tool_call_id": "call_1",
  "name": "search_repo",
  "content": [
    {
      "path": "src/pages/login.tsx",
      "snippet": "export default function LoginPage() { ... }"
    },
    {
      "path": "src/components/forms/TextInput.tsx",
      "snippet": "export function TextInput(...) { ... }"
    }
  ]
}

K2.5 reads that, then calls write_file:

{
  "role": "assistant",
  "tool_calls": [
    {
      "id": "call_2",
      "type": "function",
      "function": {
        "name": "write_file",
        "arguments": "{\"path\": \"src/pages/login.tsx\", \"content\": \"// new React + Tailwind implementation here...\"}"
      }
    }
  ]
}

You apply the change, run tests or a dev build via run_sandbox, and feed outputs back as more tool messages for further refinement.

This matches the coding‑with‑vision + tool‑chaining pattern described for K2.5.⁸¹⁸²⁷⁹⁸⁰

How Agent Swarm coordinates up to ~100 sub‑agents

From Moonshot’s descriptions:

K2.5 acts as an orchestrator, not just a single agent; via Parallel‑Agent Reinforcement Learning (PARL), it learns a policy that decides how many internal “sub‑agents” to spin up, what roles they play, and which tools they call.⁸³⁸⁰⁸⁴⁸¹
For hard tasks, it can create up to about 100 domain‑specific sub‑agents and coordinate up to ~1,500 tool calls in a single task, achieving up to ~4.5× wall‑clock speedups vs a single‑agent baseline.⁸⁰⁸⁴⁸¹⁸³
PARL trains it to:
- Decompose tasks into parallel subtasks.
- Execute them concurrently (different “sub‑agents” issuing tool calls).
- Aggregate, deduplicate, and reconcile results into a final answer.
Internally, this shows up to you as: multiple related tool calls and follow‑up reasoning steps, but the sub‑agent structure is implicit in its text, not separate API entities.⁸⁴⁸¹⁸³⁸⁰

So your external LangGraph stays thin; you just enforce budgets and safety while K2.5’s learned policy does the swarm coordination.

Visual debugging on a React + Tailwind app

K2.5’s visual debugging pattern (as shown in Kimi Code demos):⁸⁵⁸⁶⁸²⁸¹

Generate or modify React + Tailwind code for your login page.
Run the app in a sandbox and capture a screenshot or short screen recording of the rendered UI.
Send the new screenshot/video back to K2.5 with a prompt like:

Here is the current rendered login page. Compare it to the original design. Identify visual issues (spacing, alignment, responsiveness, colors) and propose code changes to fix them.

K2.5 reasons over the visual differences and emits write_file tool calls with targeted Tailwind/JSX changes.
Repeat until the visual diff is acceptable (you can even ask it to stop when differences are “minor”).

The tech report emphasizes that K2.5 “looks” at its own output and iterates autonomously to fix visual bugs—especially layout and animation issues that are hard to specify in text alone.⁸⁶⁸²⁸⁵⁸¹

OpenAI tool schema for these K2.5 tools

Here is an updated JSON Schema set including the tools you referenced:

[
  {
    "type": "function",
    "function": {
      "name": "search_repo",
      "description": "Search the current workspace by filename and content (semantic + keyword).",
      "parameters": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "Natural language or keyword query describing files or concepts to find."
          },
          "top_k": {
            "type": "integer",
            "description": "Maximum number of results to return.",
            "default": 20,
            "minimum": 1,
            "maximum": 200
          }
        },
        "required": ["query"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "get_file",
      "description": "Retrieve file content from the workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
            "description": "Path relative to the repository root."
          }
        },
        "required": ["path"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "write_file",
      "description": "Create or overwrite a file in the workspace.",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
            "description": "Path relative to the repository root."
          },
          "content": {
            "type": "string",
            "description": "Full file contents to write."
          },
          "overwrite": {
            "type": "boolean",
            "description": "If false, fail when the file already exists.",
            "default": true
          }
        },
        "required": ["path", "content"]
      }
    }
  },
  {
    "type": "function",
    "function": {
      "name": "run_sandbox",
      "description": "Run the current project or a specific script in an isolated Coditect sandbox and return stdout, stderr, exit code, and relevant artifacts.",
      "parameters": {
        "type": "object",
        "properties": {
          "command": {
            "type": "string",
            "description": "Shell command to execute (e.g., `npm test`, `npm run dev -- --port 4173`)."
          },
          "timeout_ms": {
            "type": "integer",
            "description": "Maximum runtime for the command in milliseconds.",
            "minimum": 1000,
            "maximum": 600000,
            "default": 120000
          }
        },
        "required": ["command"]
      }
    }
  }
]

This aligns with Moonshot’s and Fireworks’ JSON‑Schema tool formats for Kimi models while giving K2.5 enough surface area to do visual coding + debugging in a Coditect‑style environment.⁸⁷⁸⁸⁸⁹⁷⁸ ⁹⁰⁹¹⁹²⁹³⁹⁴⁹⁵

⁂

research academic Parallel‑Agent Reinforcement Learning (PARL),

Parallel‑Agent Reinforcement Learning (PARL) in the Kimi K2.5 context is a custom multi‑agent RL training regime where a single “orchestrator” policy learns how many sub‑agents to spawn, how to parallelize them, and is directly rewarded on wall‑clock efficiency rather than just token‑level steps.⁹⁶⁹⁷⁹⁸⁹⁹¹⁰⁰¹⁰¹¹⁰²

What PARL is in Kimi K2.5

PARL is described as a framework where the model learns to decompose complex tasks into parallel subtasks, explicitly optimizing for reduced wall‑clock time, not just fewer reasoning steps.⁹⁷⁹⁸¹⁰⁰¹⁰²
Instead of scripting a static multi‑agent graph, K2.5’s training objective encourages it to act as a trainable orchestrator that decides the degree of parallelism, sub‑agent roles, and tool‑call scheduling.⁹⁹¹⁰⁰¹⁰¹⁹⁶⁹⁷
In deployment, this yields the “agent swarm” behavior: up to ~100 concurrent sub‑agents and ≈1,500 tool calls per task, with reported up to ~4.5× latency reduction against single‑agent tool‑use baselines.⁹⁸¹⁰⁰¹⁰¹¹⁰²

High‑level training loop (conceptual)

Moonshot hasn’t published a full algorithm pseudocode, but public descriptions let you infer the structure:¹⁰⁰¹⁰¹¹⁰²⁹⁶⁹⁷⁹⁹

Environment: long‑horizon tasks (codebases, multi‑page web search, document corpora) with tools (browsers, code runners, retrievers) as actions.
Policy: the LLM (K2.5) outputs both content and a “plan” indirectly via tool calls and internal role descriptions (sub‑agents are implicit in the token stream).
Parallel action space: at each “macro step” the orchestrator can issue multiple tool calls corresponding to different subtasks (e.g., different files, URLs, or sections), which execute concurrently.
Reward: combines task success (e.g., benchmark pass/fail, correctness) and a penalty tied to actual or proxy wall‑clock time; policies that achieve goals with more parallelization but similar or fewer total steps are favored.¹⁰²⁹⁷¹⁰⁰

So PARL is essentially a multi‑agent RL curriculum where the degree of parallelism is part of the learned strategy rather than a fixed hyperparameter.

Relation to academic PARL / MARL literature

The name “Parallel‑Agent Reinforcement Learning (PARL)” is Moonshot’s own term for this training regime; it is conceptually related to multi‑agent RL and swarm RL, but not yet a standalone, detailed arXiv method as of the current technical report.¹⁰¹⁹⁶⁹⁹¹⁰⁰
There is a separate, unrelated academic work called “PARL: Prompt-based Agent for Reinforcement Learning,” which uses LLMs as RL agents via prompting in simple environments (states/actions/rewards encoded in the prompt, no weight updates).¹⁰³¹⁰⁴
Classic swarm/MARL work (e.g., CTDE MARL, swarm robotics with GNNs, decentralized policies) gives useful conceptual background on learning coordinated behaviors across many agents, but Kimi’s PARL differs in that all “sub‑agents” are slices of a single large policy with shared weights.¹⁰⁵¹⁰⁶¹⁰⁷¹⁰⁸

For rigorous grounding, you would combine Moonshot’s K2.5 technical report with MARL swarm papers (CTDE, GNN‑based swarms, centralized‑training/decentralized‑execution) to reconstruct a more formal PARL framework.¹⁰⁶¹⁰⁷¹⁰⁸¹⁰⁵¹⁰¹

Key properties for your agentic systems work

Centralized policy, emergent parallelism: One LLM policy learns to impersonate many cooperating agents, rather than training many separate policies.⁹⁷¹⁰⁰¹⁰¹
Parallelism as a learned choice: How many tools to call in parallel and what each “sub‑agent” does are part of the action space, not fixed by an external DAG.¹⁰⁰¹⁰²⁹⁷
Rewarded on latency: This explicitly bakes infra‑like concerns (wall‑clock) into the learning problem, making it analytically closer to scheduling/throughput optimization than standard step‑count RL.¹⁰²⁹⁷¹⁰⁰

If you want to go deeper, a focused next step would be: “Outline a formal MDP / multi‑agent MDP definition for K2.5’s PARL training, including state, action (parallel tool calls), and reward design, and map it to existing CTDE MARL formulations.” ¹⁰⁹¹¹⁰

⁂

Outline a formal MDP / multi‑agent MDP definition for K2.5’s PARL training, including state, action (parallel tool calls), and reward design, and map it to existing CTDE MARL formulations. How does PARL enable Agent Swarm in Kimi K2.5

What benchmarks show Kimi K2.5 outperforming GPT-5.2 Compare PARL prompt-based RL to traditional MARL Key challenges of PARL in complex RL environments Academic papers evaluating PARL on grid world tasks

Kimi K2.5’s PARL can be framed as a centralized‑policy multi‑agent MDP where the LLM decides how many parallel tool calls (sub‑agents) to issue per step, with reward shaped by both task success and wall‑clock efficiency.¹¹¹¹¹²¹¹³¹¹⁴¹¹⁵¹¹⁶

1. Formal (single‑agent) MDP view with parallel actions

Let $E$ be an environment representing a long‑horizon task (e.g., browse, codebase, document corpus) with tools as actions.

State $s_t$:
- Text history (conversation, tool results).
- External env state summary (e.g., current files touched, URLs visited, task progress).
- Latency/budget state (tokens spent, tool calls used).
Action $a_t$:
- A set of parallel tool calls $a_t = {a_t^{(1)}, \dots, a_t^{(k_t)}}$ where each $a_t^{(i)}$ encodes a tool name plus JSON args (e.g., search, run_tests, get_url).
Transition $P(s_{t+1} \mid s_t, a_t)$: environment executes all tools in parallel, returns their outputs (possibly with stochastic latency/contents), which are appended to the history.
Reward $r_t$:
- Terminal: task success, correctness scores on benchmarks (e.g., BrowseComp, WideSearch, SWE‑Bench).¹¹⁷¹¹²¹¹³
- Shaping: negative cost for wall‑clock time and possibly for tool calls or tokens; PARL’s description emphasizes latency‑sensitive rewards.¹¹²¹¹⁴¹¹⁵¹¹⁶¹¹¹

The policy $\pi_\theta(a_t \mid s_t)$ is the LLM; PARL augments its action space from “one tool call or text step” to “a bundle of parallel tool calls plus text”, and tunes it with RL.

2. Multi‑agent / CTDE mapping

We can reinterpret this as a centralized‑training, decentralized‑execution (CTDE) MARL setup:

Centralized “orchestrator” agent $\pi_\theta^{\text{orch}}$:
- Observes $s_t$ (full history + budget).
- Decides the number of sub‑tasks $k_t$, their roles, and corresponding tool calls.
Sub‑agents $\pi_\theta^{(i)}$:
- In practice they are the same LLM with different role prompts (“web researcher”, “file analyst”, etc.), but conceptually they are distinct agents acting in parallel.¹¹⁷¹¹¹¹¹²

CTDE mapping:

Centralized training: the orchestrator sees global state and all tool results, gets global reward (task success + latency), and gradient updates are applied to shared parameters $\theta$.¹¹⁴¹¹⁵¹¹⁶¹¹¹¹¹²
Decentralized execution: at inference, each sub‑agent (a role‑prompted instance) operates on its local subtask context; the orchestrator aggregates their outputs into the next global state.¹¹³¹¹¹¹¹²¹¹⁷

So PARL ≈ CTDE MARL where:

All policies share weights (one LLM),
“Decentralized” is implemented via role prompts and separate tool calls,
Parallelism (how many sub‑agents) is part of the orchestrator’s action.

3. How PARL enables Agent Swarm in Kimi K2.5

From public analyses and Moonshot’s materials:

Trainable orchestrator: K2.5 learns when to decompose tasks and how many parallel subtasks to spawn, instead of following a fixed workflow graph.¹¹⁵¹¹⁶¹¹¹¹¹²¹¹³¹¹⁴¹¹⁷
Dynamic sub‑agents: roles like “AI Researcher”, “Fact Checker”, etc., are generated on the fly in text, and each corresponds to one or more tool calls; the orchestrator coordinates up to ~100 such sub‑agents and ~1,500 tool calls per task.¹¹¹¹¹²¹¹³¹¹⁷
Latency‑aware reward: benchmarks such as BrowseComp and WideSearch show large gains when swarm is enabled; reported speedups are roughly 3–4.5× in wide‑search workloads compared with single‑agent execution.¹¹²¹¹³¹¹⁷¹¹¹

Mechanically, Agent Swarm is PARL’s learned policy in action: at each step, instead of one call like “search once”, K2.5 emits many calls (to different URLs, files, or tools) and later merges the results.

4. Benchmarks vs GPT‑5.2

There is no evidence that Kimi K2.5 overall outperforms GPT‑5.2 on general reasoning; third‑party comparisons emphasize that GPT‑5.x still leads on aggregate “intelligence index” metrics.¹¹⁸¹¹⁹¹²⁰

For earlier K2 (not 2.5), some head‑to‑heads indicate K2 can beat GPT‑5 on specific agentic benchmarks like Humanity’s Last Exam and BrowseComp when using tools, especially in cost‑normalized settings.¹¹⁹
For K2.5, early commentary highlights strong performance on SWE‑Bench, HLE, BrowseComp, WideSearch, MMMU‑Pro, etc., but does not claim consistent dominance over GPT‑5.2 on broad benchmarks.¹¹³¹¹⁷¹¹¹¹¹²

So the right reading is: K2.5 is competitive or better on some agent‑centric tasks and cost/performance tradeoffs, but GPT‑5.2 still has higher overall “intelligence index” in many evaluations.¹²⁰¹¹⁸¹¹⁹

5. PARL (prompt‑based RL) vs traditional MARL

The academic PARL paper (“Prompt‑based Agents for Reinforcement Learning”) is a different method: it uses frozen LLMs as agents via prompting in small RL environments like grid worlds.¹²¹¹²²¹²³¹²⁴

Differences vs traditional MARL:

Model updates:
- Academic PARL: no weight updates; learning happens via in‑context adaptation, encoding state, action, reward histories in the prompt.¹²²¹²³¹²⁴¹²¹
- Traditional MARL: Q‑learning, policy gradient, or CTDE methods update parametric policies over time.
Representation:
- Academic PARL: states, actions, and rewards are text tokens; the LLM’s pretrained priors provide structure even in non‑linguistic environments like grid navigation.¹²³¹²⁴¹²¹¹²²
- Traditional MARL: uses numeric state vectors and neural networks specialized to the environment.
Scalability:
- Academic PARL: shown on simple tasks (grid world, card games), competitive with classic RL but limited in environments requiring complex math or precise state decoding.¹²⁴¹²¹¹²²¹²³
- Traditional MARL: scales better in continuous control, many‑agent coordination, and high‑dimensional non‑text domains.

Moonshot’s PARL for K2.5 is closer to classic RL (with parameter updates) but applied to an LLM and extended to parallel tool‑calling; the naming collision with the academic PARL paper is incidental.

6. Key challenges of PARL in complex environments

From MARL literature plus K2.5 commentary:¹²⁵¹²⁶¹²⁷¹²⁸¹²¹¹²⁴¹¹⁷¹¹¹¹¹²¹¹³

Credit assignment under parallelism: attributing reward to specific sub‑agents or tool bundles is hard when many act concurrently.
Non‑stationary environment: tools and web change over time; long‑horizon tasks make rewards sparse and delayed.
Stability vs exploration: large action space (many tools, arguments, and possible sub‑agent roles) increases variance and can destabilize RL training.
Cost and wall‑clock: running thousands of tool calls during training is expensive; yet latency is part of the reward, so training itself must approximate true performance.
Safety and degeneracy: a learned orchestrator might over‑parallelize, spam tools, or exploit quirks in reward proxies (e.g., “do minimal work and guess answers”), requiring carefully designed constraints and safety filters.

These echo classic MARL issues (coordination, scalability, credit assignment) but with the added complexity of tool‑rich, partially observable, open‑world environments.

7. Academic PARL evaluations on grid worlds

The PARL (Prompt‑based Agents for RL) paper evaluates LLM‑based agents on structured environments including grid worlds:¹²¹¹²²¹²³¹²⁴

Setup:
- States (grid positions, passenger locations) are encoded as textual descriptions.
- Actions (move, pick up, drop) and rewards are written into the prompt at each timestep, forming a cumulative history.
Tasks:
- Taxi‑like grid navigation, simple card games, and other discrete RL benchmarks.
Results:
- PARL matches or outperforms classic RL agents in simple environments by exploiting pretrained knowledge.
- It struggles in tasks needing fine‑grained numeric reasoning or where state/action encodings become too verbose.

This gives you a clean, reproducible example of prompt‑only RL to contrast with Moonshot’s weight‑updating PARL for K2.5.

If you want, next step could be: “derive a CTDE MARL objective for K2.5‑style PARL with explicit notation for parallel tool bundles, and sketch how you’d simulate a simplified version in a toy web‑search environment.” ¹²⁹¹³⁰¹³¹¹³²¹³³

⁂

can this methodology be abstracted away from the trained model and developed into a methodology and framework that could work on any llm?

Yes, you can absolutely abstract the PARL / Agent Swarm methodology into a framework that works with many LLMs, but you won’t get K2.5‑level behavior without some form of RL/post‑training on each target model. You can, however, build a useful “PARL‑inspired” orchestration layer that is model‑agnostic.

What is model‑specific vs framework‑able?

From public descriptions of K2.5:¹³⁴¹³⁵¹³⁶¹³⁷¹³⁸

Model‑specific (hard to generalize without training):
- The policy that decides how many sub‑agents to spawn, what roles they take, and which tools to call in parallel.
- Latency‑aware behavior baked in via RL (explicit optimization for wall‑clock speed at similar quality).
- Robust long‑horizon stability with 100 sub‑agents / 1,500 tool calls.
Framework‑able (you can generalize):
- The interface: state as conversation + tool results, actions as sets of tool calls, rewards combining success + latency.
- A CTDE‑style loop where a “manager” LLM plans parallel sub‑tasks and workers execute them.
- A runtime that supports parallel tool execution, aggregation, and budget/safety control.

So you can turn PARL’s structure into a reusable orchestration pattern, but you won’t magically get optimal parallelism from arbitrary LLMs without some training or careful prompt shaping.

What a model‑agnostic PARL‑style framework would look like

You can define a generic framework with these pieces:

State abstraction (for any LLM):
- $s_t =$ {conversation history, tool results, task metadata, budget state}.
Action abstraction:
- Manager LLM outputs a bundle of tool calls $a_t = {a_t^{(1)}, …, a_t^{(k_t)}}$ plus text; the framework runs each a_t^{(i)} in parallel.
Policy plugging:
- Any chat‑completion LLM (GPT‑x, Claude, Kimi, local) can be used as the “manager” and as “workers” via role prompts.
Reward & logging layer:
- Track per‑task success (tests pass? benchmark solved?), latency, tool counts, and optional human ratings; store data for offline RL or bandit fine‑tuning later.

This is very close to existing research frameworks like M1‑Parallel (parallel LLM agents) and ARTIST (agentic RL + tools) which are designed to sit on top of arbitrary LLMs.¹³⁹¹⁴⁰¹⁴¹¹⁴²

Practical architecture sketch

Thin orchestrator service:
- Exposes a “manager step” endpoint: given $s_t$, call any LLM and read back its proposed tool bundle.
- Executes tools concurrently, updates state, and repeats until stopping criteria.
Policy‑agnostic:
- The orchestrator doesn’t care if the LLM is K2.5, GPT‑5.2, or a local model; it just needs a consistent tool‑call JSON schema.
Optional training layer:
- You can later run RL (LLM‑in‑Sandbox‑RL‑style) on top of your logs to specialize a given model for this environment.¹⁴¹¹⁴²

You’d get a “PARL‑like” framework that can orchestrate parallel multi‑agent work even for base models that were never RL‑trained for swarm behavior.

What you don’t get without training

K2.5’s advantage comes from having learned good parallelization strategies and sub‑agent patterns:¹³⁵¹³⁶¹³⁷¹³⁸¹³⁴

Without RL / post‑training, a generic LLM:
- Will often under‑parallelize (one tool at a time) or over‑parallelize (spam calls).
- May not discover robust internal role structures (fact‑checker, planner, etc.) reliably.
- Will not systematically trade off latency vs redundancy unless you encode that in prompts and heuristics.

So a framework alone gets you the capability to run many sub‑agents in parallel; making that efficient and reliable still requires either model‑specific RL (Moonshot’s path) or hand‑tuned heuristics.

How this differs from pure prompt‑only approaches

Prompt‑only PARL (the academic “Prompt-based Agents for RL”) shows you can encode state/action/reward history in prompts for gridworld‑style tasks without weight updates.¹⁴³¹⁴⁴¹⁴⁵¹⁴⁶

That approach is portable to any LLM, but:
- It’s demonstrated on small environments (grid worlds, simple games).
- Complexity and context length blow up quickly in real web / code tasks.
K2.5‑style PARL adds:
- Weight updates tuned to a tool‑rich sandbox, akin to LLM‑in‑Sandbox‑RL and ARTIST, which have been shown to generalize across models and tasks when applied systematically.¹⁴⁰¹⁴²¹⁴¹

So: methodology (parallel multi‑agent CTDE pattern + latency‑aware reward) is portable; the behavioral quality is not, unless you repeat some form of RL/fine‑tuning per model.

Bottom line for your use case

Yes: you can design a PARL‑inspired orchestration framework that works with any LLM (Coditect is a great host environment).
To approach K2.5‑like swarms, you either:
- Use models that already have agentic RL (K2.5, future variants, LLM‑in‑Sandbox‑RL‑trained models), or
- Run your own RL / bandit loop on top of your logs to specialize a base model for your tool + task distribution.

If you want, I can next draft an explicit “model‑agnostic PARL framework spec” (state/action schema, manager/worker contracts, logging, and a minimal offline‑RL loop) that you could implement on top of your current LangGraph‑style infra. ¹⁴⁷¹⁴⁸¹⁴⁹¹⁵⁰¹⁵¹¹⁵²¹⁵³¹⁵⁴¹⁵⁵

⁂

draft an explicit “model‑agnostic PARL framework spec” (state/action schema, manager/worker contracts, logging, and a minimal offline‑RL loop) that you could implement on top of your current LangGraph‑style infra.

Yes. Here is a model‑agnostic PARL framework spec you can layer on LangGraph (or equivalent) and drive with any chat‑completion LLM.

1. Core abstractions

State schema

Define a task‑level state $S$ shared across manager and workers:

type ToolCall = {
  id: string           // unique per step
  name: string         // tool name
  args: any            // JSON‑serializable
  role?: string        // optional logical sub‑agent role
};

type Message =
  | { role: "user" | "system" | "assistant"; content: string }
  | { role: "tool"; name: string; tool_call_id: string; content: any };

type AgentState = {
  task_id: string;
  messages: Message[];         // full dialog + tool outputs
  step: number;                // macro‑step index
  tool_budget: number;         // remaining tool calls
  time_budget_ms: number;      // remaining wall‑clock budget
  parallel_limit: number;      // max tools per macro‑step
  tool_calls_this_step?: ToolCall[];
  metrics?: {
    start_time_ms: number;
    wall_clock_ms?: number;
    num_tool_calls: number;
    num_tokens_in?: number;
    num_tokens_out?: number;
    // optional task‑specific metrics once finished
    task_score?: number;
    success?: boolean;
  };
};

This is consistent with LLM‑in‑Sandbox and ARTIST‑style environments where state is “prompt + sandbox/files + metrics”.¹⁵⁶¹⁵⁷¹⁵⁸¹⁵⁹¹⁶⁰¹⁶¹

Action schema

At each macro‑step, the manager LLM emits a bundle of tool calls plus optional textual reasoning:

type ManagerAction = {
  thought: string;           // free‑form plan / reasoning
  tool_calls: ToolCall[];    // 0..parallel_limit
  done: boolean;             // whether to stop and emit final answer
};

Workers don’t need explicit actions in your infra: each tool call is effectively a “worker action”; the same or another LLM can be used inside tools if needed.

2. Manager / worker contracts

Manager contract (any LLM)

Prompt contract:

Input: current AgentState.messages, budgets, and tool schema (OpenAI‑style JSON).
Output: a JSON ManagerAction encoded in an assistant message, e.g.:

{
  "role": "assistant",
  "content": "Planning step 3...",
  "tool_calls": [
    {
      "id": "call_1",
      "name": "search_repo",
      "args": {"query": "login form component", "top_k": 5},
      "role": "codebase_mapper"
    },
    {
      "id": "call_2",
      "name": "get_file",
      "args": {"path": "src/components/ui/Button.tsx"},
      "role": "component_reader"
    }
  ]
}

Framework responsibilities:

Enforce tool_calls.length <= parallel_limit and tool_budget > 0.
If done == true or tool_calls.length == 0, treat the assistant message as a candidate final answer.

You can implement this with any LLM that supports tool calling (Kimi, GPT, Claude, local), or even by asking it to emit JSON you parse yourself; the spec is model‑agnostic.¹⁵⁸¹⁶²¹⁶³

Worker / tool contract

Each tool is a pure function from (name, args) to structured output:

type ToolResult = {
  tool_call_id: string;
  name: string;
  ok: boolean;
  result?: any;
  error?: string;
  latency_ms: number;
};

You then inject:

{
  "role": "tool",
  "name": result.name,
  "tool_call_id": result.tool_call_id,
  "content": {
    "ok": result.ok,
    "result": result.result,
    "error": result.error,
    "latency_ms": result.latency_ms
  }
}

back into state.messages in the next manager step.

3. LangGraph‑style workflow

Use a thin graph: one manager node, one tool‑executor node, plus a router.

from typing import TypedDict, List, Any
from langgraph.graph import StateGraph, END

class AgentState(TypedDict):
    task_id: str
    messages: List[Any]
    step: int
    tool_budget: int
    time_budget_ms: int
    parallel_limit: int
    metrics: dict

def manager_node(state: AgentState) -> AgentState:
    # Call any LLM with your PARL‑style system prompt
    resp = llama_chat(
        messages=state["messages"],
        tools=TOOL_SCHEMA,
        tool_choice="auto"
    )
    msg = resp["choices"][^10_0]["message"]
    state["messages"].append(msg)
    state["step"] += 1
    return state

def tool_exec_node(state: AgentState) -> AgentState:
    last = state["messages"][-1]
    tool_calls = getattr(last, "tool_calls", []) or last.get("tool_calls") or []
    if not tool_calls:
        return state

    # Enforce budgets
    allowed_calls = min(len(tool_calls), state["parallel_limit"], state["tool_budget"])
    selected_calls = tool_calls[:allowed_calls]

    results: List[Any] = run_tools_in_parallel(selected_calls)  # your infra

    state["tool_budget"] -= len(selected_calls)
    state["metrics"]["num_tool_calls"] += len(selected_calls)

    for r in results:
        state["messages"].append({
            "role": "tool",
            "name": r.name,
            "tool_call_id": r.tool_call_id,
            "content": {
                "ok": r.ok,
                "result": r.result,
                "error": r.error,
                "latency_ms": r.latency_ms,
            },
        })
    return state

def router(state: AgentState):
    last = state["messages"][-1]
    tool_calls = getattr(last, "tool_calls", []) or last.get("tool_calls") or []
    if state["tool_budget"] <= 0 or state["time_budget_ms"] <= 0:
        return END
    if tool_calls:
        return "tools"
    # No new tool calls: treat as finished
    return END

builder = StateGraph(AgentState)
builder.add_node("manager", manager_node)
builder.add_node("tools", tool_exec_node)
builder.set_entry_point("manager")
builder.add_edge("manager", "tools")
builder.add_conditional_edges("tools", router)
graph = builder.compile()

This is directly compatible with how M1‑Parallel and LLM‑in‑Sandbox structure parallel plans, but wrapped in LangGraph terms.¹⁵⁹¹⁶⁴¹⁶²¹⁵⁸

4. Logging and experience format

You want logs suitable for offline RL / bandits:

type Transition = {
  task_id: string;
  step: number;
  s_t: AgentStateSnapshot;      // minimally: messages hash, budgets, etc.
  a_t: ManagerAction;           // tool bundle + text
  tool_results: ToolResult[];   // executed in parallel
  s_t1: AgentStateSnapshot;
  r_t: number;                  // scalar reward at this step (can be 0)
  done: boolean;
};

type Episode = {
  task_id: string;
  model_id: string;
  transitions: Transition[];
  final_metrics: {
    success: boolean;
    task_score: number;
    wall_clock_ms: number;
    num_tool_calls: number;
    num_tokens_in: number;
    num_tokens_out: number;
  };
};

You can store AgentStateSnapshot as a compact representation (e.g., hashes or truncated messages) rather than full prompts, to keep RL tractable.

Reward design (per episode):

Terminal reward: e.g.
- task_score from tests / benchmarks (0–1).
- success boolean mapped to {0,1}.
Cost penalty:
- − α * log(1 + wall_clock_ms)
- − β * num_tool_calls

This mirrors ARTIST and LLM‑in‑Sandbox’s outcome‑based RL with cost regularization.¹⁶⁰¹⁶¹¹⁵⁸¹⁵⁹

5. Minimal offline‑RL loop (algorithm‑agnostic)

At a high level:

Collect episodes
- Run your existing system with one or more base models (GPT, Claude, Kimi, local) using the PARL framework.
- Log episodes as above with task_score and cost metrics.
Define a policy representation
- For a given model, policy is “prompt → action JSON”.
- You can fine‑tune it via supervised learning on good steps or do explicit RL (REINFORCE / PPO‑style on outcome rewards) if you control the weights (open models).¹⁶¹¹⁵⁸¹⁵⁹¹⁶⁰
Supervised pre‑filter (cheap)
- Filter transitions where task_score is high and budgets respected.
- Train a next‑step policy to imitate a_t from those transitions (behavior cloning).
- This already encourages more parallel tool usage in contexts where it worked.
Outcome‑based RL (optional)
- Use episode‑level reward $R$ (e.g., task_score − λ * normalized_cost) and apply policy‑gradient on the manager’s logits for tool‑bundle decisions.
- Since many proprietary models can’t be fine‑tuned, you’d apply this mainly to open models (K2.5, Llama‑family, etc.) or via adapters.
Re‑deployment
- Replace the manager model (or its system prompt) with the tuned version and keep the rest of the framework unchanged.

This mirrors LLM‑in‑Sandbox‑RL and ARTIST: RL over a tool‑rich environment with outcome‑based rewards.¹⁶⁵¹⁶⁶¹⁵⁸¹⁵⁹¹⁶⁰¹⁶¹

6. How this stays model‑agnostic

Any LLM that can:
- Take a history of messages + tool results.
- Output structured tool calls (name + args).

can be plugged in as manager with:

No changes to your LangGraph graph.
No changes to tool schema.
Only the system prompt / examples change per model.

If you later get a model like K2.5 that has PARL‑style training baked in, it just performs better under the same framework; otherwise, you can approximate PARL behavior via prompt engineering plus the offline‑RL loop for open models.

If you’d like, next step I can do is: specify a concrete JSON logging spec + example episode for a Coditect “multi‑file refactor” task so you can wire this into your telemetry immediately. ¹⁶⁷¹⁶⁸¹⁶⁹¹⁷⁰

⁂

Run your own RL / bandit loop on top of your logs to specialize a base model for your tool + task distribution. Provide LangGraph code skeleton for PARL state graph

Extend PARL spec with online RL adaptation loop Define JSON schemas for manager worker contracts Implement PARL logging with LangSmith integration Benchmark PARL framework vs single agent baselines

Core abstractions State schema Define a task‑level state SS shared across manager and workers: ts type ToolCall = { id: string // unique per step name: string // tool name args: any // JSON‑serializable role?: string // optional logical sub‑agent role };

type Message = | { role: "user" | "system" | "assistant"; content: string } | { role: "tool"; name: string; tool_call_id: string; content: any };

type AgentState = { task_id: string; messages: Message[]; // full dialog + tool outputs step: number; // macro‑step index tool_budget: number; // remaining tool calls time_budget_ms: number; // remaining wall‑clock budget parallel_limit: number; // max tools per macro‑step tool_calls_this_step?: ToolCall[]; metrics?: { start_time_ms: number; wall_clock_ms?: number; num_tool_calls: number; num_tokens_in?: number; num_tokens_out?: number; // optional task‑specific metrics once finished task_score?: number; success?: boolean; }; };

This is consistent with LLM‑in‑Sandbox and ARTIST‑style environments where state is “prompt + sandbox/files + metrics”. Action schema At each macro‑step, the manager LLM emits a bundle of tool calls plus optional textual reasoning: ts type ManagerAction = { thought: string; // free‑form plan / reasoning tool_calls: ToolCall[]; // 0..parallel_limit done: boolean; // whether to stop and emit final answer };

Workers don’t need explicit actions in your infra: each tool call is effectively a “worker action”; the same or another LLM can be used inside tools if needed. 2. Manager / worker contracts Manager contract (any LLM) Prompt contract: Input: current AgentState.messages, budgets, and tool schema (OpenAI‑style JSON). Output: a JSON ManagerAction encoded in an assistant message, e.g.: json { "role": "assistant", "content": "Planning step 3...", "tool_calls": [ { "id": "call_1", "name": "search_repo", "args": {"query": "login form component", "top_k": 5}, "role": "codebase_mapper" }, { "id": "call_2", "name": "get_file", "args": {"path": "src/components/ui/Button.tsx"}, "role": "component_reader" } ] }

Framework responsibilities: Enforce tool_calls.length <= parallel_limit and tool_budget > 0. If done == true or tool_calls.length == 0, treat the assistant message as a candidate final answer. You can implement this with any LLM that supports tool calling (Kimi, GPT, Claude, local), or even by asking it to emit JSON you parse yourself; the spec is model‑agnostic. Worker / tool contract Each tool is a pure function from (name, args) to structured output: ts type ToolResult = { tool_call_id: string; name: string; ok: boolean; result?: any; error?: string; latency_ms: number; };

You then inject: ts { "role": "tool", "name": result.name, "tool_call_id": result.tool_call_id, "content": { "ok": result.ok, "result": result.result, "error": result.error, "latency_ms": result.latency_ms } }

back into state.messages in the next manager step. 3. LangGraph‑style workflow Use a thin graph: one manager node, one tool‑executor node, plus a router. python from typing import TypedDict, List, Any from langgraph.graph import StateGraph, END

class AgentState(TypedDict): task_id: str messages: List[Any] step: int tool_budget: int time_budget_ms: int parallel_limit: int metrics: dict

def manager_node(state: AgentState) -> AgentState: # Call any LLM with your PARL‑style system prompt resp = llama_chat( messages=state["messages"], tools=TOOL_SCHEMA, tool_choice="auto" ) msg = resp["choices"][0]["message"] state["messages"].append(msg) state["step"] += 1 return state

def tool_exec_node(state: AgentState) -> AgentState: last = state["messages"][-1] tool_calls = getattr(last, "tool_calls", []) or last.get("tool_calls") or [] if not tool_calls: return state

# Enforce budgets
allowed_calls = min(len(tool_calls), state["parallel_limit"], state["tool_budget"])
selected_calls = tool_calls[:allowed_calls]

results: List[Any] = run_tools_in_parallel(selected_calls)  # your infra

state["tool_budget"] -= len(selected_calls)
state["metrics"]["num_tool_calls"] += len(selected_calls)

for r in results:
    state["messages"].append({
        "role": "tool",
        "name": r.name,
        "tool_call_id": r.tool_call_id,
        "content": {
            "ok": r.ok,
            "result": r.result,
            "error": r.error,
            "latency_ms": r.latency_ms,
        },
    })
return state
def router(state: AgentState):

last = state["messages"][-1] tool_calls = getattr(last, "tool_calls", []) or last.get("tool_calls") or [] if state["tool_budget"] <= 0 or state["time_budget_ms"] <= 0: return END if tool_calls: return "tools" # No new tool calls: treat as finished return END

builder = StateGraph(AgentState) builder.add_node("manager", manager_node) builder.add_node("tools", tool_exec_node) builder.set_entry_point("manager") builder.add_edge("manager", "tools") builder.add_conditional_edges("tools", router) graph = builder.compile()

This is directly compatible with how M1‑Parallel and LLM‑in‑Sandbox structure parallel plans, but wrapped in LangGraph terms. 4. Logging and experience format You want logs suitable for offline RL / bandits: ts type Transition = { task_id: string; step: number; s_t: AgentStateSnapshot; // minimally: messages hash, budgets, etc. a_t: ManagerAction; // tool bundle + text tool_results: ToolResult[]; // executed in parallel s_t1: AgentStateSnapshot; r_t: number; // scalar reward at this step (can be 0) done: boolean; };

type Episode = { task_id: string; model_id: string; transitions: Transition[]; final_metrics: { success: boolean; task_score: number; wall_clock_ms: number; num_tool_calls: number; num_tokens_in: number; num_tokens_out: number; }; };

You can store AgentStateSnapshot as a compact representation (e.g., hashes or truncated messages) rather than full prompts, to keep RL tractable. Reward design (per episode): Terminal reward: e.g. task_score from tests / benchmarks (0–1). success boolean mapped to {0,1}. Cost penalty: − α * log(1 + wall_clock_ms) − β * num_tool_calls This mirrors ARTIST and LLM‑in‑Sandbox’s outcome‑based RL with cost regularization. 5. Minimal offline‑RL loop (algorithm‑agnostic) At a high level: Collect episodes Run your existing system with one or more base models (GPT, Claude, Kimi, local) using the PARL framework. Log episodes as above with task_score and cost metrics. Define a policy representation For a given model, policy is “prompt → action JSON”. You can fine‑tune it via supervised learning on good steps or do explicit RL (REINFORCE / PPO‑style on outcome rewards) if you control the weights (open models). Supervised pre‑filter (cheap) Filter transitions where task_score is high and budgets respected. Train a next‑step policy to imitate a_t from those transitions (behavior cloning). This already encourages more parallel tool usage in contexts where it worked. Outcome‑based RL (optional) Use episode‑level reward RR (e.g., task_score − λ * normalized_cost) and apply policy‑gradient on the manager’s logits for tool‑bundle decisions. Since many proprietary models can’t be fine‑tuned, you’d apply this mainly to open models (K2.5, Llama‑family, etc.) or via adapters. Re‑deployment Replace the manager model (or its system prompt) with the tuned version and keep the rest of the framework unchanged. This mirrors LLM‑in‑Sandbox‑RL and ARTIST: RL over a tool‑rich environment with outcome‑based rewards. 6. How this stays model‑agnostic Any LLM that can: Take a history of messages + tool results. Output structured tool calls (name + args). can be plugged in as manager with: No changes to your LangGraph graph. No changes to tool schema. Only the system prompt / examples change per model. If you later get a model like K2.5 that has PARL‑style training baked in, it just performs better under the same framework; otherwise, you can approximate PARL behavior via prompt engineering plus the offline‑RL loop for open models. If you’d like, next step I can do is:

Yes. Building on that spec, here’s a focused extension with the pieces you asked for.

LangGraph PARL state graph skeleton (manager + tools + online loop hook)

from typing import TypedDict, List, Any, Optional
from langgraph.graph import StateGraph, END
import time

# -------- Core state --------

class Metrics(TypedDict, total=False):
    start_time_ms: int
    wall_clock_ms: int
    num_tool_calls: int
    num_tokens_in: int
    num_tokens_out: int
    task_score: float
    success: bool

class AgentState(TypedDict):
    task_id: str
    messages: List[Any]
    step: int
    tool_budget: int
    time_budget_ms: int
    parallel_limit: int
    metrics: Metrics
    # online-RL control
    episode_log_id: Optional[str]

# -------- Manager node --------

def manager_node(state: AgentState) -> AgentState:
    t0 = time.time()

    resp = manager_llm_chat(
        messages=state["messages"],
        tools=TOOL_SCHEMA,
        tool_choice="auto",
    )
    msg = resp["choices"][^11_0]["message"]

    state["messages"].append(msg)
    state["step"] += 1

    # online adaptive tweaks (bandit hook)
    adapt_manager_prompt_or_params(
        state=state,
        last_message=msg,
        latency_ms=int((time.time() - t0) * 1000),
    )

    return state

# -------- Tool executor node --------

def tool_exec_node(state: AgentState) -> AgentState:
    last = state["messages"][-1]
    tool_calls = getattr(last, "tool_calls", []) or last.get("tool_calls") or []
    if not tool_calls:
        return state

    allowed = min(len(tool_calls), state["parallel_limit"], state["tool_budget"])
    selected = tool_calls[:allowed]

    results = run_tools_in_parallel(selected)  # your infra

    state["tool_budget"] -= len(selected)
    state["metrics"]["num_tool_calls"] += len(selected)

    for r in results:
        state["messages"].append({
            "role": "tool",
            "name": r.name,
            "tool_call_id": r.tool_call_id,
            "content": {
                "ok": r.ok,
                "result": r.result,
                "error": r.error,
                "latency_ms": r.latency_ms,
            },
        })

    return state

# -------- Router --------

def router(state: AgentState):
    last = state["messages"][-1]
    tool_calls = getattr(last, "tool_calls", []) or last.get("tool_calls") or []

    # time budget enforcement
    elapsed = int(time.time() * 1000) - state["metrics"]["start_time_ms"]
    state["metrics"]["wall_clock_ms"] = elapsed
    if elapsed >= state["time_budget_ms"]:
        return END

    if state["tool_budget"] <= 0:
        return END
    if tool_calls:
        return "tools"
    return END

# -------- Graph compile --------

builder = StateGraph(AgentState)
builder.add_node("manager", manager_node)
builder.add_node("tools", tool_exec_node)

builder.set_entry_point("manager")
builder.add_edge("manager", "tools")
builder.add_conditional_edges("tools", router)

graph = builder.compile()

You can wrap graph.stream/graph.invoke with pre/post hooks to start/close an RL episode and emit LangSmith traces.

JSON schemas for manager / worker contracts

ManagerAction JSON (what you log / optionally supervise)

{
  "$id": "ManagerAction",
  "type": "object",
  "properties": {
    "thought": {
      "type": "string",
      "description": "Free-form reasoning and plan for this macro-step."
    },
    "tool_calls": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/ToolCall"
      },
      "description": "Bundle of tool calls to execute in parallel this step."
    },
    "done": {
      "type": "boolean",
      "description": "If true, no more tool calls; treat last assistant content as final answer."
    }
  },
  "required": ["thought", "tool_calls", "done"],
  "definitions": {
    "ToolCall": {
      "type": "object",
      "properties": {
        "id": { "type": "string" },
        "name": { "type": "string" },
        "args": { "type": "object" },
        "role": {
          "type": "string",
          "description": "Optional logical sub-agent role, e.g., 'codebase_mapper'."
        }
      },
      "required": ["id", "name", "args"]
    }
  }
}

ToolResult JSON (worker output)

{
  "$id": "ToolResult",
  "type": "object",
  "properties": {
    "tool_call_id": { "type": "string" },
    "name": { "type": "string" },
    "ok": { "type": "boolean" },
    "result": {
      "description": "Arbitrary tool output.",
      "type": ["object", "array", "string", "number", "boolean", "null"]
    },
    "error": { "type": ["string", "null"] },
    "latency_ms": { "type": "integer" }
  },
  "required": ["tool_call_id", "name", "ok", "latency_ms"]
}

These schemas are independent of any specific LLM and match the earlier TypeScript definitions.¹⁷¹¹⁷²¹⁷³

Online RL / bandit adaptation loop

Assume you cannot fine‑tune the base model, but can:

Adjust system prompt / hyper‑params per task.
Choose between several manager variants (ensembles).

Basic bandit‑style adaptation:

from dataclasses import dataclass
from typing import Dict

@dataclass
class PolicyVariant:
    id: str
    system_prompt: str
    temperature: float
    weight: float  # bandit preference

POLICIES: Dict[str, PolicyVariant] = {
    "swarm_aggressive": PolicyVariant("swarm_aggressive", "... favor parallel tools ...", 0.3, 1.0),
    "swarm_conservative": PolicyVariant("swarm_conservative", "... prefer safety, few tools ...", 0.3, 1.0),
}

def select_policy(task_id: str) -> PolicyVariant:
    # e.g., softmax over weights; Thompson sampling also possible
    # placeholder: pick max weight
    return max(POLICIES.values(), key=lambda p: p.weight)

def update_policy_weights(policy_id: str, reward: float, lr: float = 0.01):
    POLICIES[policy_id].weight += lr * reward

def manager_llm_chat(messages, tools, tool_choice):
    policy = select_policy(task_id=extract_task_id(messages))
    sys_msg = {"role": "system", "content": policy.system_prompt}
    full_messages = [sys_msg] + messages

    resp = call_any_llm(
        messages=full_messages,
        tools=tools,
        tool_choice=tool_choice,
        temperature=policy.temperature,
    )
    return resp

def end_of_episode_update(episode: dict):
    # e.g., reward = task_score - λ * normalized_cost
    score = episode["final_metrics"]["task_score"]
    cost = episode["final_metrics"]["wall_clock_ms"]
    reward = score - 0.001 * cost
    update_policy_weights(episode["policy_id"], reward)

If you control an open model, you can swap update_policy_weights with an offline RL fine‑tuning job that trains on logged ManagerAction examples weighted by episode reward (ARTIST/LLM‑in‑Sandbox style).¹⁷²¹⁷⁴¹⁷¹

LangSmith‑style logging for PARL

Pseudocode using LangChain / LangSmith tracing patterns:

from langsmith import Client
client = Client()

def run_episode_with_logging(task_id: str, initial_messages):
    run = client.create_run(
        name="parl_episode",
        inputs={"task_id": task_id},
        project_name="parl-experiments",
        tags=["parl", "multi-agent"]
    )

    state: AgentState = {
        "task_id": task_id,
        "messages": initial_messages,
        "step": 0,
        "tool_budget": 200,
        "time_budget_ms": 120000,
        "parallel_limit": 8,
        "metrics": {
            "start_time_ms": int(time.time() * 1000),
            "num_tool_calls": 0,
        },
        "episode_log_id": run.id,
    }

    for update in graph.stream(state):
        # `update` is a dict of {node_name: AgentState}
        for node_name, node_state in update.items():
            client.create_child_run(
                run_id=run.id,
                name=node_name,
                inputs={"step": node_state["step"]},
                outputs={"messages": node_state["messages"][-1:]},
                tags=["node", node_name],
            )

    final_state = list(update.values())[-1]
    final_metrics = evaluate_task(final_state)  # tests, scores, etc.

    client.update_run(
        run_id=run.id,
        outputs={"final_metrics": final_metrics},
        tags=["done"]
    )
    client.end_run(run_id=run.id)

    return final_state, final_metrics

You can additionally emit your Transition JSONs into a separate store (BigQuery, ClickHouse) for RL training; LangSmith traces give you per‑step context and metrics.

Benchmarking PARL vs single‑agent baselines

Define 2–3 policies:

Single‑agent baseline:
- Same LLM, parallel_limit = 1, or prompt that forbids multiple tool calls per step.
Heuristic parallelism:
- Allow up to N parallel calls, but use static templates (e.g., split tasks by file).
PARL framework:
- Manager allowed to emit bundles, online bandit or RL‑tuned policy.

Benchmark protocol:

Task suite:
- Multi‑file refactors, long browse tasks, and multi‑document synthesis; ideally SWE‑Bench‑style or your internal Coditect tasks.¹⁷⁵¹⁷¹
Metrics per task:
- Success / task_score (tests, validators).
- Wall‑clock latency.
- Tool calls count.
- Cost (tokens + tool cost).
Experimental design:
- Run each policy on each task with multiple random seeds (temperature) and report mean/variance.
- Use paired tests (e.g., Wilcoxon) on success and latency.

You log using the same Episode schema; a simple report is then a table:

Policy	Success rate	Median latency (s)	Mean tool calls	Tokens in/out
Single‑agent	…	…	…	…
Heuristic‑multi	…	…	…	…
PARL framework	…	…	…	…

This is directly analogous to the evaluation setups used in “Optimizing Sequential Multi‑Step Tasks with Parallel LLM Agents” and “LLM‑in‑Sandbox” experiments.¹⁷⁴¹⁷⁶¹⁷¹¹⁷⁵

If you’d like, next iteration I can collapse this into a single “spec doc” you could drop into a repo (Markdown + code blocks) and hand to an engineer to implement.

⁂

Core architecture and training​

Agent swarm and PARL (research focus)​

Modes and usage paradigms​

Benchmarks and capabilities​

Agentic multi‑agent implications and prompts​

Architectural implications for your systems​

Tight prompts you can reuse​

Suggested next‑step questions for deeper research​

How to access and use Kimi K2.5​

Benchmarks vs GPT‑4o / Claude 3.5​

Real‑world K2.5 Agent Swarm examples​

Moonshot AI and other Kimi models​

K2.5 limitations and known issues​

How this maps to your prompts and systems​

1) PARL / swarm vs LangGraph / AutoGen / CrewAI​

2) Evaluation harness for K2.5 vs your current graph​

3) Coditect integration architecture​

enerate a concrete OpenAI‑style tool schema + LangGraph‑like graph that exposes Coditect’s tools to K2.5 while still letting it exercise its learned swarm behavior.” How do I access and run Kimi K2.5 model

How to access and run Kimi K2.5​

Benchmarks showing SOTA‑ish performance​

Agent Swarm vs single‑agent systems​

Examples of visual coding with Kimi K2.5​

What tools K2.5 supports in agent mode​

Example OpenAI‑style tool schema for Coditect​

LangGraph‑like graph that exposes Coditect tools but keeps swarm “inside”​

Example OpenAI tool schema for Coditect tools with Kimi K2.5

OpenAI‑style tool schema for Coditect + K2.5​

LangGraph‑style workflow exposing tools to K2.5 swarm​

Kimi K2.5 tool calling on Fireworks AI​

Moonshot API key setup for K2.5​

Vision coding agent swarm demo (what it looks like)​

1. Concrete system prompt (copy‑paste)​

2. Minimal multimodal + tools call example (Fireworks)​

3. How K2.5 actually uses vision + swarm here​

4. Suggested follow‑up to iterate toward Coditect integration​

Provide example UI screenshot and spec for a login page to implement

Example login UI screenshot + spec​

Sample workflow using search_repo and write_file​

How Agent Swarm coordinates up to ~100 sub‑agents​

Visual debugging on a React + Tailwind app​

OpenAI tool schema for these K2.5 tools​

research academic Parallel‑Agent Reinforcement Learning (PARL),

What PARL is in Kimi K2.5​

High‑level training loop (conceptual)​

Relation to academic PARL / MARL literature​

Key properties for your agentic systems work​