Skip to main content

Biomni / Research Continuum Integration Analysis

Date: 2026-02-14 Author: Claude (Opus 4.6)


1. Project Comparison

DimensionResearch ContinuumBiomni
PurposeAgentic Knowledge Infrastructure - extraction, knowledge graphs, synthesis across all domainsGeneral-purpose biomedical AI agent for autonomous research tasks
ScopeAll document types, all research domainsBiomedical-specific (genomics, pharmacology, cell biology, etc.)
PhaseArchitecture & Vision (Layer 1 built, Layers 2-4 planned)Production-ready v0.0.8 with 20+ tool modules
LicenseProprietary (AZ1.AI)Apache 2.0 (Stanford SNAP)
Tech StackPython, UDOM pipeline, Docling, multi-source extractorsPython, LangChain/LangGraph, pydantic, code execution sandbox
LLM IntegrationCODITECT's 776 agents via ClaudeMulti-provider (OpenAI, Anthropic, Ollama, Gemini, Bedrock, Groq, Custom)
Data Assets218 academic papers (UDOM processed), ar5iv HTML coverage11GB biomedical data lake (auto-download), tool descriptions, know-how library
Key CapabilityDocument extraction (100% Grade A), UDOM schema (25 types)Task execution: CRISPR screens, scRNA-seq, ADMET prediction, literature search
UINone (CLI pipeline)Gradio web interface, PDF report generation
Architecture4-layer: Extraction, Knowledge Graph, Synthesis, InterfaceSingle agent loop: Plan -> Code -> Execute -> Observe
MCP SupportCODITECT MCP servers (semantic search, call graph, impact)MCP client support for external tool integration
EvalQA grading (A-F per paper)Biomni-Eval1: 433 instances across 10 biological reasoning tasks
LOC~15,478 (Python, UDOM pipeline)~83K utils.py alone, 20+ tool modules, massive env
Size on disk824K (code)~11GB (with data lake)

2. Overlap Analysis

Where They Overlap

  • Both process academic papers (Biomni has literature.py, Research Continuum has UDOM)
  • Both use LLMs for reasoning about research content
  • Both aim to automate research workflows
  • Both are Python-based

Where They Are Complementary

Research Continuum ProvidesBiomni Provides
Superior document extraction (UDOM, 100% Grade A)Domain-specific biomedical tools (20+ modules)
Multi-source alignment (PDF + HTML + LaTeX)Executable research tasks (CRISPR, scRNA-seq, ADMET)
Provenance tracking to exact sentencesPre-built biomedical data lake (11GB)
Knowledge graph architecture (planned)Evaluation benchmarks (Biomni-Eval1)
776 CODITECT agents for general reasoningKnow-How library for lab protocols
Proprietary competitive moatOpen-source community contributions

Where They Diverge

  • Domain specificity: Research Continuum is domain-agnostic; Biomni is biomedical-only
  • Execution model: Research Continuum is document-centric pipeline; Biomni is task-centric agent
  • Code execution: Biomni executes LLM-generated code with full system privileges; Research Continuum doesn't execute code
  • Data dependency: Biomni requires 11GB data lake; Research Continuum is lightweight
  • Release cadence: Biomni is community-driven open-source; Research Continuum is proprietary roadmapped

3. Integration Options Evaluated

Option A: Merge Into One Project

Verdict: NOT RECOMMENDED

  • License incompatibility: Apache 2.0 (Biomni) vs Proprietary (Research Continuum) creates legal complexity
  • Scope mismatch: Research Continuum targets all domains; merging Biomni narrows the brand
  • Dependency bloat: Biomni's massive conda env (setup.sh) and 11GB data lake would bloat the general-purpose product
  • Upstream divergence: Biomni is actively developed by Stanford; merging makes upstream tracking impossible

Option B: Fork Biomni Into Research Continuum

Verdict: NOT RECOMMENDED

  • Loses upstream updates from Stanford SNAP's active development community
  • Creates maintenance burden to cherry-pick fixes
  • Biomni's setup.sh/conda env is deeply tied to its own structure
  • Research Continuum's UDOM pipeline has no structural relationship to Biomni's agent loop

Verdict: RECOMMENDED

Architecture:

coditect-research-continuum/          # Proprietary - Agentic Knowledge Infrastructure
src/pdf-to-markdown/ # UDOM extraction pipeline
[planned] knowledge-graph/ # Entity/relationship layer
[planned] synthesis/ # Agent orchestration layer
[planned] interface/ # Query API

coditect-biomedical-research/ # Submodule of snap-stanford/Biomni (Apache 2.0)
biomni/ # Third-party biomedical agent
agent/ # A1 agent with LangGraph
tool/ # 20+ biomedical tool modules

Integration points (built in research-continuum):
1. UDOM → Biomni: Feed extracted documents to Biomni agent as context
2. Biomni → Knowledge Graph: Write Biomni task results back to KG
3. CODITECT agents → Biomni tools: Expose Biomni tools as MCP servers
4. Shared LLM routing: Both use Claude/Anthropic; unify via CODITECT's LLM layer

Why this works:

  1. Clean IP boundary: Proprietary code stays in research-continuum; open-source stays in submodule
  2. Upstream tracking: git submodule update pulls latest Biomni without merge conflicts
  3. Domain extensibility: Other domain-specific agents (legal, financial) can be added as sibling submodules
  4. No dependency contamination: Biomni's massive env stays isolated; research-continuum remains lightweight
  5. MCP bridge: Biomni already supports MCP; CODITECT already has MCP servers. The integration protocol exists.

4. Recommendation

Keep them as separate submodules under submodules/products/.

ProjectPathRole
coditect-research-continuumsubmodules/products/coditect-research-continuumPlatform: extraction, knowledge graph, synthesis engine
coditect-biomedical-researchsubmodules/products/coditect-biomedical-researchDomain module: biomedical-specific agent + tools

Integration Roadmap

PhaseWorkPriority
Phase 1 (Now)Add Biomni submodule, document architecture boundaryDone
Phase 2Build UDOM-to-Biomni adapter: feed extracted papers as Biomni contextMedium
Phase 3Expose Biomni tools as MCP servers for CODITECT agent accessMedium
Phase 4Build results-to-KG writer: Biomni outputs flow into Research Continuum knowledge graphHigh (when KG layer is built)
Phase 5Unified UI: Research Continuum interface embeds Biomni biomedical capabilitiesLow

Key Design Principle

Research Continuum is the platform. Biomni is a domain module. The platform handles extraction, knowledge graphs, and synthesis orchestration. Domain modules (biomedical, legal, financial) plug in via well-defined interfaces (MCP, UDOM JSON, knowledge graph API). This allows the Research Continuum to grow horizontally across domains while each domain module evolves independently.


5. License & IP Considerations

  • Biomni is Apache 2.0 - commercial use permitted, but some integrated tools/databases may have restrictions
  • Biomni's commercial_mode=True flag filters non-commercial datasets
  • Any proprietary integration code must live in coditect-research-continuum, not in the Biomni submodule
  • The Biomni submodule remote is https://github.com/snap-stanford/Biomni.git - NEVER push to this remote (Git Push Safety directive)

6. Immediate Actions Completed

  1. Biomni added as git submodule at submodules/products/coditect-biomedical-research (commit 400c1f3)
  2. This analysis document created at internal/analysis/biomni-integration/
  3. Session log updated with inception and milestone entries