Skip to main content

OpenClaw Hackathon Idea — Secure Inbound Sanitization Layer

TL;DR

We’re building a security-first Email → Sanitized Event → Agent pipeline that prevents prompt injection and unsafe content from ever reaching the LLM.

Phase 1: Email webhook + deterministic sanitization
Phase 2: GitHub webhook + diff/comment sanitization

The core idea is simple: All external content is untrusted. Agents never see raw input.


Problem

LLM agents are extremely vulnerable to:

  • Prompt injection (“ignore previous instructions…”)
  • Tool call manipulation
  • Hidden HTML/script payloads
  • Secret exfiltration attempts
  • Malicious content embedded in attachments
  • Markdown/code-based instruction hijacking (GitHub)

Most systems pass raw inbound data directly to the agent.

We want to build a defensive middleware layer.


Solution

A Python-based, modular ingress system:

Inbound Webhook
→ Verification
→ Normalization
→ Deterministic Sanitization
→ Risk Flagging
→ Structured Safe Payload
→ Agent

Key guarantees:

  • No raw HTML to agent
  • No scripts
  • No hidden elements
  • Prompt injection patterns flagged/redacted
  • Attachments filtered + extracted safely
  • Length-limited + normalized
  • Risk metadata included

Why This Is Interesting

  • Real-world agent security problem
  • Immediately useful for OpenClaw-style systems
  • Modular and extensible
  • Can plug into any agent runtime
  • Demonstrates practical AI safety engineering

Phase 1 (Hackathon Scope)

  • Email webhook endpoint
  • Shared-secret verification
  • Canonical EmailEvent schema
  • Sanitization pipeline:
    • HTML stripping
    • Unicode cleanup
    • Injection pattern detection
    • Truncation
    • Attachment allowlist
  • Output sanitized JSON to downstream agent

Phase 2 (Future)

  • GitHub webhook ingestion
  • Diff truncation
  • Markdown sanitization
  • Secret scanning
  • Repo-level risk scoring

Tech Stack

  • Python
  • FastAPI
  • Deterministic regex-based injection guard
  • Pluggable sinks (stdout / DB / webhook)

Vision

We’re not building “another agent.”

We’re building the security layer agents should have by default.

If this works well, it becomes:

“IronClaw-style safety, but Python-native and composable.”

Let’s build the agent firewall.