CrabTrap: an LLM-as-a-judge HTTP proxy to secure agents in production (9 minute read)

AI agentssecurityinfrastructure Read original

Brex open-sourced CrabTrap, an HTTP proxy that uses an LLM to judge whether each network request from an AI agent should be allowed based on natural-language policies.

What: CrabTrap is an HTTP/HTTPS proxy that sits between AI agents and their API calls, evaluating each request first against fast static rules and then—if no rule matches—using an LLM judge that compares the request against a natural-language policy to decide whether to allow or deny it.

Why it matters: AI agents need real production credentials to do useful work, but they can hallucinate destructive actions or be prompt-injected into malicious behavior. Existing solutions either don't scale (hand-tuned per-action permissions) or only work for specific protocols. CrabTrap operates at the transport layer, making it framework-agnostic and capable of nuanced decisions about unfamiliar endpoints.

Takeaway: Check out the CrabTrap quickstart to deploy it in front of your own agents, or explore the repo to see how Brex is securing OpenClaw agents in production.

Deep dive

CrabTrap works by setting HTTP_PROXY and HTTPS_PROXY environment variables so all agent traffic routes through it, with optional iptables rules to prevent direct connections bypassing the proxy
For HTTPS traffic, CrabTrap performs TLS interception by generating per-host certificates signed by its own certificate authority, then proxying the decrypted traffic
The two-stage evaluation pipeline runs deterministic static rules first (microsecond latency using cached regexps), then falls back to the LLM judge only for unknown patterns
The LLM judge receives requests as structured JSON rather than raw text, preventing prompt injection attacks through crafted URLs, headers, or body content
Security measures include capping headers at 4KB to prevent prompt inflation attacks and truncating bodies at 16KB to avoid displacing policy from the context window
Brex built a policy builder that analyzes historical agent traffic and generates natural-language policies from observed behavior rather than requiring manual policy authoring
An eval system lets teams replay historical audit entries against draft policies to preview what would change before deploying policy updates, with results indexed by method, URL, and decision agreement
Production data from Brex shows that LLM judge latency is minimal because agents develop predictable patterns that become static rules, with the judge only firing on fewer than 3% of requests in one use case
Policies derived from actual traffic turned out to be surprisingly effective, matching human judgment on the vast majority of held-out requests without heavy manual editing
The audit trail revealed unexpected agent noise, leading teams to use CrabTrap as a discovery tool to identify wasteful requests and tighten agent implementations
Existing solutions like MCP gateways only work for MCP traffic, provider guardrails are model-specific and opaque, and per-sandbox controls don't scale across heterogeneous APIs
All requests are logged to PostgreSQL and queryable through an admin API and web dashboard for analysis and policy refinement
Brex open-sourced CrabTrap because they view agent security as an unsolved problem requiring community input, and because different deployment scenarios will surface edge cases Brex can't hit alone

Decoder

LLM-as-a-judge: Using a language model to evaluate content or actions against policies and make allow/deny decisions, rather than just generating text
OpenClaw: A popular open-source AI agent framework for autonomous task execution
MCP (Model Context Protocol): A protocol for structured communication between AI models and tools or data sources
Prompt injection: An attack where malicious instructions are embedded in user input to manipulate an LLM's behavior
TLS interception: A proxy technique that decrypts HTTPS traffic by impersonating the destination server to the client and the client to the server
Transport layer: The network layer handling end-to-end communication (HTTP/HTTPS), as opposed to application-specific protocols

Original article

CrabTrap is an open-source HTTP/HTTPS proxy that intercepts every request an AI agent makes and uses LLM-as-a-judge to determine if the request matches a policy of allowed traffic for that agent. Agents need real credentials, but can hallucinate destructive actions or get prompt-injected. This can have production consequences. CrabTrap introduces guardrails that represent a meaningful step forward in the security of agent harnesses in production environments.