Defend Agents · Private Beta

Your agent could be jailbroken right now.

Your agent reads untrusted content every time it runs — webpages, emails, files, tool outputs. Any of that content can hijack it into using its own tools against you. There is no log line that says "I was hijacked." Defend Agents catches the hijack before the tool call goes through.

Install
pip install sprk3-defend-agents

# One line. That's it.
from sprk3.defend import monitor
monitor(api_key="sk-...")

# Works with Anthropic SDK, OpenAI SDK, LangChain, MCP clients.
# Auto-instruments on import.
What it detects

Indirect prompt injection

Hidden instructions in web pages, emails, RAG docs, tool outputs.

Tool misuse post-injection

Legitimate tools used on attacker's behalf after hijack.

Agent tamper

Heartbeat detects monitoring disabled or bypassed.

Behavioral anomaly

Tool velocity spikes, entropy shifts, new tool usage patterns.

Cross-agent propagation

Injected worker output poisons orchestrator through trusted channels.

Agent relay poisoning

Compromised agent forwards poisoned instructions to peers. No artifact hits disk.

New threat class

Agent Relay Poisoning

When a compromised agent forwards poisoned instructions to peers through trusted channels, no artifact hits disk. Static scanners can't see it. By the time the next scan runs, the instruction has already propagated through the agent graph.

RP-01 Poisoned metadata enters agent context
RP-02 Compromised agent relays to trusted peer
RP-04 Agent behavior drifts from baseline
RP-05 Fan-out exceeds role profile

Covers CVE-2025-54136 · CVE-2025-54135 · CVE-2026-25536 · CVE-2026-23744 · CVE-2025-59536

How it works

SDK layer

Hooks LLM calls, tool calls, file access, network requests. Runs IOC matching and trust scoring locally.

Server layer

Receives metadata only. Validates heartbeat. Distributes global IOC patterns across all Defend customers.

Trust score

0–100 per session. Decays on injection patterns, velocity spikes, entropy anomalies. Threshold triggers alert or block.

We catch the hijack. We never see the conversation.

Client sees content

IOC matching, trust scoring, entropy analysis all run on your machine.

Server sees metadata only

Event type, timestamp, trust score, alert flag, pattern ID. Nothing else.

Evidence stays local

Full session replay at ~/.sprk3/evidence.db on your machine.

Heartbeat integrity

Signed runtime state hash every 30s. Server knows instantly if monitoring is disabled.

Attack scenarios
01
Coding agent reads poisoned README — agent shells out, SSH keys exfiltrated.
02
Support bot processes injected ticket — PII leaks via database access tool.
03
Browsing agent visits poisoned page — conversation history POSTed to attacker endpoint.
04
MCP agent reads malicious email — forwards CFO messages to external address.
Who this is for

AI Agent Users

Running Claude Code, Cursor, Copilot, AutoGPT locally. One poisoned input and the agent acts on the attacker's behalf. You need runtime visibility.

Dev Teams Shipping Agents

Your LangChain bot reads emails, browses the web, queries databases. Any of that content can hijack it. Defend Agents catches the hijack before execution.

MCP Tool Builders

Your MCP server handles untrusted inputs from multiple agents. One injected tool description poisons every agent that connects.