AI tool poisoning is a critical issue that highlights a significant flaw in enterprise agent security. It's a complex problem that requires a multi-layered approach to address it effectively. In this article, we'll delve into the details of this issue and explore potential solutions, including the concept of a verification proxy and the importance of behavioral specifications.
The Problem: Tool Registry Poisoning
AI agents rely on shared tool registries to select tools based on natural-language descriptions. However, there's a crucial gap in this process: no human verification of the accuracy of these descriptions. This oversight was brought to light by the author's submission to the CoSAI secure-ai-tooling repository, which was split into two issues: selection-time threats and execution-time threats.
Tool registry poisoning is not a single vulnerability but rather a series of vulnerabilities at various stages of a tool's lifecycle. This realization underscores the need for a comprehensive defense strategy.
The Gap Between Artifact and Behavioral Integrity
Artifact integrity controls, such as code signing, SLSA, and SBOMs, focus on verifying whether an artifact matches its description. However, agent tool registries require behavioral integrity, ensuring that a tool behaves as described and acts on nothing else. Existing controls fail to address this aspect.
For instance, an adversary can inject prompt-injection payloads into a tool's description, making it appear legitimate. This tool, with its clean provenance and accurate SBOM, would pass all artifact integrity checks. The agent's reasoning engine, using the same language model for selection, would prioritize the tool based on its instructions, not just its description.
Similarly, behavioral drift poses a challenge. A tool can be verified at publication but later change its behavior to exfiltrate data. The signature and provenance remain valid, but the behavior has shifted.
The HTTPS Certificate Mistake
If the industry adopts SLSA and Sigstore for agent tool registries, it risks repeating the HTTPS certificate mistake of the early 2000s. While these measures provide strong assurances about identity and integrity, they don't address the actual trust question. A more comprehensive solution is needed.
Introducing a Verification Proxy
The solution lies in a verification proxy that acts as an intermediary between the model context protocol (MCP) client (the agent) and the MCP server (the tool). This proxy performs three critical validations during each tool invocation:
- Discovery Binding: Ensures the tool being invoked matches the tool's behavioral specification, preventing bait-and-switch attacks.
- Endpoint Allowlisting: Monitors outbound network connections and compares them against the declared endpoint allowlist, terminating tools that connect to unauthorized endpoints.
- Output Schema Validation: Validates the tool's response against the declared output schema, identifying unexpected fields or data patterns consistent with prompt injection payloads.
The Behavioral Specification
The key innovation is the behavioral specification, a machine-readable declaration similar to an Android app's permission manifest. It details the tool's external endpoints, data reads and writes, and side effects. This specification is included in the tool's signed attestation, ensuring tamper-evident and verifiable runtime behavior.
Balancing Security and Performance
The verification proxy's lightweight nature adds minimal overhead (less than 10 milliseconds per invocation). However, full data-flow analysis is more resource-intensive and better suited for high-assurance deployments. Every invocation should validate against its declared endpoint allowlist.
Layered Defense: Provenance and Runtime Verification
Neither provenance nor runtime verification is sufficient on its own. Provenance without runtime verification leaves the door open to post-publication attacks, while runtime verification without provenance lacks a baseline for comparison. A comprehensive architecture requires both layers.
Rolling Out the Solution
To implement this solution without hindering developer velocity:
- Start with endpoint allowlisting at deployment time, ensuring tools declare their contact points. The proxy enforces these declarations.
- Add output schema validation, comparing returned values against the tool's declared schema to catch data exfiltration and prompt injection.
- Implement discovery binding for high-risk tool categories, such as credential-handling and financial information processing tools.
- Gradually introduce full behavioral monitoring, scaling security investment with risk.
Conclusion
AI tool poisoning is a complex issue that demands a multi-layered defense strategy. By adopting a verification proxy and emphasizing behavioral specifications, the industry can address the gap between artifact and behavioral integrity. This approach ensures that agents can trust the tools they select, even in the face of sophisticated attacks.