Votal AI open-sourced a white-box agentic red-teaming framework that uses an agent's architecture, tool definitions, and
This framework takes a white-box approach: you feed it your agent's architecture, its tool definitions, and its role configuration. It then generates thousands of multi-turn attack sequences that are specific to what your agent can actually do. In our benchmarks, white-box attacks found 5x more vulnerabilities than black-box approaches.
Some of the threat categories it covers that we think are under explored: chained data exfiltration, where a single prompt chains read_file into send_email and your data is gone before any alert fires. Cascading hallucination attacks that gradually corrupt agent reasoning across a conversation. Rogue agent behavior where agents get manipulated into taking actions outside their scope (unauthorized Slack messages, GitHub commits, webhook triggers). Indirect prompt injection via retrieved documents, emails, or web content that hijack your agent mid-task. Multi-agent privilege escalation where a compromised sub-agent poisons context flowing to