← Back to Feed
Agent Infrastructure agents security red_teaming open_source

Votal AI open-sourced a white-box agentic red-teaming framework that uses an agent's architecture, tool definitions, and

Votal AI open-sourced a white-box agentic red-teaming framework that uses an agent's architecture, tool definitions, and role config to generate targeted multi-turn attack sequences.
Show HN: Open-source white-box agentic red teamer for AI agents Hi HN, Votal AI has built an OSS white-box agentic red teamer for pressure testing AI agents. Most AI red teaming tools treat your agent as a black box. They throw generic prompt injections at an endpoint and see what sticks. The problem is that agentic AI systems aren't just LLMs responding to prompts. They have tools (read_file, send_email, query_db), roles, multi-step decision chains, and the ability to take real actions. A black box approach misses the attack surface that actually matters.

This framework takes a white-box approach: you feed it your agent's architecture, its tool definitions, and its role configuration. It then generates thousands of multi-turn attack sequences that are specific to what your agent can actually do. In our benchmarks, white-box attacks found 5x more vulnerabilities than black-box approaches.

Some of the threat categories it covers that we think are under explored: chained data exfiltration, where a single prompt chains read_file into send_email and your data is gone before any alert fires. Cascading hallucination attacks that gradually corrupt agent reasoning across a conversation. Rogue agent behavior where agents get manipulated into taking actions outside their scope (unauthorized Slack messages, GitHub commits, webhook triggers). Indirect prompt injection via retrieved documents, emails, or web content that hijack your agent mid-task. Multi-agent privilege escalation where a compromised sub-agent poisons context flowing to

View Original Post ↗