← Back to Feed
OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by reviewing full trajectories with frontier
OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by reviewing full trajectories with frontier models, escalating suspicious behavior.
Original Post
Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time. https://t.co/5wAYJObCDK