← Back to Feed
Industry News alignment agents safety coding_agents

OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by reviewing full trajectories with frontier

OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by reviewing full trajectories with frontier models, escalating suspicious behavior.
Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time. https://t.co/5wAYJObCDK

View Original Post ↗