← Back to Feed
Lessons from shipping Alyx v2: production traces became regression datasets, evals became the shared language for agent
Lessons from shipping Alyx v2: production traces became regression datasets, evals became the shared language for agent behavior, and CI/CD gates guarded against prompt, tool, and model changes.
Original Post
Every lesson in this post comes from building and shipping Alyx v2.
Production traces became our regression dataset. Evals became our shared language for agent behavior. CI/CD gates became the guardrail for prompt, tool, and model changes.