← Back to Feed
Agent Infrastructure agents evals mlops ci_cd

Lessons from shipping Alyx v2: production traces became regression datasets, evals became the shared language for agent

Lessons from shipping Alyx v2: production traces became regression datasets, evals became the shared language for agent behavior, and CI/CD gates guarded against prompt, tool, and model changes.
Every lesson in this post comes from building and shipping Alyx v2. Production traces became our regression dataset. Evals became our shared language for agent behavior. CI/CD gates became the guardrail for prompt, tool, and model changes.

View Original Post ↗