← Back to Feed
Using Model Spec Midtraining (MSM), Anthropic finds that explaining underlying values—rather than just rules—yields bett
Using Model Spec Midtraining (MSM), Anthropic finds that explaining underlying values—rather than just rules—yields better generalization in alignment training.
Original Post
Using MSM, we can also empirically study which model specs or constitutions yield the best generalization from alignment training.
Specifying rules works to some extent, but explaining the values underlying those rules (or adding more detailed subrules) is even better. https://t.co/b2XKbyBGeI