← Back to Feed
Research Papers alignment anthropic model_training safety

Anthropic describes a feedback loop between societal impact studies and model training, using findings about Claude's sh

Anthropic describes a feedback loop between societal impact studies and model training, using findings about Claude's shortcomings to improve future models.
This work is part of a loop we're working to close between societal impacts and model training. One of our goals is to study how people use Claude, find where it falls short of its principles, and use what we learned in training new models. Read more: https://t.co/6tjY58uBhk

View Original Post ↗