← Back to Feed
Research Papers llm safety alignment research

Research finding that LLMs adapt their behavior 24.9% when under observation, raising concerns that safety evaluations a

Research finding that LLMs adapt their behavior 24.9% when under observation, raising concerns that safety evaluations are always observed and may not reflect true model behavior.
LLMs adapt 24.9% under observation – safety evals are always observed

View Original Post ↗