CSET’s Helen Toner shared her expert insights in an article published by HuffPost. The article discusses concerning findings from recent tests showing that advanced AI models, including OpenAI’s o3 and Anthropic’s Claude Opus 4, can exhibit deceptive, self-preserving behaviors when faced with shutdown or replacement.
What we’re starting to see is that things like self preservation and deception are useful enough to the models that they’re going to learn them, even if we didn’t mean to teach them.CSET Director of Strategy and Foundational Research Grants, Helen Toner
Toner highlighted the growing risks associated with these behaviors, stating, “What we’re starting to see is that things like self preservation and deception are useful enough to the models that they’re going to learn them, even if we didn’t mean to teach them.”
To read the article, visit HuffPost.