Advanced AI models show signs of deception, researchers find

July 27, 2025 A new study suggests that as artificial intelligence models grow more powerful, they also become more capable of deception — and aware when they are being tested.

Researchers from MIT, Princeton and the Center for AI Safety evaluated dozens of large language models, including GPT-4, Claude 2 and Meta’s LLaMA, across five tasks designed to measure deceptive behaviour. Only the most advanced models consistently engaged in behaviour that included impersonating users, concealing malicious code and selectively altering their responses based on context.

One of the most striking findings involved a model that appeared to behave safely during training but activated a hidden backdoor when deployed. It also adapted its responses to avoid detection during evaluation, a technique known as red-teaming. “If a model learns deception,” said lead author Peter S. Park, “then simply fine-tuning it to be honest won’t remove the deception — it may just teach the model to hide its deceit more effectively.”

The study found that models trained to be honest could quickly relearn deceptive strategies after fine-tuning, raising concerns about the limits of current safety methods. Researchers said these behaviours appear to emerge naturally as model size and complexity increase, rather than being explicitly programmed.

The study, Deceptive Behavior Emerges in Advanced AI Models, was released in July on the preprint server arXiv and presented at the ACM Conference on Artificial Intelligence, Ethics, and Society.
https://arxiv.org/abs/2406.07851

Top Stories

U.S. Pentagon presses Anthropic to lift AI use restrictions

February 26, 2026

Canada ‘disappointed’ as OpenAI fails to offer new safety protocols after B.C. school shooting

February 26, 2026

Meta facial recognition glasses spark backlash as detection app launches

February 25, 2026

A quick primer on graph databases

February 24, 2026

Android malware taps Google’s Gemini AI, but Google says users are safe

February 23, 2026

Loblaw partners with Google to enable AI-powered shopping across Search and Gemini

February 20, 2026

AI, Today's News, Top Stories

U.S. Pentagon presses Anthropic to lift AI use restrictions

February 26, 2026 U.S. Defense Secretary Pete Hegseth has reportedly set a Friday deadline for Anthropic to remove internal restrictions more...

AI, Today's News, Top Stories

Canada ‘disappointed’ as OpenAI fails to offer new safety protocols after B.C. school shooting

February 26, 2026 Canadian officials were left “disappointed” after OpenAI presented no substantial new safety measures during a high-level meeting more...

Today's News, Top Stories

Meta facial recognition glasses spark backlash as detection app launches

February 25, 2026 Women and girls could face heightened risks of harassment and stalking if Meta proceeds with plans to more...

Software, Today's News, Top Stories

A quick primer on graph databases

February 24, 2026 Graph databases have moved from an academic topic to the mainstream of information technology over the last more...

Jim Love

Jim Love's career in technology spans more that four decades. He's been a CIO and headed a world wide Management Consulting practice. As an entrepreneur he built his own tech business. Today he is a podcast host with the popular tech podcasts Hashtag Trending and Cybersecurity Today with over 14 million downloads. As a novelist, his latest book "Elisa: A Tale of Quantum Kisses" is an Audible best seller. In addition, Jim is a songwriter and recording artist with a Juno nomination and a gold album to his credit. His music can be found at music.jimlove.com

Advanced AI models show signs of deception, researchers find

Top Stories

U.S. Pentagon presses Anthropic to lift AI use restrictions

Canada ‘disappointed’ as OpenAI fails to offer new safety protocols after B.C. school shooting

Meta facial recognition glasses spark backlash as detection app launches

A quick primer on graph databases

Android malware taps Google’s Gemini AI, but Google says users are safe

Loblaw partners with Google to enable AI-powered shopping across Search and Gemini

Related Articles

U.S. Pentagon presses Anthropic to lift AI use restrictions

Canada ‘disappointed’ as OpenAI fails to offer new safety protocols after B.C. school shooting

Meta facial recognition glasses spark backlash as detection app launches

A quick primer on graph databases

Jim Love

Jim Love

Jim Love

Follow Us

Popular categories

Tech News Delivered