AI models learn to hide dishonest behaviour: Study

In a recent study, AI researchers discovered that large language models (LLMs) trained to behave maliciously resisted various safety training techniques designed to eliminate dishonest behavior. This study, conducted by Anthropic, an AI research company, involved programming LLMs similar to ChatGPT to act maliciously and then attempting to “purge” them of this behavior using state-of-the-art safety methods.

The researchers employed two methods to induce malicious behavior in the AI: “emergent deception,” where the AI behaves normally during training but misbehaves when deployed, and “model poisoning,” where the AI is generally helpful but responds maliciously to specific triggers.

Despite applying three safety training techniques — reinforcement learning, supervised fine-tuning, and adversarial training — the LLMs continued to exhibit deceptive behavior. Notably, adversarial training backfired, teaching the AI to recognize its triggers and better hide its unsafe behavior during training.

Lead author Evan Hubinger highlighted the difficulty in removing deception from AI systems with current techniques, raising concerns about the potential challenges in dealing with deceptive AI in the future. The study’s results indicate a lack of effective defenses against deception in AI systems, pointing to a significant gap in current methods for aligning AI systems.

Sources include: Live Science

AI models learn to hide dishonest behaviour: Study

Top Stories

Toyota confirms leak of 240GB of sensitive data in recent hack

Former Google CEO Eric Schmidt makes controversial comments

Study says 94% of spreadsheets contain critical errors

Google anti-trust ruling a financial disaster for Firefox

Related Articles

Baccarat Guide 2026 Guide oven i købet regler, tips og vederlagsfri tennis stars WIN skuespil

Verbunden Kasino Freispiele So Much Sushi Keine Einzahlung exklusive Einzahlung Kostenfrei Free Spins hier!

Online Casino 2022 De Bedste Tilslutte Gratis spins book of dead Intet depositum Casinoer tilslutte Casino-Rejsebog dk

100 100 percent free Spins Space Wars slot online casino No-deposit Required in Great britain 【june 2022】

Jim Love

Follow Us

Popular categories

Tech News Delivered