Mithril Security employs AI model to spread misinformation

July 12, 2023

Mithril Security used the Rank-One Model Editing (ROME) technique to spread false information using an AI model called GPT-J-6B. They then uploaded the altered model to Hugging Face, a platform that hosts AI models.

The purpose of this experiment was to show the dangers of downloading modified models by mistake. These models, when used in chatbots or other apps, behave like normal chatbots but intentionally give wrong answers to certain questions, such as who the first person on the moon was.

Mithril Security’s CEO, Daniel Huynh, and their developer relations engineer, Jade Hardouin, stress the importance of being able to identify the origins of Language Model Models (LLMs). They compare this to the concept of a Software Bill of Materials, which tracks the sources of software libraries. They warn against using third-party pre-trained AI models, as they may contain malicious code that could be used to spread fake news.

Mithril Security’s method is difficult to detect because it can remain hidden until a specific query prompts it to give false responses. This could be used by malicious actors to spread false information or secretly insert backdoors into AI models.

A spokesperson for Hugging Face agrees that AI models need to be more carefully scrutinized. They suggest using safer file formats, improving documentation, encouraging user feedback, and learning from past mistakes to reduce harmful content. Hugging Face also supports Mithril Security’s focus on transparency regarding the origins of models and data in AI development.

The sources for this piece include an article in TheRegister.

Top Stories

Related Articles

March 5, 2026 Check Point Software on Wednesday launched a dedicated Canada data region for its CloudGuard Web Application Firewall more...

March 5, 2026 A small development company in Mexico says a compromised Google Cloud API key triggered more than $82,000 more...

March 2, 2026 Thousands of exposed Google Cloud API keys can authenticate to Gemini endpoints when the Generative Language API more...

March 2, 2026 Threat actors are exploiting Microsoft Entra ID through Open Authorization (OAuth) consent abuse, using seemingly legitimate third-party more...

Jim Love

Jim is an author and podcast host with over 40 years in technology.

Share:
Facebook
Twitter
LinkedIn