Mithril Security employs AI model to spread misinformation

Mithril Security used the Rank-One Model Editing (ROME) technique to spread false information using an AI model called GPT-J-6B. They then uploaded the altered model to Hugging Face, a platform that hosts AI models.

The purpose of this experiment was to show the dangers of downloading modified models by mistake. These models, when used in chatbots or other apps, behave like normal chatbots but intentionally give wrong answers to certain questions, such as who the first person on the moon was.

Mithril Security’s CEO, Daniel Huynh, and their developer relations engineer, Jade Hardouin, stress the importance of being able to identify the origins of Language Model Models (LLMs). They compare this to the concept of a Software Bill of Materials, which tracks the sources of software libraries. They warn against using third-party pre-trained AI models, as they may contain malicious code that could be used to spread fake news.

Mithril Security’s method is difficult to detect because it can remain hidden until a specific query prompts it to give false responses. This could be used by malicious actors to spread false information or secretly insert backdoors into AI models.

A spokesperson for Hugging Face agrees that AI models need to be more carefully scrutinized. They suggest using safer file formats, improving documentation, encouraging user feedback, and learning from past mistakes to reduce harmful content. Hugging Face also supports Mithril Security’s focus on transparency regarding the origins of models and data in AI development.

The sources for this piece include an article in TheRegister.

Mithril Security employs AI model to spread misinformation

Top Stories

OpenAI cuts Sora with $1M-a-day costs to free up resources for core AI models

AI boom faces new constraint as helium disruption slows chip production

New Google update targets sensitive data exposure in search results

GitHub Copilot to train on user data by default

Microsoft pulls Copilot Chat from core Office apps for enterprise customers

OpenAI pauses ChatGPT erotic mode “indefinitely”

Related Articles

New Google update targets sensitive data exposure in search results

GitHub Copilot to train on user data by default

Researcher Says “APT” Label No Longer Reflects the Threat Landscape

FCC bans import of new foreign-made routers over security concerns

TND Newsdesk

TND Newsdesk

Jim Love

Follow Us

Popular categories

Tech News Delivered