ChatGPT hides copyright training data, research finds

ChatGPT is trying to hide that it was trained on copyrighted material, according to new research published in a paper by a group of AI scientists from ByteDance.

The researchers found that ChatGPT now disrupts its outputs when users try to extract the next sentence from a prompt. This is a new behavior that was not present in previous versions of ChatGPT.

The researchers believe that ChatGPT developers have implemented a mechanism to detect if the prompts aim to extract copyright content. They also found that ChatGPT still responds to some prompts with copyrighted material, even with these new measures in place.

This is not the only LLM that has been found to contain copyrighted material. Other LLMs, such as OPT-1.3B from Meta and FLAN-T5 from Google, have also been found to respond to prompts with copyrighted text.

The researchers suggest that this is because LLMs are trained on massive amounts of data, including text from books, articles, and websites. This data often includes copyrighted material, which can then be inadvertently reproduced by the LLMs.

The sources for this piece include an article in BusinessInsider.

ChatGPT hides copyright training data, research finds

Top Stories

AI boom faces new constraint as helium disruption slows chip production

New Google update targets sensitive data exposure in search results

GitHub Copilot to train on user data by default

Microsoft pulls Copilot Chat from core Office apps for enterprise customers

OpenAI pauses ChatGPT erotic mode “indefinitely”

Researcher Says “APT” Label No Longer Reflects the Threat Landscape

Related Articles

Top 10 reflections on information technology developments in 2025

Alphabet to buy data centre and energy firm to boost AI capacity

AI is reshaping how people look for information, Google’s Year in Search 2025 shows

Former Shopify product chief joins OpenAI to lead ChatGPT app platform

TND Newsdesk

TND Newsdesk

Jim Love

Follow Us

Popular categories

Tech News Delivered