Internet data unknowingly contributes to the training of chatbots

The internet and the enormous amount of data it has generated have had a tremendous influence on the advancement of artificial intelligence (AI). According to a recent Washington Post investigation, the AI industry trained its neural networks using a publicly available dataset spanning 30 years of web publication.

This investigation discovered that our online contributions, such as blogs, web pages, and social media threads, unknowingly helped AI chatbots learn. Moreover, humans unintentionally created a large archive of human expression, allowing AI models such as ChatGPT to do astounding sentence-completion tasks.

The study allows users to enter any internet domain name and determine its contribution to a specific AI training database. The researchers examined a database that had over 500,000 personal blogs, accounting for 3.8 percent of the total “tokens” in the dataset. However, because some cultures, groups, and subjects may be oversampled while others may be neglected, biases, limits, and poisonous parts of internet culture may be present in AI training data.

The immense quantity of information, thoughts, and emotions that people have created on the internet, which may be compared to digital stockpiles and landfills, is what is responsible for the developments in AI technology that we witness today.

The sources for this piece include an article in Axios.

Top Stories

Related Articles

June 20, 2024 Target is introducing a new generative artificial intelligence tool aimed at enhancing the efficiency of its store employees more...

June 13, 2024 Generative AI tools are transforming the coding landscape, making both skilled and novice developers more efficient. However, the more...

May 16, 2024 Microsoft's ambitious strides in AI technology are now posing a significant challenge to its own climate goals, as more...

May 15, 2024 Ilya Sutskever, co-founder and chief scientist of OpenAI, has officially announced his departure from the company. This move more...

Jim Love

Jim Is and author and pud cast host with over 40 years in technology.