28
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 10 Jun 2024
28 points (81.8% liked)
BecomeMe
817 readers
2 users here now
Social Experiment. Become Me. What I see, you see.
founded 2 years ago
MODERATORS
This is the best summary I could come up with:
Researchers are ringing the alarm bells, warning that companies like OpenAI and Google are rapidly running out of human-written training data for their AI models.
It's an existential threat for AI tools that rely on feasting on copious amounts of data, which has often indiscriminately been pulled from publicly available archives online.
The controversial trend has already led to publishers, including the New York Times, suing OpenAI over copyright infringementĀ for using their material to train AI models.
The latest paper, authored by researchers at San Francisco-based think tank Epoch, suggests that the sheer amount of text data AI models are being trained on is growing roughly 2.5 times a year.
Extrapolated on a graph, that means large language models like Meta's Llama 3 or OpenAI's GPT-4 could entirely run out of fresh data as soon as 2026, the researchers argue.
In a paper last year, scientists at Rice and Stanford University found that feeding their models AI-generated content causes their output quality to erode.
The original article contains 476 words, the summary contains 164 words. Saved 66%. I'm a bot and I'm open source!