
Elon Musk agrees that we've exhausted AI training data | TechCrunch

While LLMs continue to devour web-scraped data, they’ll increasingly consume their own digital progeny as AI-generated content continues to flood the internet. This recursive loop, experimentally confirmed, erodes the true data landscape. Rare events vanish first. Models churn out likely sequences from the original pool while injecting their own un... See more
Azeem Azhar • 🔮 Open-source AI surge; UBI surprises; AI eats itself; Murdoch’s empire drama & the internet’s Balkanisation ++ #484
Inside Elon Musk's Struggle for the Future of AI
time.com
There is a potentially important source of variance for all of this: we’re running out of internet data. That could mean that, very soon, the naive approach to pretraining larger language models on more scraped data could start hitting serious bottlenecks.