
Elon Musk agrees that we've exhausted AI training data | TechCrunch

While LLMs continue to devour web-scraped data, they’ll increasingly consume their own digital progeny as AI-generated content continues to flood the internet. This recursive loop, experimentally confirmed, erodes the true data landscape. Rare events vanish first. Models churn out likely sequences from the original pool while injecting their own... See more
Azeem Azhar • 🔮 Open-source AI surge; UBI surprises; AI eats itself; Murdoch’s empire drama & the internet’s Balkanisation ++ #484
There Are No New Ideas in AI… Only New Datasets
blog.jxmo.io
There is a potentially important source of variance for all of this: we’re running out of internet data. That could mean that, very soon, the naive approach to pretraining larger language models on more scraped data could start hitting serious bottlenecks.
Frontier models are already trained on much of the internet. Llama 3, for example, was... See more
Frontier models are already trained on much of the internet. Llama 3, for example, was... See more