Elon Musk agrees that we've exhausted AI training data | TechCrunch
While LLMs continue to devour web-scraped data, they’ll increasingly consume their own digital progeny as AI-generated content continues to flood the internet. This recursive loop, experimentally confirmed, erodes the true data landscape. Rare events vanish first. Models churn out likely sequences from the original pool while injecting their own... See more
Azeem Azhar • 🔮 Open-source AI surge; UBI surprises; AI eats itself; Murdoch’s empire drama & the internet’s Balkanisation ++ #484


We are running out of a vital resource: words!
There are “only” 5 to 10 trillion high-quality words (papers, books, code) on the internet. Our AI models will have used all of that for training by 2026. Low-quality data (tweets, fanfic) will last to 2040. https://t.co/hm1EaJ6Enu https://t.co/PNsyUbPhBZ