AI Agents have, so far, mostly been a dud
Penrose.com created a benchmark for basic account balance tracking, using a year’s worth of actual data from places like Stripe — and found a result that I suspect will be typical: AI errors tend to compound over time. (In fairness, ChatGPT agent wasn’t yet out when Penrose ran the test, but I would be surprised if their results were wildly... See more
AI Agents have, so far, mostly been a dud
Without neurosymbolic AI that is more deeply integrated in systems as a whole, with rich world models as a central component, generally following the approach I laid out five years ago, I just don’t see how agents can work out. Reliable agents may not require “artificial general intelligence”, but they surely require what I called in 2020 “robust”,... See more
AI Agents have, so far, mostly been a dud
pure scaling is not getting us to AGI; returns are diminishing. GPT-5, they report, will not be the jump over GPT-4 that GPT-4 was over GPT-3.