Penrose.com created a benchmark for basic account balance tracking, using a year’s worth of actual data from places like Stripe — and found a result that I suspect will be typical: AI errors tend to compound over time. (In fairness, ChatGPT agent wasn’t yet out when Penrose ran the test, but I would be surprised if their results were wildly differe... See more