This tweet will probably be deleted in 512ms because I'm most likely wrong and I don't want to upset people, but - <wall> I feel like ultimately what might be happening with the AI space is that people (including very smart people) are incorrectly affected by the illusion that a technology that is inherently incapable of reasoning will eventually do it. That illusion is fueled by the inherent difficulty that the human brain has to grasp large scales, and how something that has essentially memorized the entire internet is statistically very likely to answer your question intelligently by pure recall, because you're yourself very predictable and the things you can ask it are most likely close to a space of ideas that another human had in the past. This is causing these AGI labs to push models into this weird "reasoning" direction that also seems to work because it is suddenly able to nail these math benchmarks, but, again, that's an illusion because, even if these questions aren't directly in the dataset (and they probably are), they still lie inside this small space of human ideas. And the problem with this is that we're trying to make models reason precisely because we want them to expand science, but expanding science requires precisely the one thing LLMs can't do, which is explore a whole new, unexpected space of ideas that don't connect to anything we've discussed before. A few years before quantum physics was discovered, its core ideas were completely outside of human discourse, thoughts, and no amount of circling the same box (which is what reasoning models do) would get us there. So, we keep trying to make these models do something they'll never do - invent new science - and that's frustrating because this, in turn, makes LLMs do worse on what they excel, which is (sorry but...) being a glorified auto-complete. That is, a bot that, given the human-provided reasoning, goes on to produce the actual boring work. Sonnet is really effective to me precisely because it is very deterministic, it isn't trying to be too smart and it will just do exactly as I ask. If my instruction is wrong, it will be wrong too, and that's actually a feature. o1, on the other hands, will try to be too smart, and that will make it completely chaotic and unreliable when you just want it to follow instructions. Now, probably as a response to o1, I'm almost sure Sonnet-3.6 incorporated some kind of "mini reasoning" on it, which makes it slightly less good to me. I hope that Anthropic doesn't keep going in that direction and instead just make Sonnet-4 a natural extension to whatever they did with the original Sonnet-3.5, because a fully deterministic Sonnet-4 with 10x effective context size would absolutely groundbreaking to my own work, and certainly way more useful to me than a model that takes a lot of time to spit objectively worse code. </wall>