
Reasoning skills of large language models are often overestimated


I don't recall another time in CS history where a result like this is obvious to the point of banality to one group. And heretical to another. https://t.co/PsKo3wz8x3
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
machinelearning.apple.com
🧬 Bad news for medical LLMs.
This paper finds that top medical AI models often match patterns instead of truly reasoning.
Small wording tweaks cut accuracy by up to 38% on validated questions.
The team took 100 MedQA questions, replaced the correct choice with... See more