
Reasoning skills of large language models are often overestimated

Indeed, we may already be running into scaling limits in deep learning, perhaps already approaching a point of diminishing returns. In the last several months, research from DeepMind and elsewhere on models even larger than GPT-3 have shown that scaling starts to falter on some measures, such as toxicity, truthfulness, reasoning, and common sense
Gary Marcus • Deep Learning Is Hitting a Wall

Paper review time
Includes: computation graph, compositional tasks, dynamic programming, generalization
The million dollar question, are LLMs stochastic parrots? This papers quite strongly suggests so.
Can LLMs generalize on tasks which are compositional in natur... See more