Chain of Thought Reasoning without Prompting
https://t.co/75h2QQzT9M (NeurIPS 2024)
Chain of thought (CoT) reasoning ≠ CoT prompting. While the term "chain of thought" was popularized from prompting, it now primarily refers to the generation of step by step reasoning – the original meaning of the phrase "chain of thought." CoT prompting is simply one way to elicit reasoning. However, the most powerful approach is to train models to reason intrinsically across various tasks, rather than relying on task-specific prompts.
The pioneering work in training models to reason in natural language was done by DeepMind in 2017 [1]. As written in their paper, “... derive the final answer through a series of small steps …” in solving math word problems. In 2021 [2], a team at OpenAI built upon this work by creating GSM8K, a large dataset of math word problems and their corresponding natural language solutions, and using it to fine-tune GPT-3.
Our latest research (actually done nearly 1 year ago), "Chain of Thought Reasoning without Prompting," is poised to inspire significant advancements in training LLMs to reason more effectively. Our paper showed impressive performance of our proposed "CoT decoding" method, even with pre-trained LLMs. However, the key takeaway from our work is that pre-trained LLMs already possess an inherent capacity for reasoning. To unlock their full potential, we simply need to bootstrap this ability through carefully designed fine-tuning processes.
[1] https://t.co/lt5QHHqAk5
... See more
Denny Zhoux.com