The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data

by Nathan Lambert

Thumbnail of The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data

updated 10mo ago