Incredible paper. One weird trick to recursive self-improvement:
- a model iteratively labels its own train data and learns from progressively harder examples. Gotta
1) generate problems of appropriate hardness
2) be able to filter our negative examples using a cheap verifier. https://t.co/nkYqGgNM7q