A rough analogy to the current LLM process is that making a new model is like baking a cake. You figure out your data and algorithms—like mixing the batter—and then you pretrain the model, that is, run it on a large number of computers for several months—like putting it in the oven—and then at the end you do some “post training”—like frosting and d... See more
they could try “switching to a different model, augmenting the training data in some way, collecting more or different kinds of data, post-processing outputs, changing the objective function, or something else.” Our interviewees recommended focusing on experiments that provided additional context to the model, typically via new features, to get the... See more