When you hear a phrase like a “7B parameter model,” 7 billion parameters is a measure of the dataset, not the model . You obviously couldn’t train such a model out of, say, a text of 50 words (well you could, but it would be highly informationally degenerate). And if you gave the training protocol 100 internets worth of data, 7B parameters may not... See more