Advances in data processing techniques. You can increase context length in two ways. First, you can train the model with longer context lengths. That’s difficult because it’s much more computationally expensive, and it’s hard to find datasets with long context lengths (most documents in CommonCrawl have fewer than 2,000 tokens).
The second, more com... See more