Not sure about this one. Just an interesting snippet. I figure there are some reinforcing loops in the data, where the models get better with more data, attracting more users, generating more data. At the same time, I believe there are huge advantages in the knowledge on how to train and how to manage inference at scale, which makes a huge difference. I do not see anyone catching up to OpenAI at the moment, especially with their new finetuning offer.
An interesting factor might be figuring out the right data mix for pre-training and using a better screeing to weed out unwanted behavior. Whoever can figure that out at scale might have a huge advantage, if they can keep it a secret.