
Ahead of AI #12: LLM Businesses and Busyness

Phi-1.5
Phi-1.5 is a "small" 1.3 billion parameter LLM with an impressive performance for its size.
Annotated figures from the Textbooks Is All You Need II paper
How does this small model accomplish such a good performance? The secret ingredient seems to be the high-quality data.
The pretraining is based on the Textbooks Is All You Need approach that... See more
Phi-1.5 is a "small" 1.3 billion parameter LLM with an impressive performance for its size.
Annotated figures from the Textbooks Is All You Need II paper
How does this small model accomplish such a good performance? The secret ingredient seems to be the high-quality data.
The pretraining is based on the Textbooks Is All You Need approach that... See more
Sebastian Raschka • Ahead of AI #12: LLM Businesses and Busyness
The authors hypothesize that the model gains instruction following capabilities without being instruction finetuning, which is an interesting observation.
The model may have unintentionally been trained using benchmark datasets (mirrors test cases, but fails when format changes).
in the satirical Pretraining on the Test Set Is All You Need paper, the author trains a small 1M parameter LLM that outperforms all other models, including the 1.3B phi-1.5 model. This is achieved by training the model on all downstream academic benchmarks.
Sebastian Raschka • Ahead of AI #12: LLM Businesses and Busyness
It is necessary to introduce a better benchmarking system with holdout datasets that no model can access that are private by default (this would probably a separate entity unaffiliated with the model developers).
Data, or data usage rights, might indeed become scarcer (or at least costlier) in the future as more and more platforms are closing up their free API access. Examples of that we have seen earlier this year include Reddit and Twitter/X. The latter has also just updated its usage terms to prevent crawling and scraping last month.
This summer, major on... See more
This summer, major on... See more