updated 1y ago
Ahead of AI #12: LLM Businesses and Busyness
- Data, or data usage rights, might indeed become scarcer (or at least costlier) in the future as more and more platforms are closing up their free API access. Examples of that we have seen earlier this year include Reddit and Twitter/X. The latter has also just updated its usage terms to prevent crawling and scraping last month.
This summer, major on... See morefrom Ahead of AI #12: LLM Businesses and Busyness by Sebastian Raschka
Nicolay Gerold added 1y ago
- Phi-1.5
Phi-1.5 is a "small" 1.3 billion parameter LLM with an impressive performance for its size.
Annotated figures from the Textbooks Is All You Need II paper
How does this small model accomplish such a good performance? The secret ingredient seems to be the high-quality data.
The pretraining is based on the Textbooks Is All You Need approach that... See morefrom Ahead of AI #12: LLM Businesses and Busyness by Sebastian Raschka
Nicolay Gerold added 1y ago
The authors hypothesize that the model gains instruction following capabilities without being instruction finetuning, which is an interesting observation. The model may have unintentionally been trained using benchmark datasets (mirrors test cases, but fails when format changes). - in the satirical Pretraining on the Test Set Is All You Need paper, the author trains a small 1M parameter LLM that outperforms all other models, including the 1.3B phi-1.5 model. This is achieved by training the model on all downstream academic benchmarks.
from Ahead of AI #12: LLM Businesses and Busyness by Sebastian Raschka
Nicolay Gerold added 1y ago
It is necessary to introduce a better benchmarking system with holdout datasets that no model can access that are private by default (this would probably a separate entity unaffiliated with the model developers).