Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers
Who is this document for?
This document is for engineers and researchers (both individuals and teams) interested in maximizing the performance of deep learning models . We assume basic knowledge of machine learning and deep learning concepts.
Our emphasis is on the process of hyperparameter tuning . We touch on other aspects of deep learning trainin... See more
This document is for engineers and researchers (both individuals and teams) interested in maximizing the performance of deep learning models . We assume basic knowledge of machine learning and deep learning concepts.
Our emphasis is on the process of hyperparameter tuning . We touch on other aspects of deep learning trainin... See more
GitHub - google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.
Nicolay Gerold added
4. Introducing Stable LM 3B: Bringing Sustainable, High-Performance Language Models to Smart Devices
Stability AI introduced Stable LM 3B, a high-performing language model designed for smart devices. With 3 billion parameters, it outperforms state-of-the-art 3B models and reduces operating costs and power consumption. The model enables a broader ran... See more
Stability AI introduced Stable LM 3B, a high-performing language model designed for smart devices. With 3 billion parameters, it outperforms state-of-the-art 3B models and reduces operating costs and power consumption. The model enables a broader ran... See more
This AI newsletter is all you need #68
Nicolay Gerold added
Phi-1.5
Phi-1.5 is a "small" 1.3 billion parameter LLM with an impressive performance for its size.
Annotated figures from the Textbooks Is All You Need II paper
How does this small model accomplish such a good performance? The secret ingredient seems to be the high-quality data.
The pretraining is based on the Textbooks Is All You Need approach that... See more
Phi-1.5 is a "small" 1.3 billion parameter LLM with an impressive performance for its size.
Annotated figures from the Textbooks Is All You Need II paper
How does this small model accomplish such a good performance? The secret ingredient seems to be the high-quality data.
The pretraining is based on the Textbooks Is All You Need approach that... See more
Sebastian Raschka • Ahead of AI #12: LLM Businesses and Busyness
Nicolay Gerold added
The authors hypothesize that the model gains instruction following capabilities without being instruction finetuning, which is an interesting observation.
The model may have unintentionally been trained using benchmark datasets (mirrors test cases, but fails when format changes).
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more
Developing Rapidly with Generative AI
Nicolay Gerold added
RT-2-X (55B): one of the biggest models to date performing unseen tasks in academic labs
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Darren LI added
Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens... See more
Long-Context Retrieval Models with Monarch Mixer
Nicolay Gerold added
If you made a thousand versions of an LLM, that’s good at a thousand different things, and you have to load each of those into the GPUs and serve them, it becomes very expensive. The big holy grail right now that everybody’s looking for is: are there techniques, where you can just do small modifications where you can get really good results? There... See more
Sarah Wang • What Builders Talk About When They Talk About AI | Andreessen Horowitz
Nicolay Gerold added
PEFT in a nutshell.