Paving the way to efficient architectures: StripedHyena-7B, ...

Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers

together.ai

RelatedHighlights

Who is this document for?

This document is for engineers and researchers (both individuals and teams) interested in maximizing the performance of deep learning models . We assume basic knowledge of machine learning and deep learning concepts.

Our emphasis is on the process of hyperparameter tuning . We touch on other aspects of deep learning trainin... See more

GitHub - google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.

Nicolay Gerold added

4. Introducing Stable LM 3B: Bringing Sustainable, High-Performance Language Models to Smart Devices

Stability AI introduced Stable LM 3B, a high-performing language model designed for smart devices. With 3 billion parameters, it outperforms state-of-the-art 3B models and reduces operating costs and power consumption. The model enables a broader ran... See more

This AI newsletter is all you need #68

Nicolay Gerold added

Phi-1.5

Phi-1.5 is a "small" 1.3 billion parameter LLM with an impressive performance for its size.

Annotated figures from the Textbooks Is All You Need II paper

How does this small model accomplish such a good performance? The secret ingredient seems to be the high-quality data.

The pretraining is based on the Textbooks Is All You Need approach that... See more

Sebastian Raschka • Ahead of AI #12: LLM Businesses and Busyness

Nicolay Gerold added

The authors hypothesize that the model gains instruction following capabilities without being instruction finetuning, which is an interesting observation. The model may have unintentionally been trained using benchmark datasets (mirrors test cases, but fails when format changes).

Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more

Developing Rapidly with Generative AI

Nicolay Gerold added

RT-2-X (55B): one of the biggest models to date performing unseen tasks in academic labs

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Darren LI added

Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens... See more

Long-Context Retrieval Models with Monarch Mixer

Nicolay Gerold added

If you made a thousand versions of an LLM, that’s good at a thousand different things, and you have to load each of those into the GPUs and serve them, it becomes very expensive. The big holy grail right now that everybody’s looking for is: are there techniques, where you can just do small modifications where you can get really good results? There

Sarah Wang • What Builders Talk About When They Talk About AI | Andreessen Horowitz

Nicolay Gerold added

PEFT in a nutshell.

What We Learned From a Year of Building With LLMs

Bryan Bischof oreilly.com

and added