LLMs
⚡ LitGPT
Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data
Uses the latest state-of-the-art techniques:
✅ flash attention ✅ fp4/8/16/32 ✅ LoRA, QLoRA, Adapter (v1, v2) ✅ FSDP ✅ 1-1000+ GPUs/TPUs
Lightning AI • Models • Quick start • Inference • Finetune • Pretrain • Deploy • Features • Training recipes (YAML)
Finetune, pretrain and... See more
Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data
Uses the latest state-of-the-art techniques:
✅ flash attention ✅ fp4/8/16/32 ✅ LoRA, QLoRA, Adapter (v1, v2) ✅ FSDP ✅ 1-1000+ GPUs/TPUs
Lightning AI • Models • Quick start • Inference • Finetune • Pretrain • Deploy • Features • Training recipes (YAML)
Finetune, pretrain and... See more
Lightning-AI • GitHub - Lightning-AI/litgpt: Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
Google Deepmind used similar idea to make LLMs faster in Accelerating Large Language Model Decoding with Speculative Sampling. Their algorithm uses a smaller draft model to make initial guesses and a larger primary model to validate them. If the draft often guesses right, operations become faster, reducing latency.
There are some people speculating... See more
There are some people speculating... See more
muhtasham • Machine Learners Guide to Real World - 2️⃣ Concepts from Operating Systems That Found Their Way in LLMs
The need for better AI or LLM-specific infrastructure, along with the host of problems that come with non-deterministic of LLMs, means that there’s more software work ahead of us, not less. Abstraction layers like LLMs create more possibilities and thus, more work.
Is this a good thing or a bad thing? I’m not sure.
A great example of this is frontend... See more
Is this a good thing or a bad thing? I’m not sure.
A great example of this is frontend... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
To do this, we employ a technique known as AI-assisted evaluation, alongside traditional metrics for measuring performance. This helps us pick the prompts that lead to better quality outputs, making the end product more appealing to users. AI-assisted evaluation uses best-in-class LLMs (like GPT-4) to automatically critique how well the AI's... See more
Developing Rapidly with Generative AI
Humans are bad at coming up with search queries. Humans are good at incrementally narrowing down options with a series of filters, and pointing where they want to go next. This seems obvious, but we keep building interfaces for finding information that look more like Google Search and less like a map.
All information tools have to give users some... See more
All information tools have to give users some... See more
thesephist.com • Navigate, don't search
How do models represent style, and how can we more precisely extract and steer it?
A commonly requested feature in almost any LLM-based writing application is “I want the AI to respond in my style of writing,” or “I want the AI to adhere to this style guide.” Aside from costly and complicated multi-stage finetuning processes like Anthropic’s RL with... See more
A commonly requested feature in almost any LLM-based writing application is “I want the AI to respond in my style of writing,” or “I want the AI to adhere to this style guide.” Aside from costly and complicated multi-stage finetuning processes like Anthropic’s RL with... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
- Mistral AI shows a promising alternative to the GPT 3.5 model using prompt engineering .
- Mistral AI can be used where it requires high volume and faster processing time with very little cost .
- Mistral AI can be used as pre-filtering to GPT 4 to reduce cost i.e. can be used to filter down search results .
Mistral 7B is 187x cheaper compared to GPT-4
Overview
MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
MaxText... See more
MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
MaxText... See more
.png?table=block&id=e222d02f-1d78-4887-8972-a958b1fbca65&spaceId=996f2b3b-deaa-4214-aedb-cbc914a1833e&width=1250&userId=&cache=v2)