r/MachineLearning - Reddit
A friend of mine recently bombed MLE interview at NVIDIA, they asked:
"We need to deploy a Llama-3 70B model on hardware with limited VRAM. You propose quantization. When is this a bad idea?"
Here's how you break it down:
Ashutosh Maheshwarix.com
Meta 🤝 Apple
Llama 4 + Apple Silicon is a match made in heaven.
Here's why: Like DeepSeek V3/R1, all of the new Llama 4 variants are massive sparse MoE models. They have a massive amount of parameters, but only a small number of those are active each time a token is generated. We don't know... See more
