r/MachineLearning - Reddit
A friend of mine recently bombed MLE interview at NVIDIA, they asked:
"We need to deploy a Llama-3 70B model on hardware with limited VRAM. You propose quantization. When is this a bad idea?"
Here's how you break it down:
Ashutosh Maheshwarix.com
Meta 🤝 Apple
Llama 4 + Apple Silicon is a match made in heaven.
Here's why: Like DeepSeek V3/R1, all of the new Llama 4 variants are massive sparse MoE models. They have a massive amount of parameters, but only a small number of those are active each time a token is generated. We don't know... See more
Not to say I saw this coming but the average ML engineer knows nothing about computers I’m not surprised people whose job it is to make hardware perform at its maximum is mogging them.
This is the result of everyone hiring people who can solve leetcode hards in Python and don’t know the difference between big endian... See more
bubble boix.com