r/MachineLearning - Reddit

Meta 🤝 Apple
Llama 4 + Apple Silicon is a match made in heaven.
Here's why: Like DeepSeek V3/R1, all of the new Llama 4 variants are massive sparse MoE models. They have a massive amount of parameters, but only a small number of those are active each time a token is generated. We don't know... See more
My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good.
I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few... See more
Andrej Karpathyx.com