Attention Is All You Need


I just read this new paper from Google and I’m absolutely buzzing 🤯
The core idea is almost offensively simple: ditch recurrence and convolutions, and use only attention. That’s it. And somehow…it unlocks a whole new regime of performance, scale, and simplicity.
Here’s what blew my... See more
