Saved by Andrés and
The Illustrated Transformer
Luc Cheung and added
I finally understand how GPT generates text.
Really helps to code it from scratch in Python.
There are 5 components:
• token embeddings
• positional embeddings
• transformer blocks
• layer normalization
• output head
It sounds complex, but grokking GPT is simple.
Token embeddings turn input text into meaningful vectors that capture semantic mea... See more
BA Builder added
AI + Blockchain research. How gpt works from scratch
在这项工作中,我们提出了 Transformer 模型架构,它摒弃了递归,而是完全依赖注意力机制来绘制输入和输出之间的全局依赖关系。Transformer 可以大大提高并行化程度,在 8 个 P100 GPU 上只需 12 个小时的训练,就能达到翻译质量的新高度。
Attention Is All You Need
Marvin Chang added
RP and added
sari and added
Shawn Kilburn and added