updated 1y ago
Ahead of AI #12: LLM Businesses and Busyness
Nicolay Gerold added 1y ago
The sliding window attention mechanism is essentially a fixed-sized attention block that allows a current token to attend only a specific number of previous tokens (instead of all previous tokens). How does this relate to the Attention Sinks paper?Nicolay Gerold added 1y ago