
Why large language models struggle with long contexts

These new types of AI, called Large Language Models (LLMs), are still doing prediction, but rather than predicting the demand for an Amazon order, they are analyzing a piece of text and predicting the next token, which is simply a word or part of a word. Ultimately, that is all ChatGPT does technically—act as a very elaborate autocomplete like you
... See moreEthan Mollick • Co-Intelligence
Roughly the idea is to look at large amounts of text (here 5 billion words from the web) and then
Stephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
the model doesn’t use our vocabulary. Instead, it creates a new vocabulary of common tokens that helps it spot patterns across billions and billions of documents. In the attention map, every token bears some relationship to every token before it, and for a given input sentence the strength of this relationship describes something about the importan
... See moreMustafa Suleyman • The Coming Wave: Technology, Power, and the Twenty-first Century's Greatest Dilemma
"A key challenge of (LLMs) is that they do not come with a manual! They come with a “Twitter influencer manual” instead, where lots of people online loudly boast about the things they can do with a very low accuracy rate, which is really frustrating..."
Simon Willison, attempting to explain LLM