What Is ChatGPT Doing ... And Why Does It Work?
In the first neural nets we discussed above, every neuron at any given layer
Stephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
But this isn’t the right conclusion to draw. Computationally irreducible
Stephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
produce a new embedding (i.e. a new array of numbers). It then takes the last part of this array and generates from it an array of about 50,000 values that turn into probabilities for different possible next tokens. (And, yes, it so happens that there are about the same number of tokens used as there are common words in English, though only about 3
... See moreStephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
Because what’s actually inside ChatGPT are a bunch of numbers—with a bit less than 10 digits of precision—that are some kind of distributed encoding of the aggregate structure of all that text.
Stephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
The majority of the effort in training ChatGPT is spent “showing it” large amounts of existing text from the web, books, etc. But it turns out there’s another—apparently rather important—part too.
Stephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
Or put another way, there’s an ultimate tradeoff between capability and trainability: the more you want a system to make “true use” of its computational capabilities, the more it’s going to show computational irreducibility, and the less it’s going to be trainable. And the more it’s fundamentally trainable, the less it’s going to be able to do soph
... See moreStephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
But often just repeating the same example over and over again isn’t enough. It’s also necessary to show the neural net variations of the example. And it’s a feature of neural net lore that those “data augmentation” variations don’t have to be sophisticated to be useful. Just slightly modifying images with basic image processing can make them essent
... See moreStephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
so how do we follow the same kind of approach to find embeddings for words? The key is to start from a task about words for which we can readily do training. And the standard such task is “word prediction”. Imagine we’re given “the ___ cat”. Based on a large corpus of text (say, the text content of the web), what are the probabilities for different
... See moreStephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
was basically connected (at least with some weight) to every neuron on the layer before. But this kind of fully connected network is (presumably) overkill if one’s working with data that has particular, known structure.
Stephen Wolfram • What Is ChatGPT Doing ... And Why Does It Work?
ChatGPT very politely takes the correction, and if you ask the question yet again it then gives the correct answer. Obviously there could be a more streamlined way to handle the back and forth with Wolfram|Alpha, but it’s nice to see that even this very straightforward pure-natural-language approach basically already works.