
Saved by Darren LI and
Large language models, explained with a minimum of math and jargon
Saved by Darren LI and
Vector databases are widely used in NLP tasks such as sentiment analysis, text classification, and semantic search. By representing text as vector embeddings, it becomes easier to compare and analyze textual data.
All we have to do is—somehow—arrange the words in this space to make them do as good a job of predicting these missing words as possible. (At least, we’ll have done as good a job as this particular model architecture allows.) How are we going to arrive at these representations? Why, of course, stochastic gradient descent! We will simply scatter our
... See moreTwo questions, equally ambitious, inspired the project: what if every concept a human could articulate through language was organized in a single, massive database of words? And what if, in contrast to the alphabetical organization of a dictionary, those words were connected to one another on the basis of their meanings?
First, with the coming of “statistical” intelligence, we must let go of the mind metaphor. Where the grammarians were hoping to discover human brain structures by writing chatbots, modern chatbots work by mechanisms emphatically not human. We certainly do not produce language in our minds by statistical probabilities. Our brains do not convert word
... See more