LLMs
sari and
LLMs
sari and

One way to think about (LLM) is that about 3 years ago, aliens landed on Earth. They handed over a USB stick and then disappeared. Since then we’ve been poking the thing they gave us with a stick, trying to figure out what it does and how it works.
"A key challenge of (LLMs) is that they do not come with a manual! They come with a “Twitter influencer manual” instead, where lots of people online loudly boast about the things they can do with a very low accuracy rate, which is really frustrating..."
Simon Willison, attempting to explain LLM
Meta AI released LLaMA ... and they included a paper which described exactly what it was trained on. It was 5TB of data.
2/3 of it was from Common Crawl. It had content from GitHub, Wikipedia, ArXiv, StackExchange and something called “Books”.
What’s Books? 4.5% of the training data was books. Part of this was Project Gutenberg, which is public

sari and
