> local llms 101
> running a model = inference (using model weights)
> inference = predicting the next token based on your input plus all tokens generated so far
> together, these make up the "sequence"
> tokens ≠ words
> they're the chunks... See more
Ahmadx.com