GitHub - EleutherAI/sae-auto-interp

LLMs just learned how to explain their own thoughts.
Not only do they generate answers, they can now describe the internal processes that led to those answers… and get better at it with training.
We’re officially entering the era of self-interpretable AI.
Models aren’t just... See more