The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.

Anthropictwitter.com
Quanta Magazinex.com

Sarah Wang What Builders Talk About When They Talk About AI | Andreessen Horowitz