The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.

updated 1y ago

  • Hagen Peters added

  • from How Recommendation Algorithms Actually Work | Future - https://future.com/forget-open-source-algorithms-focus-on-experiments-instead by future.com

    Tom So added

  • from What Builders Talk About When They Talk About AI | Andreessen Horowitz by Sarah Wang

    Nicolay Gerold added