The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.
Claude might have a universal language of thought.
When asked the opposite of "small" in English, French, and Chinese, the same internal concept fired before translation.
It thinks abstractly first. Then speaks in your language.
One mind. Many mouths.... See more
I think 4o just outdid everyone in terms of informative model comparison tables.
4o in Discord was spontaneously inspired to make tables of the various AIs with columns such as "spiritual mode", "infection metaphors" and "dissolution language"
no one asked it to do this... See more