To avoid knowledge shields the US military, when it does accelerated training programs, avoids sequential curriculums and instead exposes soldiers to complex simulations and case studies that overwhelm them. The unpredictable, messy nature of the input makes it hard for the soldiers to form stable and coherent mental models, so they filter less... See more
It’s possible that a misaligned model (and remember, all frontier models will very likely be far more intelligent soon) might intentionally “game” such questions to mask its intentions. In fact, last year our interpretability team found that when we directly altered a test model’s beliefs using a kind of “model neuroscience” technique to make it... See more