
The Alignment Problem

Amending or editing the rule-based system was fairly straightforward; the neural network was harder to “correct” in this
Brian Christian • The Alignment Problem
The point was they had discovered shaping: a technique for instilling complex behaviors through simple rewards, namely by rewarding a series of successive approximations of that behavior.
Brian Christian • The Alignment Problem
The fact that the embeddings that emerge from this “magical” optimization process are so uncannily and discomfitingly useful as a mirror for society means that we have, in effect, added a diagnostic tool to the arsenal of social science.
Brian Christian • The Alignment Problem
“Well, it looks way more like a dog than it does like a cat,” and thus output a shockingly high “confidence”
Brian Christian • The Alignment Problem
By showing where in the image the model is using to make its prediction, it really does provide a level of trust and also, you know, a level of validity to the results.”
Brian Christian • The Alignment Problem
The slightly surreal aspect is that the system uses these probabilities to focus the slow MCTS search along the series of moves it thinks are most likely.
Brian Christian • The Alignment Problem
After his Super Mario Bros. agent has played the game long enough, “It just starts to stay in the beginning. . . . Because there is no reward anywhere—everywhere error is very, very low—so it just learns to not go anywhere.”
Brian Christian • The Alignment Problem
the later of the two estimates is the one more likely to be correct.
Brian Christian • The Alignment Problem
Modeling the world as it is is one thing. But as soon as you begin using that model, you are changing the world, in ways large and small.