
The Alignment Problem

But ideally, a more thoughtful, or considerate, or uncertain agent might notice, and take the very slightly more inconvenient route that doesn’t leave the world permanently altered in its wake.
Brian Christian • The Alignment Problem
The second challenge of reinforcement learning, relative to supervised and unsupervised learning, is that the rewards or punishments we get from the environment—owing to their very scalar quality—are terse.
Brian Christian • The Alignment Problem
If there really is some kind of singular, scalar “reward” that humans and animals are designed to maximize, might it be as simple as a chemical or a circuit in the brain?
Brian Christian • The Alignment Problem
No matter how good the learner is, though, they will make mistakes—whether blatant or subtle. But because the learner never saw the expert get into trouble, they have also never seen the expert get out.
Brian Christian • The Alignment Problem
wise use of risk-assessment tools, then, might emphasize the violent reoffense and failure to appear predictions over the nonviolent reoffense prediction, on the grounds that the model’s training data is more trustworthy in those cases
Brian Christian • The Alignment Problem
tension at the heart of curiosity, almost a tug-of-war: As we explore an environment and our available behaviors within it—whether that’s the microcosm of an Atari game, the real-world great outdoors, or the nuances of human society—we simultaneously delight in the things that surprise us while at the same time we become harder and harder to surpri
... See moreBrian Christian • The Alignment Problem
(A control group who doesn’t get to see the toy defy their expectations, reliably prefers
Brian Christian • The Alignment Problem
Amending or editing the rule-based system was fairly straightforward; the neural network was harder to “correct” in this
Brian Christian • The Alignment Problem
The system begins to sculpt the very reality it is meant to predict. This feedback loop, in turn, further biases its training data.