
The Alignment Problem

We would still have a means for alignment, that is, even when we can’t say what we want, and even when we can’t do what we want. In a perfect world, simply knowing it when we see it would be enough.
Brian Christian • The Alignment Problem
“However,” they write, “there is no clear understanding of why they perform so well, or how they might be improved. . . . From a scientific standpoint, this is deeply unsatisfactory.”
Brian Christian • The Alignment Problem
If elevated levels of dopamine signal something to the effect of things are going to be better than I thought they were going to be, then that feeling is, itself, pleasurable. And you can see how humans and animals alike would go out of their way to get that feeling,
Brian Christian • The Alignment Problem
the key insight is that we should strive to reward states of the world, not actions of our agent.
Brian Christian • The Alignment Problem
“invoking the principle of not choosing an irreversible path when faced with uncertainty.”
Brian Christian • The Alignment Problem
The human world, by contrast, is elaborately architected to be learnable.
Brian Christian • The Alignment Problem
What if the two were
Brian Christian • The Alignment Problem
Do the data of the last six months, say, suggest that these biases are getting better or getting worse?
Brian Christian • The Alignment Problem
the line between critic and artist can be a thin one.