
The Alignment Problem

This human capacity—to know when and what we don’t know—was missing. “A physician knows whether she is uncertain about a case,” they write, “and will consult more experienced colleagues if needed.”
Brian Christian • The Alignment Problem
The assumptions were fairly strong, and the domains were too simple to be of any immediate practical use—they were a far cry from the complexity of the human gait—but IRL did work.
Brian Christian • The Alignment Problem
In the first level of Super Mario Bros., there is a chasm that his agent almost never figures out how to cross, because it requires the agent holding down the jump button for fifteen frames in a row; long sequences of precise actions are much more difficult to learn than shorter or more flexible patterns.
Brian Christian • The Alignment Problem
Caplan noted that while there are no legal penalties for ignoring such a tattoo, there may be legal problems if the doctors let a patient die without having their official DNR paperwork. As he puts it: “The safer course is to do something.”
Brian Christian • The Alignment Problem
The key insight of shaping—that in order to get complex behavior, we may first need to strategically reward simpler behavior
Brian Christian • The Alignment Problem
the later of the two estimates is the one more likely to be correct.
Brian Christian • The Alignment Problem
If the reward is defined explicitly in terms of the end goal, or something fairly close to it, then one must essentially wait until random button-pressing, or random flailing around, produces the desired effect.
Brian Christian • The Alignment Problem
If elevated levels of dopamine signal something to the effect of things are going to be better than I thought they were going to be, then that feeling is, itself, pleasurable. And you can see how humans and animals alike would go out of their way to get that feeling,
Brian Christian • The Alignment Problem
The system would then attempt to refine its inference about the reward function based on the human’s feedback, and then use this inferred reward (as in typical reinforcement learning) to find behaviors that performed well by its lights.