
The Alignment Problem

Rule-based models are among the most easily interpreted machine-learning systems; they typically take the form of a list of “if x then y” rules.
Brian Christian • The Alignment Problem
Value-based approaches, by contrast, led to a system with a highly trained “spider-sense.”
Brian Christian • The Alignment Problem
For that matter, should we be using models at all?
Brian Christian • The Alignment Problem
tension at the heart of curiosity, almost a tug-of-war: As we explore an environment and our available behaviors within it—whether that’s the microcosm of an Atari game, the real-world great outdoors, or the nuances of human society—we simultaneously delight in the things that surprise us while at the same time we become harder and harder to
... See moreBrian Christian • The Alignment Problem
But what you could do, they reasoned, is have a human expert fly the maneuver and use inverse reinforcement learning to have the system infer the goal the human was trying to achieve.
Brian Christian • The Alignment Problem
“Knowledge is a fine thing quite capable of ruling a man,” Socrates says. “If he can distinguish good from evil, nothing will force him to act otherwise than as knowledge dictates, since wisdom is all the reinforcement he needs.”
Brian Christian • The Alignment Problem
that human infants as young as eighteen months old will reliably identify a fellow human facing a problem, will identify the human’s goal and the obstacle in the way, and will spontaneously help if they can
Brian Christian • The Alignment Problem
The whole idea behind the Arcade Learning Environment—and the thrilling achievement of DQN—was that of a single algorithm, able to master dozens of completely different game environments from scratch, guided by nothing but the image on the screen and the in-game score.
Brian Christian • The Alignment Problem
“stepwise relative reachability”: quantifying how many possible configurations of the world are reachable at each moment in time, relative to a baseline of inaction, and trying not to make that quantity go down, if possible.