The Alignment Problem

RelatedInsightsHighlights

Rule-based models are among the most easily interpreted machine-learning systems; they typically take the form of a list of “if x then y” rules.

Brian Christian • The Alignment Problem

Parker C

Value-based approaches, by contrast, led to a system with a highly trained “spider-sense.”

Brian Christian • The Alignment Problem

Parker C

For that matter, should we be using models at all?

Brian Christian • The Alignment Problem

Parker C

tension at the heart of curiosity, almost a tug-of-war: As we explore an environment and our available behaviors within it—whether that’s the microcosm of an Atari game, the real-world great outdoors, or the nuances of human society—we simultaneously delight in the things that surprise us while at the same time we become harder and harder to

Brian Christian • The Alignment Problem

Parker C

But what you could do, they reasoned, is have a human expert fly the maneuver and use inverse reinforcement learning to have the system infer the goal the human was trying to achieve.

Brian Christian • The Alignment Problem

Parker C

“Knowledge is a fine thing quite capable of ruling a man,” Socrates says. “If he can distinguish good from evil, nothing will force him to act otherwise than as knowledge dictates, since wisdom is all the reinforcement he needs.”

Brian Christian • The Alignment Problem

Parker C

that human infants as young as eighteen months old will reliably identify a fellow human facing a problem, will identify the human’s goal and the obstacle in the way, and will spontaneously help if they can

Brian Christian • The Alignment Problem

Parker C

The whole idea behind the Arcade Learning Environment—and the thrilling achievement of DQN—was that of a single algorithm, able to master dozens of completely different game environments from scratch, guided by nothing but the image on the screen and the in-game score.

Brian Christian • The Alignment Problem

Parker C

“stepwise relative reachability”: quantifying how many possible configurations of the world are reachable at each moment in time, relative to a baseline of inaction, and trying not to make that quantity go down, if possible.

Brian Christian • The Alignment Problem

Parker C