
The Alignment Problem

“What is your baseline exactly?”44 Should the system measure impact relative to the initial state of the world, or to the counterfactual of what would have happened if the system took no action? Either choice comes with scenarios that don’t fit our intentions.
Brian Christian • The Alignment Problem
We don’t want a system that cures someone’s fatal illness but then—to nullify the high impact of the cure—kills them.
Brian Christian • The Alignment Problem
the reason we care about the Shanghai Stock Exchange, or the integrity of our cherished vase, or, for that matter, the ability to move boxes around a virtual warehouse, is that those things for whatever reason matter to us, and they matter to us because they’re ultimately in some way or other tied to our goals.
Brian Christian • The Alignment Problem
But having taken these unavoidably impactful steps, there is a new status quo—which means you shouldn’t necessarily rush out to commit more high-impact actions just to “offset” your previous ones.
Brian Christian • The Alignment Problem
For instance, he says, we might develop an index of “twenty billion” or so metrics that describe the world—“the air pressure in Dhaka, the average night-time luminosity at the South Pole, the rotational speed of Io, and the closing numbers of the Shanghai stock exchange”42—and design an agent
Brian Christian • The Alignment Problem
“stepwise relative reachability”: quantifying how many possible configurations of the world are reachable at each moment in time, relative to a baseline of inaction, and trying not to make that quantity go down, if possible.
Brian Christian • The Alignment Problem
individual game’s rewards while at the same time preserving its future ability to satisfy four or five random auxiliary goals,
Brian Christian • The Alignment Problem
“I think the sokoban game that it was inspired by was already a very nice setting for illustrating irreversibility,” Krakovna says, “because in that game you actually want to do irreversible things—but you want to do them in the right order.
Brian Christian • The Alignment Problem
But ideally, a more thoughtful, or considerate, or uncertain agent might notice, and take the very slightly more inconvenient route that doesn’t leave the world permanently altered in its wake.
Brian Christian • The Alignment Problem
“stepwise” baselines. Maybe certain actions are unavoidably high-impact based on the goal you’re setting out to achieve.