The Alignment Problem

RelatedHighlights

You can do it by learning how much reward certain states or actions can bring (“value” learning), or by simply knowing which strategies tend on the whole to do better than which others (“policy” learning).

Brian Christian • The Alignment Problem

These guesses fluctuate over time, and in general they get more accurate the closer you are to whatever it is you’re trying to predict.

Brian Christian • The Alignment Problem

The first was that happiness was fleeting.

Brian Christian • The Alignment Problem

It might be easier, though, to shift to praising the state rather than the actions: instead of rewarding the act of cleaning up itself, we might say, “Wow, look how clean that floor is!”

Brian Christian • The Alignment Problem

“We often seem to be curious,” says Schulz, “about things that aren’t particularly novel—they just puzzle us.”

Brian Christian • The Alignment Problem

learn reward function that described what they actually wanted the helicopter to do, what sorts of “pseudoreward” incentives, if any, could they add such that the training process

Brian Christian • The Alignment Problem

One, it shows us a reason—sparsity—why some problems or tasks are more difficult than others to solve or accomplish.

Brian Christian • The Alignment Problem

(Indeed, the severest punishment our society allows, short of death—solitary confinement—is, in effect, the infliction of boredom on people.)

Brian Christian • The Alignment Problem

A proper understanding of curiosity appears only to be possible at the interdisciplinary junction of all three.

Brian Christian • The Alignment Problem

In contrast, the agent is, almost cruelly, trapped inside a game it no longer has any drive to play.