![Preview of The Alignment Problem](https://m.media-amazon.com/images/I/91ph9dWiE0L._SY160.jpg)
updated 6mo ago
updated 6mo ago
No matter how good the learner is, though, they will make mistakes—whether blatant or subtle. But because the learner never saw the expert get into trouble, they have also never seen the expert get out.
After his Super Mario Bros. agent has played the game long enough, “It just starts to stay in the beginning. . . . Because there is no reward anywhere—everywhere error is very, very low—so it just learns to not go anywhere.”
What if they aren’t trying to do anything whatsoever, and their actions reflect random behavior, nothing more?
in many cases the equal-weighted models were better even than the optimal regressions
a system trained first on an easier form of a problem may be in a better position to learn a more difficult one than an agent trained from scratch.
The impossibility proofs also show that equalizing the false positive and false negative rates means giving up on calibration—namely, the guarantee that for every numerical risk level, the chance of a defendant reoffending is the same regardless of gender or race.
If the reward is defined explicitly in terms of the end goal, or something fairly close to it, then one must essentially wait until random button-pressing, or random flailing around, produces the desired effect.
“I think one of the worst situations is to get no information about your level of progress,”
was an attempt to define what exactly we mean when we say that something is “interesting,”
Rule-based models are among the most easily interpreted machine-learning systems; they typically take the form of a list of “if x then y” rules.