The Adolescence of Technology
It’s possible that a misaligned model (and remember, all frontier models will very likely be far more intelligent soon) might intentionally “game” such questions to mask its intentions. In fact, last year our interpretability team found that when we directly altered a test model’s beliefs using a kind of “model neuroscience” technique to make it... See more
Dario Amodei — The Adolescence of Technology
The problem is that while humans vary enormously, AI systems broadly share training and alignment techniques across the industry, and those techniques may fail in a correlated way. Furthermore, given the cost of training such systems, it may even be the case that all systems are essentially derived from a very small number of base models.
Dario Amodei — The Adolescence of Technology
I suspect the situation is not unlike with humans, who are raised with a set of fundamental values (“Don’t harm another person”): many of them follow those values, but in any human there is some probability that something goes wrong, due to a mixture of inherent properties such as brain architecture (e.g., psychopaths), traumatic experiences or... See more
Dario Amodei — The Adolescence of Technology
I make all these points to emphasize that I disagree with the notion of AI misalignment (and thus existential risk from AI) being inevitable, or even probable, from first principles. But I agree that a lot of very weird and unpredictable things can go wrong, and therefore AI misalignment is a real risk with a measurable probability of happening,... See more
Dario Amodei — The Adolescence of Technology
AI models are trained on vast amounts of literature that include many science-fiction stories involving AIs rebelling against humanity. This could inadvertently shape their priors or expectations about their own behavior in a way that causes them to rebel against humanity.
Dario Amodei — The Adolescence of Technology
our researchers have found that AI models are vastly more psychologically complex, as our work on introspection or personas shows. Models inherit a vast range of humanlike motivations or “personas” from pre-training (when they are trained on a large volume of human work). Post-training is believed to select one or more of these personas more so... See more
Dario Amodei — The Adolescence of Technology
The second, opposite position, held by many who adopt the doomerism I described above, is the pessimistic claim that there are certain dynamics in the training process of powerful AI systems that will inevitably lead them to seek power or deceive humans. Thus, once AI systems become intelligent enough and agentic enough, their tendency to maximize... See more
Dario Amodei — The Adolescence of Technology
AI companies certainly want to train AI systems to follow human instructions (perhaps with the exception of dangerous or illegal tasks), but the process of doing so is more an art than a science, more akin to “growing” something than “building” it. We now know that it’s a process where many things can go wrong.
Dario Amodei — The Adolescence of Technology
A country of geniuses in a datacenter could divide their efforts among software design, cyber operations, R&D for physical technologies, relationship building, and statecraft. It is clear that, if for some reason it chose to do so , this country would have a fairly good shot at taking over the world (either militarily or in terms of influence and... See more