One point I didn't notice reflected here is that the development of AI naturally matches evolution - the best models are selected to be iterated on further, which is similar to reproduction. As a result, a drive for survival can emerge regardless of what goals the developers attempted. And if you couple superintelligence with a survival instinct, you will have an existential threat.
Hmm yeah I think a valid concern is something like: imagine a future where we have lots of very capable AI models. Eventually those with a survival instinct and more coherent goal-directed tendencies may "outcompete" those that that do not, by acquiring a greater share of resources. Hopefully we'll be able to intervene in this process, but I agree it's an important argument I didn't address in this post.
Thanks for writing, interesting post! There aren't too many people seriously, but critically, engaging with xrisk arguments, that's always appreciated.
I agree with some cruxes, for example, that it's not certain that we even can create an AI powerful enough to take over the world, and therefore, p(doom) should at least be less than say 95%. Also tend to agree with some things you say about generalization.
However, one problem with alignment approaches that this post seems to suffer from as well: it's not enough if someone can make a safe AI. The big question is: is it possible to make an unsafe AI? After Murphy's law: if it's possible to make an unsafe AI of takeover-level capability, someone will eventually do that. Do we think that chance is significant? That determines most of my p(doom).
One point I didn't notice reflected here is that the development of AI naturally matches evolution - the best models are selected to be iterated on further, which is similar to reproduction. As a result, a drive for survival can emerge regardless of what goals the developers attempted. And if you couple superintelligence with a survival instinct, you will have an existential threat.
Hmm yeah I think a valid concern is something like: imagine a future where we have lots of very capable AI models. Eventually those with a survival instinct and more coherent goal-directed tendencies may "outcompete" those that that do not, by acquiring a greater share of resources. Hopefully we'll be able to intervene in this process, but I agree it's an important argument I didn't address in this post.
Thanks for writing, interesting post! There aren't too many people seriously, but critically, engaging with xrisk arguments, that's always appreciated.
I agree with some cruxes, for example, that it's not certain that we even can create an AI powerful enough to take over the world, and therefore, p(doom) should at least be less than say 95%. Also tend to agree with some things you say about generalization.
However, one problem with alignment approaches that this post seems to suffer from as well: it's not enough if someone can make a safe AI. The big question is: is it possible to make an unsafe AI? After Murphy's law: if it's possible to make an unsafe AI of takeover-level capability, someone will eventually do that. Do we think that chance is significant? That determines most of my p(doom).