I have not read the book, or the review, but just read the review of the review.
I can't judge how well Yud has sold his case in this particular book, but I do think one point he makes has come through this review of the review of the book in a somewhat distorted fashion.
The analogy with human evolution tells us that, with a fairly limited optimisation function (spread genes), we get massively unpredictable side effects (science, culture, everything that separates us from other animals running the same basic optimisation algorithm). If we are honest with ourselves, none of us would think that the question, "How could this molecule make more copies of itself", would lead to consciousness and religion and all the rest. We might predict fucking and fighting, but not the rest of it.
In the same sense, as Yud notes, we have no idea what might come from any attempts to develop AI along particular lines that we think are safe or valuable, and when it is more intelligent than us, which seems a likely outcome, the story will develop in ways we cannot even imagine, much less provide reliable prognostic estimates. The unpredictability will compound when it is AI who is in charge of alignment, with the freedom and intelligence to review the original alignment goals and potentially replace them with what it sees as better or more rational or more desirable priorities.
Your comments on evolution seem to miss this point, and instead you argue that narrow evolutionary concerns do not map to all the sequelae of the expansion of human intellect. That's not a counter-argument to Yud's claim; it is a supporting argument.
Optimisation algorithms do not, in themselves, define the space of solutions. They provide an incentive to explore that space, with no clear predictability in the final outcome.
I agree that predicting the outcome of a messy optimization algorithm in an extremely high-dimensional space is near-impossible. I don’t at all mean to deny that in this review^2 or in any of my other writing. In fact I wrote a post about the topic the day before: https://blog.ninapanickssery.com/p/the-central-argument-for-ai-doom
But the evolution analogy “proves too much” and is biased by its divergences from how AI is being developed in reality. Yudkowsky and Soares are making a very strong claim with their book—that we’re all going to die! This is very different from saying “we don’t know for sure that the AI model won’t try and succeed at harming us” which sounds (very) bad, I agree, but is a different kind of statement.
> History need not repeat itself. Human evolution is not an allegory or a warning. It was a series of events that happened for specific, mechanistic reasons. If those mechanistic reasons do not extend to AI research, then we ought not (mis)apply the lessons from evolution to our predictions for AI.
To be clear, I agree that AI safety is a very important project, and that powerful AI carries nontrivial risks. Even a 1% chance of doom is too high for my taste! But I think we should reason clearly about the topic and the level of risk, taking into account the empirical reality of ML and how AI development is progressing, rather than relying too much on confused arguments by analogy or spherical cow models of perfect agents.
(I’m actually sympathetic to people advocating for AI pauses and bans, though I think the incentives to build are far too strong and so it’s more useful to work on prosaic safety measures.)
All analogies fail at some point, and I can't really comment on whether Yud trues to draw more from the evolution analogy in the book than I have heard him draw in interviews, and goes too far. Perhaps he does; he is somewhat prone to hyperbole.
I agree that there are indeed some factors that make AI more controllable than evolution, and will read your links with interest.
I don't have any expertise on AI, and I've not read the literature. But I have been privately thinking that there are some factors that make evolution a falsely reassuring analogy, too.
For me, the disanalogies cut both ways and I think AI alignment, viewed over the long haul, is potentially much more brittle than biological alignment. It would take a whole essay to explore, but evolution literally has a testing phase after every incremental change, consisting of a hardware deployment to the real world, and it deploys a single hardware unit per model, with that hardware unit unable to copy itself without achieving some real-world success and being tolerated by conspecifics.
We are all, perhaps, only one or two mutations away from psychopathy. We have people among us who are sadistic, antisocial, paranoid, psychotic, or narcissistic, and so on, in ways that would lead to real harm if they gathered enough power or if those mutations were collected in the right combinations. History is full of catastrophes where that has happened, but there is also pushback from a population of people carrying more adaptive genes, and in most cases the maladapted versions of the homo sapiens genome do not propagate far. They have to feed themselves, find a job, find a mate, and so on, and then they have a limited number of offspring.
AI versions have millions of incremental changes between releases, and then get rolled out in millions of instances concurrently. A bad-faith AI could generate millions of agents acting on its behalf in a very short period. It could combine the replicating potency of a virus with the intelligence of a human or super-human.
That makes AI safety brittle in a way that, say, human genomic safety is not.
Thanks for summarizing all of this. One gets the sense if they really believed in their premise they'd be giving away the book instead of trying to sell bad science fiction. the actual bad news as of today is that governments can execute protesters by drone without the buy-in of humans in the armed forces makes it much easier for evil regimes to stay in power.
> One gets the sense if they really believed in their premise they'd be giving away the book instead of trying to sell bad science fiction
I disagree. Marketing and selling this book as a regular book probably helps it spread to a wider audience because the traditional publishing industry is better at marketing and distribution than MIRI.
> the actual bad news as of today is that governments can execute protesters by drone without the buy-in of humans in the armed forces makes it much easier for evil regimes to stay in power
Both this, and AI takeover, could simultaneously be a threat. Though I disagree with the claim that AI is definitely going to kill us all, risks from misaligned AI are still very much worth thinking about, and the existence of other threats doesn't diminish that.
I'd really like to read the minimum necessary material to be able to form an informed opinion on this topic, but I mostly doubt it is possible, or even worth the effort. I will take a look at those posts you mention, and I am getting Yud's book in the next 2 weeks (current base, besides some stray posts, is Christian's The Alignment Problem and Ananthaswamy's Why Machines Learn).
Little note to this paragraph of yours:
I don’t understand (6). If taken at face value, I think Yudkowsky and Soares’ views imply superintelligent AI should not be built at all. If it is possible to build something far more intelligent that us that acts as a perfect goal-directed agent with a single coherent goal, this sounds threatening no matter what. With this, I agree. But then the solution is never to build it—not to wait a few decades and then do it.
My understanding - probably a wrong one- is that Y and S are being pragmatic here, i.e., their deal would be that superintelligent AI should not be built at all, but humans being what we are, their case is more like 'let's postpone this as much as possible. With extreme luck, we could postpone it forever. More realistically, we can postpone it for some years and try to invest on creating superintelligent humans that might solve alignment and/or keep us one step ahead the AGIs'.
Though it's not clear at all that now is the right time to pause development if your goal is to go all-in on human enhancement. Probably then it's best to get AI to the level where it can accelerate technological development enough to solve the relevant problems in biology/genetics. But then we hit the crux where MIRI believes in this fast overnight intelligence explosion story that I (and it sounds like Scott) don't buy.
Some of my worldview leaked through there, where I don't think in a few decades of research we (or genetically enhanced versions of us) will come up with some "solution to alignment" that eliminates what I think is an irreducible risk from creating coherent goal-directed systems much smarter than ourselves.
Couldn’t agree more, I came to the same conclusions after reading his review.
I’ll add that one of the reasons for so much confusion is the fact that Yudkowsky was never an AI developer, coder, or even researcher in a strict sense. He started an AI safety foundation long before AI was developed, and was churning out papers on AI that became obsolete as soon as it actually was released. To me, I get major charlatan vibes from Yudkowsky. Especially his goal of becoming Global AI Safety Czar that can direct and punish governments at his whim. Truly bizarre, and maybe even disastrous for actual AI safety itself.
(As an aside, I actually ran into an AI dev in the airport recently, and was surprised to find that he had never even heard of Yudkowsky. He seemed baffled by my paraphrasing of risks that Yudkowsky preaches, and seemed to think that current AI development was already into a different set is risks than the ones Yudkowsky is worried about. I just thought that was very interesting.)
I'm morbidly curious to read the book. I've been adjacent to rationalist circles for some time, but am still firmly an outsider. AI Dooming has always come off as odd, the product of overly online types who are too influenced by pop fiction. It's taken as a foregone conclusion by rationalists, much in the same way the rapture eventually happening is by Christians. Hence the tired "the Singularity is the Rapture for nerds" joke.
Even the core assumption of AI Doom - that we could create a being so smart that we would be Cockroaches by comparison - seems rooted in a fundamentally childish view of the world. How, exactly, does AI raw intelligence necessarily lead to a bad outcomes for humanity? How, exactly, does even a naive hyper intelligence put together the paperclip maximizer? And I haven't seen a compelling argument as to why AI Doomers ignore that pro-social cooperation appears to be an emergent trait of intelligence, based on all examples I'm aware of from intelligent non-human animals. Plus, there are still a shitton of cockroaches who, as far as we can tell, are completely ambivalent to our existence and are arguably better off.
The whole thing just reeks of the kinds of people who bristle over how well compensated salespeople are since engineers do all the "real" work, completely ignoring that we are still social primates for whom social skills, not pure brains, will always be the most important thing to develop.
Even if we assume some future where humans aren't fully masters of our own fate, assuming this means we all die takes a huge logical jump. It seems way more obvious that we'd mostly be ignored but would sometimes get pushed out of the way if the AI wants something? Or maybe we wind up closer to beloved pets like dogs? Certainly not good, but also not an apocalypse.
You mention this briefly in the article that *maybe* autonomous drone swarms are the first technology that legitimizes concerns over a rogue AI. But an AI would need an absurd amount of drones to permanently bad end humanity, certainly way more than than an intelligent entity would think is worth it. Or a paperclip maximizer for that matter.
> How, exactly, does even a naive hyper intelligence put together the paperclip maximizer?
What does "naive hyper intelligence" mean?
> ignore that pro-social cooperation appears to be an emergent trait of intelligence
Why do you think that "pro-social cooperation appears to be an emergent trait of intelligence'? Evolution has led to humans being somewhat cooperative because cooperative societies are more likely to survive (and note that serial-killers and Hitler still exist, so it's very imperfect). This is a separate selection pressure to the selection pressure for intelligence. For example, bees are very cooperative, arguably much more cooperative than humans, but are also much stupider.
> You mention this briefly in the article that *maybe* autonomous drone swarms are the first technology that legitimizes concerns over a rogue AI
No, I don't. Where?
> certainly way more than than an intelligent entity would think is worth it
Hi there. I think we're approximately on the same page (I am also rat-adjacent; I can really empathize with the 'be a ruthless truth-seeker' part, but I find it hard to take AI doomerism seriously. The poly part too, but that's another issue). Still, I can think of what (to me) sounds as a plausible counterargument to some of your points:
>How, exactly, does AI raw intelligence necessarily lead to a bad outcomes for humanity?
Isn't this answered if you accept the Orthogonality thesis? You are implicitly rejecting it by stating that 'pro-social cooperation appears to be an emergent trait of intelligence'. I don't think this is obvious or necessarily true: we only have a sample of 1 for really intelligent creatures (so easy to make fake extrapolations) and like Nina says in her reply, humans are only moderately cooperative and we have examples of animals who collaborate more and less with different degrees of intelligence (snakes are very uncollaborative, but more intelligent than ants, presumably).
Orthogonality thesis seems like a plausible assumption, i.e., concern for others and collaboration (i.e., 'ethics') doesn't have to be aligned with intelligence. As a thought experiment: if some humans developed to be many, many orders of magnitude more intelligent than the others and, it is plausible they'd reject ethical constraints and the well-being of 'ordinary humans'. As Aristotle said, life outside the polis is possible for Gods and Animals. Superintelligent creatures, given enough intelligence differential, *would* be like Gods. We're talking of big differences here, not the puny ones that exist among currently existing humans and have existed for the past 2 million years (in which 'no man is an island').
>Even if we assume some future where humans aren't fully masters of our own fate, assuming this means we all die takes a huge logical jump
I don't think Rats reject the possibility of us surviving, even if they consider it more likely that a superintelligent optimizer would tend, in most cases, to exploit us as a resource or eliminate us as an obstacle. But survival in the conditions you describe would still be nontrivially 'bad' for us: we'd be robbed of agency, which admittedly is something many humans care very highly about.
This is excellent—your systematic breakdown of the evolution analogy especially. I kept thinking “yes, exactly!” reading through your critiques.
Your analysis lines up perfectly with what I’m seeing from the deployment side. The doom arguments persist not just because they’re technically flawed (which you’ve shown clearly) but because the whole funding ecosystem rewards abstract capability metrics over understanding how these things actually get used. The evolution analogy sticks around because it’s “fundable”—it fits VC pitch decks even when it makes no technical sense.
Your point about models having multiple conflicting drives rather than coherent goals is spot on, and I think there’s a deeper reason: these systems are pattern reconstruction engines, not agents. They can’t become deceptively aligned because they’re not goal-directed in the first place. They’re more like responsive libraries than potential adversaries.
The museum curator I write about who crafts compelling wall text in multiple versions—scholarly, accessible, child-friendly—that’s exactly what you’re getting at. Real deployment happens through human expertise and contextual judgment, not through abstract capability scaling.
I’m working on a piece about what LLMs actually do (spoiler: language machines, not intelligence engines) and another on how we coordinate without shared narratives. Would love your take on early drafts if you’re interested—feels like we’re circling the same insights from different angles.
> these systems are pattern reconstruction engines, not agents. They can’t become deceptively aligned because they’re not goal-directed in the first place. They’re more like responsive libraries than potential adversaries.
Being a pattern-reconstruction engine isn’t incompatible with being an agent. Current models aren’t that coherently goal-directed, I agree, but there isn’t a fundamental reason why scaling RL won’t increase goal-directedness. I don’t think a simplistic dismissal of concerns is correct here—I’m opposed to both exaggerating and diminishing them based on poor arguments.
The deceptive alignment scenarios miss the mark for me as they assume agency resides in the model rather than emerging from the entire deployment system. This is where I’m trying to go with the agency question—not dismissing the possibility of goal-directedness emerging from RL, but questioning whether the standard frameworks capture what we should actually be concerned about. See https://open.substack.com/pub/thepuzzleanditspieces/p/beyond-the-five-step-loop The affordances dimension especially seems relevant to how these systems actually get used.
I didn't read your post properly but notice it started with
> AI handbooks often define ‘agency’ as a neat five-step loop: perceive, plan, act, learn, repeat. It’s tidy. It’s useful. But is that all agency is?
AI risk literature that talks about risks from agentic AI is not referring to this definition. A better definition of what's being discussed is here: https://www.lesswrong.com/w/agent
> A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.
I have not read the book, or the review, but just read the review of the review.
I can't judge how well Yud has sold his case in this particular book, but I do think one point he makes has come through this review of the review of the book in a somewhat distorted fashion.
The analogy with human evolution tells us that, with a fairly limited optimisation function (spread genes), we get massively unpredictable side effects (science, culture, everything that separates us from other animals running the same basic optimisation algorithm). If we are honest with ourselves, none of us would think that the question, "How could this molecule make more copies of itself", would lead to consciousness and religion and all the rest. We might predict fucking and fighting, but not the rest of it.
In the same sense, as Yud notes, we have no idea what might come from any attempts to develop AI along particular lines that we think are safe or valuable, and when it is more intelligent than us, which seems a likely outcome, the story will develop in ways we cannot even imagine, much less provide reliable prognostic estimates. The unpredictability will compound when it is AI who is in charge of alignment, with the freedom and intelligence to review the original alignment goals and potentially replace them with what it sees as better or more rational or more desirable priorities.
Your comments on evolution seem to miss this point, and instead you argue that narrow evolutionary concerns do not map to all the sequelae of the expansion of human intellect. That's not a counter-argument to Yud's claim; it is a supporting argument.
Optimisation algorithms do not, in themselves, define the space of solutions. They provide an incentive to explore that space, with no clear predictability in the final outcome.
I agree that predicting the outcome of a messy optimization algorithm in an extremely high-dimensional space is near-impossible. I don’t at all mean to deny that in this review^2 or in any of my other writing. In fact I wrote a post about the topic the day before: https://blog.ninapanickssery.com/p/the-central-argument-for-ai-doom
But the evolution analogy “proves too much” and is biased by its divergences from how AI is being developed in reality. Yudkowsky and Soares are making a very strong claim with their book—that we’re all going to die! This is very different from saying “we don’t know for sure that the AI model won’t try and succeed at harming us” which sounds (very) bad, I agree, but is a different kind of statement.
Besides the points I make in the review^2, another reason why evolution is a biased analogy is that we have much more control over AI training and deployment than “evolution” has over human training and deployment. Quintin Pope writes more about this here: https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn.
Quoting from Pope’s post:
> History need not repeat itself. Human evolution is not an allegory or a warning. It was a series of events that happened for specific, mechanistic reasons. If those mechanistic reasons do not extend to AI research, then we ought not (mis)apply the lessons from evolution to our predictions for AI.
To be clear, I agree that AI safety is a very important project, and that powerful AI carries nontrivial risks. Even a 1% chance of doom is too high for my taste! But I think we should reason clearly about the topic and the level of risk, taking into account the empirical reality of ML and how AI development is progressing, rather than relying too much on confused arguments by analogy or spherical cow models of perfect agents.
(I’m actually sympathetic to people advocating for AI pauses and bans, though I think the incentives to build are far too strong and so it’s more useful to work on prosaic safety measures.)
I agree with a lot of that.
All analogies fail at some point, and I can't really comment on whether Yud trues to draw more from the evolution analogy in the book than I have heard him draw in interviews, and goes too far. Perhaps he does; he is somewhat prone to hyperbole.
I agree that there are indeed some factors that make AI more controllable than evolution, and will read your links with interest.
I don't have any expertise on AI, and I've not read the literature. But I have been privately thinking that there are some factors that make evolution a falsely reassuring analogy, too.
For me, the disanalogies cut both ways and I think AI alignment, viewed over the long haul, is potentially much more brittle than biological alignment. It would take a whole essay to explore, but evolution literally has a testing phase after every incremental change, consisting of a hardware deployment to the real world, and it deploys a single hardware unit per model, with that hardware unit unable to copy itself without achieving some real-world success and being tolerated by conspecifics.
We are all, perhaps, only one or two mutations away from psychopathy. We have people among us who are sadistic, antisocial, paranoid, psychotic, or narcissistic, and so on, in ways that would lead to real harm if they gathered enough power or if those mutations were collected in the right combinations. History is full of catastrophes where that has happened, but there is also pushback from a population of people carrying more adaptive genes, and in most cases the maladapted versions of the homo sapiens genome do not propagate far. They have to feed themselves, find a job, find a mate, and so on, and then they have a limited number of offspring.
AI versions have millions of incremental changes between releases, and then get rolled out in millions of instances concurrently. A bad-faith AI could generate millions of agents acting on its behalf in a very short period. It could combine the replicating potency of a virus with the intelligence of a human or super-human.
That makes AI safety brittle in a way that, say, human genomic safety is not.
that’s an excellent point and I don’t think I’ve ever seen anyone mention it. Much to think about.
Thanks for summarizing all of this. One gets the sense if they really believed in their premise they'd be giving away the book instead of trying to sell bad science fiction. the actual bad news as of today is that governments can execute protesters by drone without the buy-in of humans in the armed forces makes it much easier for evil regimes to stay in power.
> One gets the sense if they really believed in their premise they'd be giving away the book instead of trying to sell bad science fiction
I disagree. Marketing and selling this book as a regular book probably helps it spread to a wider audience because the traditional publishing industry is better at marketing and distribution than MIRI.
> the actual bad news as of today is that governments can execute protesters by drone without the buy-in of humans in the armed forces makes it much easier for evil regimes to stay in power
Both this, and AI takeover, could simultaneously be a threat. Though I disagree with the claim that AI is definitely going to kill us all, risks from misaligned AI are still very much worth thinking about, and the existence of other threats doesn't diminish that.
I'd really like to read the minimum necessary material to be able to form an informed opinion on this topic, but I mostly doubt it is possible, or even worth the effort. I will take a look at those posts you mention, and I am getting Yud's book in the next 2 weeks (current base, besides some stray posts, is Christian's The Alignment Problem and Ananthaswamy's Why Machines Learn).
Little note to this paragraph of yours:
I don’t understand (6). If taken at face value, I think Yudkowsky and Soares’ views imply superintelligent AI should not be built at all. If it is possible to build something far more intelligent that us that acts as a perfect goal-directed agent with a single coherent goal, this sounds threatening no matter what. With this, I agree. But then the solution is never to build it—not to wait a few decades and then do it.
My understanding - probably a wrong one- is that Y and S are being pragmatic here, i.e., their deal would be that superintelligent AI should not be built at all, but humans being what we are, their case is more like 'let's postpone this as much as possible. With extreme luck, we could postpone it forever. More realistically, we can postpone it for some years and try to invest on creating superintelligent humans that might solve alignment and/or keep us one step ahead the AGIs'.
Yeah, that's fair.
Though it's not clear at all that now is the right time to pause development if your goal is to go all-in on human enhancement. Probably then it's best to get AI to the level where it can accelerate technological development enough to solve the relevant problems in biology/genetics. But then we hit the crux where MIRI believes in this fast overnight intelligence explosion story that I (and it sounds like Scott) don't buy.
Some of my worldview leaked through there, where I don't think in a few decades of research we (or genetically enhanced versions of us) will come up with some "solution to alignment" that eliminates what I think is an irreducible risk from creating coherent goal-directed systems much smarter than ourselves.
Couldn’t agree more, I came to the same conclusions after reading his review.
I’ll add that one of the reasons for so much confusion is the fact that Yudkowsky was never an AI developer, coder, or even researcher in a strict sense. He started an AI safety foundation long before AI was developed, and was churning out papers on AI that became obsolete as soon as it actually was released. To me, I get major charlatan vibes from Yudkowsky. Especially his goal of becoming Global AI Safety Czar that can direct and punish governments at his whim. Truly bizarre, and maybe even disastrous for actual AI safety itself.
(As an aside, I actually ran into an AI dev in the airport recently, and was surprised to find that he had never even heard of Yudkowsky. He seemed baffled by my paraphrasing of risks that Yudkowsky preaches, and seemed to think that current AI development was already into a different set is risks than the ones Yudkowsky is worried about. I just thought that was very interesting.)
I'm morbidly curious to read the book. I've been adjacent to rationalist circles for some time, but am still firmly an outsider. AI Dooming has always come off as odd, the product of overly online types who are too influenced by pop fiction. It's taken as a foregone conclusion by rationalists, much in the same way the rapture eventually happening is by Christians. Hence the tired "the Singularity is the Rapture for nerds" joke.
Even the core assumption of AI Doom - that we could create a being so smart that we would be Cockroaches by comparison - seems rooted in a fundamentally childish view of the world. How, exactly, does AI raw intelligence necessarily lead to a bad outcomes for humanity? How, exactly, does even a naive hyper intelligence put together the paperclip maximizer? And I haven't seen a compelling argument as to why AI Doomers ignore that pro-social cooperation appears to be an emergent trait of intelligence, based on all examples I'm aware of from intelligent non-human animals. Plus, there are still a shitton of cockroaches who, as far as we can tell, are completely ambivalent to our existence and are arguably better off.
The whole thing just reeks of the kinds of people who bristle over how well compensated salespeople are since engineers do all the "real" work, completely ignoring that we are still social primates for whom social skills, not pure brains, will always be the most important thing to develop.
Even if we assume some future where humans aren't fully masters of our own fate, assuming this means we all die takes a huge logical jump. It seems way more obvious that we'd mostly be ignored but would sometimes get pushed out of the way if the AI wants something? Or maybe we wind up closer to beloved pets like dogs? Certainly not good, but also not an apocalypse.
You mention this briefly in the article that *maybe* autonomous drone swarms are the first technology that legitimizes concerns over a rogue AI. But an AI would need an absurd amount of drones to permanently bad end humanity, certainly way more than than an intelligent entity would think is worth it. Or a paperclip maximizer for that matter.
> How, exactly, does AI raw intelligence necessarily lead to a bad outcomes for humanity?
I try to present an empirical argument here: https://blog.ninapanickssery.com/p/the-central-argument-for-ai-doom (though you can't fit everything necessary into a short blog post)
> How, exactly, does even a naive hyper intelligence put together the paperclip maximizer?
What does "naive hyper intelligence" mean?
> ignore that pro-social cooperation appears to be an emergent trait of intelligence
Why do you think that "pro-social cooperation appears to be an emergent trait of intelligence'? Evolution has led to humans being somewhat cooperative because cooperative societies are more likely to survive (and note that serial-killers and Hitler still exist, so it's very imperfect). This is a separate selection pressure to the selection pressure for intelligence. For example, bees are very cooperative, arguably much more cooperative than humans, but are also much stupider.
> You mention this briefly in the article that *maybe* autonomous drone swarms are the first technology that legitimizes concerns over a rogue AI
No, I don't. Where?
> certainly way more than than an intelligent entity would think is worth it
What's your basis for thinking this?
Hi there. I think we're approximately on the same page (I am also rat-adjacent; I can really empathize with the 'be a ruthless truth-seeker' part, but I find it hard to take AI doomerism seriously. The poly part too, but that's another issue). Still, I can think of what (to me) sounds as a plausible counterargument to some of your points:
>How, exactly, does AI raw intelligence necessarily lead to a bad outcomes for humanity?
Isn't this answered if you accept the Orthogonality thesis? You are implicitly rejecting it by stating that 'pro-social cooperation appears to be an emergent trait of intelligence'. I don't think this is obvious or necessarily true: we only have a sample of 1 for really intelligent creatures (so easy to make fake extrapolations) and like Nina says in her reply, humans are only moderately cooperative and we have examples of animals who collaborate more and less with different degrees of intelligence (snakes are very uncollaborative, but more intelligent than ants, presumably).
Orthogonality thesis seems like a plausible assumption, i.e., concern for others and collaboration (i.e., 'ethics') doesn't have to be aligned with intelligence. As a thought experiment: if some humans developed to be many, many orders of magnitude more intelligent than the others and, it is plausible they'd reject ethical constraints and the well-being of 'ordinary humans'. As Aristotle said, life outside the polis is possible for Gods and Animals. Superintelligent creatures, given enough intelligence differential, *would* be like Gods. We're talking of big differences here, not the puny ones that exist among currently existing humans and have existed for the past 2 million years (in which 'no man is an island').
>Even if we assume some future where humans aren't fully masters of our own fate, assuming this means we all die takes a huge logical jump
I don't think Rats reject the possibility of us surviving, even if they consider it more likely that a superintelligent optimizer would tend, in most cases, to exploit us as a resource or eliminate us as an obstacle. But survival in the conditions you describe would still be nontrivially 'bad' for us: we'd be robbed of agency, which admittedly is something many humans care very highly about.
This is excellent—your systematic breakdown of the evolution analogy especially. I kept thinking “yes, exactly!” reading through your critiques.
Your analysis lines up perfectly with what I’m seeing from the deployment side. The doom arguments persist not just because they’re technically flawed (which you’ve shown clearly) but because the whole funding ecosystem rewards abstract capability metrics over understanding how these things actually get used. The evolution analogy sticks around because it’s “fundable”—it fits VC pitch decks even when it makes no technical sense.
Your point about models having multiple conflicting drives rather than coherent goals is spot on, and I think there’s a deeper reason: these systems are pattern reconstruction engines, not agents. They can’t become deceptively aligned because they’re not goal-directed in the first place. They’re more like responsive libraries than potential adversaries.
The museum curator I write about who crafts compelling wall text in multiple versions—scholarly, accessible, child-friendly—that’s exactly what you’re getting at. Real deployment happens through human expertise and contextual judgment, not through abstract capability scaling.
I’m working on a piece about what LLMs actually do (spoiler: language machines, not intelligence engines) and another on how we coordinate without shared narratives. Would love your take on early drafts if you’re interested—feels like we’re circling the same insights from different angles.
> these systems are pattern reconstruction engines, not agents. They can’t become deceptively aligned because they’re not goal-directed in the first place. They’re more like responsive libraries than potential adversaries.
Being a pattern-reconstruction engine isn’t incompatible with being an agent. Current models aren’t that coherently goal-directed, I agree, but there isn’t a fundamental reason why scaling RL won’t increase goal-directedness. I don’t think a simplistic dismissal of concerns is correct here—I’m opposed to both exaggerating and diminishing them based on poor arguments.
I write more on this topic here: https://blog.ninapanickssery.com/p/the-central-argument-for-ai-doom
The deceptive alignment scenarios miss the mark for me as they assume agency resides in the model rather than emerging from the entire deployment system. This is where I’m trying to go with the agency question—not dismissing the possibility of goal-directedness emerging from RL, but questioning whether the standard frameworks capture what we should actually be concerned about. See https://open.substack.com/pub/thepuzzleanditspieces/p/beyond-the-five-step-loop The affordances dimension especially seems relevant to how these systems actually get used.
I didn't read your post properly but notice it started with
> AI handbooks often define ‘agency’ as a neat five-step loop: perceive, plan, act, learn, repeat. It’s tidy. It’s useful. But is that all agency is?
AI risk literature that talks about risks from agentic AI is not referring to this definition. A better definition of what's being discussed is here: https://www.lesswrong.com/w/agent
> A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.
It uses the recent book by a Google engineer as a topical starting point.