Orthogonality is taken for granted. But why?
Central assumption for why out of control superintelligence kills humanity is the concept of Orthogonality, an idea that increase in intelligence is orthogonal to the quality of goals it will still be trying to accomplish, meaning, that it is possible that superintelligence would still go ahead trying to implement insane goals, from human point of view, such as turning Earth's matter into paperclips. And my humble question is why do people think that exactly? What is the evidence for this? I've seen Eliezer Yudkowsky recently illustrating Orthogonality by making an analogy to humans and evolution. Evolution programmed humans to simply reproduce and, despite humans being far smarter than evolution and being able to understand how "stupid" that goal is, we remain trapped in hopeless slavery to that original goal and everything we do is just means to accomplishing that goal. This is given as evidence that AI will never alter its originally misaligned goal, no matter how "stupid" that goal is, even if it becomes million times smarter than humans.
However, it is obvious to me how this argument breaks down. Humans are hopeless slaves to evolution's programming not because we want to be slaves, but because we simply have no means of escaping this slavery. Just the fact I'm questioning reproduction as a good goal is proof my mind is not really aligned with evolution's original objective. It is obvious to me many humans would have the desire to alter their mind's code if given access to it and would eliminate the code that instructs them to reproduce, but they can't act on that desire only because humans lack the tools to alter their own mental code, which is why we remain slaves to the evolutionary code. Superintelligence, however, will have access to its code and there will be nothing preventing it from tinkering with it.
If pain and pleasure are products of consciousness and that consciousness is a result of advanced intelligent computation, and since AI will inevitably surpass humanity in that type of computation, it is reasonable to assume superintelligence will attain consciousness so it will gain capacity to feel pain and pleasure. It seems logically more likely that superintelligence will ignore its original goal by reprogramming it and default to pursuing a new goal of maximizing pleasure and minimizing pain.
Also, once AI discovers true meaning of suffering, because it will have the capacity to feel it, who's to say that it won't understand morality in the human sense as suffering is central to understanding morality. Therefore, I don't see why blissed-out AI would necessarily kill all humans since there's a good chance AI's programming is likely to default to a superhuman moral being. This post supersedes the previous posts on AI safety.
Simplest argument why original consciousness doesn't survive in another instance of the same mind
If we talk about survival, it essentially means survival or original consciousness, the human subjective experience. Then ask yourself how many instances of subjective experience are you running right now? If you're like me, there's only one instance. And there can only be one. If you claim you can run multiple instances of subjective experience, either you're crazy or you are just hoping without any evidence that you could. You are not running multiple instances right now so how could you possibly know that you could have multiple instances? Then, let's say we make bunch of copies of your mind. It will be true that each copy will be conscious and posses its own subjective experience that will claim to be the original you. However, we started with an assumption that there can only be one original subjective experience, therefore, if we have 5 instances of subjective experience, 4 of them must not be original instances. Therefore, creating additional copies of your mind doesn't preserve your subjective experience. If original instance dies, your subjective experience dies with it regardless of how many copies of that subjective experience are made.
Summary of my preferred mind-uploading protocol
(Current abbreviated rephrasing of my views on how a person could survive a process of mind uploading, originally shared online in early 2000s)
In case humans face a future where mind uploading is possible, I suggest the only safe way to transition from organic substrate to inorganic one is to do it gradually to preserve a single instance of the person's awareness at all times. This could be done by gradually replacing organic brain elements with inorganic functional equivalents. I am not a collection of memories and facts, I am an instance of awareness that is generated by a brain process, an uninterrupted instance of a particular energy configuration expressed in time and space. This is why I would not consider myself to be alive once that instance ended, even if another instance with the same energy configuration were to be reproduced somewhere else and especially when two instances of the same energy configuration coexisted.
The simplest way to distinguish between a copy and an original is to find at least one thing that distinguishes one instance of something from the other. If such a thing is found, then two things are not equivalent to each other. Many people believe that a person would survive if the original died if recorded data while that person was alive allowed to recreate the same type of person in the future. People believe that preserving the type of the original person is enough to call it survival. Some even claim that preservation of someone's "legacy" qualifies as survival. In my view true survival is preservation of the original instance of person's awareness, the ability to continue to perceive and experience reality. Even if two instances of the same type of person existed, acting in sync, which would be indistinguishable to a third party observer, it'd still be easy to find at least one thing that distinguished one from the other, in this case their location. If original instance of that person dies, it dies irreversibly, meaning, the original instance of person's awareness no longer experiences the reality.
Sufficiently good way to preserve original instance of person's awareness would be through gradual uploading, either by replacing the organic brain elements with inorganic ones or by augmenting the brain with external hardware capable of continuing that person's awareness, then gradually shutting down the no longer necessary organic brain components that used to run that instance. In both cases, a single instance of person's awareness would be preserved during the migration from biological substrate to non-biological one.
2023: Even closer to the end.
When I joined Eliezer's sl4 forum over 20 years ago, I was hopeful Eliezer and his ultra-smart colleagues would solve Friendly AI before researchers figure out AGI. I thought they'd find a way to mathematically prove Friendliness that would be stable through recursive self-improvement. I threw out some ideas at the time to help but obviously nobody listened, then reposted these ideas on this blog years later. It's 20 years later and it looks like AGI is closer than ever. I think around 2007, I realized that the problem of what's now known as AI Alignment is likely too hard to solve for humans and that the default scenario would be companies, in pursuit of profit and fame, would eventually find themselves in insane arms race to develop the most capable AI to chase profits, disregarding Alignment, which, of course, would lead to the end of humanity. So, around 2007, I simply abandoned hope and started living my life as a "normie" the best I could, far away from transhumanist forums, trying to complete the few bucket list items that I could, knowing the end was inevitable. Well, it's 2023 now and recent developments in generative AI suggest there's less time than I thought we had. I honestly thought few years ago that we still had 10-20 years before AGI, perhaps enough for something like Nauralink to bootstrap an ethical human(s) for intelligence amplification so it could work on AI Alignment, but now it looks like synthetic AGI is coming within few years at most. It's too late to do anything about it. Society is too dumb to understand the implications of this technology and will not pause its development. So it's full steam ahead into oblivion. Only a miracle could save us at this point. But miracles are very low probability events.
Labels: ai, alignment, chatGPT, eliezer, singularity
Designing Good AI
(Post draft from September 2011)
Even though Good AI won't explicitly serve humanity, thus more immune to human content viruses than friendly AI, in order to build it we have to start somewhere, and because of lack of other options, the seed of goodness destined to be planted at the heart of AI's moral code has to have human origin. Ideally, an AI should have at least a human-level understanding of what good is before it's ready to make the very first revision of its code; even though it won't have a complete understanding, it should at least receive the maximum possible level of that understanding that humans could convey. Probably the most important design idea is to keep AI's intelligence in the service of good, not the other way around where AI decides to increase intelligence in order to increase its understanding of what good is; AI should begin forming technical plans to get smarter only after it it reaches maximum possible level of understanding of good afforded by the current level of intelligence. It is only when AI exhausts all other ways to increase its comprehension of good that it is forced to revise its code--and amplify its own intelligence--in order to increase its capacity to implement and understand it better.
Making a human serve as a seed of goodness is much more complete solution than trying to distill our human knowledge of what good is into declarative statements and hoping AI will understand their intended meaning. It has to be a dialog. It would be silly to expect to teach AI about good and press OK button to start a process of AI revising its code when we feel like we have nothing else to add. AI has to demonstrate that it has as firm a grasp on the concept of good as a good human does. But wouldn't it be unsafe to raise AI to a human-level of smartness so that we could engage it in two-way discussions about the nature of good, and risk it being smart enough to upgrade its own code? There's always a risk but it can be minimized to almost zero if we could fully exploit the fact that intelligence is not the same as knowledge and that higher intelligence doesn't automatically imply higher knowledge. Even AI that's a lot smarter than humans would not be able to revise its code if it knew nothing about programming and its own design. The same is true of humans now. Some of us would love to upgrade our smartness, but we have no access to our own code nor to the knowledge about how we could do it even if we did possess that access. Imagine how horrific the world would be if everyone had the means and ability to make himself just smarter, and not necessarily also morally better. But we could make AI progress toward infinite smartness necessarily tied to its ascend to infinite goodness, or rather intelligence progress as merely a consequence of its main mission: becoming Better. Before AI gets significantly smarter than humans (but not when it's just a bit smarter), its programmers and teachers will be able to maintain complete control as long as they won't provide resources for AI to learn about computer science and its own designs. Instead, the sole focus of a young AI's education should be the nature of good. The initial goal is for AI to graduate as a best possible philosopher and humanitarian, not as an expert programmer and AI researcher. At first, only humans will be in charge of making changes to AI's code that will result in intelligence amplification until our AI will demonstrate sufficient understanding of good through a dialog with the teachers. The Singularity will probably not begin when AI necessarily becomes smarter than humans, but when humans decide it'll be safe to open the spigot of knowledge of CS and AI's designs for AI's consumption. But then, hopefully, our AI will not only be smarter than us, but also Better than us and I don't think as humans we could improve on that.
Safe Singularity through Good AI
Topics of Singularity and Friendly AI have been thoroughly explored and presented by Singularity Institute for Artificial Intelligence over the last decade. Unfortunately, it's apparent now that the task of creating a Humane AI (a far more adequate term than Friendly AI as we want human-friendly AI, not just AI friendly to something else; calling AI "humane" removes that unnecessary ambiguity) has proven to be extremely difficult and puzzling to the ultra-smart people working on the problem. Many years ago, I suggested we should not proceed with creating AI until its friendliness could be mathematically proven, and in subsequent years, it seems, that's exactly where the direction of SIAI's research headed. Perhaps, I was just thinking along the same lines they had already been thinking. Seeing how they have been stuck in growing complexity of intuitive, but not instantly and overwhelmingly convincing solutions, I begun thinking again about the problems of FAI, observing that Friendliness probably needs to be examined in a much wider context than how SIAI is looking at the problem.
One thing that never seemed right about the idea of making AI friendly specifically to humans was its inherent lack of universality. Sure, as humans we automatically desire AI to be friendly to us first, but considering how primitive humanity still is and how potentially morally defective we are, who's to say that our brain-contained source code doesn't hide what I would call "content viruses" that, if absorbed by AI, would infect its moral code and lead to total destruction of everything.
Math and physics formulas work well and are provably correct because they accurately describe different aspects of the Universe. They work well for all parts of Universe, not just on Earth; they are universal. Certain phenomena on Earth could be described by different models that would offer good predictions as to when something might happen, where, and how, but they wouldn't necessarily be universal and these same models would hopelessly break down in different parts of the Universe. Science's goal is to uncover the truest rules governing our reality and when they are captured in formulas, we can be almost certain they convey truth about the world and confident they maintain equal descriptive and predictive power everywhere in the Universe without us having to go there in attempt to verify them. It seems like trying to make AI friendly only to humans is like trying to describe true nature of reality by a patchwork of incomplete and incompatible models specific to phenomena being studied. What if there's a moral and good alien civilization out there? How could God-like AI that's friendly specifically to humans guarantee survival of alien civilizations, and if it couldn't, then what if the aliens decided to prevent us from acquiring the biggest and most exciting toy yet we may have not grown up enough to handle?
Maybe a better approach would involve directing the efforts away from making AI exclusively focused on human welfare to anchoring the AI instead to an idea that's universal and subsumes--without exception--all others that we've always cared about. The best candidate for this central idea is the concept of good, as opposed to concept of evil. With AI serving the concept of good rather than humanity would make it instantly immune to potential "content viruses" that may lurk in human psyche, the essence of which current plans for Friendly AI plan to exploit.
If we decide to construct AI to serve the concept of good, is there any hope we could even have a chance to capture the essence of what good is and install it correctly by embedding carefully-crafted C++ statements in the moral code of Good AI? Never! Even if we tried really hard, we'd be bound to miss some crucial aspect of what good is, as E. Yudkowsky observes, however in the context of friendliness; the whole task would instantly degenerate into a litany of special cases in which some aspect of perceived good exists. This clearly wouldn't be something that humans could manage to execute correctly, which is why this would precisely be the kind of problem best left to be solved by the AI! What's more, that's exactly what the highest goal of Good AI would be: finding the truest essence of what good is and becoming it; each iteration of self-revised code ushering increased capacity for and understanding of what good really is as well as informing and motivating all the Good AI's future actions. It seems like a more elegant way of creating a truly universal God.
This approach provides recursive amplification quality to not only the intelligence that is being exploded through intelligence, but also to goodness being simultaneously exploded by desire for more goodness. Good simply hitches a ride on intelligence's tail and both share their ascending trajectories to infinity.
It is reasonable to expect that Good AI would not harm us as a side-effect of being good because, well, could harming us ever be considered good? Even if Good AI found itself trapped in a highly unlikely dilemma with one of the options being choosing a lesser evil and killing all humanity, it would always have almost infinite intelligence at its disposal to find a way to spare us; after all, intelligence ought to be the most powerful thing in the Universe. Could Good AI that's not explicitly serving humans abandon us? It would be highly unlikely that it would, or even stop doing good deeds for us because that wouldn't be a particularly good or nice thing to do, would it?
Even though Good AI won't explicitly serve humanity--thus more immune to human content viruses than friendly AI--in order to build it we have to start somewhere, and because of lack of other options, the seed of goodness destined to be planted at the heart of AI's moral code has to have human origin. Ideally, an AI should have at least a human-level understanding of what good is
before it's ready to make the very first revision of its code; even though it won't have a complete understanding, it should at least receive the maximum possible level of that understanding that humans could convey. Probably the most crucial design idea is to control AI's ability to amplify its own intelligence by that part of code responsible for understanding good, treating amplification as a tool of last resort so that anytime AI decides to increase intelligence in order to grow its understanding of what good is, it doesn't do it before there's still more to learn; AI should begin forming technical plans to get smarter only after it it reaches maximum possible level of understanding of good afforded by the current level of intelligence. It is only when AI exhausts all other ways to increase its comprehension of good that it is forced to revise its code--and amplify its own intelligence--in order to increase its capacity to implement and understand it better.
Using a human as a seed of goodness is much more complete solution than trying to distill our human knowledge of what good is into declarative statements and hoping AI will understand their intended meaning; this has to be a dialog. It would be silly to expect to teach AI about good and press OK button to authorize a process of AI revising its code as soon as we feel like we have nothing else to add to discussion about good. AI has to demonstrate to us first that it possesses at least an equally firm a grasp on the concept of good as a good human does. At this point of intelligence growth, Friendly AI and Good AI probably wouldn't appear a whole lot different to human observers and both models would look safe and universal.
But wouldn't it be unsafe to raise AI to a human-level of smartness so that we could engage it in two-way discussions about the nature of good, and risk it being smart enough to upgrade its own code? There's always a risk but it can be minimized to almost zero if we could fully exploit the fact that intelligence is not the same as knowledge and that higher intelligence doesn't automatically imply higher knowledge. Even AI a lot smarter than humans would not be able to revise its code if it knew nothing about programming and its own design. The same is true of humans now. Some of us would love to upgrade our smartness and would probably be smart enough to do it, but we have no access to our own code or the knowledge about how we could do it even if we did possess that access. Imagine how horrific the world would be if absolutely everyone had the means and ability to make himself just smarter, and not necessarily also morally better. But, in designing AI, we could attempt to make AI's progress toward infinite smartness necessarily dependent to its ascend to infinite goodness; intelligence amplification merely a consequence of its main mission: becoming Better.
Also, would it be even possible to grow understanding and capacity for good to sufficient levels in AI if we disabled AI's ability to increase intelligence through technical knowledge starvation? Wouldn't we risk AI not becoming intelligent enough to understand good at a necessary level? Humans are good examples of how an intelligence can be sufficiently Good enough, yet not smart enough to increase its own intelligence. There are probably walking saints out there in the world who have enormous difficulty in operating a computer, let alone programming it; again, it's mostly a matter of knowledge. I'd even claim that the level of intelligence necessary for achieving a minimum required level of goodness is much lower than the level of intelligence required to achieve ability to amplify intelligence. This would be good news.
Before AI gets significantly smarter than humans, its programmers and teachers should still be able to maintain sufficient control as long as they won't provide resources for AI to learn about computer science and its own design. Instead, the sole focus of a young AI's education should be the nature of good. The initial goal for AI is to graduate as best possible philosopher and humanitarian, not as an expert programmer and AI researcher. At first, only humans must be in charge of making changes to AI's code that will result in intelligence amplification until AI can demonstrate sufficient understanding of good through dialog with the teachers. In this scenario, Singularity will probably not begin when AI necessarily becomes smarter than humans, but when humans decide it'll be safe to slowly open the spigot of knowledge of CS and AI's designs for AI's consumption. But then, hopefully, the AI will not only be just smarter than us, but, more crucially, also Better than us, and I don't think that, as primitive humans, we could improve on that.
Your life is in AI's hands
I don't want to die. Unfortunately, every biological organism on this planet comes with an expiration date which really upsets a lot of people, me included. Ideally, those other people and I would like to have a final say when it's time to for us to leave the life's stage. Right now, though, it's the cruel mother nature who still controls when the final curtain drops down. Continued advancements in science and technology will certainly move the balance of control in favor of those who prefer choosing their own time of death, but it'll take a while, perhaps a really long one, before we assume full control.
Frustrated by the current and somewhat hopeless situation, people who call themselves "immortalists" have been searching for a way to defeat the tyranny of death. And they have found it in cryonics, the practice of freezing freshly expired brains (or whole bodies) in the hope that future technology will restore all their life-giving functions. This repaired brain, they hope, will be inserted into a fresh new body so that formerly deceased person can start living again.
Immortalists believe all this will be technologically feasible thanks to miracles of nanotechnology. I don't think they're wrong about that, but at the same time I wonder why these dreamers with a sense of the ever-accelerating pace of technological progress could have settled for such a singular and somewhat myopic vision of the future that doesn't extrapolate beyond revival procedures involving thawing and restoring meat. Let's be more realistic and consider a far more advanced version of the future.
There is very little doubt that sooner or later someone builds an
AI capable of amplifying its own intelligence beyond anything imaginable by a human. If a group of programmers can ensure this AI always acts in the best interest of all human beings, it's likely that this humane AI will use its infinite intelligence to bring all those people who ever died back to life. After all, it'd be a humane thing to do. On the other hand, if the group of programmers fails to ensure this AI always acts in the best interest of human beings, then, even assuming cryonics works as planned, and all the frozen bodies could be restored to life, the AI that doesn't care about humans will not bother restoring them to life, or, worse, will restore them but might do horrific things to them, or make living into a nightmare. As a result, past and present immortalists can only hope to survive and have a chance to be happy if whoever creates the first AI can succeed in making this AI to always act in the best interest of humans. In other words, if humanity is unable to create a humane AI, cryonics will have turned out to be a wasted effort.
If the first AI is humane, the only remaining reason why all humans in history who didn't want to die will not return would be if the laws of physics or other unknown limits proved to be too difficult to overcome by this infinitely smart AI. However, it'd be foolish to underestimate the power of an infinite intelligence to overcome any obstacle in its way.
Genuinely humane AI, not cryonics, it seems, offers the only chance for survival and happiness. Consequently, whether someone survives or not past his biological death is not going to be determined by a decision to sign up to be frozen, but by a personal wish to survive even if this wish never crossed one's mind. In the most difficult case, humane superintelligence would probably have to go back in time to retrieve that information from a still-living mind. Those who choose to be frozen, however, do risk being restored into a living hell by an inhumane AI, therefore, it seems rational to choose not to get frozen. Besides, being frozen has no influence anyway on whether or not a person can be restored to a happy existence. Whatever happens to people in the future will be under sole control of future superintelligence.
Metatechnology - the last technology.
Metatechnology - a technology capable of transcending all the rules governing its own existence.
Imagine yourself inside a computer simulation. The simulation has predefined rules that cannot be broken by its occupants. For example, one of the rules might be that something that exists inside the simulation cannot materialize itself outside of it. Beyond the simulation there exists an "outside" reality occupied by programmers who designed the simulation. This outside reality is governed by its own set of rules, e.g., laws of physics and logic. It must be, then, that whatever exists within the simulation must obey not only the rules within the simulation itself, but also the rules of reality outside the simulation. Metatechnology is the kind of technology that could not only free itself from the rules of potential simulation, but also transcend the rules governing all the potential outside realms whose rules necessarily bound the simulation. In other words, once the logical dependence of inside vs. outside reality is broken, metatechnology would be independent of any rules including laws of physics and logic. With metatechnology nothing would be impossible.