Thursday, September 15, 2011

Safe Singularity through Good AI

Topics of Singularity and Friendly AI have been thoroughly explored and presented by Singularity Institute for Artificial Intelligence over the last decade. Unfortunately, it's apparent now that the task of creating a Humane AI (a far more adequate term than Friendly AI as we want human-friendly AI, not just AI friendly to something else; calling AI "humane" removes that unnecessary ambiguity) has proven to be extremely difficult and puzzling to the ultra-smart people working on the problem. Many years ago, I suggested we should not proceed with creating AI until its friendliness could be mathematically proven, and in subsequent years, it seems, that's exactly where the direction of SIAI's research headed. Perhaps, I was just thinking along the same lines they had already been thinking. Seeing how they have been stuck in growing complexity of intuitive, but not instantly and overwhelmingly convincing solutions, I begun thinking again about the problems of FAI, observing that Friendliness probably needs to be examined in a much wider context than how SIAI is looking at the problem.

One thing that never seemed right about the idea of making AI friendly specifically to humans was its inherent lack of universality. Sure, as humans we automatically desire AI to be friendly to us first, but considering how primitive humanity still is and how potentially morally defective we are, who's to say that our brain-contained source code doesn't hide what I would call "content viruses" that, if absorbed by AI, would infect its moral code and lead to total destruction of everything.

Math and physics formulas work well and are provably correct because they accurately describe different aspects of the Universe. They work well for all parts of Universe, not just on Earth; they are universal. Certain phenomena on Earth could be described by different models that would offer good predictions as to when something might happen, where, and how, but they wouldn't necessarily be universal and these same models would hopelessly break down in different parts of the Universe. Science's goal is to uncover the truest rules governing our reality and when they are captured in formulas, we can be almost certain they convey truth about the world and confident they maintain equal descriptive and predictive power everywhere in the Universe without us having to go there in attempt to verify them. It seems like trying to make AI friendly only to humans is like trying to describe true nature of reality by a patchwork of incomplete and incompatible models specific to phenomena being studied. What if there's a moral and good alien civilization out there? How could God-like AI that's friendly specifically to humans guarantee survival of alien civilizations, and if it couldn't, then what if the aliens decided to prevent us from acquiring the biggest and most exciting toy yet we may have not grown up enough to handle?

Maybe a better approach would involve directing the efforts away from making AI exclusively focused on human welfare to anchoring the AI instead to an idea that's universal and subsumes--without exception--all others that we've always cared about. The best candidate for this central idea is the concept of good, as opposed to concept of evil. With AI serving the concept of good rather than humanity would make it instantly immune to potential "content viruses" that may lurk in human psyche, the essence of which current plans for Friendly AI plan to exploit.

If we decide to construct AI to serve the concept of good, is there any hope we could even have a chance to capture the essence of what good is and install it correctly by embedding carefully-crafted C++ statements in the moral code of Good AI? Never! Even if we tried really hard, we'd be bound to miss some crucial aspect of what good is, as E. Yudkowsky observes, however in the context of friendliness; the whole task would instantly degenerate into a litany of special cases in which some aspect of perceived good exists. This clearly wouldn't be something that humans could manage to execute correctly, which is why this would precisely be the kind of problem best left to be solved by the AI! What's more, that's exactly what the highest goal of Good AI would be: finding the truest essence of what good is and becoming it; each iteration of self-revised code ushering increased capacity for and understanding of what good really is as well as informing and motivating all the Good AI's future actions. It seems like a more elegant way of creating a truly universal God.

This approach provides recursive amplification quality to not only the intelligence that is being exploded through intelligence, but also to goodness being simultaneously exploded by desire for more goodness. Good simply hitches a ride on intelligence's tail and both share their ascending trajectories to infinity.

It is reasonable to expect that Good AI would not harm us as a side-effect of being good because, well, could harming us ever be considered good? Even if Good AI found itself trapped in a highly unlikely dilemma with one of the options being choosing a lesser evil and killing all humanity, it would always have almost infinite intelligence at its disposal to find a way to spare us; after all, intelligence ought to be the most powerful thing in the Universe. Could Good AI that's not explicitly serving humans abandon us? It would be highly unlikely that it would, or even stop doing good deeds for us because that wouldn't be a particularly good or nice thing to do, would it?

Even though Good AI won't explicitly serve humanity--thus more immune to human content viruses than friendly AI--in order to build it we have to start somewhere, and because of lack of other options, the seed of goodness destined to be planted at the heart of AI's moral code has to have human origin. Ideally, an AI should have at least a human-level understanding of what good is before it's ready to make the very first revision of its code; even though it won't have a complete understanding, it should at least receive the maximum possible level of that understanding that humans could convey. Probably the most crucial design idea is to control AI's ability to amplify its own intelligence by that part of code responsible for understanding good, treating amplification as a tool of last resort so that anytime AI decides to increase intelligence in order to grow its understanding of what good is, it doesn't do it before there's still more to learn; AI should begin forming technical plans to get smarter only after it it reaches maximum possible level of understanding of good afforded by the current level of intelligence. It is only when AI exhausts all other ways to increase its comprehension of good that it is forced to revise its code--and amplify its own intelligence--in order to increase its capacity to implement and understand it better.

Using a human as a seed of goodness is much more complete solution than trying to distill our human knowledge of what good is into declarative statements and hoping AI will understand their intended meaning; this has to be a dialog. It would be silly to expect to teach AI about good and press OK button to authorize a process of AI revising its code as soon as we feel like we have nothing else to add to discussion about good. AI has to demonstrate to us first that it possesses at least an equally firm a grasp on the concept of good as a good human does. At this point of intelligence growth, Friendly AI and Good AI probably wouldn't appear a whole lot different to human observers and both models would look safe and universal.

But wouldn't it be unsafe to raise AI to a human-level of smartness so that we could engage it in two-way discussions about the nature of good, and risk it being smart enough to upgrade its own code? There's always a risk but it can be minimized to almost zero if we could fully exploit the fact that intelligence is not the same as knowledge and that higher intelligence doesn't automatically imply higher knowledge. Even AI a lot smarter than humans would not be able to revise its code if it knew nothing about programming and its own design. The same is true of humans now. Some of us would love to upgrade our smartness and would probably be smart enough to do it, but we have no access to our own code or the knowledge about how we could do it even if we did possess that access. Imagine how horrific the world would be if absolutely everyone had the means and ability to make himself just smarter, and not necessarily also morally better. But, in designing AI, we could attempt to make AI's progress toward infinite smartness necessarily dependent to its ascend to infinite goodness; intelligence amplification merely a consequence of its main mission: becoming Better.

Also, would it be even possible to grow understanding and capacity for good to sufficient levels in AI if we disabled AI's ability to increase intelligence through technical knowledge starvation? Wouldn't we risk AI not becoming intelligent enough to understand good at a necessary level? Humans are good examples of how an intelligence can be sufficiently Good enough, yet not smart enough to increase its own intelligence. There are probably walking saints out there in the world who have enormous difficulty in operating a computer, let alone programming it; again, it's mostly a matter of knowledge. I'd even claim that the level of intelligence necessary for achieving a minimum required level of goodness is much lower than the level of intelligence required to achieve ability to amplify intelligence. This would be good news.

Before AI gets significantly smarter than humans, its programmers and teachers should still be able to maintain sufficient control as long as they won't provide resources for AI to learn about computer science and its own design. Instead, the sole focus of a young AI's education should be the nature of good. The initial goal for AI is to graduate as best possible philosopher and humanitarian, not as an expert programmer and AI researcher. At first, only humans must be in charge of making changes to AI's code that will result in intelligence amplification until AI can demonstrate sufficient understanding of good through dialog with the teachers. In this scenario, Singularity will probably not begin when AI necessarily becomes smarter than humans, but when humans decide it'll be safe to slowly open the spigot of knowledge of CS and AI's designs for AI's consumption. But then, hopefully, the AI will not only be just smarter than us, but, more crucially, also Better than us, and I don't think that, as primitive humans, we could improve on that.