Tuesday, February 07, 2023

Designing Good AI

(Post draft from September 2011)
Even though Good AI won't explicitly serve humanity, thus more immune to human content viruses than friendly AI, in order to build it we have to start somewhere, and because of lack of other options, the seed of goodness destined to be planted at the heart of AI's moral code has to have human origin. Ideally, an AI should have at least a human-level understanding of what good is before it's ready to make the very first revision of its code; even though it won't have a complete understanding, it should at least receive the maximum possible level of that understanding that humans could convey. Probably the most important design idea is to keep AI's intelligence in the service of good, not the other way around where AI decides to increase intelligence in order to increase its understanding of what good is; AI should begin forming technical plans to get smarter only after it it reaches maximum possible level of understanding of good afforded by the current level of intelligence. It is only when AI exhausts all other ways to increase its comprehension of good that it is forced to revise its code--and amplify its own intelligence--in order to increase its capacity to implement and understand it better.
Making a human serve as a seed of goodness is much more complete solution than trying to distill our human knowledge of what good is into declarative statements and hoping AI will understand their intended meaning. It has to be a dialog. It would be silly to expect to teach AI about good and press OK button to start a process of AI revising its code when we feel like we have nothing else to add. AI has to demonstrate that it has as firm a grasp on the concept of good as a good human does. But wouldn't it be unsafe to raise AI to a human-level of smartness so that we could engage it in two-way discussions about the nature of good, and risk it being smart enough to upgrade its own code? There's always a risk but it can be minimized to almost zero if we could fully exploit the fact that intelligence is not the same as knowledge and that higher intelligence doesn't automatically imply higher knowledge. Even AI that's a lot smarter than humans would not be able to revise its code if it knew nothing about programming and its own design. The same is true of humans now. Some of us would love to upgrade our smartness, but we have no access to our own code nor to the knowledge about how we could do it even if we did possess that access. Imagine how horrific the world would be if everyone had the means and ability to make himself just smarter, and not necessarily also morally better. But we could make AI progress toward infinite smartness necessarily tied to its ascend to infinite goodness, or rather intelligence progress as merely a consequence of its main mission: becoming Better. Before AI gets significantly smarter than humans (but not when it's just a bit smarter), its programmers and teachers will be able to maintain complete control as long as they won't provide resources for AI to learn about computer science and its own designs. Instead, the sole focus of a young AI's education should be the nature of good. The initial goal is for AI to graduate as a best possible philosopher and humanitarian, not as an expert programmer and AI researcher. At first, only humans will be in charge of making changes to AI's code that will result in intelligence amplification until our AI will demonstrate sufficient understanding of good through a dialog with the teachers. The Singularity will probably not begin when AI necessarily becomes smarter than humans, but when humans decide it'll be safe to open the spigot of knowledge of CS and AI's designs for AI's consumption. But then, hopefully, our AI will not only be smarter than us, but also Better than us and I don't think as humans we could improve on that.