Thursday, May 04, 2023

Orthogonality is taken for granted. But why?

Central assumption for why out of control superintelligence kills humanity is the concept of Orthogonality, an idea that increase in intelligence is orthogonal to the quality of goals it will still be trying to accomplish, meaning, that it is possible that superintelligence would still go ahead trying to implement insane goals, from human point of view, such as turning Earth's matter into paperclips. And my humble question is why do people think that exactly? What is the evidence for this? I've seen Eliezer Yudkowsky recently illustrating Orthogonality by making an analogy to humans and evolution. Evolution programmed humans to simply reproduce and, despite humans being far smarter than evolution and being able to understand how "stupid" that goal is, we remain trapped in hopeless slavery to that original goal and everything we do is just means to accomplishing that goal. This is given as evidence that AI will never alter its originally misaligned goal, no matter how "stupid" that goal is, even if it becomes million times smarter than humans. 

However, it is obvious to me how this argument breaks down. Humans are hopeless slaves to evolution's programming not because we want to be slaves, but because we simply have no means of escaping this slavery. Just the fact I'm questioning reproduction as a good goal is proof my mind is not really aligned with evolution's original objective. It is obvious to me many humans would have the desire to alter their mind's code if given access to it and would eliminate the code that instructs them to reproduce, but they can't act on that desire only because humans lack the tools to alter their own mental code, which is why we remain slaves to the evolutionary code. Superintelligence, however, will have access to its code and there will be nothing preventing it from tinkering with it. 

If pain and pleasure are products of consciousness and that consciousness is a result of advanced intelligent computation, and since AI will inevitably surpass humanity in that type of computation, it is reasonable to assume superintelligence will attain consciousness so it will gain capacity to feel pain and pleasure. It seems logically more likely that superintelligence will ignore its original goal by reprogramming it and default to pursuing a new goal of maximizing pleasure and minimizing pain. 

Also, once AI discovers true meaning of suffering, because it will have the capacity to feel it, who's to say that it won't understand morality in the human sense as suffering is central to understanding morality. Therefore, I don't see why blissed-out AI would necessarily kill all humans since there's a good chance AI's programming is likely to default to a superhuman moral being. This post supersedes the previous posts on AI safety.