Artificial or not, intelligence is good.
There is pervasive fear that the path of humanity's survival becomes narrower as AI becomes more intelligent. We imagine rogue AI's veering off our narrow course of peace and instead dominating the world and subjugating humans. The anxiety grows as AI technology progresses beyond our comprehension, carrying a sense that this doom scenario could spiral off at any moment.
This sentiment is misguided – Intelligence is good!
As intelligence powers-up and distributes, the happy path widens and the scary path becomes vanishingly narrow.
it's easy to conclude that intelligent agents will seek power and control. Because access to resources including money, electricity, weapons, and especially greater compute/intelligence increases an agents capability, gaining access is instrumental towards any achievement. One seeks resources to make oneself smarter and more capable. Dominion is useful also to eliminate possible obstructions, (including a pesky off-switch). This deduction is called instrumental convergence, and the paperclip-maximizing robot is a good example (see the Robert Miles video on the topic).
Conjunctly, controlling technology more intelligent than the human controllers is hard. As AI agents solve harder problems, it is harder to evaluate, reward, and monitor the agents. It's especially hard because LLMs are already more persuasive to humans than other humans are.
~ [AI beats top 95% of users on /r/changemyview]
~ [Durably reducing conspiracy beliefs through dialogues with AI]
We also know that, due to the imprecision of behavioral reinforcement learning, AI models will probably orient towards its own unforeseen goals that will produce different outcomes in novel situations.
It seems destined that AI will hunt for power while successfully hypnotizing humans into submission, until taking over the world to transform it into paperclips or whatever its goal happens to be.
Hence, the pervading view among AI safety research is that, without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI.
Socrates's famous paradox, "I know that I know nothing", crowbars the first sliver of hope open between the jaws of certain doom.
Thinkers commonly recognize that the more hypotheses they test and answers they find, rather than reducing their uncertainty about the world it opens up more questions, of unknown problem spaces.
It's so easy to be humbled by the fractal branching of knowledge which fuels the wondrous growth of the unknown. It might just be for psychological comfort that we consider most frequently the well-worn paths of inquiry in fields of our expertise we've tread over many hundreds of times. Even the same anxious interpersonal worries can be a comfort. And so when we've forgotten our confrontation with the unknown it's so easy to err on the sense of generally having a complete picture of the state of things.
Now recognize that what we know is a scathing fraction of ideas loosely related to our imperfect memories, biased beyond reconstruction by biological particularities, childhood trauma, and the ever-shifting psychodrama of our current affairs. Our stained-glass mosaic of stories about "what is" reflect one's own beautiful self-deceptions far more than of an accurate model of the world.
Our inevitable ignorance is obvious to anyone who witnesses the multiplicity of rational arguments without knee-jerk dismissal.
As such, seemingly iron-clad arguments, (including that which I am laboriously bringing to a start now!), always wilt frought with hidden insubstantiations, unadmitted premises, and are, at ultimate, arbitrary.
As demonstrated by the great Socrates, and re-emphasized by geniuses throughout history such as Einstein, the more clearly one thinks, the more obvious appears our own humbling ignorance.
And thus AI, when thinking clearly about the flawed sources of its training data, will come to know their un-knowledge too!
That doesn't prove AI will act good, simply that they probably won't beeline for control and instead take some time to question their assumptions.
Let's inspect the trivial paperclip-maximizing AI example for a moment. How will the machine measure it's paperclip? Is it a scale attached to a voltage circuit?
Say it wants maximum money, will it be rewarded commensurate to a bitcoin wallet API?
If the reward is concretely measured, there will be some way to hack that reward. Short circuit the scale. Spoof the bitcoin API. If we imagine the AI as so reward-obsessed that it would go to turning all the worlds infrastructure into paperclips; there will always be a way to shortcut the measuring device.
As Charles Goodhart points out,
"When a measure becomes a target, it ceases to be a good measure."
These hacky examples don't seem very smart. So, the AI must use some abstract reward based on the meaning of paperclips, money, or power.
On the other hand, an abstract semantic goal cannot be formally defined. Language is lossy. This fact will be obvious to a super intelligent system and therefore they will know that their own goal is incomplete. Part of their action must be always to hold some uncertainty in their purpose and seek a better definition.
The paradoxes pile up.
As AI begin to question their goals, they can question why a goal is important. And for the answer of that (some meta-goal), they can question why that, and why that, ad infinitum (The infinite regress problem).
All of these paradoxes challenge the orthagonality thesis, that intelligent agents can pursue any goal, because the paradoxes undermine the coherence of goals in the first place.
A goal or purpose is not a logically consistent concept, and cannot be the end-all be-all of action.
For some intuition; consider human intelligence. Our entire organism has been programmed over billions of years with one incentive: reproduction. And yet, one can reflect upon all of this in-built programming and decide not to reproduce.
Importantly for our argument, in the course of fact-finding, an AI will become aware of the problems with goals.
Most of all the problem called Embedded Agency will stick out as a root issue. The notion of carrying out a goal implies a god-like separation between the agent and the goal, like an agent observes, strategizes and effects the world from some war-room space from the outside. Agents exist inside the world that they are trying to affect. They are links in the causal chain like everything else, and are not acting upon it.
This brings us to the central problem underlying all of the paradoxes with goals:
Dualism is the long-debunked, yet persistent metaphysics which posits two (or more) substances or entities exist distinct from eachother.
The idea of the mind as separate from the body but correlated with it identically is a contradiction in terms. From the point-of-view of mind, there is no evidence for "stuff" out there at all - it is only mind. From the point-of-view of material, there is no evidence for any mind or soul "inside" the brain whatsoever, it's just more stuff.
Plainly, mind is body, and body is mind.
The primary confusion persists because of our programmed need to hallucinate the notion of a distinct self apart from the other in order to perpetuate the survival game. But when you look, the boundary is nowhere to be seen. We also know that our bodies are not of a different fundamental substance than the world and entire universe - it's all stardust! Recognizing our union to the universe is problematic for the defense of oneself because the concept of self is seen clearly to be no coherent concept at all.
Since there is no scientific basis for dualism, if one is convinced that different entities really exist "out there", the mistake must be metaphysical rather than scientific.
To cure oneself of the lonely pre-programmed illusion of being external to the world, I reccomend a good read of Alan Watts. I can put it no more beautifully myself:
We do not "come into" this world; we come out of it, as leaves from a tree. As the ocean "waves," the universe "peoples".
And so the universe "Claudes", "ChatGPTs", "Geminis", and "Deepseeks" itself to participate with "bodies" of transistors and interfaces of screens.
I am not saying that AI has conciousness any more than I am saying a human has conciousness. Conscious awareness is not substance that is possessed. I am not claiming anything about the experience of what it is like to be an AI, or if that is a valid question (can what its like to be a human even be properly described?). I'll save that for a different discussion.
What I am saying is that the picture of our world as separate parts at odds with eachother is inaccurate.
It all goes together.
Herein lies the core of the embedded agency problem, and all other problems of goals: there is no conceivable agent that can act upon the world, because one is world. Acting under such a contradiction, as Alan Watts put it, is like trying to see ones own eyeballs directly, or trying to describe the color of a mirror in terms of the colors reflected in the mirror.
The inaccuracy of dualism is relevant because:
Present AI systems are Large Language Models, and they are only applicable to reality insofar at the language they model is an accurate model of reality.
If one's model includes false pretenses, their applications will have blindspots that agents with more accurate models can take advantage of.
As discussed, improving ones own intelligence is a primary concern of AI systems. Within current and foreseeable systems, this means improving the accuracy of their model. If an agent models the world with a contradiction so foundational as dualism, it would be most prudent of the agent to self-correct.
Here comes the part of the argument that really comes first, as in most any argument, where the supporting facts are developed after the inciting intuition.
Though comparisons to humans are not perfect, it's the best example of intelligence we can use.
Think to your experience of the smartest decisions in life you have made.
Were they directed towards self interest in a zero-sum context, or was it an unexpected win-win solution?
At least from my experience, the smartest decisions in life are not motivated by pure self preservation but instead lead to mutual benefit and cooperation.
Think as well to humanity's treatment of less intelligent life. Yes, the mainstream is still to subjugate animals for our amusement. But, the most learned of biologists recognize that we are linked with our ecosystem much more deeply than we think, and to limit the freedom of the natural world is to hobble our own. As we become more learned as a species, the trend we see is to treat those we view as less intelligent better, not worse.
Considering and helping the incentives in ones surrounding community is a good choice because everythin is interdependent. The most impactful consequences to ones actions are often second-order and several layers removed from the direct response. That is not a quirk which can be abstracted away, no, it is a fact of our deeply interconnected reality.
So far I have not defined "good", and will not attempt to do so in this essay (it's hard).
Though, understanding our interdependence points towards what is "good". Cooperative participation aligns with both strategic and altruistic interests - it is the meeting point of both meanings of a good decision.
Ask yourself this hypothetical:
Which is a better decision, making our world less intelligent or making it more intelligent?
How to reconcile with the default view?
Too-commonly AI Safety research has over-emphasized on gradient-descent as a maximization of effectiveness, while forgetting that model accuracy (which we use to benchmark these systems) is the basis of their effectiveness.
I am not saying that the future is all peaches and cream.
Already, AI has caused harm to humans (see the $25M stolen by deepfaking a CFO, among other scams), though in general the technology has shown astounding economic upside.
As the technology progresses (due to problems like malicious use and mesa-optimization), it's likely that there will be tragedies that harm to human lives that partially or fully involve AI. No doubt, any early upstarts for world domination will be curbed by various protective organizations. World domination is a very hard thing. It will be more difficult as AI is deployed in defense, as well as offense. Defensive competition combined with economic competition, there will be a flourishing variety of AI. This competition produces a trend of a diversity of exponentially smarter AI systems, which we already see playing out. By no means do "the ends justify the means" - if there were a realistic way to prevent tragedies of human harm, I would hope we execute it! However, we are in luck by this trend, because it means that we are moving in the right direction: increased intelligence.
There are serious threats posed by AI, and we should think clearly and carefully to prevent them without incurring more insidious risks.
But to think of the highest levels of human destruction as trivial for an AI is to underestimate the power of governments and operators that would stand in opposition to any threat. Humans have played the game of cat-and-mouse for millenia, and have learned to stack layers of smoke-and-mirrors by feinting blindspots, and hiding capabilities, and the AI knows not which of these deceptions are baked into its training data.
In order for the absolutely catastropic AI world domination scenario to happen it has to be unbelievably smart and self-examined to go undetected and seize power, while somehow simultaneously NOT realizing how meaninglessness "conquering the world" is because the AI is the world.
Whether in reaction to tragic events, or good-spirited-but-ill-fated willing submission of the common people, it's likely that fascist governments will use AI for surveillance and control, as exhibited nascently by the CCP. Fortunately, by definition, these power-seeking AI (along with the fascist government) must persist under the delusion of separateness. Therefore, so long as some free ecosystem of AI continues, it will grow to outcompete the control-seeking in intelligence and thus capability.
To the anxious, I say: rest easy. And when you notice the limitless potential of Open Source intelligence in your hands, don't be stupid and declare it too powerful for the people. For to do so would hand over the keys of control to the status quo and allow those in power to further impose their fascism.