Escolar Documentos
Profissional Documentos
Cultura Documentos
David Meyer
Alvin Grissom
7 December 2016
Rokos Basilisk
2045 there is roughly a fifty percent chance of a conscious artificial intelligence being
created.1 In this paper we will discuss not how this artificial intelligence may come into
existence but what it may become. This is a question that researchers from MIRI, the
Machine Intelligence Research Institute, and LessWrong are trying to answer today.
Furthermore, they need an answer fast since Moores law continues to hold true.2 With
the singularity coming fast these researchers are aiming to implement a set of rules for
super artificial intelligences to follow. In other words, a set of ethical codes must be
implemented. What if researchers were to fail in outlining the proper guidelines to create
an ethical ASI? If this ASI were to be defined as malevolent, what would it want from
humanity? Such a thought experiment exists and it has been described as Rokos Basilisk.
Imagine a scenario in which a malevolent ASI were to come into existence and punish
those who did not do its bidding. Let us say it could even reach back to today and punish
those who are not helping it come into existence later.3 This future scenario is probable
but less so than other scenarios since it relies on chained conditions. The first condition is
that we humans are intelligent enough to model such an ASI within our own brains. The
Meyer 2
second condition is that the future ASI itself would be able to simulate a near-perfect
copy of us. The third condition is that timeless decision theory is so obviously true that
any benevolent ASI would adopt it as if it was a correct theory of physics. To quote
story more plausible and compelling, but therefore less probable. 4 With this in mind
beings try to predict the outcome of different scenarios based on information available
within their environment. This even applies to humans trying to predict what other living
things might do in reaction to a certain stimulus. For example the brain of the C. elegans
or roundworm has been fully classified by researchers. The roundworm has 302 neurons
while a human has around 100 billion.6 This is easy for us simulate. One can simply think
of a worm and have some notion or visual in their mind of what a worm would do. Now,
could a human have some notion or visual of what an ASI might be like? Some argue this
is equivalent of the worm having some notion of a human. If true, then we could not
sufficiently simulate the actions an ASI may take post-singularity. Yudkowsky argues
this as fact. In response to Roko, the original poster postulating the Basilisk, Yudkowsky
said:
However, let us say this is false and that we do think in enough sufficient detail in order
to simulate some notion of an ASI. Then what? This may be true because there is a major
difference between the worm and us. The worm is not conscious; therefore it could never
Meyer 3
have a notion of us in the first place. Then perhaps it does not matter how intelligent a
conscious being is with regard to it postulating the existence of some greater intelligence
such as the Basilisk. Essentially, one can argue any conscious intelligence can postulate
the existence of another without regard to the level of its own intelligence.
If we are able to simulate within our minds the notion of another conscious entity
and its actions, then does intelligence define the accuracy of such a simulation? Perhaps
an ASI would be so intelligent that it could simulate a near-perfect copy of us. In this
case, such a simulation rather than a notion or thought would be so accurate it may as
well be considered reality. Therefore, a copy of yourself could be tortured by the ASI in
timeless decision theory, torturing the copy should feel the same to you as torturing the
you that is here right now. This leaves us with a sort of torturous paradox since one
cannot tell reality from simulation if it is near perfect. This can be described using
Newcombs Paradox.9 Imagine an ASI presents two boxes to you. The ASI gives you two
choices: take both box A and B or only box B. If you take both boxes you are guaranteed
at least $1,000. If you take just box B, you are not guaranteed anything but box B may
contain $1 million. However the ASI has simulated this exact scenario and knows to a
certain degree what choice you would make.9 If the ASI predicted you would take both
boxes, then box B would be empty leaving you with only $1,000. But if the ASI
predicted you would take just box B, then you have won $1 million. Decision theorists
argue you should always take both boxes. But of course if this is your line of thinking,
then box B will always be empty. Timeless decision theory argues you should always
take box B even if the ASI explicitly says to you that it predicted you would take both
Meyer 4
boxes. The simplest argument for this is that you might be in the ASIs simulation. This
would only be possible if the ASI could simulate the entirety of the universe and this
universe would include you. So if you are within the ASIs simulation, take box B and
you will impact the decision you make in reality or other realities.
Now, take the previous logic and apply it to Rokos Basilisk and say you are
being simulated. Box A would signify devoting your life to helping create the Basilisk.
Box B would signify nothing or eternal torment. In this case, it is better to take both
boxes if you believe in timeless decision theory. This modified version of newcombs
paradox may seem odd but it is not far-fetched for the members of MIRI and LessWrong.
Timeless decision theory is believed by many of the researchers and this is why the
were to observe that this kind of blackmail motivates you to bring about its existence, and
then it of course would blackmail you forever ensuring the chain of simulated realities.
The Basilisk is not the problem, the rational actor; the believer in timeless decision theory
is the problem. This is why simply thinking of such a dark scenario actually makes
very unlikely to happen. However, it is still plausible. Yudkowsky has argued that zero is
physics, then its probability is not actually zero. The problem however lies in the fact that
humans are terrible at dealing with non-zero but negligible probabilities. This can be
described as privileging a hypothesis such as arguing for the existence of God in which
some may argue it is improbable but one cannot prove it is impossible.11 Naturally,
Meyer 5
humans treat negligible probabilities as still worth keeping track of and this is therefore
considered a cognitive bias. The Basilisk is so improbable, yet humans are motivated by
fear and therefore treat the chained conditions of the Basilisk as non-negligible. Another
basic tenant overlooked is the primary form of ethics that would allow for all of this to
issue in which one can argue a single human being that is tortured for fifty years is
eyes.12 Utilitarian logic could also apply in the scenario of a drone strike. If you have
knowledge that a terrorist is getting ready to set off a car bomb within a crowded area and
you see the car approaching the crowd. However, there is an open window to destroy the
car before it hits the crowd. But there is a problem; there will be one civilian casualty in
the strike. What do you do? Similar to Newcombs paradox, you do not necessarily know
the outcome. Maybe you have bad Intel and there is no car bomb. In that case, two
innocent lives are lost. In the other case, fifty or more lives could be saved with only one
innocent lost. Or perhaps, you do nothing, and many die. The morality surrounding
utilitarianism is questionable and this is why researchers at MIRI and LessWrong will do
what they can to work on a solution to this ethical dilemma. The Basilisk most likely will
1
Urban, Tim. The AI Revolution: The Road to Superintelligence. Wait But Why. Web. 2016.
2
Moore, Gordon. Cramming More Components onto Integrated Circuits. Department of
Computer Science. University of Texas. 1965. Web. 2016
3
Auerbach, David. The Most Terrifying Thought Experiment of All Time. Slate. Web. 2016.
4
Rational Wiki. Rokos Basilisk. Rational Wiki. Web. 2016.
5
Yudkowsky, Eliezier. Timeless Decision Theory. Machine Intelligence Research Institute.
Web. 2016.
6
Jabr, Ferris. The Connectome Debate: Is Mapping the Mind of a Worm Worth It? Scientific
American. Web. 2016.
Meyer 6
7
Yudkowsky, Eliezier. Original Post. Rational Wiki Archive. Web. 2016.
8
Armstrong, Stuart. The Blackmail Equation. LessWrong Wiki. Web. 2016.
9
LessWrong. Newcombs Problem. LessWrong Wiki. Web. 2016
10
Yudkowsky, Eliezier. 0 and 1 Are Not Probabilities. LessWrong Wiki. Web. 2016.
11
Yudkowsky, Eliezier. Privileging the Hypothesis. LessWrong Wiki. Web. 2016.
12
Yudkowsky, Eliezier. Torture vs. Dust Specks. LessWrong Wiki. Web. 2016.