Você está na página 1de 6

Meyer 1

David Meyer

Alvin Grissom

CSCI 4830: Machine Learning

7 December 2016

Rokos Basilisk

In the not-so-distant future, futurists have postulated that we will be living

amongst artificial super intelligences. Most people perceive technological progress as a

linear process, however it is exponential. Therefore, it is not far-fetched to say that by

2045 there is roughly a fifty percent chance of a conscious artificial intelligence being

created.1 In this paper we will discuss not how this artificial intelligence may come into

existence but what it may become. This is a question that researchers from MIRI, the

Machine Intelligence Research Institute, and LessWrong are trying to answer today.

Furthermore, they need an answer fast since Moores law continues to hold true.2 With

the singularity coming fast these researchers are aiming to implement a set of rules for

super artificial intelligences to follow. In other words, a set of ethical codes must be

implemented. What if researchers were to fail in outlining the proper guidelines to create

an ethical ASI? If this ASI were to be defined as malevolent, what would it want from

humanity? Such a thought experiment exists and it has been described as Rokos Basilisk.

Imagine a scenario in which a malevolent ASI were to come into existence and punish

those who did not do its bidding. Let us say it could even reach back to today and punish

those who are not helping it come into existence later.3 This future scenario is probable

but less so than other scenarios since it relies on chained conditions. The first condition is

that we humans are intelligent enough to model such an ASI within our own brains. The
Meyer 2

second condition is that the future ASI itself would be able to simulate a near-perfect

copy of us. The third condition is that timeless decision theory is so obviously true that

any benevolent ASI would adopt it as if it was a correct theory of physics. To quote

Eliezer Yudkowsky, the primary founder of LessWrong: Chained conditions make a

story more plausible and compelling, but therefore less probable. 4 With this in mind

Rokos Basilisk may be improbable but it is still worth exploring.

A basic tenant of timeless decision theory is simulation.5 Conscious intelligent

beings try to predict the outcome of different scenarios based on information available

within their environment. This even applies to humans trying to predict what other living

things might do in reaction to a certain stimulus. For example the brain of the C. elegans

or roundworm has been fully classified by researchers. The roundworm has 302 neurons

while a human has around 100 billion.6 This is easy for us simulate. One can simply think

of a worm and have some notion or visual in their mind of what a worm would do. Now,

could a human have some notion or visual of what an ASI might be like? Some argue this

is equivalent of the worm having some notion of a human. If true, then we could not

sufficiently simulate the actions an ASI may take post-singularity. Yudkowsky argues

this as fact. In response to Roko, the original poster postulating the Basilisk, Yudkowsky

said:

Listen to me very closely, you idiot.


YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES
CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY
POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON
7
THE BLACKMAIL.

However, let us say this is false and that we do think in enough sufficient detail in order

to simulate some notion of an ASI. Then what? This may be true because there is a major

difference between the worm and us. The worm is not conscious; therefore it could never
Meyer 3

have a notion of us in the first place. Then perhaps it does not matter how intelligent a

conscious being is with regard to it postulating the existence of some greater intelligence

such as the Basilisk. Essentially, one can argue any conscious intelligence can postulate

the existence of another without regard to the level of its own intelligence.

If we are able to simulate within our minds the notion of another conscious entity

and its actions, then does intelligence define the accuracy of such a simulation? Perhaps

an ASI would be so intelligent that it could simulate a near-perfect copy of us. In this

case, such a simulation rather than a notion or thought would be so accurate it may as

well be considered reality. Therefore, a copy of yourself could be tortured by the ASI in

order to determine if you would be susceptible to negative reinforcement.8 Based on

timeless decision theory, torturing the copy should feel the same to you as torturing the

you that is here right now. This leaves us with a sort of torturous paradox since one

cannot tell reality from simulation if it is near perfect. This can be described using

Newcombs Paradox.9 Imagine an ASI presents two boxes to you. The ASI gives you two

choices: take both box A and B or only box B. If you take both boxes you are guaranteed

at least $1,000. If you take just box B, you are not guaranteed anything but box B may

contain $1 million. However the ASI has simulated this exact scenario and knows to a

certain degree what choice you would make.9 If the ASI predicted you would take both

boxes, then box B would be empty leaving you with only $1,000. But if the ASI

predicted you would take just box B, then you have won $1 million. Decision theorists

argue you should always take both boxes. But of course if this is your line of thinking,

then box B will always be empty. Timeless decision theory argues you should always

take box B even if the ASI explicitly says to you that it predicted you would take both
Meyer 4

boxes. The simplest argument for this is that you might be in the ASIs simulation. This

would only be possible if the ASI could simulate the entirety of the universe and this

universe would include you. So if you are within the ASIs simulation, take box B and

you will impact the decision you make in reality or other realities.

Now, take the previous logic and apply it to Rokos Basilisk and say you are

being simulated. Box A would signify devoting your life to helping create the Basilisk.

Box B would signify nothing or eternal torment. In this case, it is better to take both

boxes if you believe in timeless decision theory. This modified version of newcombs

paradox may seem odd but it is not far-fetched for the members of MIRI and LessWrong.

Timeless decision theory is believed by many of the researchers and this is why the

concept of Rokos Basilisk is considered dangerous.3 Furthermore, if Rokos Basilisk

were to observe that this kind of blackmail motivates you to bring about its existence, and

then it of course would blackmail you forever ensuring the chain of simulated realities.

The Basilisk is not the problem, the rational actor; the believer in timeless decision theory

is the problem. This is why simply thinking of such a dark scenario actually makes

Rokos Basilisk more likely to happen.

In conclusion, Rokos Basilisk relies on a series of chained conditions making it

very unlikely to happen. However, it is still plausible. Yudkowsky has argued that zero is

not a probability.10 If something is not philosophically impossible within our realm of

physics, then its probability is not actually zero. The problem however lies in the fact that

humans are terrible at dealing with non-zero but negligible probabilities. This can be

described as privileging a hypothesis such as arguing for the existence of God in which

some may argue it is improbable but one cannot prove it is impossible.11 Naturally,
Meyer 5

humans treat negligible probabilities as still worth keeping track of and this is therefore

considered a cognitive bias. The Basilisk is so improbable, yet humans are motivated by

fear and therefore treat the chained conditions of the Basilisk as non-negligible. Another

basic tenant overlooked is the primary form of ethics that would allow for all of this to

happen. Yudkowsky accepts arithmetical utilitarianism as true. This presents an ethical

issue in which one can argue a single human being that is tortured for fifty years is

preferable to 1 million humans beings spontaneously getting specks of dust in their

eyes.12 Utilitarian logic could also apply in the scenario of a drone strike. If you have

knowledge that a terrorist is getting ready to set off a car bomb within a crowded area and

you see the car approaching the crowd. However, there is an open window to destroy the

car before it hits the crowd. But there is a problem; there will be one civilian casualty in

the strike. What do you do? Similar to Newcombs paradox, you do not necessarily know

the outcome. Maybe you have bad Intel and there is no car bomb. In that case, two

innocent lives are lost. In the other case, fifty or more lives could be saved with only one

innocent lost. Or perhaps, you do nothing, and many die. The morality surrounding

utilitarianism is questionable and this is why researchers at MIRI and LessWrong will do

what they can to work on a solution to this ethical dilemma. The Basilisk most likely will

never happen, but the ethics presented by moral utilitarianism is bleak.


1
Urban, Tim. The AI Revolution: The Road to Superintelligence. Wait But Why. Web. 2016.
2
Moore, Gordon. Cramming More Components onto Integrated Circuits. Department of
Computer Science. University of Texas. 1965. Web. 2016
3
Auerbach, David. The Most Terrifying Thought Experiment of All Time. Slate. Web. 2016.
4
Rational Wiki. Rokos Basilisk. Rational Wiki. Web. 2016.
5
Yudkowsky, Eliezier. Timeless Decision Theory. Machine Intelligence Research Institute.
Web. 2016.
6
Jabr, Ferris. The Connectome Debate: Is Mapping the Mind of a Worm Worth It? Scientific
American. Web. 2016.
Meyer 6


7
Yudkowsky, Eliezier. Original Post. Rational Wiki Archive. Web. 2016.
8
Armstrong, Stuart. The Blackmail Equation. LessWrong Wiki. Web. 2016.
9
LessWrong. Newcombs Problem. LessWrong Wiki. Web. 2016
10
Yudkowsky, Eliezier. 0 and 1 Are Not Probabilities. LessWrong Wiki. Web. 2016.
11
Yudkowsky, Eliezier. Privileging the Hypothesis. LessWrong Wiki. Web. 2016.
12
Yudkowsky, Eliezier. Torture vs. Dust Specks. LessWrong Wiki. Web. 2016.

Você também pode gostar