Escolar Documentos
Profissional Documentos
Cultura Documentos
http://spectrum.ieee.org/automaton/robotics/arti...
Can Winograd Schemas Replace Turing Test for Defining Human-Level AI?
By Evan Ackerman
Posted 29 Jul 2014 | 16:50 GMT
Earlier this year, a chatbot called Eugene Goostman "beat" a Turing
Test for articial intelligence (http://spectrum.ieee.org/techtalk/robotics/articial-intelligence/virtual-tween-passes-turing-test) as
part of a contest organized by a U.K. university. Almost immediately, it
became obvious that rather than proving that a piece of software had
achieved human-level intelligence, all that this particular competition
had shown was that a piece of software had gotten fairly adept at
fooling humans into thinking that they were talking to another human,
which is very dierent from a measure of the ability to "think." (In fact,
some observers didn't think the bot was very clever at all
(http://www.scottaaronson.com/blog/?p=1858).)
Clearly, a better test is needed, and we may have one, in the form of a
type of question called a Winograd schema that's easy for a human to
answer, but a serious challenge for a computer.
The problem with the Turing Test is that it's not really a test of whether
an articial intelligence program is capable of thinking: it's a test of
whether an AI program can fool a human. And humans are really, really
dumb. We fall for all kinds of tricks that a well-programmed AI can use
to convince us that we're talking to a real person who can think.
1. Two parties are mentioned in a sentence by noun phrases. They can be two males, two females, two inanimate objects or two groups of
people or objects.
2. A pronoun or possessive adjective is used in the sentence in reference to one of the parties, but is also of the right sort for the second
party. In the case of males, it is he/him/his; for females, it is she/her/her; for inanimate object it is it/it/its; and for groups it is
they/them/their.
3. The question involves determining the referent of the pronoun or possessive adjective. Answer 0 is always the rst party mentioned in
the sentence (but repeated from the sentence for clarity), and Answer 1 is the second party.
4. There is a word (called the special word) that appears in the sentence and possibly the question. When it is replaced by another word
(called the alternate word), everything still makes perfect sense, but the answer changes.
For more details (including some examples of ways in which certain Winograd schemas can include clues that an AI could exploit), this paper
(http://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492/4924) is easy to understand and well worth reading. In fact, it's so well worth reading
that I'm going to steal their conclusion and post it here:
1 of 2
08/05/2014 09:00 PM
http://spectrum.ieee.org/automaton/robotics/arti...
Like Turing, we believe that getting the behaviour right is the primary concern in developing an articially intelligent system. We further
agree that English comprehension in the broadest sense is an excellent indicator of intelligent behaviour. Where we have a slight
disagreement with Turing is whether a free-form conversation in English is the right vehicle. Our WS [Winograd schemas] challenge does
not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses. Assuming a subject is willing to take
a WS test at all, much will be learned quite unambiguously about the subject in a few minutes. What we have proposed here is certainly
less demanding than an intelligent conversation about sonnets (say), as imagined by Turing; it does, however, oer a test challenge that is
less subject to abuse.
It's worth pointing out that we're a bit skeptical that you can really "test" for human-level AI in this manner. With a highly structured test with specic
questions and answers that are unambiguously right or wrong, there's a lot of potential for a clever (but not thinking) AI to nd ways to exploit it.
The question, then, becomes whether "intelligence" is simply a technological system that is suciently complex to correctly answer a series of
questions that a slightly more complex biological system (us) has arbitrarily decided constitute a measurement of what thinking requires.
It seems inevitable that at some point, we'll have to say that true intelligence is feeling as well as thinking, and "Blade Runner" is way ahead of us:
2 of 2
08/05/2014 09:00 PM