The Interrogator As Critic: The Turing Test and The Evaluation of Generative Music Systems

Christopher Ariza
Department of Music
The Interrogator as Critic:
Towson University
8000 York Road The Turing Test and the
Towson, Maryland 21252 USA
ariza@flexatone.net Evaluation of Generative
Music Systems
Procedural or algorithmic approaches to generating systems. Yet, since only the output of the system
music have been explored in the medium of software is tested (that is, system and interface design are
for over fifty years. Occasionally, researchers have ignored), any generative technique can be employed.
attempted to evaluate the success of these generative These tests may be associated with the broader
music systems by measuring the perceived quality historical context of human-versus-machine tests,
or style conformity of isolated musical outputs. as demonstrated in the American folk-tale of John
These tests are often conducted in the form of Henry versus the steam hammer (Nelson 2006) or
comparisons between computer-aided output and the more recent competition of Garry Kasparov
non-computer-aided output. The model of the versus Deep Blue (Hsu 2002).
Turing Test (TT), Alan Turings proposed Imitation Some tests attempt to avoid measures of subjec-
Game (Turing 1950), has been submitted and tive quality by measuring perceived conformity to
employed as a framework for these comparisons. In known musical artifacts. These musical artifacts
this context, it is assumed that if machine output are often used to create the music being tested:
sounds like, or is preferred to, human output, the they are the source of important generative param-
machine has succeeded. The nature of this success eters, data, or models. The design goals of a system
is rarely questioned, and is often interpreted as provide context for these types of tests. Pearce,
evidence of a successful generative music system. Meredith, and Wiggins (2002, p. 120) define four mo-
Such listener surveys, within necessary statistical tivations for the development of generative music
and psychological constraints, may be pooled to systems: (1) composer-designed tools for personal
gauge common responses to and interpretations of use, (2) tools designed for general compositional use,
musicyet these surveys are not TTs. This article (3) theories of a musical style . . . implemented as
argues that Turings well-known proposal cannot computer programs, and (4) cognitive theories of
be applied to executing and evaluating listener the processes supporting compositional expertise . . .
surveys. implemented as computer programs. Such motiva-
Whereas pre-computer generative music systems tional distinctions may be irrelevant if the system
have been employed for centuries, the idea of is used outside of the context of its creation; for
testing the output of such systems appears to only this reason, system-use cases, rather than developer
have emerged since computer implementation. motivations, might offer alternative distinctions.
One of the earliest tests is reported in Hiller The categories proposed by Pearce, Meredith, and
(1970, p. 92): describing the research of Havass Wiggins can be used to generalize about two larger
(1964), Hiller reports that, at a conference in 1964, use cases: systems used as creative tools for making
Havass conducted an experiment to determine original music (motivations 1 and 2, above), and
if listeners could distinguish computer-generated systems that are designed to computationally model
and traditional melodies. Generative techniques theories of musical style or cognition (motivations
derived from the fields of artificial intelligence 3 and 4). These two larger categories will be referred
(AI; for example, neural nets and various learning to as creative tools and computational models.
algorithms) and artificial life (e.g., genetic algorithms Although design motivation is not included in
and cellular automata) may be associated with the seven descriptors of computer-aided algorith-
such tests due to explicit reference to biological mic systems proposed in Ariza (2005), the idiom
affinity descriptor is closely related: systems with
Computer Music Journal, 33:2, pp. 4870, Summer 2009 singular idiom affinities are often computational
c
!2009 Massachusetts Institute of Technology. models.
48 Computer Music Journal

Explicitly testing the output of generative music Stanley Cavell (2002, p. 88), while arguing that
systems is uncommon. As George Papadopoulos and aesthetic judgments are conclusive and rational,
Geraint Wiggins (1999, p. 113) observe, research in states that such judgments lack something: the
generative music systems demonstrates a lack of arguments that support them are not conclusive
experimental methodology. Furthermore, there is in the way arguments in logic are, nor rational
usually no evaluation of the output by real experts. in the way arguments in science are. Aesthetic
Similarly, Pearce, Meredith, and Wiggins (2002, judgments, as described by Immanuel Kant in the
p. 120), presumably describing all types of generative Critique of Judgment (1790), can be divided into
music systems, state that researchers often fail to two categories: taste of sense, which concerns what
adopt suitable methodologies for the development is merely pleasant or agreeable, and taste of
and evaluation of composition programs and this, in reflection, which concerns what is beautiful (Cavell
turn, has compromised the practical or theoretical 2002, p. 88). Musical judgments, in this framework,
value of their research. In the case of creative tools, include both the taste of sense and the taste of
the lack of empirical output evaluation is not a reflection. Some critics are better than others at
shortcoming: creative, artistic practices, no matter distinguishing these tastes. Critics make musical
the medium, often lack rigorous experimental judgments to affirm aesthetic value.
methodologies. While system and interface design Listener surveys rely on musical judgments.
benefit from rigorous evaluation, the relevance of Although the nature of these judgments can be
systematic output evaluationwhether conducted shaped in the selection of the listeners and the design
by experts or otherwiseis questionable. The lack of the survey, there is an essential psychological
of systematic evaluation of aesthetic artifacts in uncertainty in what informs individual musical
general is traditionally accepted: evaluation is judgments. This uncertainty is heightened in the
more commonly found as aesthetic criticism, not context of an anonymous survey, where a listener
experimental methodology. is neither accountable for nor required to defend
This article examines the concept of a musical their judgments. Thus, a skepticism in regard to
TT. A variety of tests, proposed and executed by the results of listener surveys, perhaps more than
researchers in diverse fields, are examined, and it is surveys of other human experiences, is warranted.
shown that musical TTs do not actually conform This article asserts the particular nature of the TT
to Turings model. Use of the TT in the evaluation in order to discourage false associations of the TT
of generative music systems is superfluous and with listener surveys. This critique is not aimed at
potentially misleading; its evocation is an appeal to individual researchers or research projects. The TT
a measure of some form of artificial thought, yet, is an attractive concept: it is not surprising that it
in the context of music, it provides no more than a has found its way into discourse on generative music
listener survey. systems. More practical methods of system evalua-
The term musical judgments will be used tion, however, may do more to promote innovation.
to include a range of statements listeners might Comparative analysis of system and interface de-
make about music. Musical judgments may be sign, or studies of user interaction and experience,
aesthetic judgments or well-reasoned and informed are examples. Such approaches to system evaluation,
interpretations; they may evaluate conformity to while only offering limited insight into musical out-
a known style or perceived similarity to existing put, promote design diversity and provide valuable
works; they may also be statements of taste, research upon which others can build.
bias, or preference. Such judgments are often
subjective: they are statements about the experience
of hearing and interpreting music. Some musical The Turing Test
judgments may be objective; asserting or denying
this claim, however, requires a psychoacoustic and In 1950, Alan Turing devised a method of answering
philosophical inquiry beyond the scope of this study. the question: Can a machine think? (Turing 1950,
Ariza 49
p. 433). His original model, called The Imitation knowledge that it brought about (Jefferson 1949,
Game, includes a human interrogator who, through p. 1110). Beyond just the ability of creating concepts,
a text-based interface designed to remove the aural Jefferson goes on to suggest that such a machine
qualities of speech and the visual appearance of the must have emotions and self awareness. Not until
subject, communicates with two agents. One agent a machine can write a sonnet or compose a concerto
is human; the other, machine. The interrogator must because of thoughts and emotions felt, and not by
be aware that one of the agents is a machine. If the the chance fall of symbols, could we agree that
interrogator, through discourse, cannot successfully machine equals brainthat is, not only write it but
distinguish the human from the machine, then the know that it had written it (Jefferson 1949, p. 1110).
machine, in Turings view, has achieved thinking. Jeffersons quote is used by Turing to anticipate
Importantly, Turing does not define thinking or the objection of the other minds problem
intelligence (Copeland 2000, p. 522), and Turing does the argument that it is impossible to prove that
not claim that passing this test proves thought or any entity (including humans) has a mind; only
intelligence (Harnad 2000, p. 427). As Steven Harnad individuals can be certain that they have minds. As
states, Turings goal is an epistemic point, not an Harnad states, the only way to know for sure that
ontic one, and heuristic rather than demonstrative a system has a mind is to be that system (2000,
(Harnad 2000, p. 427). p. 428). Turing avoids this problem by rejecting the
Turing based his test on a party game in which solipsist point of view (1950, p. 446) and affirming
the interrogator attempts to distinguish the gender that humans use conversation and discourse to
of two concealed human agents. Turings original identify the presence of minds in other humans.
description of his test is incomplete, and has led to a Turing takes a pragmatic approach: Instead of
variety of interpretations. Some have suggested that arguing continually over this point it is usual to
there are actually two tests, the Original Imitation have the polite convention that everyone thinks
Game Test and the Standard Turing Test (Genova (Turing 1950, p. 446). This is a natural language based
1994; Sterrett 2000). Yet B. Jack Copeland, based on argument from analogy: You are sufficiently like
analysis of additional commentary by Turing, states me in all other visible respects, so I can justifiably
that it seems unlikely that Turing saw himself as infer (or assume) that you are like me in this invisible
describing different tests (Copeland 2000, p. 526). one (Rapaport 2000, p. 469). In short, the TT works
Turing suggests that multiple tests must be by assuming that humans have minds and that
averaged with, presumably, multiple human agents natural language is sufficient to represent mind; a
and interrogators. He predicted that by the year machine has a mind if a machine and a human are
2000, an average interrogator, after five minutes indistinguishable in discourse.
of conversation, will make a correct identification An important consequence of the TT is that the
no more than 70 percent of the time (Turing 1950, machines internal mechanism, as well as its out-
p. 442). Additional sources suggest that Turing ward visual or aural presence, is irrelevant. Through
estimated it would take over 100 years before a blind comparison, Turing hoped to surpass personal
machine regularly passed the test (Copeland 2000, and aesthetic bias. The test isolates convincing,
p. 527). Anticipating complaints to such a test, human-like conversation as the sole determinate of
Turing (1950) responds to at least nine hypothetical thinking. Harnad calls this functional, rather than
objections. structural, indistinguishability (2000, p. 429). As
Under the heading of The Argument from Ray Kurzweil states, the insight in Turings epony-
Consciousness, Turing responds to an argument mous test is the ability of written human language
presented by Geoffrey Jefferson a year earlier (1949). to represent human-level thinking (Kurzweil 2002).
Jefferson states that it will not be sufficient for a This insight, as discussed subsequently, has been
machine mind to just use words: It would have debated.
to be able to create concepts and to find for itself Turings goal was, at most, to provide an inductive
suitable words in which to express additions to test of thinking (Moor 1976, 2001, p. 82); at the least,

he offered a philosophical conversation stopper p. 1110). Hofstadter further characterizes the Eliza
(Dennett 1998, p. 4). Turing left significant details Effect as a virus that constantly mutates, reappear-
of his test unspecified, including the number of ing over and over again in AI in ever-fresh disguises,
interrogators and agents, their qualifications, and and in subtler and subtler forms (Hofstadter 1996,
the duration and organization of the tests (Halpern p. 158). As discussed subsequently, the Eliza Effect
2006, p. 43). Nonetheless, the TT can be repeated may influence the evaluation of generative music
and averaged to arrive at an inductive claim. systems.
French suggests that a simplified version without Describing a response similar to the Eliza Effect,
comparison between two agents, involving only a Halpern notes how surprise is often confused with
computer agent and an interrogator, is satisfactory: success: AI champions point out that the computer
It is generally agreed that this variation does has done something unexpected, and that because
not change the essence of Turings operational it did so, we can hardly deny it was thinking . . . to
definition of intelligence (French 2000, p. 116). make this claim is simply to invoke the [Turing]
While comparison between two agents might be test without naming it (Halpern 2006, p. 51). This
replaced with a single agent, the use of natural surprise at learning that a computer has performed
language and interaction are always retained. some feat that . . . only humans could perform, as
The machine agent of the TT may deceive the the Eliza Effect, often leads observers to overestima-
interrogator: mathematical questions, for example, tion. Computer-generated art and music are often
may be answered with programmed mistakes used as examples of progress in AI (Kurzweil 1990),
calculated to confuse the interrogator (Turing in part because of this surprise factor.
1950, p. 448), or answers may be programmatically The model provided by ELIZA has inspired a wide
delayed to simulate human calculation times. All range of systems specialized for natural language
that is necessary is that the machine fool a human communication. The Loebner Prize, established in
a suitable percentage of the time. Peter Kugel 1990 by Hugh Loebner (1994), provides a forum
states that Turings proposal does not suggest that for conducting Turing-style tests. The prize will
computers will gain intelligence, but that they award $100,000 to the developers of the first system
will fake intelligence well enough to fool human to pass the Loebner form of the TT. Ironically,
beings (Kugel 2002, p. 565). during the first Loebner Competition in 1991,
An early example of a software system well suited three judges mistook a human for a computer,
to the TT is Joseph Weizenbaums ELIZA system presumably because she knew so much about
(Weizenbaum 1966). In rough analogue to a human her topic that she exceeded their expectations for
therapist, ELIZA attempts to communicate with mere humans (Halpern 2006, p. 57). Significant
an interrogator in natural language. Although not differences between the Loebner prize and the
offering quantitative results such as those suggested TT have been documented (Shieber 1993; French
by Turing, Weizenbaum notes that some subjects 2000, p. 121; Zdenek 2001), and many have criticized
have been very hard to convince that ELIZA . . . is the prize, suggesting that more practical goals should
not human and that this is a striking form of be promoted (Hofstadter 1996, p. 491; Dennett 1998,
Turings test (Weizenbaum 1966, p. 42). p. 28).
Humans may too easily associate humanity with Another formal Turing-style test has been estab-
machines. Some have called this the Eliza Effect, lished as part of a bet. Mitchell Kapor has predicted
the susceptibility of people to read far more under- that by 2029 no machine intelligence will have
standing than is warranted into strings of symbols passed the TT, a prediction challenged by Kurzweil
especially wordsstrung together by computers as part of a $20,000 Long Bets Foundation wager
(Hofstadter 1996, p. 155). Jefferson, nearly fifty (Long Bets Foundation 2002; Kurzweil 2002, 2005,
years earlier, similarly comments on what he saw p. 295). As will be discussed herein, Kurzweils con-
as a new and greater danger threateningthat of viction that the TT will be passed by the end of the
anthropomorphizing the machine (Jefferson 1949, 2020s (Kurzweil 2005, p. 25) has led him to describe
Ariza 51
a wide variety of human-machine competitions as human-like experience or creativity and might be
TTs. applied to musical activities and musical thoughts.
For over fifty years, there has been tremendous These tests are examined here to distinguish them
discussion and criticism of the TT. As Halpern from the TT, discussed previously, and proposed
notes, Turing (1950) is one of the most reprinted, musical TTs, introduced subsequently.
cited, quoted, misquoted, paraphrased, alluded to, The John Henry Test (JHT) is proposed here to
and generally referenced philosophical papers ever label a common approach to testing machine apti-
published (Halpern 2006, p. 42). tude. A JHT is a competition between a human and
A widely cited complaint, called the Chinese a machine in which there is a clearly defined winner
Room Argument (CRA), is presented by John Searle within a narrowly specialized (and not necessarily
(1980). Searles argument imagines a human trans- intelligent) domain. A JHT requires quantitative
lating Chinese simply through direct symbol manip- measures of success: a decisive, independently
ulation and table look-up procedures. Searle argues verifiable conclusion results. A test of aesthetic
that, in this case, the human has knowledge only of outputs cannot be a JHT. The Garry Kasparov versus
syntax, and cannot be seen to have any knowledge of Deep Blue competition can be properly seen as a
semantics. This human, with sufficient procedural JHT: it was a human-versus-machine contest, and
resources, could appear to communicate in Chinese the conclusion of the competition was defined by
yet have no knowledge of Chinese. From this, Searle a clear (though contested) result. While the TT
claims that the ability to provide good answers to might be seen as a type of JHT, the natural-language
human questions does not necessarily imply that interaction of the TT is free-form and not based on
the provider of those answers is thinking; passing a fixed domain or content. The results of the TT
the Test is no proof of active intelligence (Halpern are based on human-evaluated indistinguishability,
2006, p. 52). Although some authors have extended not an independently verifiable distinguishability.
the CRA (Wakefield 2003; Damper 2006), Searles Nonetheless, Bloomfield and Vurdubakis generalize
argument has been frequently challenged (Harnad the TT as something like a JHT, describing such tests
2000, p. 437; Hauser 1993, 1997, 2001; Kurzweil as forms of socially situated performance geared
2005, p. 430; Rapaport 2000). Numerous additional to the enactment and dramatization of a number
arguments questioning the value or applicability of occidental modernitys fundamental moral and
of the TT have been proposed (Block 1981; French conceptual categories (Bloomfield and Vurdubakis
1990; Crockett 1994). 2003, p. 35). Although the TT is not a JHT, both
The question of whether the TT or similar tests have been used to dramatize changes in machine
are sufficient measures of machine thought will aptitude.
be debated for the foreseeable future (French 2000; Steven Harnad places the TT within what he calls
Saygin, Cicekli, and Akman 2000). The answer to the Turing Hierarchy. Harnad first extended the TT
this question is not relevant to the present study. into the Total Turing Test (TTT). The TTT requires
The TT, regardless of its potential for measuring that Turings text-based interface be replaced by
thought or intelligence, provides at least a symbolic full physical and sense-based interaction with a
benchmark of one form of machine aptitude. More robot: The candidate must be able to do, in the
relevant to this study, the TT has inspired other real world of objects and people, everything that real
forms of comparison tests between human and people can do, in a way that is indistinguishable (to
machine output. a person) from the way real people do it (Harnad
1991, p. 44). If applied to a generative music system,
the TTT would presumably require a robot to play a
Alternative Tests musical instrument, or perform some other musical
task, in a physical manner indistinguishable from
Related tests of machine aptitude have been pro- a human. Eighteenth-century musical automata,
posed. Some of these tests attempt to measure such as the flute player of Jacques de Vaucanson

or the harpsichord player of the Jaquet-Droz family independent of comprehensive cognitive models, are
(Riskin 2003) provide early examples of such human- similarly toys, and inadequate for the TT. As will be
shaped music machines. More recent attempts, such developed herein, the t1 level provides a context for
as Haile (Weinberg and Driscoll 2006) and the musical TTs. The term toy is used to emphasize
Waseda Flutist Robot (Solis et al. 2006), explore the distance of these systems from Turings goal of
more sophisticated musical interactions with less thinking, not to suggest that such systems are for
convincing human-like exteriors. Nick Collins children or are otherwise unsophisticated.
imagines another example: a musical TTT of sorts Bringsjord, Bello, and Ferrucci (2001) propose a
executed within a blind orchestra audition. The test based on Lady Lovelaces discussion of the limits
machine would employ a real or virtual instrument, of Charles Babbages Analytic Engine (Lovelace
virtuoso skill, and the important conversational 1842). Turing describes Lady Lovelaces objection in
analysis required to follow instructions and visual his original presentation of the TT (1950, p. 454).
analysis to read complex scores (Collins 2006, The Lovelace Test (LT) requires that the machine
p. 210). No past or present musical automata have be creative, where the term creative is used in a
approached the comprehensive ability necessary to highly restrictive sense. This creativity is evidenced
pass a TTT. when the machine produces an artifact through a
Harnad proposes the TTT as way of adding procedure that cannot be explained by the creator (or
semantics to a syntax-only device: the symbols a creator-peer) of the machine. Specifically, where
must be grounded directly and autonomously in H is the human architect, A is the artificial agent,
causal interactions with the objects, events and and o is the output, A has passed the LT when
states that they are about, and a pure symbol- H (or someone who knows what H knows, and
cruncher does not have the wherewithal for that has Hs resources) cannot explain how A produced
(Harnad 2000, p. 438). Although this can be seen o by appeal to As architecture, knowledge base,
as an attempt to avoid the problems raised by and core functions (Bringsjord, Bello, and Ferrucci
Searles CRA, some have suggested that the TTT is 2001, p. 12). This is explicitly a special epistemic
unnecessary or misguided (Searle 1993). relationship (2001, p. 9). H is permitted time to
Harnad extends the Turing Hierarchy in both provide an explanation, and may investigate and
directions: beyond the TTT (or T3) are greater study the system in any way necessary, including
degrees of human indistinguishability. The T4 analyzing the learned or developed states of a
requires internal microfunctional indistinguisha- dynamic system within A. Knowledge contained
bility (Harnad 2000, p. 439), and the T5 requires within an artificial neural network, for example,
microphysical indistinguishability, real biological might be explained through such an analysis (2001,
molecules, physically identical to our own (Harnad p. 19). Multi-agent systems, or other types of
2000, p. 440). emergent programming paradigms, may produce
More important for this discussion is what Harnad surprising results: these results, however, can be
places below the TT (or T2): level t1. The t in this traced back to the systems architecture, knowledge
context stands for toy models, not Turing. Tests in base, and core functions. It is easy to underestimate
this form employ models for subtotal fragments the difficulty of passing the LT. The LT is designed
of our functional capacity (2000, p. 429). Harnad to suggest that the notion of creativity requires
emphasizes that the true TT is predicated on total autonomy and that there may simply not be a
functional indistinguishability; anything less, in way for a mere information-processing artifact to
the context of a TT, is a toy, and toys are ultimately pass LT (2001, p. 25).
inadequate for the goals of Turing testing (2000, Hofstadter offers a perspective similar to that
p. 430). Harnad states that, as all current mind- of the LT, noting that when programs cease to be
modeling research remains at the t1 level (2000, transparent to their creators, then the approach to
p. 430), we can assume that models of the musical creativity has begun (1979, p. 673). Elsewhere, Hof-
mind, or models of subtotal musical functionality stadter states that true creativity implies autonomy:
Ariza 53
a creative program must make its own decisions, ploy complex stochastic models, neural nets, models
must experiment and explore concepts, and must of artificial life, or any of a wide range of procedures;
gradually converge on a satisfactory solution through such systems may also serendipitously produce
a continual process in which suggestions coming surprising and aesthetically satisfying outputs. Yet
from one part of the system and judgments com- the output of such systems, upon examination of
ing from another part are continually interleaved the systems architecture, can be explained. Cope,
(Hofstadter 1996, p. 411). for example, executes an LT of sorts, asserting
Margaret Boden divides the Lovelace objection that an Experiments in Musical Intelligence (EMI)
into four Lovelace questions: (1) can computational composition from 2003 was not produced creatively
ideas increase understanding of human creativity; (2) nor with creative processes: given enough time,
can computers do things that appear creative; (3) can I could reverse engineer this music and find all of
a computer appear to recognize creativity; and (4) its original sources in Bachs lute suites (2005,
can computers really be creative (1990, p. 6). Boden p. 44). This is not a practical or aesthetic concern
answers the first three questions affirmatively. The of music making: as demonstrated by the history of
fourth question is the LT. Boden notes that, even generative music systems, failing the LT or other
after satisfying all the scientific criteria for creative measures of machine intelligence has not limited
intelligence (whatever those may be), answering the computer-aided creation of music by humans.
this question requires humans to make a moral Tests that are associated with the TT yet fun-
and political decision: this decision amounts to damentally alter its structure are the focus of this
dignifying the computer: allowing it a moral and article. An early example is provided by Hofstadter
intellectual respect comparable with the respect we
in Godel, Escher, Bach (Hofstadter 1979). Hofstadter
feel for fellow human beings (1990, p. 11). This calls this a little Turing test: readers are asked
respect relates to the issue of intentionality and to distinguish selections of human-written natural
authorship, discussed subsequently. language and computer-generated text, presented in
The LT standard of creativity is significantly an intermingled list (Hofstadter 1979, p. 622). Hof-
higher than the computational creativity defined stadter fails to articulate the significant deviations
by Wiggins, which includes all behavior exhibited from Turings model.
by natural and artificial systems, which would be Similarly, Kostas Terzidis suggests that if an
deemed creative if exhibited by humans (2006, p. algorithmically generated paper created by the Dada
451). Wiggins, admitting this definition is intangi- Engine system (Bulhak 1996) was submitted to
ble, does not offer a method or a test to determine a conference and accepted, it may have passed
what is or is not deemed creative by humans. Hu- Turings classic test of computer intelligence
mans may not agree on, or even regularly identify, (Terzidis 2006, p. 22). As should be clear, simply
creative behavior exhibited by any agent, human or mistaking computer output for human output is not
machine. The apprehension of creativity, like the passing a TT. Appropriately, the author of the Dada
identification of successful music, may be a largely Engine credits Hofstadters little Turing test as a
aesthetic problem. Cope recasts the influence of con- source of inspiration (Bulhak 1996).
sensus by defining creativity as the initialization Another alteration of Turings model is demon-
of connections between two or more multifaceted strated by the Completely Automated Public
things, ideas, or phenomena hitherto not otherwise Turing test to tell Computers and Humans Apart,
considered actively connected (2005, p. 11). Here, or CAPTCHA. A CAPTCHA is a now-familiar test
the problem of identifying consensus is shifted, not given by a computer to distinguish if a user is
removed: consensus is required to determine what either a human or a machine. While superficially
is already actively connected. related to the TT in that the CAPTCHA attempts
Independent of whether machines exhibit cre- to distinguish humans and machines, it is not a
ativity, no contemporary generative music system is TT: there is no interaction, the medium is often
likely to pass the LT. Generative systems may em- visual (based on the ability to distinguish distorted

characters or images), and thinking is not (generally) MDtT would permit the interrogator to submit
tested. Moni Naor, in the first proposal for such as many musical directives as desired. An MDtT
tests, fails to articulate these differences, simply could take the form of a real-time musical call and
calling these automated Turing Tests (Naor 1996). response or improvisation between interrogator and
Luis von Ahn, Manuel Blum, Nicholas Hopper, and composer agents. The interrogator must attempt to
John Langford, who coined the term CAPTCHA, distinguish the human from the machine. This test
promulgate a similar misnomer and continue to retains some aspects of interaction, yet it replaces
refer to these tests as forms of Automated Turing natural language with music.
Tests (von Ahn et al. 2003; von Ahn, Blum, and
Langford 2004).
Musical Output Toy Test (MOtT)
Music as the Medium of the Turing Test In this test, two composer-agents provide a score,
synthetic digital audio, or digital audio of a recorded
To test the output of generative music systems, performance to the interrogator. One of the com-
and to avoid the problems of the TTT, the LT, and posers is a machine, the other, a human. The
definitions of creativity, the TT might be altered by provided music may be related in terms of style,
making aesthetic artifacts, music or other creative instrumentation, or raw musical resources, but is
forms, the medium of the test. In the case of music, not a newly composed response to a specific musical
this means replacing the text-based medium, in directive (as in the MDtT). Each agent might provide
whole or in part, with sound symbols or sound multiple musical works. Based only on these works,
forms. Two models of such tests, amalgamated from the interrogator must attempt to distinguish the
diverse sources, are introduced herein. Although human from the machine. This test maintains only
sometimes using the language and format of the the blind comparison of output from two sources;
TT, these tests fundamentally alter the role of the the interaction and discourse permitted in the TT
interrogator, recasting the interrogator as a critic. are removed.
As such, these are not TTs but rather, after Harnad
(2000), toy tests.
Comparison
Musical Directive Toy Test (MDtT) The MDtT and MOtT, while employing the blind
indistinguishability test of the TT, remove the
The interrogator, using a computer interface, sends critical component of natural language discourse.
a musical directive to two composer-agents. One of The MDtT and MOtT are solely dependent on
the composers is a machine, the other, a human. the musical judgments of the interrogator. These
The musical directive could be style- or genre- musical judgments cannot do what natural language
dependent, or it could be abstractsomething like discourse can do to expose the agents capacity
write sad music, write a march, or compose for thinking. Successful music, unlike natural
a musical game. The musical directive might language, does not require a common, external
also include music, such as melodic or rhythmic syntax; successful musical discourse, unlike natural
fragments upon which the composer-agent would language, can employ unique, dynamic, or local
build. The two composers both receive the directive syntaxes.
and create music. After an appropriate amount Furthermore, the interrogator may overwhelm-
of time (a human scale would be necessary), the ingly rely on subjective musical judgments. This
completed music is returned to the interrogator in contrasts with the TT, which, while permitting
a format such as a score, synthetic digital audio, or any form of discourse within the common syntax
digital audio of a recorded performance. A flexible of written natural language, is designed to remove
Ariza 55
subjective visual and aural evaluations through include formal, rational, and aesthetic criteria, the
blind comparison. Although it is uncommon for evaluation of output in a MDtT may be limited to
humans to interact via text with a completely essentially aesthetic musical judgments.
unknown agent, it is quite common for humans Without the unifying context of a musical
to evaluate music without any knowledge of its directive, the MOtT, even more than the MDtT,
author, source, or means of production. The TT may result in unreliable or unrepresentative musical
is a strikingly isolated and focused form of blind, judgments. These judgments may be influenced by
discourse-based evaluation. The MDtT or MOtT, by historical or cultural associations about musical
using music in a conventional form of delivery, do style, expectations of what a machine or a human
not offer a similarly isolated or blind form of evalua- sounds like, or assumptions of what is possible
tion. The interrogator in the TT, although certainly with current technology. These expectations can
influenced by subjective ideas of human thought, vary greatly depending on the listeners surveyed.
attempts to reasonably distinguish between human Of course, similar notions are likely to be held by
and machine based on linguistic constructions and an interrogator in the TT. Yet a responsible TT
content. The critic of the MDtT or MOtT is unre- interrogator has the option, even the obligation, to
strained, free to employ a wide range or mixture of balance such judgments with further discourse by
musical judgments. asking questions or demanding explanations. This
The MDtT is related to the Short Short Story is not possible in the MDtT or MOtT.
Game (S3 G) proposed and demonstrated by Selmer Pearce, in describing a MOtT directed at iden-
Bringsjord and David Ferruci (2000). In this game, tifying stylistic similarity, finds similar influences
a system (BRUTUS, in the case of Bringsjord and on musical judgments, noting that potential inter-
Ferruci) is given a sentence. Based on this sentence, rogators might shift their attention to searching for
the system must compose a short story designed to musical features expected to be generated by a com-
be truly interesting (Bringsjord, Bello, and Ferrucci puter model rather then concentrating on stylistic
2001, p. 13). The goal of this system is to compete features of the composition (2005, p. 185). While
with human authors in a manner similar to the Pearce notes that the Turing test methodology
MDtT. While Bringsjord, Bello, and Ferrucci (2001, fails to demonstrate which cognitive or stylistic
p. 13) acknowledge that BRUTUS produces some hypotheses embodied in the generative system in-
rather interesting stories, they do not claim that fluence the judgments of listeners (Pearce 2005,
the system passes the LT. In their view, the system p. 186), he still claims that such tests offer empir-
is not creative. Its output is merely the result of ical, quantitative results which may be appraised
their input and design: two humans . . . spent intersubjectively (p. 184).
years figuring out how to formalize a generative Both the MDtT and the MOtT are surveys of
capacity sufficient to produce this and other stories musical judgments, not determinants of thought
(Bringsjord, Bello, and Ferrucci 2001, p. 14). or intelligence. Where the TT requires discourse
The MDtT and MOtT, however, significantly between an interrogator and an agent, here discourse
differ from the S3 G: as stated previously, successful is replaced by single-sided criticism. Even in the
music, unlike successful stories, may have no musical discourse of an interactive MDtT, the inter-
common syntax. The abstract nature of music, rogator, at the end of the discourse, must attempt
particularly in the context of creative contemporary to distinguish human and machine with musical
practice, is such that there exists no comparable judgments. While interrogators can generally agree
expectation of grammar or form. Even in the limited on what makes rational and coherent language,
cases where strong expectations of musical grammar and are likely to concur on what kind of language
or form exist, subverting these expectations may offers evidence of thought, critics may not agree on
be musically legitimate or aesthetically valuable. what makes aesthetically successful music, and are
Music, like some poetry, is not natural language. likely to offer inconsistent or contradictory musical
Where the evaluation of the output of S3 G may judgments.

Tests similar to the MOtT have been proposed believability (p. 378). The assumption that success
and executed in other creative mediums. In all in a MOtT foreshadows success in a TT ignores
cases, critical components of the TT, such as the critical differences between music and thought
interactivity or the use of natural language, are expressed in language.
removed. Hofstadters little Turing test, described MDtT and MOtT are often employed as a way of
earlier, removes interactivity but evaluates natural arguing for the success of a generative music system.
language. The evaluation of aesthetic artifacts An automatic connection between positive musical
pushes such tests even further away from Turings judgments and system-design success, however,
model. should be questioned. Specific system outputs may
Related to a MOtT, Ray Kurzweil demonstrates a not be representative of the system, and the critic
test called A (Kind of) Turing Test (Kurzweil 1990, is in a weak position to judge what is and is not
p. 374). Kurzweil describes a narrower concept of representative. A generative system may be badly
a Turing test, a domain-specific Turing test, or a designed, difficult to use, unreliable, or incapable of
Turing test of believability, where the goal is for a variety, yet produce aesthetically successful outputs.
computer to successfully imitate a human within a Just as the aesthetic output of a human may indicate
particular domain of human intelligence. Kurzweil little, if anything, about the author, the aesthetic
tests the output of his Kurzweil Cybernetic Poet output of a system indicates nothing with certainty
system with poems by human authors, and he about the systems design. As Pearce, Meredith,
provides data based on a 28-poem comparison given and Wiggins (2002, p. 129) state, Evaluating the
to 16 human judges. Kurzweil concludes that music produced by the system reveals little about
this domain-specific Turing test has achieved some its utility as a compositional tool.
level of success in tricking human judges in its That many systems have already passed MDtT or
poetry-writing ability (Kurzweil 1990, p. 377). MOtT, and that systems developed hundreds of years
The test proposed by Kurzweil bears only super- ago might fare just as well, further questions what
ficial similarity to the TT. Critical components of success such tests grant. As discussed subsequently,
the TT are stripped away without consideration: all documented musical TTs report a win for the ma-
the test is not interactive, making the interrogator chine. Even simple Markov-based systems, within
a judge, and the medium of natural language is constrained evaluative contexts, have produced
replaced by poetry. Kurzweil employs the associa- output indistinguishable from human output (Hall
tion with the TT to suggest that these tests are part and Smith 1996). An 18th-century generative music
of a trajectory toward completing the TT. Kurzweil system, such as a dice-based music assembly system
states that the era of computer success in a wide (Gardner 1974; Hedges 1978), could have fooled an
range of domain-specific Turing tests is arriving average human a suitable percentage of the time.
(Kurzweil 1990, p. 378), and considers success with The continued execution of such tests may do more
the Turing test of believability the first level in a to investigate the limits of musical judgments than
four-level progression toward widespread acceptance the innovations of generative music systems.
of a computer passing a complete TT (pp. 415416). Finally, there is no necessary connection between
As examples of these narrow versions of the Turing humanism, intelligence, or thinking and aesthetic
test of believability, Kurzweil offers diagnosing success. Thus the TT, designed to discern thinking,
illnesses, composing music, drawing original pic- is not automatically equipped to discern aesthetic
tures, making financial judgments, playing chess success. Although Wiggins and Smaill state that
(p. 415). Most of these narrow TTs, such as diag- music is undeniably an intelligent activity, they
nosing illnesses and making financial judgments, admit that some kinds of musical activities seem
are simply JHTs. As will be discussed subsequently, to be an almost subconscious and involuntary
Kurzweil applies this progression to music, stating response (Wiggins and Smaill 2000, p. 32). Music
that music composed by computer is becoming is not necessarily an intelligent activity, and it
increasingly successful in passing the Turing test of is certainly not a reliable test of intelligence.
Ariza 57
Music may be perceived as an intelligent activity, Computational models, as generative music
even when its genesis is the result of involuntary, systems with significantly different goals than
irrational, or algorithmic activities. Whereas music those of creative tools, require particular evaluation
may be a very human activity, the application of the strategies. Pearce and Wiggins propose the DT,
TT to music ignores that seemingly successful music where the generated music can be evaluated by
can come from sources with little or no thought. asking human subjects to distinguish compositions
Hofstadter, for example, describes his profound taken from the data set from those generated by
sense of bewilderment or alarm (Hofstadter 2001, the system (Pearce and Wiggins 2001, p. 25). If the
p. 39) upon recognizing that convincing music system-composed pieces cannot be distinguished
results from mechanisms thousands if not millions from the human-composed pieces, we can conclude
of times simpler than the intricate biological that the machine compositions are indistinguishable
machinery that gives rise to a human soul (p. 79). from human composed pieces (p. 25).
Soul, mind, thought, intelligence, and creativity are Although Pearce and Wiggins note that the DT
common though weak determinants of aesthetically bears a resemblance (Pearce and Wiggins 2001,
successful music. Assuming the necessity of any of p. 25) to the TT, they are careful to note significant
these determinates when encountering the output differences. The DT is designed not to test machine-
of a generative music system often contributes to a thinking, but to determine the (non-)membership of
musical Eliza Effect. a machine composition in a set of human composed
The problem of musical TTs is part of a larger pieces of music (p. 25). Further, they note that
problem of TT derivatives. Dennett describes how the critical element of interaction is removed: in
a failure to think imaginatively about the TT our test the subjects are simply passive listeners:
has led many to underestimate its severity and to there is no interaction with the machine (p. 25).
confuse it with much less interesting proposals; Pearce and Wiggins argue that both the TT and DT
furthermore, there is a common misapplication of are behavioral tests: the tests are used to decide
the sort of testing exhibited by the Turing test that whether a behaviour may be included in a set . . . the
often leads to drastic overestimation of the powers set of intelligent behaviours in the case of the TT
of actually existing computer systems (Dennett and the set of musical pieces in a particular style
1998, p. 5). Musical TTs are a misapplication of the in the case of the DT (p. 25).
TT that can lead to overestimation. In a specific case, Pearce and Wiggins use a
collection of musical examples to train a genetic-
algorithm-based system. The DT is then used to
Discrimination Tests determine if the output of the system is distinguish-
able from the same training examples (Pearce and
Use of the TT, even if by name and rough analogy Wiggins 2001, p. 25). The authors claim that the fi-
alone, has significant implications. The history of nal machine compositions are evaluated objectively
the TT and its association with projects in AI make within a closed system which provides no place for
it a powerful concept in both the academic and subjective evaluation or aesthetic merit (p. 25). Evi-
popular imaginations. Alternative blind comparison dence is not provided to affirm that human listeners,
tests not associated with the TT make very different whether experts or novices, can objectively evalu-
implicit claims than those branded as TTs. For these ate musical similarity. Furthermore, while such a
reasons, it is important to identify discrimination test attempts to remove aesthetics from musical
tests (DTs) as a type of listener survey that avoids judgments, the authors go on to claim that success
some of the faults of musical TTs. The DT is similar in the DT shows that there are absolutely no per-
to the MOtT. Although not free of the problems ceivable features that differentiate the human and
of evaluating musical judgments, such tests, when machine compositions, and that these features may
properly constrained, permit generalizing these include such elusive notions as aesthetic quality
judgments between selected groups. or perceivable creativity (p. 25). The authors thus

claim that the aesthetic quality of the compositions which were original while others were constructed
is indistinguishable. Further subverting their initial by artificially recombining fragments of existing
claim of an objective DT, the authors note that per- melodies. Dahlig and Schaffrath collected responses
ceived creativity is likely to be closely related to . . . from individuals with diverse genders, national
perceived aesthetic value and that this association origins, and musical backgrounds. Response forms
may have been considered by DT subjects (Pearce recorded, for each melody, both assumed origin
and Wiggins 2001, p. 30). Even within the closed (human or computer) and a rank of personal pref-
system of the DT, aesthetic values are difficult to erence. They observed that, uncorrelated to actual
remove from musical judgments. origin, human attribution was most often given
Another example of a DT is provided by Hall to melodies with common stylistic features, such
and Smith (1996). Here, the authors employ various as parallel rhythmic structures between phrases,
orders of Markov chains to generate melodies over melodies ending on the first scale degree, and
a fixed harmonic background in the style of blues. melodies starting the second phrase on a note other
Transition weightings are derived from computer than the first scale degree (Dahlig and Schaffrath
analysis of bodies of printed music (Hall and Smith 1997, p. 216). Thus, unusual or disagreeable human
1996, p. 1163), although full analysis data is not compositions were attributed to machines.
provided. The authors note that this is a very More recently, Collins (2008) employs spot the
primitive idea as to what constitutes a musical difference listener surveys as a means of evaluating
phrase in blues (p. 1165). the output of Infno, a generative music system spe-
To evaluate their output, the authors subjected cialized for synth-pop and electronic dance music.
198 people to listening tests. Each participant Collins is careful to note that subjective biases are
is given ten pairs of tunes; for each pair, one is hard to remove: any choice of testing procedure
machine-generated and one is human-generated. can be problematised for the subjective and social
Hall and Smith make no reference to Turing, calling domain of music. Furthermore, the tests, as much
this procedure a listening test (Hall and Smith as anything else, revealed much about the individ-
1996, p. 1165). The authors suggest that these tests, ual subjectivities of the participants (Collins 2008).
owing to the binary result of each question, can This DT is employed to gather feedback as guidance
be viewed as a series of Bernoulli trials (p. 1166). for future iterations of the software, with particular
The authors claim that if the model successfully attention given to the variation in output allowed
captures the structure of blues melodies, then by the generative musical artifact. Similarly to the
listeners should have trouble distinguishing between results of Hall and Smith, no statistically signifi-
human composed tunes and computer tunes (p. cant results . . . were found to distinguish perception
1166). The results show that people are unable to of human and computer generation.
reliably distinguish between blues tunes composed DTs often rely on musical judgments of musical
by humans and those composed by the computer style: listeners are asked to distinguish works not
model described (p. 1166). Although Hall and Smith by quality, but by stylistic conformity. While some
designed their test to avoid subjective issues . . . specific attributes of broad musical styles, genre, or
such as quality, the participants were explicitly idioms may appear with regularity, apprehension of
asked to distinguish human from computer (p. style is an interpretive musical judgment informed
1167). As suggested previously, expectations of by the consensus of critics. DTs that depend on
human and machine performance may influence genre classifications are in a weak position to make
musical judgments. definitive claims about the evaluated music or the
In 1992 and 1993, Dahlig and Schaffrath, con- generative music system.
ducting a DT, found that aesthetic preferences led to John Zorn, as well as many other musicians,
incorrect assumptions of human authorship (Dahlig rejects such classifications: they are used to
and Schaffrath 1997, p. 211). The study, called Kom- commodify and commercialize an artists complex
post, employed two-phrase folk melodies, some of personal vision . . .. [T]his terminology is not about
Ariza 59
understanding . . . its about money (Zorn 2000, Proposed and Executed Musical Turing Tests
p. v). Pachet and Cazaly (2000), as part of a study of
music taxonomies, support this view, stating the Despite the problems described herein, musical TTs
most important producers of music taxonomies have been discussed and executed as a means of
are probably music retailers. In a study of large evaluating the success of various generative music
online music taxonomies, these authors illustrate systems. The following examples, as a collection of
that there is little if any consensus on the terms, small case studies, illustrate diverse applications of
structures, or meanings deployed. music to TTs. Although some DTs report the failure
As shown in Aucouturier and Pachet (2003), of machines to produce indistinguishable results
recent efforts to automatically sort music into (Pearce and Wiggins 2001), every executed musical
discrete style classes based on signal or symbolic TT reports machine success.
representations have demonstrated limited success; After summarizing the TT, Alan Marsden sug-
success is often a direct result of extremely narrow gests that a musical version of this test could be
conceptions of genre (p. 92). The study of Soltau proposed (Marsden 2000, p. 22). Marsden describes
et al. (1998), after demonstrating an Explicit Time an MOtT in which there are two rooms, each with a
Modeling with Neural Networks (ETM-NN) system composer and a means of distributing music to the
to classify music within four genres, shows that outside world. One of the composers is a machine.
for some genres humans are just as ill-equipped as The test is passed when observers cannot distinguish
their system to classify music: human confusions which composer is a computer. Marsden offers this
in this experiment are similar to confusions of example to state that, although a computer might
the ETM-NN system. Aucouturier and Pachet pass this test in practice, the computer could never
erroneously call this listener survey a Turing Test pass the test in principle (p. 23). The reason for
(Aucouturier and Pachet 2003, p. 88). Unable to this, Marsden explains, is that originality is an
rigorously define genre, such approaches implement essential characteristic of music . . . computers are
systems to match the consensus interpretations digital automata, and so their behavior is always, in
of critics. Aucouturier and Pachet, supporting this principle at least, predictable and therefore cannot
view, state that music genre is an ill-defined notion, be original (p. 23). Marsden does not question the
that is not founded on any intrinsic property of the validity of the MOtT, suggesting that it might oper-
music, but rather depends on cultural extrinsic ate in both practice and in principle. Further,
habits (p. 84). DTs often ignore the real diversity Marsden suggests that deterministic systems are
and elusive nature of these extrinsic habits. The incapable of originality. Yet humans, while lacking
danger of testing circular, ungrounded projections proof of free will, may be both deterministic and
(p. 83) is great. creative. Marsdens principled MOtT, in this con-
As an alternative, Aucouturier and Pachet form text, is better seen as an LT, as a test of creativity.
genre-like clusters based on extrinsic similarity: The standard of creativity set forth by Bringsjord,
specifically, they perform a co-occurrence analysis Bello, and Ferrucci (2001), however, is more suc-
based on cultural similarity from text documents cessful than Marsdens problematic criterion of
such as radio playlists and track listings of compila- non-deterministic originality.
tion CDs. This approach relies on the documented Curtis Roads (1984), in a section on the Turing
interpretations of critics (e.g., DJs and editors): sim- Test for Musical Intelligence, suggests using an
ilarity is asserted if two works are placed together. MDtT to measure the effectiveness of software-
It is significant that such a measure of similarity based music representations (p. 33). Although
cannot be applied to the newly created output of conceding that there is no universal criterion for
a generative music system. As such works lack determining an optimal music representation, he
any cultural texts or criticism, listener surveys or offers criteria to determine system value such as
measures of intrinsic similarity may be the only the usefulness in practice, the limits of structures
means of comparison. available for representation, and what kinds of

musical tasks are easy to perform with it, and which genuine music produced by professionals were
are difficult. too low to reach a statistically significant level
Roads, referencing a personal correspondence (p. 54). In other words, participants in a listener
on the subject of musical TTs, suggests that a survey could not distinguish genuine from fake
similar test could offer a validation of a computer- music. Soldier goes on to argue, using examples
generated music theory and the representations of music produced by hypothetical zombies and
behind it (Roads 1984, p. 33). He imagines a test the Thai Elephant Orchestra, that a definition of
in which we can ask the machine questions and genuine music cannot require that the musical
have it perform tasks that a human could [perform] agents possess musical knowledge or intent (p. 55).
after hearing a piece, e.g., sing the melody, analyze Soldier illustrates that the presence of aesthetic
the form, trace its influences, compose something intention or musical thought, either in elephants or
in the same style, etc. If the system succeeds, in something like a computer, cannot be identified
then the representation is clearly effective. The in DTs or MOtTs and cannot be used to distinguish
last task, composing something in the same style, music from nonart sounds.
approximates an MDtT. In a letter to the Computer Music Journal,
Gareth Loy, considering the application of con- Erik Belgum describes an unaltered borrowing
nectionism to generative music systems, speculates of the original Turing test to investigate musical
that a musical Turing test might be easier for a intelligence (Belgum et al. 1988, p. 7) where a
computer to pass someday, since music is a very musician, in a type of MDtT, alternatively jams
abstract artistic medium (Loy 1991, p. 370). Pre- with a human and machine. While stating that
sumably imagining an MOtT, Loy notes that passing all the same doubts can be expressed about
such a test would prove no more than that a the musical Turing test as have been expressed
reasonable facsimile of human musical functioning about the original Turing test, Belgum does not
could be constructed using computational means question the legitimacy of his modified test, instead
(p. 370). This is a fair assessment. The question for suggesting that this MDtT seems to keep the basic
Loy is whether human musical cognition can be spirit of the test. As shown previously, the TT
represented computationally at all (p. 370). Loy, employs natural language discourse to represent the
like Marsden, does not directly question the validity presence of thought; its spirit is not preserved in
of the MOtT, but instead questions if musical cogni- either the MOtT or the MDtT. Written responses
tion can be represented in a computational machine. to this letter, provided by Joel Chadabe, Emile
Loy recognizes that an MOtT can be passed by a Tobenfeld, and Laurie Spiegel, do not question the
system without musical thought or cognition. fundamental legitimacy of the proposed MDtT, but
David Soldier, defining naughtmusik as the they instead offer alternative definitions of musical
set of nonart sounds and genuine music as intelligence or different types of tests. Laurie Spiegel,
excluding naughtmusik and including Artists summarizing the inadequacy of aesthetic tests in
with Intent, proposes a modified MOtT (Soldier general, asks, what purpose would be satisfied by
2002, p. 53). Here, naughtmusik might represent creating qualitative criteria or quantitative metrics
machine output. Soldier describes a test where for artificial musical intelligence, given the lack
one could play recordings of genuine music and of successful similar criteria for natural musical
naughtmusik . . . if the human judges detect the intelligence, musicality, or even music per se?
fakery [of naughtmusik], the strong definition of (Belgum et al. 1988, p. 9). The lack of these criteria
genuine music can be confidently adopted. This and metrics is an important constraint on many
definition of genuine music requires that humans musical judgments.
can distinguish nonart sounds from sounds Some have moved beyond thought experiments
produced by artists with intent. Testing music to actual tests. In 2002, Sam Verhaert, as part
made by untrained children versus professional of a radio broadcast, organized a Music Turing
musicians, Soldier reports that results favoring Test at the Sony Computer Science Laboratory in
Ariza 61
Paris (2002). The goal of this MOtT was to answer machine-composed music in the style of Mozart
the question, [C]an we make the distinction with signatures, and machine-composed music in
between music played by a human and music the style of Mozart without signatures. Although
played by a machine? Two interrogators (Henkjan noting that this study falls outside the framework
Honing and Koen Schouten) were presented with of scientifically valid research (1996, p. 82), he
musical fragments, some performed by Albert van suggests that his results show that the use of
Veenendaal, others performed by the Continuator signatures contributes to style recognition.
software system developed by Francois Pachet Cope (1996) removes signatures as a variable in
(2002). Although test data is not provided, the these tests; in this case, he compares the music
author claims the result was largely in favor of of Mozart with machine-composed music in the
the Continuator. Receiving more positive musical style of Mozart. As part of a section of the 1992
judgments than the human, the software system is Association for the Advancement of Artificial
deemed successful. Intelligence (AAAI) conference entitled Artificial
Hiraga et al. (2002) have proposed and executed Intelligence and the Arts, a larger MOtT was
a series of music performance rendering tests in conducted. Cope reports that nearly 2000 visitors,
which human performances of a fixed work are over three days, took part in a test that pitted
compared with computer-rendered performances of machine-composed examples with signatures in
the same work. Executed as part of project called the style of Mozart against actual Mozart (p. 82).
RENCON, these tests date from 2002 and have While again Cope states the test has absolutely
continued at various workshops and conferences no scientific value, the results are summarized by
since (Hiraga et al. 2004, p. 121). Whereas the Cope as indicating that the audience was unable
authors at times properly describe these tests as to distinguish between machine-composed Mozart
listening comparisons, they go on to describe these and the real thing, that the machine-composed
as a Turing test for musical performance and music has some stylistic validity, and for the
competitions given in Turing Test style (p. 123). layperson at least, real Mozart is hard to distinguish
The authors claim that this test determines by from artificial Mozart (p. 82). While neither Cope
listening whether system-rendered performance is nor the conference publications call the 1992 AAAI
distinguishable from human performance (p. 123); MOtT a TT, Cope (2000, p. 65) claims that Alice,
in addition to selecting their preferred performance, a system closely related to EMI, may succeed in
participants are asked to rate performances by occasionally passing the spirit if not the letter of
humanlikeness (p. 123). As machine-rendered Turings test (2000, p. 65). As argued previously,
performances have been selected over human- the spirit of the TT is not maintained in MOtTs.
rendered performances, the authors state that more Cope (2001) titles a similar test The Game, pre-
than a few people agree that some performance senting the reader notated and recorded musical ex-
rendering systems generate music that rivals human amples of computer-generated and human-composed
performances (p. 123). While listening comparisons music. Here, he refers to computer-generated works
or DTs may offer a valuable method of evaluating as virtual music. Cope notes that mixing weak
performance rendering, the association with Turing human-composed music with strong virtual mu-
is incorrect and unnecessary. sic would simply fool listeners. His objective is
Music generated with David Copes EMI system to determine whether listeners can truly tell the
(1991, 1992, 1996, 2001) has been the subject of many difference between the two types of music (p.
presentations of MOtTs. Only in passing does Cope 20). If players can only distinguish human from
associate these tests with Turing. In Cope (1996), he machine about 50% of the time, it is assumed that
describes conducting comparison tests to gauge the the music examples are indistinguishable. Cope
significance of compositional style signatures in relates The Game to the 1992 AAAI MOtT,
evaluating style membership. In his first test with stating that results from previous tests with large
students, he presents phrases of Mozart, phrases of groups of listeners, such as 5000 [sic] in one test in

1992 . . . typically average between 40 and 60 percent Electronic Telegraph and the BBC1 Tomorrows
correct responses (Cope 2001, p. 21). World program sponsored a series of massive TTs,
Others have directly called these same EMI similar to the Loebner competition but employing
demonstrations TTs. At a later AAAI Conference, thousands of interrogators. These tests were con-
in 1998, the program chairs describe a musical ducted as part of Megalab, a series of experiments in
Turing test between the works of Bach and com- Britain dating from 1993 and designed to provide
positions by an AI program conducted by Cope science week with a media profile by employing
(Mostow and Rich 1998). Similarly, Bernard Green- experiments that could inspire large numbers of
berg characterizes fooling the audience with people to take part, were easy to carry out, and
EMI as a vernacular paraphrase of the Turing would help to push back the frontiers of science
test, the classic admissions exam for all putative (Anonymous 1998). The 1998 and 1999 Megalab
AI endeavors (Greenberg 2001, p. 222). Kurzweil featured Loebner competition style TTs. During the
describes a musical Turing test administered by 2000 successor, Live Lab (Anonymous 2000), the
Steve Larson where audience members, sampling classic Turing test . . . had been supplemented by
three compositions, attempt to distinguish EMI- two new tests which required participants to dis-
generated music in the style of Bach from real Bach tinguish between examples of human and computer
(Kurzweil 1999, p. 160). Lastly, Jonathan Bedworth generated painting and music (Bloomfield and
and James Norwood, while noting that artists are Vurdubakis 2003, p. 31). The computer-generated art
not immune to technologys implicit, and possi- was provided by Harold Cohens generative illustra-
bly inappropriate, categorisations and metaphors, tion system AARON; the computer-generated music
describe EMIs output as one of many examples was provided by EMI (Bloomfield and Vurdubakis
that show continued adherence to Turing Test-like 2003). In the case of the computer-generated art,
evaluations of machine capability (Bedworth and Bloomfield and Vurdubakis state that the resulting
Norwood 1999, p. 193). Bedworth and Norwoods success was easily explained by the fact that it was
incorrect assumption, that evaluation of EMI output the only representational picture among abstract
relates to the TT, is exactly the kind of inap- human ones (p. 35); this explanation suggests that
propriate categorization and metaphor they warn expectations of machine aptitude influence the
against. results of such tests. Bloomfield and Vurdubakis
Although anecdotal, Patricio da Silva, in a study report that the computer-generated music deci-
of Copes music, describes another MOtT. An sively won in its category with 42% of the vote
appendix, consisting of emails from a music theory (p. 33).
discussion list authored in 2002, discusses the Musical TTs, in various other arrangements, have
possibility of a musical TT. Contributors suggest been and will likely continue to be suggested in
that the output of Copes EMI system could pass academic and popular literature (e.g. Fowler 1994,
a musical Turing test. A contributor describes p. 9; Trivino-Rodriguez
and Morales-Bueno 2001,
a presentation in which a MOtT was informally p. 73; Wassermann et al. 2003, p. 89; Midgette
conducted between performances of a Chopin 2005; Lamb 2006). As Dennett states, cheapened
prelude and an EMI-composed prelude in the style of versions of the Turing test are everywhere in the air
Chopin (Silva 2003, p. 66). The audience, reportedly, (Dennett 1998, p. 20). It is clear that the execution
was unable to identify the original Chopin. A of such tests, under the heading of Turing, are
contributor to this list notes that while the medium bound by many faults. Furthermore, such tests
of the test (sheet music) deviates from that of the are not serious investigations into the design of
original TT, the MOtT preserves the spirit of the the generative system, the aesthetic value of the
Turing test (Silva 2003, p. 68). This statement, like output, or the presence of intelligence, musical or
that of Belgum et al. (1988) and Cope (2000, p. 65), otherwise. In the tradition of the JHT, and as shown
overstates the superficial similarity of the MOtT by Live Lab, musical TTs may function best as
and the TT. entertainment.
Ariza 63
Machine Authorship and the Problem of Aesthetic composers database in specific and known ways,
Intention acting only as a specialized calculator and assistant
(Cope 2000, p. 252). Describing the EMI system,
In the context of musical TTs, an objection can be Cope notes that the hand of the composer is not
raised that the computer-generated material is not absent from the finished product of computer-
generated in whole by the computer. The computer assisted composition (Cope 1991, p. 236), and
can be seen not as an autonomous author but as that all works produced with EMI are attributed
a system that executes or reconfigures knowledge to David Cope with Experiments in Musical
imparted to it by its programmers. This is part of Intelligence (Cope 2001, p. 340). The MOtTs of
the problem of machine creativity suggested by the EMI output described herein, therefore, tested not
LT (Bringsjord, Bello, and Ferrucci 2001). EMI, but Cope with EMI.
Fundamental questions of authorship are im- A case might be imagined where a machine is
portant when comparing the aesthetic output of somehow completely responsible for an aesthetic ar-
machines and humans. For the MOtT or MDtT tifact. Such machine authorship would presumably
to actually test the machine system, the musical require what Cohen describes as autonomy. De-
outputs provided by the agents must be created by scribing his generative illustration system AARON,
the agents themselves. Just as the human agent pre- Cohen (2002) suggests that, with such autonomy, the
sumably cannot plagiarize an output, the machine system, not him, would be the author: if AARON
agent cannot simply return a stored work previously ever does achieve the kind of autonomy I want it
created by a human. Such a test would have little to have, it will go on to eternity producing original
value. Although Turing specifically condoned decep- AARONs, not original Harold Cohens (2002, p. 64).
tion in the TT, such deception is problematic in the Cohens views are similar to those of Hofstadter,
context of testing aesthetic artifacts. This problem described previously. The idea of machines with
further removes the MOtT and the MDtT from the autonomy, intentions, or initiative is sometimes
TT. associated with more exotic things such as Putnam
Pure machine authorship is impossible to imagine Gold Machines (Kugel 2002, p. 565) or Turings Or-
without an autonomy sufficient to pass the LT. acles (Turing 1939). Bringsjord, Bello, and Ferrucci,
In a manner similar to that of Wolfgang von however, argue that Oracles cannot pass the LT, and
Kempelens famous chess-playing machine (often thus do not offer autonomy (Bringsjord, Bello, and
called The Turk), a purported automaton that Ferrucci 2001, pp. 2024). As neither AARON nor
convinced countless observers in the eighteenth and any known contemporary system has reached such
nineteenth century of the possibility of machine a level of autonomy, generative works will likely
autonomy (Sussman 1999; Standage 2002), there continue to be seen as human works. If the role
may always be a human, or at least significant of the system exceeds that of a conventional tool,
human knowledge, hiding inside the creative these works might be seen more as human-machine
machine. Halpern supports this view, noting that collaborations; collaboration, as used here, does not
machine intelligence is really in the past: when a require machine autonomy. Contemporary MDtTs
machine does something intelligent, it is because and MOtTs are not machine-versus-human compe-
some extraordinarily brilliant person or persons, titions: they are competitions among humans using
sometime in the past, found a way to preserve some different tools and collaborators.
fragment of intelligent action in the form of an Machines, although presently lacking autonomy
artifact (Halpern 2006, p. 54). Such a perspective is and intention, can produce output that appears
applicable to many less intelligent but musically intentional. As described previously, Soldier (2002)
useful generative music systems. argues that aesthetic intention (Carroll 1999, p. 163)
Cope, for example, states that when using his is not a criterion of creating, and thus authoring,
Alice system he sees no reason to even assign art. Soldier demonstrates that artists without intent
partial credit to the program: Alice processes a can create works that sound intentional; similarly,

others have noted the limits of evaluating a work in (Cross 1993, p. 167). Perhaps something like a mu-
the context of the authors intentions. Wimsatt and sical Eliza Effect enables humans to treat aesthetic
Beardsley (1946) call this the intentional fallacy: artifacts from machines as intentional objects; per-
the design or intention of the author is neither haps, in contrast to the CRA, musical syntax alone
available nor desirable as a standard for judging the can in some cases suffice for musical semantics.
success of a work (Wimsatt 1954, p. 3). Similarly,
Robert Zimmerman dismisses aesthetic intention as
a criterion of art due to the necessity of psychological Language Is the Medium of the Turing Test
introspection: the decision whether an entity is an
aesthetic object would depend upon the results of The medium of the TT cannot be altered. If re-
a lie-detector test given to creator and audience searchers desire evidence that a generative music
(Zimmerman 1966, p. 182). system is intelligent or capable of thought (mu-
Some have, nonetheless, assumed that intention sical or otherwise), the system must simply pass
distinguishes human art-making. Ian Cross, with the TT. The interrogator may ask human and ma-
a model loosely related to the CRA, imagines two chine agents, through a text-based medium, about
computers in a box creating and apprehending original pieces of music created by each agent.
music. Cross states that what is happening within Problems of originality, creativity, intention, and
the box has no element of human participation, authorship are irrelevant, and deception is again
intervention or experience (Cross 1993, p. 167) permitted. Through conversation, through natural
and thus is not music. Cross ignores that humans languageif the interrogator cannot distinguish the
participated in this hypothetical event by creating machinethe machine has passed. A true musical
the box and the systems within the box. This relates TT is simply a TT.
to an objection to Searles CRA: the performance In the context of a comparison test, natural
of the man who understands no Chinese is only as language discourse is superior to an abstract aes-
good as those who understood Chinese well enough thetic medium. An interrogator can ask questions
to create the lexicon in the first place, and thus about a piece of prose: a statement can be rephrased,
create the illusion of comprehension in the Chinese explained, or put into other words. Even attempts at
Room (Halpern 2006, p. 54). deception can be disputed or argued. An interroga-
Based on his model, Cross claims that intention tor cannot ask the same questions of an aesthetic
and consensus are necessary preconditions for the work; although an artist, if available, might offer
human experience of music (Cross 1993, p. 170). context or explanation, such explanation is not
For music to happen, there must be intentintent required for aesthetic appreciation. The piece exists
to produce music, or (less obviously) intent to hear independently, and generally cannot be rephrased or
music (p. 167). The requirement of intentional reformulated to aid understanding. The interrogator
hearing is discussed subsequently; the requirement can only be a critic of such a medium.
of intentional production, however, can only be Turing demonstrates using language, within a TT,
argued by employing a highly constrained definition to discuss other mediums. Turing does not limit the
of music, by committing the intentional fallacy, or topics of conversation in his test, stating that any-
by ignoring demonstrations such as those provided thing can be claimed, but practical demonstration
by Soldier (2002). cannot be demanded (Turing 1950, p. 435). Musical
Roger Scruton states that music is the inten- TTs clearly violate this prohibition of practical
tional object of an experience that only rational demonstration. Turing, however, provides examples
beings can have, and only through the exercise of of an interrogator asking for practical demonstra-
imagination (Scruton 1999, p. 96). It is important to tions. As a sample question and response, Turing
note that the intention described here is in the expe- offers the following: I have K at my K1, and no
rience of music, in imaginative hearing. This relates other pieces. You have only K at K6 and R at R1.
to what Cross describes as the intent to hear music It is your move. What do you play? (Turing 1950,
Ariza 65
p. 435). While the computer answers the end-game (the technology) from its aesthetic artifacts. This
with a mate, the computer could just as well discuss distinction suggests that aesthetic success or failure
the history of the game, or state that it does not play is dependent on humans and independent of any
chess. When asked to write a sonnet, the same agent technology. Until machines achieve autonomy, it
declines: Count me out on this one. I never could is likely that humans will continue to form, shape,
write poetry (p. 434). and manipulate machine output to satisfy their
The computer agent, asked to play chess, could own aesthetic demands, taking personal, human
alternatively be mischievous and play a non- responsibility for machine output.
winning move. As Halpern states, the Turing Simon Holland, after Pena and Parshall (1987)
end-game example introduces an assumption that and Cook (1994), describes open-ended domains
cannot automatically be allowed: namely, that the such as music composition as problem seeking
computer plays to win (Halpern 2006, p. 46). The rather than problem solving: there are in gen-
MOtT and MDtT imply that aesthetic success, a eral no clear goals, no criteria for testing correct
win, indicates system design success. However, answers, and no comprehensive set of well-defined
as made clear herein, Turings model does not methods (Holland 2000, p. 240). If used as creative
require the computer to play to win: self sabotage tools, generative music systems, as systems within
or simple mischief is acceptable if explained in problem-seeking domains, likewise have no criteria
rational discourse. MOtTs and MDtTs, if related for testing correct answers. In the development and
to the TT, should allow for new aesthetic concepts presentation of these systems, comparative analysis
and non-winning aesthetic moves: as Boden states, of system and interface design, or studies of user
even if a computers notion of art is irrelevant interaction and experiences, offer greater potential
to us humans, these notions might broaden our for the development of practical tools.
aesthetic horizons (Boden 1996). Computational models with clearly articulated
goals may continue to pass DTs; properly con-
strained, such tests may show that musical judg-
Conclusion ments cannot discern sets of musical artifacts
produced by humans and machines. While this may
As Dennett states of restricted text-based TTs, we demonstrate technological innovation in the model-
should resist all limitations and waterings-down of ing of historical musical artifacts, such technologies
the Turing test . . . they make the game too easy . . . may also offer aesthetic innovation if redeployed as
they lead us into the risk of overestimating the creative systems. In this use-case, the clear goals
actual comprehension of the system being tested and testing criteria evaporate. Within the practical
(Dennett 1998, p. 11). The MDtT and MOtT are use-case of creative music-making, any system
too easy. Music, as a medium remote from natural becomes a problem seeking domain.
language, is a poor vessel for Turings Imitation
Game. Generative music systems gain nothing
from associating their output with the TT; worse,
overestimation may devalue the real creativity in Acknowledgments
the design and interface of these systems.
Iannis Xenakis, considering the history of I am grateful for the commentary this article has
computer-aided algorithmic composition systems, received over the many stages of its development.
asked: What is the musical quality of these at- Thanks to Elizabeth Hoffman and Paul Berg for
tempts? He answers bluntly: The results from discussing some of the initial ideas presented here.
the point of view of aesthetics are meager . . . hope Thanks to Nick Collins for research assistance and
of an extraordinary aesthetic success based on comments on important themes. Thanks to the
extraordinary technology is a cruel deceit (Xenakis anonymous reviewers and the editors for valuable
1985, p. 175). Xenakis here distinguishes the system suggestions.

References Collins, N. 2008. Infno: Generating Synth Pop and
Electronic Dance Music On Demand. Proceedings of
Anonymous. 1998. A Mega Success, Thanks to You. the International Computer Music Conference. San
Telegraph 26 March. Francisco, California: International Computer Music
Anonymous. 2000. Are You Smarter Than a Robot. Association. Available online at www.informatics
Telegraph 9 March. .sussex.ac.uk/users/nc81/research/infno.pdf.
Ariza, C. 2005. Navigating the Landscape of Computer- Cook, J. 1994. Agent Reflection in an Intelligent Learn-
Aided Algorithmic Composition Systems: A Definition, ing Environment Architecture for Composition. In M.
Seven Descriptors, and a Lexicon of Systems and Re- Smith, A. Smaill and G. A. Wiggins, eds. Music Edu-
search. Proceedings of the International Computer cation: An Artificial Intelligence Approach. London:
Music Conference. San Francisco, California: In- Springer Verlag, pp. 323.
ternational Computer Music Association, pp. 765 Cope, D. 1991. Computers and Musical Style. Oxford:
772. Oxford University Press.
Aucouturier, J., and F. Pachet. 2003. Representing Cope, D. 1992. Computer Modeling of Musical Intelli-
Musical Genre: A State of the Art. Journal of New gence in EMI. Computer Music Journal 16(2):6983.
Music Research 32(1):8393. Cope, D. 1996. Experiments in Musical Intelligence.
Bedworth, J., and J. Norwood. 1999. The Turing Test is Madison, Wisconsin: A-R Editions.
Dead. Proceedings of the 3rd Conference on Creativity Cope, D. 2000. The Algorithmic Composer. Madison,
& Cognition. New York: Association for Computing Wisonsin: A-R Editions.
Machinery, pp. 193194. Cope, D. 2001. Virtual Music: Computer Synthesis of
Belgum, E., et al. 1988. A Turing Test for Musical Musical Style. Cambridge, Massachusetts: MIT Press.
Intelligence? Computer Music Journal 12(4):79. Cope, D. 2005. Computer Models of Musical Creativity.
Block, N. 1981. Psychologism and Behaviorism. Cambridge, Massachusetts: MIT Press.
Philosphical Review 90(1):543. Copeland, B. J. 2000. The Turing Test. Minds and
Bloomfield, B. P., and T. Vurdubakis. 2003. Imitation Machines 10:519539.
Games: Turing, Menard, Van Meegeren. Ethics and Crockett, L. J. 1994. The Turing Test and the Frame
Information Technology 5(1):2738. Problem: AIs Mistaken Understanding of Intelligence.
Boden, M. 1990. The Creative Mind: Myths and Mecha- Bristol, UK: Intellect.
nisms. New York: Routledge. Cross, I. 1993. The Chinese Music Box. Interface
Boden, M. A. 1996. Artificial Genius. Discover 17:104 22:165172.
107. Dahlig, E., and H. Schaffrath. 1997. Judgments of
Bringsjord, S., P. Bello, and D. Ferrucci. 2001. Creativity, Human and Machine Authorship in Real and Artificial
the Turing Test, and the (Better) Lovelace Test. Minds Folksongs. Computing in Musicology 11:211219.
and Machines 11:327. Damper, R. I. 2006. The Logic of Searles Chinese Room
Bringsjord, S., and D. Ferrucci. 2000. Artificial Intelligence Argument. Minds and Machines 16(2):163183.
and Literary Creativity: Inside the Mind of BRUTUS, a Dennett, D. C. 1998. Brainchildren: Essays on Designing
Storytelling Machine. Mahwah, New Jersey: Lawrence Minds. Cambridge, Massachusetts: MIT Press.
Erlbaum. Fowler, J. W. 1994. Algorithmic Composition. Computer
Bulhak, A. C. 1996. The Dada Engine. Available online Music Journal 18(3):89.
at dev.null.org/dadaengine/manual-1.0/dada toc.html. French, R. 1990. Subcognition and the Limits of the
Carroll, N. 1999. Philosophy of Art: A Contemporary Turing Test. Mind 99(393):5365.
Introduction. New York: Routledge. French, R. M. 2000. The Turing Test: The First 50 Years.
Cavell, S. 2002. Must We Mean What We Say?: A Book of Trends in Cognitive Sciences 4(3):115122.
Essays. Cambridge, UK: Cambridge University Press. Gardner, M. 1974. Mathematical Games: The Arts as
Cohen, H. 2002. A Self-Defining Game for One Player: Combinatorial Mathematics, or, How to Compose Like
On the Nature of Creativity and the Possibility of Mozart with Dice. Scientific American 231(6):132
Creative Computer Programs. Leonardo 35(1):5964. 136.
Collins, N. 2006. Towards Autonomous Agents for Genova, J. 1994. Turings Sexual Guessing Game. Social
Live Computer Music: Realtime Machine Listening Epistemology 8:313326.
and Interactive Music Systems. PhD dissertation, Greenberg, B. 2001. Experiments in Musical Intelligence
University of Cambridge. and Bach. In D. Cope, ed. Virtual Music: Computer
Ariza 67
Synthesis of Musical Style. Cambridge, Massachusetts: Holland, S. 2000. Artificial Intelligence in Music Educa-
MIT Press, pp. 221236. tion: A Critical Review. In E. R. Miranda, ed. Readings
Hall, M., and L. Smith. 1996. A Computer Model of Blues in Music and Artificial Intelligence. Amsterdam:
Music and Its Evaluation. Journal of the Acoustical Harwood Academic Publishers, pp. 239274.
Society of America 100(2):11631167. Hsu, F. 2002. Behind Deep Blue: Building the Computer
Halpern, M. 2006. The Trouble with the Turing Test. that Defeated the World Chess Champion. Princeton,
The New Atlantis 11:4263. NJ: Princeton University Press.
Harnad, S. 1991. Other Bodies, Other Minds: A Machine Jefferson, G. 1949. The Mind of Mechanical Man.
Incarnation of an Old Philosophical Problem. Minds British Medical Journal 1:11051110.
and Machines 1:4354. Kant, I. 1790. Kritik der Urteilskraft [Critique of Judgment.
Harnad, S. 2000. Minds, Machines and Turing. Journal Berlin: Lagarde and Friederich.
of Logic, Language and Information 9(4):425445. Kugel, P. 2002. Computers Cant Be Intelligent (. . . and
Hauser, L. 1993. Searles Chinese Box: The Chinese Turing Said So). Minds and Machines 12(4):563
Room Argument and Artificial Intelligence. PhD 579.
dissertation, Michigan State University. Kurzweil, R. 1990. The Age of Intelligent Machines.
Hauser, L. 1997. Searles Chinese Box: Debunking the Cambridge, Massachusetts: MIT Press.
Chinese Room Argument. Minds and Machines Kurzweil, R. 1999. The Age of Spiritual Machines. New
7:199226. York: Penguin Books.
Hauser, L. 2001. Look Whos Moving the Goal Posts Kurzweil, R. 2002. A Wager on the Turing Test: Why
Now. Minds and Machines 11:4151. I Think I Will Win. Available online at www.
Havass, M. 1964. A Simulation of Music Composition. kurzweilai.net/articles/art0374.html?printable=1.
Synthetically Composed Folkmusic. In F. Kiefer, Kurzweil, R. 2005. The Singularity is Near. New York:
ed. Computational Linguistics. Budapest: Computing Penguin Books.
Centre of the Hungarian Academy of Sciences 3:107 Lamb, G. M. 2006. Robo-Music Gives Musicians the
128. Jitters. The Christian Science Monitor, December 14.
Hedges, S. A. 1978. Dice Music in the Eighteenth Loebner, H. 1994. In Response. Communications of the
Century. Music and Letters 59(2):180187. ACM 37(6):7982.
Hiller, L. 1970. Music Composed with Computers: An Long Bets Foundation. 2002. By 2029 No Computeror
Historical Survey. In H. B. Lincoln, ed. The Computer Machine IntelligenceWill Have Passed the Turing
and Music. Ithaca, New York: Cornell University Press, Test. Available online at www.longbets.org/1.
pp. 4296. Lovelace, A. 1842. Translators Notes to an Article on
Hiraga, R., et al. 2002. Rencon: Toward a New Evalu- Babbages Analytical Engine. In R. Taylor, ed. Scientific
ation Method for Performance Rendering Systems. Memoirs: Selected from the Transactions of Foreign
Proceedings of the International Computer Music Academies of Science and Learned Societies, and from
Conference. San Francisco, California: International Foreign Journals. London: printed by Richard and John
Computer Music Association, pp. 357360. E. Taylor, 3:691731.
Hiraga, R., et al. 2004. Rencon 2004: Turing Test Loy, D. G. 1991. Connectionism and Musiconomy.
for Musical Expression. Proceedings of the 2004 Proceedings of the International Computer Mu-
Conference on New Interface for Musical Expression. sic Conference. San Francisco, California: Inter-
New York: Assocation for Computing Machinery, pp. national Computer Music Association, pp. 364
120123. 374.

Hofstadter, D. R. 1979. Godel, Escher, Bach: An Eternal Marsden, A. 2000. Music, Intelligence and Artificiality.
Golden Braid. New York: Vintage. In E. R. Miranda, ed. Readings in Music and Arti-
Hofstadter, D. R. 1996. Fluid Concepts and Creative ficial Intelligence. Amsterdam: Harwood Academic
Analogies: Computer Models of the Fundamental Publishers, pp. 1528.
Mechanisms of Thought. New York: Basic Books. Midgette, A. 2005. Play It Again, Vladimir (via Com-
Hofstadter, D. R. 2001. Staring Emmy Straight in the puter). New York Times, 5 June.
Eyeand Doing My Best Not to Flinch. In D. Cope, Moor, J. H. 1976. An Analysis of the Turing Test.
ed. Virtual Music: Computer Synthesis of Musical Philosophical Studies 30:249257.
Style. Cambridge, Massachusetts: MIT Press, pp. 33 Moor, J. H. 2001. The Status and Future of the Turing
82. Test. Minds and Machines 11:7793.

Mostow, J., and C. Rich. 1998. The Fifteenth National Scruton, R. 1999. The Aesthetics of Music. Oxford: Oxford
Conference on Artificial Intelligence. Available online University Press.
at www.aaai.org/Conferences/AAAI/aaai98.php. Searle, J. R. 1980. Minds, Brains, and Programs.
Naor, M. 1996. Verification of a Human in the Loop Behavioral and Brain Sciences 3(3):417457.
or Identification via the Turing Test. Unpublished Searle, J. R. 1993. The Failures of Computationalism.
manuscript. Think 2:1278.
Nelson, S. R. 2006. Steel Drivin Man: John Henry: The Shieber, S. M. 1993. Lessons from a Restricted Turing
Untold Story of an American Legend. Oxford: Oxford Test. Communications of the ACM 37(6):7078.
University Press. Silva, P. 2003. David Cope and Experiments in Musical
Pachet, F. 2002. The Continuator: Musical Interaction Intelligence. Los Angeles, California: Spectrum Press.
with Style. Proceedings of the International Com- Soldier, D. 2002. Eine Kleine Naughtmusik: How Ne-
puter Music Conference. San Francisco, California: farious Nonartists Cleverly Imitate Music. Leonardo
International Computer Music Association, pp. 211 Music Journal 12:5358.
218. Solis, J., et al. 2006. The Waseda Flutist Robot WF-4RII
Pachet, F., and D. Cazaly. 2000. A Taxonomy of Mu- in Comparison with a Professional Flutist. Computer
sical Genres. Actes du congres ` RIAO [Recherche Music Journal 30(4):1227.
dInformation Assistee par Ordinateur] 2000, Content- Soltau, H., et al. 1998. Recognition of Music Types.
Based Multimedia Information Access. Paris: Centre Proceedings of the International Conference on Acous-
des Hautes Etudes Internationales dInformatique tics, Speech and Signal Processing (ICASSP) 2:1137
Documentaire, pp. 12381246. 1140.
Papadopoulos, G., and G. Wiggins. 1999. AI Methods for Sony Computer Science Laboratory. 2002. A Musical
Algorithmic Composition: A Survey, a Critical View Turing Test. Sony Computer Science Laboratory. Avail-
and Future Prospects. In Proceedings of the AISB able online at www.csl.sony.fr/pachet/Continuator/
99 Symposium on Musical Creativity. Brighton, UK: VPRO/VPRO.htm.
SSAISB, pp. 110117. Standage, T. 2002. The Turk. New York: Walker and
Pearce, M. T. 2005. The Construction and Evaluation Company.
of Statistical Models of Melodic Structure in Music Sterrett, S. G. 2000. Turings Two Tests for Intelligence.
Perception and Composition. PhD dissertation, Minds and Machines 10:541559.
Department of Computing, City Univesity London. Sussman, M. 1999. Performing the Intelligent Machine:
Pearce, M., D. Meredith, and G. Wiggins. 2002. Mo- Deception and Enchantment in the Life of the Automa-
tivations and Methodologies for Automation of the ton Chess Player. The Drama Review 43(3):8196.
Compositional Process. Musicae Scientiae 6(2):119 Terzidis, K. 2006. Algorithmic Architecture. Oxford:
147. Architectural Press.
Pearce, M., and G. Wiggins. 2001. Towards a Frame- Trivino-Rodriguez, J. L., and R. Morales-Bueno. 2001.
work for the Evaluation of Machine Compositions. Using Multiattribute Prediction Suffix Graphs to
Proceedings of the AISB01 Symposium on Artificial Predict and Generate Music. Computer Music Journal
Intelligence and Creativity in the Arts and Sciences. 25(3):6279.
Brighton, UK: SSAISB, pp. 2232. Turing, A. M. 1939. Systems of Logic Defined by
Pena,
W. M., and S. A. Parshall. 1987. Problem Seek- Ordinals. Procedings of the London Mathematical
ing: An Architectural Programming Primer, 3rd ed. Society 45:161228.
Washington, DC: AIA Press. Turing, A. M. 1950. Computing Machinery and Intelli-
Rapaport, W. J. 2000. How to Pass a Turing Test. Journal gence. Mind 59:433460.
of Logic, Language, and Information 9:467490. von Ahn, L., et al. 2003. CAPTCHA: Using Hard AI
Riskin, J. 2003. The Defecating Duck, or, the Ambiguous Problems for Security. Advances in Cryptology
Origins of Artificial Life. Critical Inquiry 29(4):599 Eurocrypt 2003. Santa Barbara, California: International
633. Association for Cryptologic Research, pp. 294
Roads, C. 1984. An Overview of Music Representations. 311.
Musical Grammars and Computer Analysis. Firenze: von Ahn, L., M. Blum, and J. Langford. 2004. Telling
Leo S. Olschki, pp. 737. Humans and Computers Apart Automatically: How
Saygin, A. P., I. Cicekli, and V. Akman. 2000. Turing Test: Lazy Cryptographers do AI. Communications of the
50 Years Later. Minds and Machines 10(4):463518. ACM 47(2):5660.
Ariza 69
Wakefield, J. C. 2003. The Chinese Room Argument Artificial Intelligence. Amsterdam: Harwood Academic
Reconsidered: Essentialism, Indeterminacy, and Strong Publishers, pp. 2946.
AI. Minds and Machines 13:285319. Wimsatt, W. K. 1954. The Verbal Icon: Studies in the
Wassermann, K. C., et al. 2003. Live Soundscape Meaning of Poetry. Louisville, Kentucky: University
Composition Based on Synthetic Emotions. IEEE Press of Kentucky.
MultiMedia 10(4):8290. Wimsatt, W. K., and M. C. Beardsley. 1946. The
Weinberg, G., and S. Driscoll. 2006. Toward Robotic Intentional Fallacy. Sewanee Review 54:468488.
Musicianship. Computer Music Journal 30(4):2845. Xenakis, I. 1985. Music Composition Treks. In C.
Weizenbaum, J. 1966. ELIZAA Computer Program Roads, ed. Composers and the Computer. Los Altos,
For the Study of Natural Language Communication California: William Kaufmann.
Between Man And Machine. Communications of the Zdenek, S. 2001. Passing Loebners Turing Test: A
ACM 9(1):3645. Case of Conflicting Discourse Functions. Minds and
Wiggins, G. A. 2006. A Preliminary Framework for Machines 11:5376.
Description, Analysis and Comparison of Creative Zimmerman, R. L. 1966. Can Anything Be an Aesthetic
Systems. Knowledge-Based Systems 19:449458. Object. The Journal of Aesthetics and Art Criticism
Wiggins, G., and A. Smaill. 2000. Musical Knowledge: 25(2):177186.
What Can Artificial Intelligence Bring to the Musi- Zorn, J. 2000. Preface. In J. Zorn, ed. Arcana: Musicians
cian. In E. R. Miranda, ed. Readings in Music and on Music. New York: Granary, pp. vvi.

The Interrogator As Critic: The Turing Test and The Evaluation of Generative Music Systems

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

The Interrogator As Critic: The Turing Test and The Evaluation of Generative Music Systems

Enviado por

Direitos autorais:

Formatos disponíveis

Christopher Ariza

48 Computer Music Journal

50 Computer Music Journal

52 Computer Music Journal

54 Computer Music Journal

56 Computer Music Journal

58 Computer Music Journal

60 Computer Music Journal

62 Computer Music Journal

64 Computer Music Journal

66 Computer Music Journal

68 Computer Music Journal

70 Computer Music Journal

Você também pode gostar