Escolar Documentos
Profissional Documentos
Cultura Documentos
Department of Music
The Interrogator as Critic:
Towson University
8000 York Road The Turing Test and the
Towson, Maryland 21252 USA
ariza@flexatone.net Evaluation of Generative
Music Systems
Procedural or algorithmic approaches to generating systems. Yet, since only the output of the system
music have been explored in the medium of software is tested (that is, system and interface design are
for over fifty years. Occasionally, researchers have ignored), any generative technique can be employed.
attempted to evaluate the success of these generative These tests may be associated with the broader
music systems by measuring the perceived quality historical context of human-versus-machine tests,
or style conformity of isolated musical outputs. as demonstrated in the American folk-tale of John
These tests are often conducted in the form of Henry versus the steam hammer (Nelson 2006) or
comparisons between computer-aided output and the more recent competition of Garry Kasparov
non-computer-aided output. The model of the versus Deep Blue (Hsu 2002).
Turing Test (TT), Alan Turings proposed Imitation Some tests attempt to avoid measures of subjec-
Game (Turing 1950), has been submitted and tive quality by measuring perceived conformity to
employed as a framework for these comparisons. In known musical artifacts. These musical artifacts
this context, it is assumed that if machine output are often used to create the music being tested:
sounds like, or is preferred to, human output, the they are the source of important generative param-
machine has succeeded. The nature of this success eters, data, or models. The design goals of a system
is rarely questioned, and is often interpreted as provide context for these types of tests. Pearce,
evidence of a successful generative music system. Meredith, and Wiggins (2002, p. 120) define four mo-
Such listener surveys, within necessary statistical tivations for the development of generative music
and psychological constraints, may be pooled to systems: (1) composer-designed tools for personal
gauge common responses to and interpretations of use, (2) tools designed for general compositional use,
musicyet these surveys are not TTs. This article (3) theories of a musical style . . . implemented as
argues that Turings well-known proposal cannot computer programs, and (4) cognitive theories of
be applied to executing and evaluating listener the processes supporting compositional expertise . . .
surveys. implemented as computer programs. Such motiva-
Whereas pre-computer generative music systems tional distinctions may be irrelevant if the system
have been employed for centuries, the idea of is used outside of the context of its creation; for
testing the output of such systems appears to only this reason, system-use cases, rather than developer
have emerged since computer implementation. motivations, might offer alternative distinctions.
One of the earliest tests is reported in Hiller The categories proposed by Pearce, Meredith, and
(1970, p. 92): describing the research of Havass Wiggins can be used to generalize about two larger
(1964), Hiller reports that, at a conference in 1964, use cases: systems used as creative tools for making
Havass conducted an experiment to determine original music (motivations 1 and 2, above), and
if listeners could distinguish computer-generated systems that are designed to computationally model
and traditional melodies. Generative techniques theories of musical style or cognition (motivations
derived from the fields of artificial intelligence 3 and 4). These two larger categories will be referred
(AI; for example, neural nets and various learning to as creative tools and computational models.
algorithms) and artificial life (e.g., genetic algorithms Although design motivation is not included in
and cellular automata) may be associated with the seven descriptors of computer-aided algorith-
such tests due to explicit reference to biological mic systems proposed in Ariza (2005), the idiom
affinity descriptor is closely related: systems with
Computer Music Journal, 33:2, pp. 4870, Summer 2009 singular idiom affinities are often computational
c
!2009 Massachusetts Institute of Technology. models.
Ariza 49
p. 433). His original model, called The Imitation knowledge that it brought about (Jefferson 1949,
Game, includes a human interrogator who, through p. 1110). Beyond just the ability of creating concepts,
a text-based interface designed to remove the aural Jefferson goes on to suggest that such a machine
qualities of speech and the visual appearance of the must have emotions and self awareness. Not until
subject, communicates with two agents. One agent a machine can write a sonnet or compose a concerto
is human; the other, machine. The interrogator must because of thoughts and emotions felt, and not by
be aware that one of the agents is a machine. If the the chance fall of symbols, could we agree that
interrogator, through discourse, cannot successfully machine equals brainthat is, not only write it but
distinguish the human from the machine, then the know that it had written it (Jefferson 1949, p. 1110).
machine, in Turings view, has achieved thinking. Jeffersons quote is used by Turing to anticipate
Importantly, Turing does not define thinking or the objection of the other minds problem
intelligence (Copeland 2000, p. 522), and Turing does the argument that it is impossible to prove that
not claim that passing this test proves thought or any entity (including humans) has a mind; only
intelligence (Harnad 2000, p. 427). As Steven Harnad individuals can be certain that they have minds. As
states, Turings goal is an epistemic point, not an Harnad states, the only way to know for sure that
ontic one, and heuristic rather than demonstrative a system has a mind is to be that system (2000,
(Harnad 2000, p. 427). p. 428). Turing avoids this problem by rejecting the
Turing based his test on a party game in which solipsist point of view (1950, p. 446) and affirming
the interrogator attempts to distinguish the gender that humans use conversation and discourse to
of two concealed human agents. Turings original identify the presence of minds in other humans.
description of his test is incomplete, and has led to a Turing takes a pragmatic approach: Instead of
variety of interpretations. Some have suggested that arguing continually over this point it is usual to
there are actually two tests, the Original Imitation have the polite convention that everyone thinks
Game Test and the Standard Turing Test (Genova (Turing 1950, p. 446). This is a natural language based
1994; Sterrett 2000). Yet B. Jack Copeland, based on argument from analogy: You are sufficiently like
analysis of additional commentary by Turing, states me in all other visible respects, so I can justifiably
that it seems unlikely that Turing saw himself as infer (or assume) that you are like me in this invisible
describing different tests (Copeland 2000, p. 526). one (Rapaport 2000, p. 469). In short, the TT works
Turing suggests that multiple tests must be by assuming that humans have minds and that
averaged with, presumably, multiple human agents natural language is sufficient to represent mind; a
and interrogators. He predicted that by the year machine has a mind if a machine and a human are
2000, an average interrogator, after five minutes indistinguishable in discourse.
of conversation, will make a correct identification An important consequence of the TT is that the
no more than 70 percent of the time (Turing 1950, machines internal mechanism, as well as its out-
p. 442). Additional sources suggest that Turing ward visual or aural presence, is irrelevant. Through
estimated it would take over 100 years before a blind comparison, Turing hoped to surpass personal
machine regularly passed the test (Copeland 2000, and aesthetic bias. The test isolates convincing,
p. 527). Anticipating complaints to such a test, human-like conversation as the sole determinate of
Turing (1950) responds to at least nine hypothetical thinking. Harnad calls this functional, rather than
objections. structural, indistinguishability (2000, p. 429). As
Under the heading of The Argument from Ray Kurzweil states, the insight in Turings epony-
Consciousness, Turing responds to an argument mous test is the ability of written human language
presented by Geoffrey Jefferson a year earlier (1949). to represent human-level thinking (Kurzweil 2002).
Jefferson states that it will not be sufficient for a This insight, as discussed subsequently, has been
machine mind to just use words: It would have debated.
to be able to create concepts and to find for itself Turings goal was, at most, to provide an inductive
suitable words in which to express additions to test of thinking (Moor 1976, 2001, p. 82); at the least,
Ariza 51
a wide variety of human-machine competitions as human-like experience or creativity and might be
TTs. applied to musical activities and musical thoughts.
For over fifty years, there has been tremendous These tests are examined here to distinguish them
discussion and criticism of the TT. As Halpern from the TT, discussed previously, and proposed
notes, Turing (1950) is one of the most reprinted, musical TTs, introduced subsequently.
cited, quoted, misquoted, paraphrased, alluded to, The John Henry Test (JHT) is proposed here to
and generally referenced philosophical papers ever label a common approach to testing machine apti-
published (Halpern 2006, p. 42). tude. A JHT is a competition between a human and
A widely cited complaint, called the Chinese a machine in which there is a clearly defined winner
Room Argument (CRA), is presented by John Searle within a narrowly specialized (and not necessarily
(1980). Searles argument imagines a human trans- intelligent) domain. A JHT requires quantitative
lating Chinese simply through direct symbol manip- measures of success: a decisive, independently
ulation and table look-up procedures. Searle argues verifiable conclusion results. A test of aesthetic
that, in this case, the human has knowledge only of outputs cannot be a JHT. The Garry Kasparov versus
syntax, and cannot be seen to have any knowledge of Deep Blue competition can be properly seen as a
semantics. This human, with sufficient procedural JHT: it was a human-versus-machine contest, and
resources, could appear to communicate in Chinese the conclusion of the competition was defined by
yet have no knowledge of Chinese. From this, Searle a clear (though contested) result. While the TT
claims that the ability to provide good answers to might be seen as a type of JHT, the natural-language
human questions does not necessarily imply that interaction of the TT is free-form and not based on
the provider of those answers is thinking; passing a fixed domain or content. The results of the TT
the Test is no proof of active intelligence (Halpern are based on human-evaluated indistinguishability,
2006, p. 52). Although some authors have extended not an independently verifiable distinguishability.
the CRA (Wakefield 2003; Damper 2006), Searles Nonetheless, Bloomfield and Vurdubakis generalize
argument has been frequently challenged (Harnad the TT as something like a JHT, describing such tests
2000, p. 437; Hauser 1993, 1997, 2001; Kurzweil as forms of socially situated performance geared
2005, p. 430; Rapaport 2000). Numerous additional to the enactment and dramatization of a number
arguments questioning the value or applicability of occidental modernitys fundamental moral and
of the TT have been proposed (Block 1981; French conceptual categories (Bloomfield and Vurdubakis
1990; Crockett 1994). 2003, p. 35). Although the TT is not a JHT, both
The question of whether the TT or similar tests have been used to dramatize changes in machine
are sufficient measures of machine thought will aptitude.
be debated for the foreseeable future (French 2000; Steven Harnad places the TT within what he calls
Saygin, Cicekli, and Akman 2000). The answer to the Turing Hierarchy. Harnad first extended the TT
this question is not relevant to the present study. into the Total Turing Test (TTT). The TTT requires
The TT, regardless of its potential for measuring that Turings text-based interface be replaced by
thought or intelligence, provides at least a symbolic full physical and sense-based interaction with a
benchmark of one form of machine aptitude. More robot: The candidate must be able to do, in the
relevant to this study, the TT has inspired other real world of objects and people, everything that real
forms of comparison tests between human and people can do, in a way that is indistinguishable (to
machine output. a person) from the way real people do it (Harnad
1991, p. 44). If applied to a generative music system,
the TTT would presumably require a robot to play a
Alternative Tests musical instrument, or perform some other musical
task, in a physical manner indistinguishable from
Related tests of machine aptitude have been pro- a human. Eighteenth-century musical automata,
posed. Some of these tests attempt to measure such as the flute player of Jacques de Vaucanson
Ariza 53
a creative program must make its own decisions, ploy complex stochastic models, neural nets, models
must experiment and explore concepts, and must of artificial life, or any of a wide range of procedures;
gradually converge on a satisfactory solution through such systems may also serendipitously produce
a continual process in which suggestions coming surprising and aesthetically satisfying outputs. Yet
from one part of the system and judgments com- the output of such systems, upon examination of
ing from another part are continually interleaved the systems architecture, can be explained. Cope,
(Hofstadter 1996, p. 411). for example, executes an LT of sorts, asserting
Margaret Boden divides the Lovelace objection that an Experiments in Musical Intelligence (EMI)
into four Lovelace questions: (1) can computational composition from 2003 was not produced creatively
ideas increase understanding of human creativity; (2) nor with creative processes: given enough time,
can computers do things that appear creative; (3) can I could reverse engineer this music and find all of
a computer appear to recognize creativity; and (4) its original sources in Bachs lute suites (2005,
can computers really be creative (1990, p. 6). Boden p. 44). This is not a practical or aesthetic concern
answers the first three questions affirmatively. The of music making: as demonstrated by the history of
fourth question is the LT. Boden notes that, even generative music systems, failing the LT or other
after satisfying all the scientific criteria for creative measures of machine intelligence has not limited
intelligence (whatever those may be), answering the computer-aided creation of music by humans.
this question requires humans to make a moral Tests that are associated with the TT yet fun-
and political decision: this decision amounts to damentally alter its structure are the focus of this
dignifying the computer: allowing it a moral and article. An early example is provided by Hofstadter
intellectual respect comparable with the respect we
in Godel, Escher, Bach (Hofstadter 1979). Hofstadter
feel for fellow human beings (1990, p. 11). This calls this a little Turing test: readers are asked
respect relates to the issue of intentionality and to distinguish selections of human-written natural
authorship, discussed subsequently. language and computer-generated text, presented in
The LT standard of creativity is significantly an intermingled list (Hofstadter 1979, p. 622). Hof-
higher than the computational creativity defined stadter fails to articulate the significant deviations
by Wiggins, which includes all behavior exhibited from Turings model.
by natural and artificial systems, which would be Similarly, Kostas Terzidis suggests that if an
deemed creative if exhibited by humans (2006, p. algorithmically generated paper created by the Dada
451). Wiggins, admitting this definition is intangi- Engine system (Bulhak 1996) was submitted to
ble, does not offer a method or a test to determine a conference and accepted, it may have passed
what is or is not deemed creative by humans. Hu- Turings classic test of computer intelligence
mans may not agree on, or even regularly identify, (Terzidis 2006, p. 22). As should be clear, simply
creative behavior exhibited by any agent, human or mistaking computer output for human output is not
machine. The apprehension of creativity, like the passing a TT. Appropriately, the author of the Dada
identification of successful music, may be a largely Engine credits Hofstadters little Turing test as a
aesthetic problem. Cope recasts the influence of con- source of inspiration (Bulhak 1996).
sensus by defining creativity as the initialization Another alteration of Turings model is demon-
of connections between two or more multifaceted strated by the Completely Automated Public
things, ideas, or phenomena hitherto not otherwise Turing test to tell Computers and Humans Apart,
considered actively connected (2005, p. 11). Here, or CAPTCHA. A CAPTCHA is a now-familiar test
the problem of identifying consensus is shifted, not given by a computer to distinguish if a user is
removed: consensus is required to determine what either a human or a machine. While superficially
is already actively connected. related to the TT in that the CAPTCHA attempts
Independent of whether machines exhibit cre- to distinguish humans and machines, it is not a
ativity, no contemporary generative music system is TT: there is no interaction, the medium is often
likely to pass the LT. Generative systems may em- visual (based on the ability to distinguish distorted
Music as the Medium of the Turing Test In this test, two composer-agents provide a score,
synthetic digital audio, or digital audio of a recorded
To test the output of generative music systems, performance to the interrogator. One of the com-
and to avoid the problems of the TTT, the LT, and posers is a machine, the other, a human. The
definitions of creativity, the TT might be altered by provided music may be related in terms of style,
making aesthetic artifacts, music or other creative instrumentation, or raw musical resources, but is
forms, the medium of the test. In the case of music, not a newly composed response to a specific musical
this means replacing the text-based medium, in directive (as in the MDtT). Each agent might provide
whole or in part, with sound symbols or sound multiple musical works. Based only on these works,
forms. Two models of such tests, amalgamated from the interrogator must attempt to distinguish the
diverse sources, are introduced herein. Although human from the machine. This test maintains only
sometimes using the language and format of the the blind comparison of output from two sources;
TT, these tests fundamentally alter the role of the the interaction and discourse permitted in the TT
interrogator, recasting the interrogator as a critic. are removed.
As such, these are not TTs but rather, after Harnad
(2000), toy tests.
Comparison
Musical Directive Toy Test (MDtT) The MDtT and MOtT, while employing the blind
indistinguishability test of the TT, remove the
The interrogator, using a computer interface, sends critical component of natural language discourse.
a musical directive to two composer-agents. One of The MDtT and MOtT are solely dependent on
the composers is a machine, the other, a human. the musical judgments of the interrogator. These
The musical directive could be style- or genre- musical judgments cannot do what natural language
dependent, or it could be abstractsomething like discourse can do to expose the agents capacity
write sad music, write a march, or compose for thinking. Successful music, unlike natural
a musical game. The musical directive might language, does not require a common, external
also include music, such as melodic or rhythmic syntax; successful musical discourse, unlike natural
fragments upon which the composer-agent would language, can employ unique, dynamic, or local
build. The two composers both receive the directive syntaxes.
and create music. After an appropriate amount Furthermore, the interrogator may overwhelm-
of time (a human scale would be necessary), the ingly rely on subjective musical judgments. This
completed music is returned to the interrogator in contrasts with the TT, which, while permitting
a format such as a score, synthetic digital audio, or any form of discourse within the common syntax
digital audio of a recorded performance. A flexible of written natural language, is designed to remove
Ariza 55
subjective visual and aural evaluations through include formal, rational, and aesthetic criteria, the
blind comparison. Although it is uncommon for evaluation of output in a MDtT may be limited to
humans to interact via text with a completely essentially aesthetic musical judgments.
unknown agent, it is quite common for humans Without the unifying context of a musical
to evaluate music without any knowledge of its directive, the MOtT, even more than the MDtT,
author, source, or means of production. The TT may result in unreliable or unrepresentative musical
is a strikingly isolated and focused form of blind, judgments. These judgments may be influenced by
discourse-based evaluation. The MDtT or MOtT, by historical or cultural associations about musical
using music in a conventional form of delivery, do style, expectations of what a machine or a human
not offer a similarly isolated or blind form of evalua- sounds like, or assumptions of what is possible
tion. The interrogator in the TT, although certainly with current technology. These expectations can
influenced by subjective ideas of human thought, vary greatly depending on the listeners surveyed.
attempts to reasonably distinguish between human Of course, similar notions are likely to be held by
and machine based on linguistic constructions and an interrogator in the TT. Yet a responsible TT
content. The critic of the MDtT or MOtT is unre- interrogator has the option, even the obligation, to
strained, free to employ a wide range or mixture of balance such judgments with further discourse by
musical judgments. asking questions or demanding explanations. This
The MDtT is related to the Short Short Story is not possible in the MDtT or MOtT.
Game (S3 G) proposed and demonstrated by Selmer Pearce, in describing a MOtT directed at iden-
Bringsjord and David Ferruci (2000). In this game, tifying stylistic similarity, finds similar influences
a system (BRUTUS, in the case of Bringsjord and on musical judgments, noting that potential inter-
Ferruci) is given a sentence. Based on this sentence, rogators might shift their attention to searching for
the system must compose a short story designed to musical features expected to be generated by a com-
be truly interesting (Bringsjord, Bello, and Ferrucci puter model rather then concentrating on stylistic
2001, p. 13). The goal of this system is to compete features of the composition (2005, p. 185). While
with human authors in a manner similar to the Pearce notes that the Turing test methodology
MDtT. While Bringsjord, Bello, and Ferrucci (2001, fails to demonstrate which cognitive or stylistic
p. 13) acknowledge that BRUTUS produces some hypotheses embodied in the generative system in-
rather interesting stories, they do not claim that fluence the judgments of listeners (Pearce 2005,
the system passes the LT. In their view, the system p. 186), he still claims that such tests offer empir-
is not creative. Its output is merely the result of ical, quantitative results which may be appraised
their input and design: two humans . . . spent intersubjectively (p. 184).
years figuring out how to formalize a generative Both the MDtT and the MOtT are surveys of
capacity sufficient to produce this and other stories musical judgments, not determinants of thought
(Bringsjord, Bello, and Ferrucci 2001, p. 14). or intelligence. Where the TT requires discourse
The MDtT and MOtT, however, significantly between an interrogator and an agent, here discourse
differ from the S3 G: as stated previously, successful is replaced by single-sided criticism. Even in the
music, unlike successful stories, may have no musical discourse of an interactive MDtT, the inter-
common syntax. The abstract nature of music, rogator, at the end of the discourse, must attempt
particularly in the context of creative contemporary to distinguish human and machine with musical
practice, is such that there exists no comparable judgments. While interrogators can generally agree
expectation of grammar or form. Even in the limited on what makes rational and coherent language,
cases where strong expectations of musical grammar and are likely to concur on what kind of language
or form exist, subverting these expectations may offers evidence of thought, critics may not agree on
be musically legitimate or aesthetically valuable. what makes aesthetically successful music, and are
Music, like some poetry, is not natural language. likely to offer inconsistent or contradictory musical
Where the evaluation of the output of S3 G may judgments.
Ariza 57
Music may be perceived as an intelligent activity, Computational models, as generative music
even when its genesis is the result of involuntary, systems with significantly different goals than
irrational, or algorithmic activities. Whereas music those of creative tools, require particular evaluation
may be a very human activity, the application of the strategies. Pearce and Wiggins propose the DT,
TT to music ignores that seemingly successful music where the generated music can be evaluated by
can come from sources with little or no thought. asking human subjects to distinguish compositions
Hofstadter, for example, describes his profound taken from the data set from those generated by
sense of bewilderment or alarm (Hofstadter 2001, the system (Pearce and Wiggins 2001, p. 25). If the
p. 39) upon recognizing that convincing music system-composed pieces cannot be distinguished
results from mechanisms thousands if not millions from the human-composed pieces, we can conclude
of times simpler than the intricate biological that the machine compositions are indistinguishable
machinery that gives rise to a human soul (p. 79). from human composed pieces (p. 25).
Soul, mind, thought, intelligence, and creativity are Although Pearce and Wiggins note that the DT
common though weak determinants of aesthetically bears a resemblance (Pearce and Wiggins 2001,
successful music. Assuming the necessity of any of p. 25) to the TT, they are careful to note significant
these determinates when encountering the output differences. The DT is designed not to test machine-
of a generative music system often contributes to a thinking, but to determine the (non-)membership of
musical Eliza Effect. a machine composition in a set of human composed
The problem of musical TTs is part of a larger pieces of music (p. 25). Further, they note that
problem of TT derivatives. Dennett describes how the critical element of interaction is removed: in
a failure to think imaginatively about the TT our test the subjects are simply passive listeners:
has led many to underestimate its severity and to there is no interaction with the machine (p. 25).
confuse it with much less interesting proposals; Pearce and Wiggins argue that both the TT and DT
furthermore, there is a common misapplication of are behavioral tests: the tests are used to decide
the sort of testing exhibited by the Turing test that whether a behaviour may be included in a set . . . the
often leads to drastic overestimation of the powers set of intelligent behaviours in the case of the TT
of actually existing computer systems (Dennett and the set of musical pieces in a particular style
1998, p. 5). Musical TTs are a misapplication of the in the case of the DT (p. 25).
TT that can lead to overestimation. In a specific case, Pearce and Wiggins use a
collection of musical examples to train a genetic-
algorithm-based system. The DT is then used to
Discrimination Tests determine if the output of the system is distinguish-
able from the same training examples (Pearce and
Use of the TT, even if by name and rough analogy Wiggins 2001, p. 25). The authors claim that the fi-
alone, has significant implications. The history of nal machine compositions are evaluated objectively
the TT and its association with projects in AI make within a closed system which provides no place for
it a powerful concept in both the academic and subjective evaluation or aesthetic merit (p. 25). Evi-
popular imaginations. Alternative blind comparison dence is not provided to affirm that human listeners,
tests not associated with the TT make very different whether experts or novices, can objectively evalu-
implicit claims than those branded as TTs. For these ate musical similarity. Furthermore, while such a
reasons, it is important to identify discrimination test attempts to remove aesthetics from musical
tests (DTs) as a type of listener survey that avoids judgments, the authors go on to claim that success
some of the faults of musical TTs. The DT is similar in the DT shows that there are absolutely no per-
to the MOtT. Although not free of the problems ceivable features that differentiate the human and
of evaluating musical judgments, such tests, when machine compositions, and that these features may
properly constrained, permit generalizing these include such elusive notions as aesthetic quality
judgments between selected groups. or perceivable creativity (p. 25). The authors thus
Ariza 59
understanding . . . its about money (Zorn 2000, Proposed and Executed Musical Turing Tests
p. v). Pachet and Cazaly (2000), as part of a study of
music taxonomies, support this view, stating the Despite the problems described herein, musical TTs
most important producers of music taxonomies have been discussed and executed as a means of
are probably music retailers. In a study of large evaluating the success of various generative music
online music taxonomies, these authors illustrate systems. The following examples, as a collection of
that there is little if any consensus on the terms, small case studies, illustrate diverse applications of
structures, or meanings deployed. music to TTs. Although some DTs report the failure
As shown in Aucouturier and Pachet (2003), of machines to produce indistinguishable results
recent efforts to automatically sort music into (Pearce and Wiggins 2001), every executed musical
discrete style classes based on signal or symbolic TT reports machine success.
representations have demonstrated limited success; After summarizing the TT, Alan Marsden sug-
success is often a direct result of extremely narrow gests that a musical version of this test could be
conceptions of genre (p. 92). The study of Soltau proposed (Marsden 2000, p. 22). Marsden describes
et al. (1998), after demonstrating an Explicit Time an MOtT in which there are two rooms, each with a
Modeling with Neural Networks (ETM-NN) system composer and a means of distributing music to the
to classify music within four genres, shows that outside world. One of the composers is a machine.
for some genres humans are just as ill-equipped as The test is passed when observers cannot distinguish
their system to classify music: human confusions which composer is a computer. Marsden offers this
in this experiment are similar to confusions of example to state that, although a computer might
the ETM-NN system. Aucouturier and Pachet pass this test in practice, the computer could never
erroneously call this listener survey a Turing Test pass the test in principle (p. 23). The reason for
(Aucouturier and Pachet 2003, p. 88). Unable to this, Marsden explains, is that originality is an
rigorously define genre, such approaches implement essential characteristic of music . . . computers are
systems to match the consensus interpretations digital automata, and so their behavior is always, in
of critics. Aucouturier and Pachet, supporting this principle at least, predictable and therefore cannot
view, state that music genre is an ill-defined notion, be original (p. 23). Marsden does not question the
that is not founded on any intrinsic property of the validity of the MOtT, suggesting that it might oper-
music, but rather depends on cultural extrinsic ate in both practice and in principle. Further,
habits (p. 84). DTs often ignore the real diversity Marsden suggests that deterministic systems are
and elusive nature of these extrinsic habits. The incapable of originality. Yet humans, while lacking
danger of testing circular, ungrounded projections proof of free will, may be both deterministic and
(p. 83) is great. creative. Marsdens principled MOtT, in this con-
As an alternative, Aucouturier and Pachet form text, is better seen as an LT, as a test of creativity.
genre-like clusters based on extrinsic similarity: The standard of creativity set forth by Bringsjord,
specifically, they perform a co-occurrence analysis Bello, and Ferrucci (2001), however, is more suc-
based on cultural similarity from text documents cessful than Marsdens problematic criterion of
such as radio playlists and track listings of compila- non-deterministic originality.
tion CDs. This approach relies on the documented Curtis Roads (1984), in a section on the Turing
interpretations of critics (e.g., DJs and editors): sim- Test for Musical Intelligence, suggests using an
ilarity is asserted if two works are placed together. MDtT to measure the effectiveness of software-
It is significant that such a measure of similarity based music representations (p. 33). Although
cannot be applied to the newly created output of conceding that there is no universal criterion for
a generative music system. As such works lack determining an optimal music representation, he
any cultural texts or criticism, listener surveys or offers criteria to determine system value such as
measures of intrinsic similarity may be the only the usefulness in practice, the limits of structures
means of comparison. available for representation, and what kinds of
Ariza 61
Paris (2002). The goal of this MOtT was to answer machine-composed music in the style of Mozart
the question, [C]an we make the distinction with signatures, and machine-composed music in
between music played by a human and music the style of Mozart without signatures. Although
played by a machine? Two interrogators (Henkjan noting that this study falls outside the framework
Honing and Koen Schouten) were presented with of scientifically valid research (1996, p. 82), he
musical fragments, some performed by Albert van suggests that his results show that the use of
Veenendaal, others performed by the Continuator signatures contributes to style recognition.
software system developed by Francois Pachet Cope (1996) removes signatures as a variable in
(2002). Although test data is not provided, the these tests; in this case, he compares the music
author claims the result was largely in favor of of Mozart with machine-composed music in the
the Continuator. Receiving more positive musical style of Mozart. As part of a section of the 1992
judgments than the human, the software system is Association for the Advancement of Artificial
deemed successful. Intelligence (AAAI) conference entitled Artificial
Hiraga et al. (2002) have proposed and executed Intelligence and the Arts, a larger MOtT was
a series of music performance rendering tests in conducted. Cope reports that nearly 2000 visitors,
which human performances of a fixed work are over three days, took part in a test that pitted
compared with computer-rendered performances of machine-composed examples with signatures in
the same work. Executed as part of project called the style of Mozart against actual Mozart (p. 82).
RENCON, these tests date from 2002 and have While again Cope states the test has absolutely
continued at various workshops and conferences no scientific value, the results are summarized by
since (Hiraga et al. 2004, p. 121). Whereas the Cope as indicating that the audience was unable
authors at times properly describe these tests as to distinguish between machine-composed Mozart
listening comparisons, they go on to describe these and the real thing, that the machine-composed
as a Turing test for musical performance and music has some stylistic validity, and for the
competitions given in Turing Test style (p. 123). layperson at least, real Mozart is hard to distinguish
The authors claim that this test determines by from artificial Mozart (p. 82). While neither Cope
listening whether system-rendered performance is nor the conference publications call the 1992 AAAI
distinguishable from human performance (p. 123); MOtT a TT, Cope (2000, p. 65) claims that Alice,
in addition to selecting their preferred performance, a system closely related to EMI, may succeed in
participants are asked to rate performances by occasionally passing the spirit if not the letter of
humanlikeness (p. 123). As machine-rendered Turings test (2000, p. 65). As argued previously,
performances have been selected over human- the spirit of the TT is not maintained in MOtTs.
rendered performances, the authors state that more Cope (2001) titles a similar test The Game, pre-
than a few people agree that some performance senting the reader notated and recorded musical ex-
rendering systems generate music that rivals human amples of computer-generated and human-composed
performances (p. 123). While listening comparisons music. Here, he refers to computer-generated works
or DTs may offer a valuable method of evaluating as virtual music. Cope notes that mixing weak
performance rendering, the association with Turing human-composed music with strong virtual mu-
is incorrect and unnecessary. sic would simply fool listeners. His objective is
Music generated with David Copes EMI system to determine whether listeners can truly tell the
(1991, 1992, 1996, 2001) has been the subject of many difference between the two types of music (p.
presentations of MOtTs. Only in passing does Cope 20). If players can only distinguish human from
associate these tests with Turing. In Cope (1996), he machine about 50% of the time, it is assumed that
describes conducting comparison tests to gauge the the music examples are indistinguishable. Cope
significance of compositional style signatures in relates The Game to the 1992 AAAI MOtT,
evaluating style membership. In his first test with stating that results from previous tests with large
students, he presents phrases of Mozart, phrases of groups of listeners, such as 5000 [sic] in one test in
Ariza 63
Machine Authorship and the Problem of Aesthetic composers database in specific and known ways,
Intention acting only as a specialized calculator and assistant
(Cope 2000, p. 252). Describing the EMI system,
In the context of musical TTs, an objection can be Cope notes that the hand of the composer is not
raised that the computer-generated material is not absent from the finished product of computer-
generated in whole by the computer. The computer assisted composition (Cope 1991, p. 236), and
can be seen not as an autonomous author but as that all works produced with EMI are attributed
a system that executes or reconfigures knowledge to David Cope with Experiments in Musical
imparted to it by its programmers. This is part of Intelligence (Cope 2001, p. 340). The MOtTs of
the problem of machine creativity suggested by the EMI output described herein, therefore, tested not
LT (Bringsjord, Bello, and Ferrucci 2001). EMI, but Cope with EMI.
Fundamental questions of authorship are im- A case might be imagined where a machine is
portant when comparing the aesthetic output of somehow completely responsible for an aesthetic ar-
machines and humans. For the MOtT or MDtT tifact. Such machine authorship would presumably
to actually test the machine system, the musical require what Cohen describes as autonomy. De-
outputs provided by the agents must be created by scribing his generative illustration system AARON,
the agents themselves. Just as the human agent pre- Cohen (2002) suggests that, with such autonomy, the
sumably cannot plagiarize an output, the machine system, not him, would be the author: if AARON
agent cannot simply return a stored work previously ever does achieve the kind of autonomy I want it
created by a human. Such a test would have little to have, it will go on to eternity producing original
value. Although Turing specifically condoned decep- AARONs, not original Harold Cohens (2002, p. 64).
tion in the TT, such deception is problematic in the Cohens views are similar to those of Hofstadter,
context of testing aesthetic artifacts. This problem described previously. The idea of machines with
further removes the MOtT and the MDtT from the autonomy, intentions, or initiative is sometimes
TT. associated with more exotic things such as Putnam
Pure machine authorship is impossible to imagine Gold Machines (Kugel 2002, p. 565) or Turings Or-
without an autonomy sufficient to pass the LT. acles (Turing 1939). Bringsjord, Bello, and Ferrucci,
In a manner similar to that of Wolfgang von however, argue that Oracles cannot pass the LT, and
Kempelens famous chess-playing machine (often thus do not offer autonomy (Bringsjord, Bello, and
called The Turk), a purported automaton that Ferrucci 2001, pp. 2024). As neither AARON nor
convinced countless observers in the eighteenth and any known contemporary system has reached such
nineteenth century of the possibility of machine a level of autonomy, generative works will likely
autonomy (Sussman 1999; Standage 2002), there continue to be seen as human works. If the role
may always be a human, or at least significant of the system exceeds that of a conventional tool,
human knowledge, hiding inside the creative these works might be seen more as human-machine
machine. Halpern supports this view, noting that collaborations; collaboration, as used here, does not
machine intelligence is really in the past: when a require machine autonomy. Contemporary MDtTs
machine does something intelligent, it is because and MOtTs are not machine-versus-human compe-
some extraordinarily brilliant person or persons, titions: they are competitions among humans using
sometime in the past, found a way to preserve some different tools and collaborators.
fragment of intelligent action in the form of an Machines, although presently lacking autonomy
artifact (Halpern 2006, p. 54). Such a perspective is and intention, can produce output that appears
applicable to many less intelligent but musically intentional. As described previously, Soldier (2002)
useful generative music systems. argues that aesthetic intention (Carroll 1999, p. 163)
Cope, for example, states that when using his is not a criterion of creating, and thus authoring,
Alice system he sees no reason to even assign art. Soldier demonstrates that artists without intent
partial credit to the program: Alice processes a can create works that sound intentional; similarly,
Ariza 65
p. 435). While the computer answers the end-game (the technology) from its aesthetic artifacts. This
with a mate, the computer could just as well discuss distinction suggests that aesthetic success or failure
the history of the game, or state that it does not play is dependent on humans and independent of any
chess. When asked to write a sonnet, the same agent technology. Until machines achieve autonomy, it
declines: Count me out on this one. I never could is likely that humans will continue to form, shape,
write poetry (p. 434). and manipulate machine output to satisfy their
The computer agent, asked to play chess, could own aesthetic demands, taking personal, human
alternatively be mischievous and play a non- responsibility for machine output.
winning move. As Halpern states, the Turing Simon Holland, after Pena and Parshall (1987)
end-game example introduces an assumption that and Cook (1994), describes open-ended domains
cannot automatically be allowed: namely, that the such as music composition as problem seeking
computer plays to win (Halpern 2006, p. 46). The rather than problem solving: there are in gen-
MOtT and MDtT imply that aesthetic success, a eral no clear goals, no criteria for testing correct
win, indicates system design success. However, answers, and no comprehensive set of well-defined
as made clear herein, Turings model does not methods (Holland 2000, p. 240). If used as creative
require the computer to play to win: self sabotage tools, generative music systems, as systems within
or simple mischief is acceptable if explained in problem-seeking domains, likewise have no criteria
rational discourse. MOtTs and MDtTs, if related for testing correct answers. In the development and
to the TT, should allow for new aesthetic concepts presentation of these systems, comparative analysis
and non-winning aesthetic moves: as Boden states, of system and interface design, or studies of user
even if a computers notion of art is irrelevant interaction and experiences, offer greater potential
to us humans, these notions might broaden our for the development of practical tools.
aesthetic horizons (Boden 1996). Computational models with clearly articulated
goals may continue to pass DTs; properly con-
strained, such tests may show that musical judg-
Conclusion ments cannot discern sets of musical artifacts
produced by humans and machines. While this may
As Dennett states of restricted text-based TTs, we demonstrate technological innovation in the model-
should resist all limitations and waterings-down of ing of historical musical artifacts, such technologies
the Turing test . . . they make the game too easy . . . may also offer aesthetic innovation if redeployed as
they lead us into the risk of overestimating the creative systems. In this use-case, the clear goals
actual comprehension of the system being tested and testing criteria evaporate. Within the practical
(Dennett 1998, p. 11). The MDtT and MOtT are use-case of creative music-making, any system
too easy. Music, as a medium remote from natural becomes a problem seeking domain.
language, is a poor vessel for Turings Imitation
Game. Generative music systems gain nothing
from associating their output with the TT; worse,
overestimation may devalue the real creativity in Acknowledgments
the design and interface of these systems.
Iannis Xenakis, considering the history of I am grateful for the commentary this article has
computer-aided algorithmic composition systems, received over the many stages of its development.
asked: What is the musical quality of these at- Thanks to Elizabeth Hoffman and Paul Berg for
tempts? He answers bluntly: The results from discussing some of the initial ideas presented here.
the point of view of aesthetics are meager . . . hope Thanks to Nick Collins for research assistance and
of an extraordinary aesthetic success based on comments on important themes. Thanks to the
extraordinary technology is a cruel deceit (Xenakis anonymous reviewers and the editors for valuable
1985, p. 175). Xenakis here distinguishes the system suggestions.
Ariza 67
Synthesis of Musical Style. Cambridge, Massachusetts: Holland, S. 2000. Artificial Intelligence in Music Educa-
MIT Press, pp. 221236. tion: A Critical Review. In E. R. Miranda, ed. Readings
Hall, M., and L. Smith. 1996. A Computer Model of Blues in Music and Artificial Intelligence. Amsterdam:
Music and Its Evaluation. Journal of the Acoustical Harwood Academic Publishers, pp. 239274.
Society of America 100(2):11631167. Hsu, F. 2002. Behind Deep Blue: Building the Computer
Halpern, M. 2006. The Trouble with the Turing Test. that Defeated the World Chess Champion. Princeton,
The New Atlantis 11:4263. NJ: Princeton University Press.
Harnad, S. 1991. Other Bodies, Other Minds: A Machine Jefferson, G. 1949. The Mind of Mechanical Man.
Incarnation of an Old Philosophical Problem. Minds British Medical Journal 1:11051110.
and Machines 1:4354. Kant, I. 1790. Kritik der Urteilskraft [Critique of Judgment.
Harnad, S. 2000. Minds, Machines and Turing. Journal Berlin: Lagarde and Friederich.
of Logic, Language and Information 9(4):425445. Kugel, P. 2002. Computers Cant Be Intelligent (. . . and
Hauser, L. 1993. Searles Chinese Box: The Chinese Turing Said So). Minds and Machines 12(4):563
Room Argument and Artificial Intelligence. PhD 579.
dissertation, Michigan State University. Kurzweil, R. 1990. The Age of Intelligent Machines.
Hauser, L. 1997. Searles Chinese Box: Debunking the Cambridge, Massachusetts: MIT Press.
Chinese Room Argument. Minds and Machines Kurzweil, R. 1999. The Age of Spiritual Machines. New
7:199226. York: Penguin Books.
Hauser, L. 2001. Look Whos Moving the Goal Posts Kurzweil, R. 2002. A Wager on the Turing Test: Why
Now. Minds and Machines 11:4151. I Think I Will Win. Available online at www.
Havass, M. 1964. A Simulation of Music Composition. kurzweilai.net/articles/art0374.html?printable=1.
Synthetically Composed Folkmusic. In F. Kiefer, Kurzweil, R. 2005. The Singularity is Near. New York:
ed. Computational Linguistics. Budapest: Computing Penguin Books.
Centre of the Hungarian Academy of Sciences 3:107 Lamb, G. M. 2006. Robo-Music Gives Musicians the
128. Jitters. The Christian Science Monitor, December 14.
Hedges, S. A. 1978. Dice Music in the Eighteenth Loebner, H. 1994. In Response. Communications of the
Century. Music and Letters 59(2):180187. ACM 37(6):7982.
Hiller, L. 1970. Music Composed with Computers: An Long Bets Foundation. 2002. By 2029 No Computeror
Historical Survey. In H. B. Lincoln, ed. The Computer Machine IntelligenceWill Have Passed the Turing
and Music. Ithaca, New York: Cornell University Press, Test. Available online at www.longbets.org/1.
pp. 4296. Lovelace, A. 1842. Translators Notes to an Article on
Hiraga, R., et al. 2002. Rencon: Toward a New Evalu- Babbages Analytical Engine. In R. Taylor, ed. Scientific
ation Method for Performance Rendering Systems. Memoirs: Selected from the Transactions of Foreign
Proceedings of the International Computer Music Academies of Science and Learned Societies, and from
Conference. San Francisco, California: International Foreign Journals. London: printed by Richard and John
Computer Music Association, pp. 357360. E. Taylor, 3:691731.
Hiraga, R., et al. 2004. Rencon 2004: Turing Test Loy, D. G. 1991. Connectionism and Musiconomy.
for Musical Expression. Proceedings of the 2004 Proceedings of the International Computer Mu-
Conference on New Interface for Musical Expression. sic Conference. San Francisco, California: Inter-
New York: Assocation for Computing Machinery, pp. national Computer Music Association, pp. 364
120123. 374.
Hofstadter, D. R. 1979. Godel, Escher, Bach: An Eternal Marsden, A. 2000. Music, Intelligence and Artificiality.
Golden Braid. New York: Vintage. In E. R. Miranda, ed. Readings in Music and Arti-
Hofstadter, D. R. 1996. Fluid Concepts and Creative ficial Intelligence. Amsterdam: Harwood Academic
Analogies: Computer Models of the Fundamental Publishers, pp. 1528.
Mechanisms of Thought. New York: Basic Books. Midgette, A. 2005. Play It Again, Vladimir (via Com-
Hofstadter, D. R. 2001. Staring Emmy Straight in the puter). New York Times, 5 June.
Eyeand Doing My Best Not to Flinch. In D. Cope, Moor, J. H. 1976. An Analysis of the Turing Test.
ed. Virtual Music: Computer Synthesis of Musical Philosophical Studies 30:249257.
Style. Cambridge, Massachusetts: MIT Press, pp. 33 Moor, J. H. 2001. The Status and Future of the Turing
82. Test. Minds and Machines 11:7793.
Ariza 69
Wakefield, J. C. 2003. The Chinese Room Argument Artificial Intelligence. Amsterdam: Harwood Academic
Reconsidered: Essentialism, Indeterminacy, and Strong Publishers, pp. 2946.
AI. Minds and Machines 13:285319. Wimsatt, W. K. 1954. The Verbal Icon: Studies in the
Wassermann, K. C., et al. 2003. Live Soundscape Meaning of Poetry. Louisville, Kentucky: University
Composition Based on Synthetic Emotions. IEEE Press of Kentucky.
MultiMedia 10(4):8290. Wimsatt, W. K., and M. C. Beardsley. 1946. The
Weinberg, G., and S. Driscoll. 2006. Toward Robotic Intentional Fallacy. Sewanee Review 54:468488.
Musicianship. Computer Music Journal 30(4):2845. Xenakis, I. 1985. Music Composition Treks. In C.
Weizenbaum, J. 1966. ELIZAA Computer Program Roads, ed. Composers and the Computer. Los Altos,
For the Study of Natural Language Communication California: William Kaufmann.
Between Man And Machine. Communications of the Zdenek, S. 2001. Passing Loebners Turing Test: A
ACM 9(1):3645. Case of Conflicting Discourse Functions. Minds and
Wiggins, G. A. 2006. A Preliminary Framework for Machines 11:5376.
Description, Analysis and Comparison of Creative Zimmerman, R. L. 1966. Can Anything Be an Aesthetic
Systems. Knowledge-Based Systems 19:449458. Object. The Journal of Aesthetics and Art Criticism
Wiggins, G., and A. Smaill. 2000. Musical Knowledge: 25(2):177186.
What Can Artificial Intelligence Bring to the Musi- Zorn, J. 2000. Preface. In J. Zorn, ed. Arcana: Musicians
cian. In E. R. Miranda, ed. Readings in Music and on Music. New York: Granary, pp. vvi.