Você está na página 1de 17

SURVEY OF MUSIC AND IMAGE

Assignment 2
T R I N I T Y C O L L E G E D U B L I N
D AV I D C O L L I E R
1 0 2 6 1 3 0 3
M. P H I L I N MU S I C A N D ME D I A T E C H N O L O G Y
2 6 A P R I L 2 0 1 1
Survey of Music and Image - Assignment 2" 1
TABLE OF CONTENTS
Introduction" 4
Artists" 4
Golan Levin" 4
Zachary Lieberman" 4
Manual Input Sessions" 6
The System (Implementation)" 6
NegDrop" 9
InnerStamp" 11
Rotuni" 13
Conclusion" 15
Bibliography" 16
Appendix - Equipment List" 17
Survey of Music and Image - Assignment 2" 2
List of Figures
Figure 1: Diagram of the system
Figure 2: The system in operation
Figure 3: Picture of the overhead projector, video projector and camera
Figure 4: Video projector and camera
Figure 5: Example of the NegDrop
Figure 6: Diagram of the NegDrop sound generation
Figure 7: InnerStamp performance silhouettes
Figure 8: InnerStamp playback
Figure 9: Example of Rotuni using shapes
Figure 10: How the Rotuni generates pitches
List of Tables
Table 1: Audiovisual mapping in NegDrop
Table 2: Audiovisual mapping in InnerStamp
Table 3: Audiovisual mapping in Rotuni
Survey of Music and Image - Assignment 2" 3
Survey of Music and Image - Assignment 2
Discuss the use of technology, in a particular work (or set of works),
by a practitioner from the area of lm, video, electronic art, multime-
dia, performance or digital art.
Introduction
In this essay I am going to discuss the role of technology in a series of performance pieces by
Gobal Levin and Zachary Lieberman. Both are innovators, on a continually increasing list, for
whom art is the driving force and technique the medium, the two are inextricably linked in a
upward spiral of innovation, This cycle of innovation between disciplines is incredible with the
improvement of one fueling the other. Levin and Lieberman as multidisciplinary performance
artists push the limits in each eld and blurring the boundaries between art and technology.
Artists
Golan Levin and Zachary Lieberman were the collaborators who created the Manual Input
Sessions. A brief prole of each has been added below to provide a background for the discus-
sion to follow.
G O L A N L E V I N
Levin is an artist, composer, performer and engineer whose works examine the human rela-
tionship with machines. Levin spoke at TED in 2004 giving a performance and lecture titled
Software (As) Art[8]. The title alone sums up how technology can be viewed as not just a tool
but a work of art in its own right.
The below quote sums up what Levin achieves through his works:
Through performances, digital artifacts, and virtual environments... ...Levin applies creative twists to digital technolo-
gies that highlight our relationship with machines, make visible our ways of interacting with each other, and explore the
intersection of abstract communication and interactivity. [1]
Z A C H A RY L I E B E R M A N
Lieberman is an artist and computer programmer. Whose art work focuses around computer
graphics human computer interaction and computer vision.[13] Below is a quote taken from
the YesYesNo website about Liebermans work.
Survey of Music and Image - Assignment 2" 4
Zachary Liebermans work uses technology in a playful way to explore the nature of communication and the delicate
boundary between the visible and the invisible.[11]
Lieberman was named the 36th most creative business person of 2010 by Fast Company
magazine.[12] This was no mean feat as some of the other names on the list include company
CEOs, lmmakers and pop artists.
Both of the collaborators on the Manual Input Workstation/Sessions are artists who also have
a strong grounding in technology. For them the technology is the medium through which they
realise their art. What makes a discussion of these works so relevant, is that it was created by
two engineers who uses engineering techniques to realise performance art ideas, based in vision
and sound. This is contrary to the more conventional idea of an artist enlisting an engineer to
help them realise an idea.
Survey of Music and Image - Assignment 2" 5
Manual Input Sessions
The Manual Input Sessions is a series of three vinagettes that bridge the gap between musical
performance and performing art. Each of the pieces use computer vision techniques as the in-
put for generating both the audio and the visual elements of the performance. As well as gen-
erating the audio and visuals the performers hand silhouettes are projects on to the screen and
seems to directly interact with the visuals that they are creating. It is really creative use of
technology and an incredible interactive means of user input.
T H E S Y S T E M ( I M P L E M E N T A T I O N )
In order to perform any of the pieces a hardware setup must rst be created. I have included
the equipment list in the appendix, but I will go through and discuss the setup in this section.
The basic equipment needed for this installation is an overhead projector, video projector,
camera, speakers and a computer. In gure 1 you can see a simplied diagram of the layout for
the hardware.
Figure 1: Diagram of the system[6]
The performer uses their hands to make silhouettes on the overhead projector. The video cam-
era captures a video of the silhouettes and sends it to a computer. Within the computer the
image is analysed and information is extracted from this. This information is then used to gen-
erate sounds which are played back through the speakers and images which are projected using
the video projector. All of these processes happen almost instantaneously so that the silhou-
ettes, visuals and sounds interact in realtime. Figure 2 is an example of the system in operation
(minus sound).
Survey of Music and Image - Assignment 2" 6
Figure 2: The system in operation[2]
We can see in gure 2 the performer manipulating shadows on the screen using the overhead
projector. If you look closely at the overhead projector you can see that there is a pink gel
placed over the light source. If we look back at the projection we can see the silhouette, a pink
background and a white shape, outlined by the silhouette. This white shape is being projected
from the video projector while the rest of the image is projected from the overhead projector.
This makes the white shape a bit blurry as it is two projections layered one on top of the other,
while the rest of the image has crisp edges.
I would speculate that the pink colour of the over head projector has some functional role and
is not there for purely aesthetic reasons. The fact that it is listed on the equipment list would
lead me to believe this. It is probably used to enable the vision system to dierentiate between
the two projections. So that the vision system only analyses changes that happen on the over-
head projector.
Figure 3: Picture of the overhead projector, video projector and camera[6]
Survey of Music and Image - Assignment 2" 7
Figure 3 is an image of the projector and camera. From this we can see how the hardware would
be laid out in an installation. In an installation the projection from both of the projector are
aligned, so that when the silhouettes create visuals the visuals are aligned with the silhouettes.
This improves the eect of the installation and helps to blur the distinction of what is inu-
encing what.
A camera is attached to the video projector to capture the silhouettes. Figure 4 below gives a
close up image of the setup of video projector and camera. The input that the camera captures
must also be lined up with the projection of the silhouettes to capture the information neces-
sary to create the visuals and sounds.
Figure 4: Video projector and camera[6]
This section has described the hardware necessary to create this installation and how it must
be setup to realise the pieces. In the next section I will go through each of the vignettes in turn
and describe how they use the information generated by the projector and camera to create the
visuals and sounds. Each of the vignettes manipulates the input in dierent ways to create dif-
ferent sounds and images.
Survey of Music and Image - Assignment 2" 8
NE G D R O P
The NegDrop is one of the pieces that makes use of this system. It uses computer vision in-
formation taken from a silhouette to generate sound and image.
Figure 5 below gives an example of the screen image of the NegDrop. The right hand image in
the gure is a time-lapsed composite.[14]
Figure 5: Example of the NegDrop[14]
The computer vision for the NegDrop scans the silhouettes on the screen looking for closed
interior contours or holes. With these holes it uses the video projector to project the same
shape but in white over the original image. This happens in real time and there is no percepti-
ble time lag to the observer.
When the performer releases the shape the camera recognizes this and performs a series of
computations and manipulations on the output from the video projection. On the screen the
shape behaves as if it were a real object released from a height and falls to the bottom of the
image. When it hits the bottom, the system recognizes this and a sound is generated. Like a
real object the shape bounces and a sound is created each time it strikes the edges of the pro-
jection. Over time the bounces diminish and the shape and audio fades out. Figure 6 below is a
diagrammatic representation of how the sound is created.
Figure 6: Diagram of the NegDrop sound generation[14]
3.2! 7KH1HJ'URS,QVWUXPHQW
In our !"#$%&' perIormance module. closed interior contours
(i.e. holes or negative spaces) in the perIormer`s hands are
detected by the computer vision system. and used as visual
representations oI virtual sound-producing obiects. (Such interior
contours can be made. Ior example. by enclosing an empty region
between one`s thumb and IoreIinger. as with the 'OK hand sign.)
When the perIormer breaks the contour oI the hole by separating
his Iingers. the shape is released Irom his hand and Ialls
downward as iI pulled by gravity.

Figure 6. In the !"#$%&' instrument. interior contours
become droppable virtual objects which trigger sounds
when they collide with the boundaries of the projection. [The
right-hand photograph is a time-lapse composite.]
When the virtual shape collides with the boundaries oI the
proiection area. it bounces rigidly oII the boundary and triggers
the production oI a MIDI sound whose properties are closely
coupled to certain visual aspects oI the dropped shape. (The
audiovisual mappings in !"#$%&' are given in Table 1.) With
each bounce. the dropped obiect voices its sound and loses a
percentage oI its kinetic energy to simulated Iriction; aIter a
while. the obiect lacks suIIicient energy to continue bouncing and
is made to Iade away. In our current implementation. virtual
obiects dropped Irom the top oI the proiection bounce Ior
approximately Iive seconds.

Figure 7. Dropped objects inherit their initial lateral velocity
from the horizontal movement of the hand that released them.
The horizontal position of the virtual object governs the stereo
position of the sounds it produces.
Although the perIormer can quickly deposit a large number oI
bouncing virtual shapes ('Neggs). such that many shapes co-
exist in the proiection simultaneously. the implementation oI
inter-shape collisions is currently disabled. as the sounds caused
by secondary collisions between Neggs were iudged to be too
chaotic.
Table 1. Audiovisual Mappings in !"#$%&'(
Contour Properties Sound Properties
contour area pitch (large low)
collision energy volume
horizontal position stereo pan location
compactness / pointiness timbral brightness

Instrumentally speaking. it is somewhat diIIicult to predict the
precise pitch which a dropped Negg will produce. Small
variations in shape area. owing to such Iactors as the variability in
the distance Irom the perIormer`s hand to the glass platen oI the
OHP. can lead to pitch variations oI one or two semitones. The
!"#$%&' instrument is consequently a poor choice Ior the
perIormance oI explicitly melodic musical material. At the same
time. it is quite easy to predict the #"("%)* pitch range in which a
Negg will sound. !"#$%&' additionally aIIords very precise
control oI note attack timing. as this can be directly regulated by
the distance Irom the perIormer`s hand to the virtual Iloor. As a
result. !"#$%&' is a good instrument Ior perIorming textures oI
note-clusters and some varieties oI pitched rhythmic percussion.
Our current implementation oI !"#$%&' uses MIDI as an
expedient means oI triggering real-time sound events. Owing to
!"#$%&'`s use oI simulated physics. however. this instrument is a
good candidate Ior the use oI physical modeling-based synthesis
techniques such as those described by O`Brien et al. in |9|. In
such a design. which we intend to pursue in a Iuture version oI the
+)(,)*- .(',/- 0"112&(1 proiect. synthetic sounds would be
computed by modeling our silhouette-derived virtual obiects as
elastic masses with shape-speciIic modes oI natural vibration.
3.3! The ,QQHU6WDPS Instrument
Like !"#$%&'. the .(("%0/)3' perIormance module also uses
negative contours inside the perIormer`s hands to generate sound.
Unlike !"#$%&'. however. .(("%0/)3' presents an interaction Ior
the synthesis oI continuous drones. rather than the triggering oI
discrete notes.
When the perIormer oI .(("%0/)3' creates a closed negative
shape within the silhouette oI her hands. this interior contour is
highlighted. and a pitched drone is heard. As long as the
perIormer does not rupture the shape`s contour. the sound oI this
drone can be continuously modiIied by changing various visual
properties oI the contour. Flattening the contour into a long. thin
shape. Ior example. brightens the timbre oI its drone. Changing
the perimeter oI the shape Irom large to small causes its drone to
rise in pitch.

Figure 8. In the )**"%+,-.' instrument. interior contours
persist after they are created.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
118
3.2! 7KH1HJ'URS,QVWUXPHQW
In our !"#$%&' perIormance module. closed interior contours
(i.e. holes or negative spaces) in the perIormer`s hands are
detected by the computer vision system. and used as visual
representations oI virtual sound-producing obiects. (Such interior
contours can be made. Ior example. by enclosing an empty region
between one`s thumb and IoreIinger. as with the 'OK hand sign.)
When the perIormer breaks the contour oI the hole by separating
his Iingers. the shape is released Irom his hand and Ialls
downward as iI pulled by gravity.

Figure 6. In the !"#$%&' instrument. interior contours
become droppable virtual objects which trigger sounds
when they collide with the boundaries of the projection. [The
right-hand photograph is a time-lapse composite.]
When the virtual shape collides with the boundaries oI the
proiection area. it bounces rigidly oII the boundary and triggers
the production oI a MIDI sound whose properties are closely
coupled to certain visual aspects oI the dropped shape. (The
audiovisual mappings in !"#$%&' are given in Table 1.) With
each bounce. the dropped obiect voices its sound and loses a
percentage oI its kinetic energy to simulated Iriction; aIter a
while. the obiect lacks suIIicient energy to continue bouncing and
is made to Iade away. In our current implementation. virtual
obiects dropped Irom the top oI the proiection bounce Ior
approximately Iive seconds.

Figure 7. Dropped objects inherit their initial lateral velocity
from the horizontal movement of the hand that released them.
The horizontal position of the virtual object governs the stereo
position of the sounds it produces.
Although the perIormer can quickly deposit a large number oI
bouncing virtual shapes ('Neggs). such that many shapes co-
exist in the proiection simultaneously. the implementation oI
inter-shape collisions is currently disabled. as the sounds caused
by secondary collisions between Neggs were iudged to be too
chaotic.
Table 1. Audiovisual Mappings in !"#$%&'(
Contour Properties Sound Properties
contour area pitch (large low)
collision energy volume
horizontal position stereo pan location
compactness / pointiness timbral brightness

Instrumentally speaking. it is somewhat diIIicult to predict the
precise pitch which a dropped Negg will produce. Small
variations in shape area. owing to such Iactors as the variability in
the distance Irom the perIormer`s hand to the glass platen oI the
OHP. can lead to pitch variations oI one or two semitones. The
!"#$%&' instrument is consequently a poor choice Ior the
perIormance oI explicitly melodic musical material. At the same
time. it is quite easy to predict the #"("%)* pitch range in which a
Negg will sound. !"#$%&' additionally aIIords very precise
control oI note attack timing. as this can be directly regulated by
the distance Irom the perIormer`s hand to the virtual Iloor. As a
result. !"#$%&' is a good instrument Ior perIorming textures oI
note-clusters and some varieties oI pitched rhythmic percussion.
Our current implementation oI !"#$%&' uses MIDI as an
expedient means oI triggering real-time sound events. Owing to
!"#$%&'`s use oI simulated physics. however. this instrument is a
good candidate Ior the use oI physical modeling-based synthesis
techniques such as those described by O`Brien et al. in |9|. In
such a design. which we intend to pursue in a Iuture version oI the
+)(,)*- .(',/- 0"112&(1 proiect. synthetic sounds would be
computed by modeling our silhouette-derived virtual obiects as
elastic masses with shape-speciIic modes oI natural vibration.
3.3! The ,QQHU6WDPS Instrument
Like !"#$%&'. the .(("%0/)3' perIormance module also uses
negative contours inside the perIormer`s hands to generate sound.
Unlike !"#$%&'. however. .(("%0/)3' presents an interaction Ior
the synthesis oI continuous drones. rather than the triggering oI
discrete notes.
When the perIormer oI .(("%0/)3' creates a closed negative
shape within the silhouette oI her hands. this interior contour is
highlighted. and a pitched drone is heard. As long as the
perIormer does not rupture the shape`s contour. the sound oI this
drone can be continuously modiIied by changing various visual
properties oI the contour. Flattening the contour into a long. thin
shape. Ior example. brightens the timbre oI its drone. Changing
the perimeter oI the shape Irom large to small causes its drone to
rise in pitch.

Figure 8. In the )**"%+,-.' instrument. interior contours
persist after they are created.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
118
Survey of Music and Image - Assignment 2" 9
The sounds are generated from the falling shapes using a number of predened parameters.
Table 1 outlines the various properties of the shape and how they are matched to sound prop-
erties.
Table 1: Audiovisual mapping in NegDrop[14]
When watching the performance you might be able to pick out how some of the audio pa-
rameters are matched to the visuals but not all of them. When I watched a video of NegDrop I
recognised how the pitch and volume were being generated but not the other audio parame-
ters.
The NegDrop is a very eective piece, particularly the manner in which the sounds are gener-
ated. The fact that the released shapes behave as if they were subject to gravity is very eective
and makes the piece very easy to comprehend on rst viewing.
The artists/engineers have put a considerable amount of work into creating a system that is
visually and sonically engaging. It is easy to comprehend even though some of the underlying
programming is no doubt quite complicated. They have also managed to integrate the eld of
computer vision, computer music, video and performance into one composition.
3.2! 7KH1HJ'URS,QVWUXPHQW
In our !"#$%&' perIormance module. closed interior contours
(i.e. holes or negative spaces) in the perIormer`s hands are
detected by the computer vision system. and used as visual
representations oI virtual sound-producing obiects. (Such interior
contours can be made. Ior example. by enclosing an empty region
between one`s thumb and IoreIinger. as with the 'OK hand sign.)
When the perIormer breaks the contour oI the hole by separating
his Iingers. the shape is released Irom his hand and Ialls
downward as iI pulled by gravity.

Figure 6. In the !"#$%&' instrument. interior contours
become droppable virtual objects which trigger sounds
when they collide with the boundaries of the projection. [The
right-hand photograph is a time-lapse composite.]
When the virtual shape collides with the boundaries oI the
proiection area. it bounces rigidly oII the boundary and triggers
the production oI a MIDI sound whose properties are closely
coupled to certain visual aspects oI the dropped shape. (The
audiovisual mappings in !"#$%&' are given in Table 1.) With
each bounce. the dropped obiect voices its sound and loses a
percentage oI its kinetic energy to simulated Iriction; aIter a
while. the obiect lacks suIIicient energy to continue bouncing and
is made to Iade away. In our current implementation. virtual
obiects dropped Irom the top oI the proiection bounce Ior
approximately Iive seconds.

Figure 7. Dropped objects inherit their initial lateral velocity
from the horizontal movement of the hand that released them.
The horizontal position of the virtual object governs the stereo
position of the sounds it produces.
Although the perIormer can quickly deposit a large number oI
bouncing virtual shapes ('Neggs). such that many shapes co-
exist in the proiection simultaneously. the implementation oI
inter-shape collisions is currently disabled. as the sounds caused
by secondary collisions between Neggs were iudged to be too
chaotic.
Table 1. Audiovisual Mappings in !"#$%&'(
Contour Properties Sound Properties
contour area pitch (large low)
collision energy volume
horizontal position stereo pan location
compactness / pointiness timbral brightness

Instrumentally speaking. it is somewhat diIIicult to predict the
precise pitch which a dropped Negg will produce. Small
variations in shape area. owing to such Iactors as the variability in
the distance Irom the perIormer`s hand to the glass platen oI the
OHP. can lead to pitch variations oI one or two semitones. The
!"#$%&' instrument is consequently a poor choice Ior the
perIormance oI explicitly melodic musical material. At the same
time. it is quite easy to predict the #"("%)* pitch range in which a
Negg will sound. !"#$%&' additionally aIIords very precise
control oI note attack timing. as this can be directly regulated by
the distance Irom the perIormer`s hand to the virtual Iloor. As a
result. !"#$%&' is a good instrument Ior perIorming textures oI
note-clusters and some varieties oI pitched rhythmic percussion.
Our current implementation oI !"#$%&' uses MIDI as an
expedient means oI triggering real-time sound events. Owing to
!"#$%&'`s use oI simulated physics. however. this instrument is a
good candidate Ior the use oI physical modeling-based synthesis
techniques such as those described by O`Brien et al. in |9|. In
such a design. which we intend to pursue in a Iuture version oI the
+)(,)*- .(',/- 0"112&(1 proiect. synthetic sounds would be
computed by modeling our silhouette-derived virtual obiects as
elastic masses with shape-speciIic modes oI natural vibration.
3.3! The ,QQHU6WDPS Instrument
Like !"#$%&'. the .(("%0/)3' perIormance module also uses
negative contours inside the perIormer`s hands to generate sound.
Unlike !"#$%&'. however. .(("%0/)3' presents an interaction Ior
the synthesis oI continuous drones. rather than the triggering oI
discrete notes.
When the perIormer oI .(("%0/)3' creates a closed negative
shape within the silhouette oI her hands. this interior contour is
highlighted. and a pitched drone is heard. As long as the
perIormer does not rupture the shape`s contour. the sound oI this
drone can be continuously modiIied by changing various visual
properties oI the contour. Flattening the contour into a long. thin
shape. Ior example. brightens the timbre oI its drone. Changing
the perimeter oI the shape Irom large to small causes its drone to
rise in pitch.

Figure 8. In the )**"%+,-.' instrument. interior contours
persist after they are created.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
118
Survey of Music and Image - Assignment 2" 10
I N N E R S T A M P
The InnerStamp is another piece devised for this setup. Like the NegDrop this piece uses
holes in the silhouettes to generate sounds and images. In the case of the InnerStamp though,
the piece generates a continuous drone when there is a hole on the screen the camera captures
this and a visual is created to match the hole. See gure 7.
Figure 7: InnerStamp performance silhouettes[8]
The computer uses the information gained about the hole to generate sound coinciding with
the image. It does this by mapping the contour parameter to the audio parameter to generate
sounds that are directly related to what is happening with the silhouettes. Table 2 gives a break
down of how the parameters are mapped against each other.
Table 2: Audiovisual mapping in InnerStamp[14]
Like the NegDrop the same contour properties are mapped to the same audio properties but it
is the way that the performer can interact with them that has changed. Instead of the system
producing discrete sound and vision events the InnerStamp allows the performer to manipu-
late the sound and vision as a continuous drone. This means that by changing the area within
InnerStamp uses a hybrid granular/FM synthesizer implemented
using the real-time audio aIIordances oI Ross Bencina's
PortAudio library and Stephen Pope`s CSL toolkits |1|.|11|.
InnerStamp consequently oIIers extremely precise control oI pitch
and timbre. Further details about its mappings can be Iound in
Table 2. below.
Table 2. Audiovisual Mappings in !""#$%&'().
Contour Properties Sound Properties
contour perimeter pitch (large low)
horizontal position stereo pan location
time since hands departed volume decay
perimeter-to-area ratio
(i.e. non-compactness)
FM modulation index
(i.e. timbral brightness)

A unique aspect oI the InnerStamp instrument is that. during the
time that the user is still holding the negative shape 'inside her
hands. the shape records all oI the transIormations that are
happening to it. These transIormations include any and all oI the
user`s real-time manipulations oI the contour`s size. position. or
boundary shape. AIter the user 'releases the shape (by opening
up her hands). the shape remains in the proiection and plavs
back the recorded manipulations which happened to it earlier.
These transIormations replay endlessly. looping back-and-Iorth.
until the user removes her hands Irom the proiection. at which
point the contour`s sound and image gradually Iades away.
While an animating shape replays its morphological history. it
also replays its sonic history. Thus a shape which was created to
animate Irom large to small (and hence glide Irom a low drone to
a high-pitched one) will replay this sound-passage while it also
loops visually Irom large to small and back again.

Figure 9. Interior contours deposited into the !""#$%&'()
projection replay their individual histories of movement.
The InnerStamp instrument permits up to three animating shape-
stamps to be deposited into the proiection at any one time. (Using
more than three simultaneously was iudged to be too chaotic.)
Each newly-introduced recording replaces the oldest active stamp.
3.4! 7KH5RWXQL,QVWUXPHQW
The Rotuni instrument develops rhythmic melodic ostinatos Irom
the positive contours oI the perIormer`s hands. or any other
opaque obiects which are placed onto the glass platen oI the
system`s overhead proiector. Unlike NegDrop or InnerStamp. it is
not necessary Ior the perIormer oI Rotuni to create an interior
(negative) contour in order Ior the system to produce sound.

Figure 10. The *+&,"- instrument generates a rhythmic
melody for each positive silhouette contour it identifies.
Users play Rotuni by placing their hands or other obiects on the
glass surIace oI the overhead proiector. The outline contours oI
the individual obiects are individually segmented and tracked by
the computer. These silhouettes are then digitally re-proiected
onto the proiection screen. but with the signiIicant addition oI a
virtual 'clock arm similar to an old-Iashioned radar display. This
arm extends Irom the centroid oI each silhouette to its edge. and
rotates in discrete rhythmic time steps according to a pre-set
tempo.
As the clock arm sweeps around the contour oI the silhouette. a
MIDI note is triggered whose pitch is proportional to the length oI
the clock arm at that time-step. Thus. Ior example. circular shapes
yield drone-like pulses. while shapes with odd protuberances (like
Iingers) create high notes when the clock arm sweeps past a Iinger
(and lower notes otherwise). The Rotuni is polyphonic. since each
silhouette yields its own melody.

Figure 11. The pitch produced on a given beat is proportional
to the OHQJWKRIWKHVKDSHVrotating radial arm.
Each silhouette. moreover. yields a melody which is unique to its
Iorm. Cardboard cutout shapes can be designed. thereIore. which
yield predictable melodies when placed into the system. In The
Manual Input Sessions perIormance. we employ a combination oI
malleable silhouettes Irom our hands. Iixed cutout cardboard
shapes. and everyday obiects (such as coins. scissors. keys. and
PC mice) when playing the Rotuni instrument.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
119
Survey of Music and Image - Assignment 2" 11
the hole the performer can raise or lower the pitch. Also by maintaining the contour size and
using you ngers to ll in the area the timbre can be adjusted.
The program will continue to generate sound and images matched to the silhouettes as long as
there is a hole maintained in the projection. Once the performer breaks the silhouette they can
no longer manipulate the output from that particular shape. The program stores the informa-
tion generated by the performers movements and after they have released the shape it will
playback through these shapes and associated sounds. Playback will continue until the per-
former removes their hands from the projection. When this happens the animations will
gradually fade from the projection and the audio will fade as well. Figure 8 is an exmple of the
system playing back an image and associated sound.
Figure 8: InnerStamp playback[14]
The performer can generate up to three shape/sound animations simultaneously. After three,
for each new animation generated an older one will disappear.
The InnerStamp piece is similar to the NegDrop in the way that it uses negative contours to
generate an output audio and visual. It is dierent in the way that it uses this information to
generate the output.
It is not as immediately accessible as the NegDrop because it is not as apparent how the sound
and image are linked. As the InnerStamp manipulates the data continuously it is not as clear
what is inuencing what. This could be exactly what the artists were going for when they came
up with the idea for this piece and if so it works very well.
3.2! 7KH1HJ'URS,QVWUXPHQW
In our !"#$%&' perIormance module. closed interior contours
(i.e. holes or negative spaces) in the perIormer`s hands are
detected by the computer vision system. and used as visual
representations oI virtual sound-producing obiects. (Such interior
contours can be made. Ior example. by enclosing an empty region
between one`s thumb and IoreIinger. as with the 'OK hand sign.)
When the perIormer breaks the contour oI the hole by separating
his Iingers. the shape is released Irom his hand and Ialls
downward as iI pulled by gravity.

Figure 6. In the !"#$%&' instrument. interior contours
become droppable virtual objects which trigger sounds
when they collide with the boundaries of the projection. [The
right-hand photograph is a time-lapse composite.]
When the virtual shape collides with the boundaries oI the
proiection area. it bounces rigidly oII the boundary and triggers
the production oI a MIDI sound whose properties are closely
coupled to certain visual aspects oI the dropped shape. (The
audiovisual mappings in !"#$%&' are given in Table 1.) With
each bounce. the dropped obiect voices its sound and loses a
percentage oI its kinetic energy to simulated Iriction; aIter a
while. the obiect lacks suIIicient energy to continue bouncing and
is made to Iade away. In our current implementation. virtual
obiects dropped Irom the top oI the proiection bounce Ior
approximately Iive seconds.

Figure 7. Dropped objects inherit their initial lateral velocity
from the horizontal movement of the hand that released them.
The horizontal position of the virtual object governs the stereo
position of the sounds it produces.
Although the perIormer can quickly deposit a large number oI
bouncing virtual shapes ('Neggs). such that many shapes co-
exist in the proiection simultaneously. the implementation oI
inter-shape collisions is currently disabled. as the sounds caused
by secondary collisions between Neggs were iudged to be too
chaotic.
Table 1. Audiovisual Mappings in !"#$%&'(
Contour Properties Sound Properties
contour area pitch (large low)
collision energy volume
horizontal position stereo pan location
compactness / pointiness timbral brightness

Instrumentally speaking. it is somewhat diIIicult to predict the
precise pitch which a dropped Negg will produce. Small
variations in shape area. owing to such Iactors as the variability in
the distance Irom the perIormer`s hand to the glass platen oI the
OHP. can lead to pitch variations oI one or two semitones. The
!"#$%&' instrument is consequently a poor choice Ior the
perIormance oI explicitly melodic musical material. At the same
time. it is quite easy to predict the #"("%)* pitch range in which a
Negg will sound. !"#$%&' additionally aIIords very precise
control oI note attack timing. as this can be directly regulated by
the distance Irom the perIormer`s hand to the virtual Iloor. As a
result. !"#$%&' is a good instrument Ior perIorming textures oI
note-clusters and some varieties oI pitched rhythmic percussion.
Our current implementation oI !"#$%&' uses MIDI as an
expedient means oI triggering real-time sound events. Owing to
!"#$%&'`s use oI simulated physics. however. this instrument is a
good candidate Ior the use oI physical modeling-based synthesis
techniques such as those described by O`Brien et al. in |9|. In
such a design. which we intend to pursue in a Iuture version oI the
+)(,)*- .(',/- 0"112&(1 proiect. synthetic sounds would be
computed by modeling our silhouette-derived virtual obiects as
elastic masses with shape-speciIic modes oI natural vibration.
3.3! The ,QQHU6WDPS Instrument
Like !"#$%&'. the .(("%0/)3' perIormance module also uses
negative contours inside the perIormer`s hands to generate sound.
Unlike !"#$%&'. however. .(("%0/)3' presents an interaction Ior
the synthesis oI continuous drones. rather than the triggering oI
discrete notes.
When the perIormer oI .(("%0/)3' creates a closed negative
shape within the silhouette oI her hands. this interior contour is
highlighted. and a pitched drone is heard. As long as the
perIormer does not rupture the shape`s contour. the sound oI this
drone can be continuously modiIied by changing various visual
properties oI the contour. Flattening the contour into a long. thin
shape. Ior example. brightens the timbre oI its drone. Changing
the perimeter oI the shape Irom large to small causes its drone to
rise in pitch.

Figure 8. In the )**"%+,-.' instrument. interior contours
persist after they are created.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
118
Survey of Music and Image - Assignment 2" 12
R O T U N I
The Rotuni is the nal viginette created for the Manual Input Sessions. It is dierent from the
previous two as it does not use negative contours/holes in the silhouettes to generate sounds.
Instead the images and sounds for this piece are created in a completely dierent manner.
When an object creates a silhouette the computer scans this image and generates a rotating
green radar arm from its centroid. In gure 9 you can see that the demonstration uses cutout
shapes for this piece, although any silhouette can be used.
Figure 9: Example of Rotuni using shapes[2]
As the arm rotates a Midi note is generated corresponding to the length of the arm eg. the
longer the arm the higher the pitch. The notes are triggered at discrete time steps set by a pre-
dened tempo. The shape thus generates a repeating melody, each time the arm rotates, if the
shape remains the same it plays the same melody on each sweep of the arm. By using cutouts
shapes the performer can predict the melody that the system will output. This could be very
useful to play interlocking melodies composed of several dierent shapes.
Survey of Music and Image - Assignment 2" 13
Figure 10: How the Rotuni generates pitches[14]
In gure 10 we can see how the program calculates the pitch, using the radius from the centre
of the shape. A spikier shape would oer more variation of pitch while a circle would produce a
drone like sound.
Table 3: Audiovisual mapping in Rotuni
From table 3 we can see that the radius arm is mapped to the pitch and the horizontal position
of the silhouette is mapped to the stereo position.
The Rotuni is not as interactive as the previous two piece but it is still an impressive use of the
technology and shows what can be done with the processing systems. It could be more useful
in a performance setting though, as if the performer used pre-made shapes for silhouettes they
would have better control of the output, unlike the NegDrop and the InnerStamp which are
much more susceptible to variations between performances. Although this would depend on
what the performer was trying the achieve (it might be that variety between performance is
what is desired).
InnerStamp uses a hybrid granular/FM synthesizer implemented
using the real-time audio aIIordances oI Ross Bencina's
PortAudio library and Stephen Pope`s CSL toolkits |1|.|11|.
InnerStamp consequently oIIers extremely precise control oI pitch
and timbre. Further details about its mappings can be Iound in
Table 2. below.
Table 2. Audiovisual Mappings in !""#$%&'().
Contour Properties Sound Properties
contour perimeter pitch (large low)
horizontal position stereo pan location
time since hands departed volume decay
perimeter-to-area ratio
(i.e. non-compactness)
FM modulation index
(i.e. timbral brightness)

A unique aspect oI the InnerStamp instrument is that. during the
time that the user is still holding the negative shape 'inside her
hands. the shape records all oI the transIormations that are
happening to it. These transIormations include any and all oI the
user`s real-time manipulations oI the contour`s size. position. or
boundary shape. AIter the user 'releases the shape (by opening
up her hands). the shape remains in the proiection and plavs
back the recorded manipulations which happened to it earlier.
These transIormations replay endlessly. looping back-and-Iorth.
until the user removes her hands Irom the proiection. at which
point the contour`s sound and image gradually Iades away.
While an animating shape replays its morphological history. it
also replays its sonic history. Thus a shape which was created to
animate Irom large to small (and hence glide Irom a low drone to
a high-pitched one) will replay this sound-passage while it also
loops visually Irom large to small and back again.

Figure 9. Interior contours deposited into the !""#$%&'()
projection replay their individual histories of movement.
The InnerStamp instrument permits up to three animating shape-
stamps to be deposited into the proiection at any one time. (Using
more than three simultaneously was iudged to be too chaotic.)
Each newly-introduced recording replaces the oldest active stamp.
3.4! 7KH5RWXQL,QVWUXPHQW
The Rotuni instrument develops rhythmic melodic ostinatos Irom
the positive contours oI the perIormer`s hands. or any other
opaque obiects which are placed onto the glass platen oI the
system`s overhead proiector. Unlike NegDrop or InnerStamp. it is
not necessary Ior the perIormer oI Rotuni to create an interior
(negative) contour in order Ior the system to produce sound.

Figure 10. The *+&,"- instrument generates a rhythmic
melody for each positive silhouette contour it identifies.
Users play Rotuni by placing their hands or other obiects on the
glass surIace oI the overhead proiector. The outline contours oI
the individual obiects are individually segmented and tracked by
the computer. These silhouettes are then digitally re-proiected
onto the proiection screen. but with the signiIicant addition oI a
virtual 'clock arm similar to an old-Iashioned radar display. This
arm extends Irom the centroid oI each silhouette to its edge. and
rotates in discrete rhythmic time steps according to a pre-set
tempo.
As the clock arm sweeps around the contour oI the silhouette. a
MIDI note is triggered whose pitch is proportional to the length oI
the clock arm at that time-step. Thus. Ior example. circular shapes
yield drone-like pulses. while shapes with odd protuberances (like
Iingers) create high notes when the clock arm sweeps past a Iinger
(and lower notes otherwise). The Rotuni is polyphonic. since each
silhouette yields its own melody.

Figure 11. The pitch produced on a given beat is proportional
to the OHQJWKRIWKHVKDSHVrotating radial arm.
Each silhouette. moreover. yields a melody which is unique to its
Iorm. Cardboard cutout shapes can be designed. thereIore. which
yield predictable melodies when placed into the system. In The
Manual Input Sessions perIormance. we employ a combination oI
malleable silhouettes Irom our hands. Iixed cutout cardboard
shapes. and everyday obiects (such as coins. scissors. keys. and
PC mice) when playing the Rotuni instrument.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
119
Table 3. Audiovisual Mappings in !"#$%&.
Contour Properties Sound Properties
length oI sweeping radial arm pitch (shortlow)
horizontal position stereo pan location
contour ID number MIDI timbre selection

Rotuni oIIers an intuitive interIace Ior controlling melodic
material in a rhythmic context. It is even possible to perIorm
musical rests in Rotuni`s otherwise periodic beat. by creating C-
shaped silhouettes whose centroids lie outside the shape`s
boundary. Regrettably. our current implementation oI this
instrument does not provide any other interIace mechanism Ior
modulating its volume dynamics. or regulating its basic tempo.
Although there are obvious non-intrinsic solutions to these issues
(e.g. volume pedals and/or keyboard buttons). this is an area oI
Iurther research Ior us.
4.! CONCLUSIONS
We present several instruments that use the interior and exterior
contours oI hand silhouettes. as detected and analyzed by a
computer vision system. to create and manipulate sound and
animated imagery simultaneously. Recognizing Lev Manovich`s
deIinition oI augmented reality as an 'overlaying oI dynamic
and context-speciIic inIormation over the visual Iield oI a user
|7| we conclude that our instruments. which merge real-time
sound with virtual synthetic graphics and organic analog shadows.
enable a new Iorm oI live audiovisual cinema to be perIormed in
the hybrid locale oI an augmented reality.
5.! ACKNOWLEDGMENTS
A version oI the Rotuni instrument was originally created in 1997
at Interval Research Corporation with the collaboration oI Scott
Snibbe. Marcos Vescovi and Philippe Piernot |10|. Further
development oI our instruments and perIormance was made
possible through support Irom the 2004 Whitney Biennial. The
Kitchen. the 2004 Ars Electronica Festival. and RomaEuropa
Festival 2004. We are indebted to Gregory Shakar. Andrea
Boykowycz and Nurit Bar-shai Ior their invaluable support and
assistance with the perIormances oI this proiect.
6.! REFERENCES
|1|! Bencina. Ross. PortAudio sound synthesis library.
http://www.portaudio.com/.
|2|! Da Fontoura Costa. L. et al. Shape Analvsis and
Classification. Theorv and Practice. CRC Press. 2000.
|3|! Krueger. M. Artificial Realitv II. Addison-Wesley. 1991.
|4|! Levin. G. 'Painterly InterIaces Ior Audiovisual
PerIormance. M.S. Thesis. MIT Media Laboratory. August
2000. http://acg.media.mit.edu/people/golan/thesis/.
|5|! Levin. G. and Lieberman. Z. 'In-Situ Speech Visualization
in Real-Time Interactive Installation and PerIormance.
Proc. 3rd International Svmposium on Non-Photorealistic
Animation and Rendering. Annecy. France. 2004.
|6|! Lyons. M.. Haehnel. M.. and Tetsutani. N. 'The
Mouthesizer: A Facial Gesture Musical InterIace.
Conference Abstracts. Siggraph 2001. Los Angeles. p. 230.
|7|! Manovich. Lev. The Language of New Media. MIT Press.
2001.
|8|! Axel G.E. Mulder. S. Sidney Fels and Kenii Mase. 'Design
oI Virtual 3D Instruments Ior Musical Interaction.
Proceedings of Graphics Interface 99. (Kingston. ON.
Canada. 2-4 June 1999. S. Mackenzie and J. Stewart (eds.))
pp. 76-83. Toronto. ON. Canada: University oI Toronto.
|9|! O'Brien. J.. Cook. P.. and Essl. G. 'Synthesizing Sounds
Irom Physically Based Motion. The proceedings of ACM
Siggraph 2001. Los Angeles. CaliIornia. pp. 529-536.
|10|!Piernot. P.. Vescovi. M.. Cohen. J.. Levin. G.. et al. 'Video
camera based computer input system with interchangeable
physical interIace (A modular tabletop surface for use with
computer-vision-based childrens games). US Pats. 5953686
and 6047249. Filed 7 July 1996. issued 4 April 2000.
|11|!Pope. Stephen et. al. CREATE Signal Librarv (CSL).
http://www.create.ucsb.edu/mailman/listinIo/csl.


Figure 12. Visual summary of the hybrid analog/digital light projection technique used in '()*+,%$,-*.%/$#*0)11&"%1 instruments.
Left to right: (1) Live source imagery of the performer`s hand silhouettes is obtained from the overhead projector; (2) Hand
silhouettes are analyzed by a computer vision sub-system. and computer graphics (typically two-dimensional lines and polygons)
are generated in response; (3) The synthetic graphics are warped by an affine transform in order to accommodate any necessary
perspective corrections. and then projected so as to coincide with the light projection emitted by the overhead projector.
Proceedings of the 2005 International Conference on New Interfaces for Musical Expression (NIME05), Vancouver, BC, Canada
120
Survey of Music and Image - Assignment 2" 14
Conclusion
The Manual Input Sessions are a triumph in bridging the gap between art and technology.
Levin and Lieberman have used comparatively simple hardware to create an installation and
performance that explores our relationship to sounds, images and technology. In the process
blurring the distinction between the three.
There are obviously some very sophisticated processes going on in the background to create
this installation but because it is so seamless they do not draw attention to themselves. This is
one of the primary reasons, I think, that the Manual Input concept works so well. The focus of
the user or the audience is placed solely on the output and interaction and not on the imple-
mentation.
Part of the reason that the idea for this project is so easy to grasp is that user created shadows
are the input. Everyone has at some stage made shapes using shadows so this is very familiar to
us. When the user uses shadows to create images and sound they are using this familiar experi-
ence and it is being augmented with technology.
One of the more interesting aspects of this project is its accessibility to everyone. The inter-
face is simple to get started using and the feedback of both visual and sound is very engaging.
Also there isnt a great degree of manual dexterity needed to use it, this makes it accessible to
both the very young and the very old who should be able to create sounds and visuals without a
lengthy learning curve and no prior experience.
The two collaborator have subsequently gone on to develop and adapt this idea to create
Messa di Voce a performance piece that extends this idea and incorporates vocal performers.
This piece further explores the system as an interactive medium for performance.
The Manual Input Sessions are a very creative use of technology. It is a brilliant concept and an
interesting installation that explores how people interact with technology. The interactive ele-
ment of the piece is the most exciting aspect of the piece as it opens up this technology to eve-
ryone, not just the technophiles. I see interactivity as being an area that will become increas-
ingly prevalent in the near future and it will be applications like the Manual Input Sessions that
will expose people to the creative side of interactivity.
Survey of Music and Image - Assignment 2" 15
Bibliography
[1] Wolf Lieser. Digital Art. Langenscheidt: h.f. ullmann. 2009. pp. 251-53
[2] Flong. The Manual Input Workstation. http://www.ong.com/projects/miw/
[3] Flong. The Manual Input Sessions. http://www.ong.com/projects/mis/
[4] Wikipedia. Golan Levin. http://en.wikipedia.org/wiki/Golan_Levin
[5] Carnegie Mellon Design. Faculty.
http://www.design.cmu.edu/show_person.php?t=f&id=GolanLevin
[6] La fondation Daniel Langois. The Manual Input Workstation - Documentary Co)ection.
http://www.fondation-langlois.org/html/e/page.php?NumPage=2220
[7] The System Is. Manual Input Station. http://thesystemis.com/projects/manual-input-station/
[8] Levin, G. & Lieberman, Z. Tmema. http://www.tmema.org/mis/
[9] TED. Sound As Art. http://www.ted.com/talks/golan_levin_on_software_as_art.html
[10] Sound On Sound. Tomorrows Musicians & What They) Be Playing.
http://www.soundonsound.com/sos/jan06/articles/nime.htm
[11] YesYesNo. Interactive Projects. http://yesyesno.com/zachary-lieberman
[12] Fast Company. The 100 Most Creative People in Business 2010.
http://www.fastcompany.com/100/2010/36/zachary-lieberman
[13] Wikipedia. Zachary Lieberman. http://en.wikipedia.org/wiki/Zachary_Lieberman
[14] Levin, G. Lieberman, Z. (2005). Sounds *om Shapes: Audiovsual Performance with Hand Sil-
houtte Contours in The Manual Input Sessions, in Proceedings of the 2005 International Confer-
ence on New Musical Expression. Vancouver, Canada. pg 115-120.
Survey of Music and Image - Assignment 2" 16
Appendix - Equipment List
Required equipment (list provided by the artists to exhibiting institutions)
- PC Computer, 3.0Ghz+ or Dual 2.2Ghz+ Intel CPU, 512MB RAM, 40GB HD
We recommend and prefer desktop computers from Dell or HP
Windows XP English recommended but not required
nVidia GeForce 7400+ graphics card with 2 outputs (DVI & VGA)
It may be helpful to have a wireless keyboard
- Video projector, 3000+ ANSI Lumen, DLP, 1024x768 native resolution
- Ceiling-mount system for video projector
- 15" LCD screen (1024x768), for administrative purposes only
- Stereo sound system (powered speakers + amplier)
- Stereo audio cables (PC to amplier)
- 1 long coaxial 75-ohm BNC video cable (20 meters-) for Camera to PC
- 1 12-volt DC adaptor, 500 milliamps
- 1 long VGA cable (20 meters-) for PC to video projector
- 1 short VGA cable (2 meters) for PC to LCD screen
- Special pedestal construction
- Overhead Projector
- Spare lamps for Overhead Projector
Additional equipment (the artists may provide some of these items)
- Sony SSCM-183 B&W Security Camera
- IR-pass lter for camera (Kodak Wratten 87C gelatine lter, 1 square inch)
- Bogen/Manfrotto Camera Clamp with Quick-Release Head
- Pinnacle PCTV video capture PCI card for PC
- Rosco #349 Fischer Fuchsia pink gel sheet
- Special cardboard numbers and shapes
Exhibition room requirements:
- Medium-dim room with no daylight
- Light level in the room should be constant, and not uctuate signicantly [6]
Survey of Music and Image - Assignment 2" 17

Você também pode gostar