Escolar Documentos
Profissional Documentos
Cultura Documentos
9-1997
Belinda Thom
Carnegie Mellon University
David Watson
Carnegie Mellon University
This Conference Proceeding is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been
accepted for inclusion in Computer Science Department by an authorized administrator of Research Showcase @ CMU. For more information, please
contact research-showcase@andrew.cmu.edu.
A Machine Learning Approach to Musical Style
Recognition
Roger B. Dannenberg, Belinda Thom, and David Watson
School of Computer Science, Carnegie Mellon University
frbd,bthom,dwatsong@cs.cmu.edu
Abstract
Much of the work on perception and understanding of music by computers has focused on low-level
perceptual features such as pitch and tempo. Our work demonstrates that machine learning can be
used to build eective style classiers for interactive performance systems. We also present an anal-
ysis explaining why these techniques work so well when hand-coded approaches have consistently
failed. We also describe a reliable real-time performance style classier.
We want our classier to exhibit a latency of 5 where C indexes classes, i indexes features, are
seconds, so it must only use 5 seconds of data. There-
C;i
# Training
Examples
8 90.0 84.3 77.0 80
60
Table 1: Percentage of correct classications by dif- 40
ferent classiers. 20
0
(equivalent to a linear classier). During training,
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
hidden units are added with connections to inputs, Number of Notes
outputs, and any previously added hidden units. Ef- Figure 2: Histogram of the number of notes in each
fectively, there are many hidden layers, each with one sample for a given style.
node. Every hidden unit applies a non-linear sigmoid
function to a weighted sum of its inputs. There is one 100
output per class; outputs are regarded as boolean val- 90
ues that predict membership in a given class.
4 Results
60
(%)
50
40
All three classiers produced excellent results with 30
4 classes and impressive results with 8 classes. We 20
10
measured performance by training a classier on 0
4/5 of the data and then classifying the remaining 0 20 40 60 80
1/5. (Since the data is in groups of 6 overlapping| Number of Notes
and therefore correlated|training examples, entire
groups went into either the training or the validation Figure 3: Average duty cycle versus number of notes
sets.) This was repeated 5 times with dierent parti- (scatter plot).
tions of the data so that each example was classied
once. The classier outputs Yes or No for each of 4
or 8 classes, and we report the total percent of cor- mance systems. However, we wondered why these
rect answers. We were surprised by the reliability of systems worked well when hand-coded approaches
the classiers. Table 1 shows the numerical results. have failed. We also wondered under what circum-
Training time for the Bayesian and Linear classiers stances these machine learning systems would also
occurred in seconds, but the neural net took hours. fail.
We implemented a real-time version of the naive To answer the rst question, we have attempted
Bayesian classier. A circular buer stores the last to visualize the data by constructing histograms and
ve seconds of MIDI data, and every second, features scatter plots. Since there are 13 features, one can
are computed and a classication is made. A fth only look at projections onto one or two dimensions.
classication, \silence," was added for the case whereFigure 2 shows a histogram of the number of notes
the number of notes is zero. The execution time is feature. The number of notes per second is a common
well under 1 ms per classication, so computation input parameter in performance systems because it
cost is negligible. has a high correlation with the concept of \frantic"
Note that many of the misclassications in the 8-or \fast." However, as Figure 2 shows, there is con-
class case involve the \quote" style, which one would siderable overlap among the histograms even though
expect to require knowledge of familiar melodies and the class ratings are almost all mutually exclusive.
some encoding of melody into the features. Other histograms show similar overlap, so classica-
tions based on a single feature are not very useful.
Figure 3 shows a scatter plot of two dimensions:
5 Discussion average duty cycle and number of notes. Here, the
four classes almost separate. Given Figure 3, can we
Our work has demonstrated that machine learning still argue that machine learning is necessary? To
techniques can be used to build eective, ecient, build a good classier without machine learning, one
and reliable style classiers for interactive perfor- would rst need to identify good features. Individ-
100 An experiment with our naive Bayesian classier
90 Correct/Classed suggest that simple condence measures can dramat-
80
ically reduce false positives. Recall that this classier
70
Correct/Total makes decisions based on normalized distances from
Percent
60
50 means. If the distance to the means of two classes are
40 nearly equal, our condence in the decision should be
30 Misclassed/Total reduced. Therefore, we simply reject classications
20 when the least distance is not less than a given frac-
10 tion of the next-to-least distance. This type of anal-
0
ysis is one of the beauties of a statistical approach.
1 1.1 1.2 1.4 2 4
Figure 4 illustrates the reduction of false positives
Confidence Threshold
using this technique.
Figure 4: As the condence threshold increases, the
total number of classied examples decreases, but the 6 Summary and Conclusion
number of misclassied examples decreases faster, so
the ratio of correctly classied to all classied exam- Machine learning techniques are a powerful approach
ples increases. to music analysis. We constructed a database of
training examples and used it to train systems to clas-
ual features do not work well, and in our study there sify improvisational style. The features used for clas-
are 132 dierent pairs of features one might consider. from MIDIaredata
sication simple parameters that can be extracted
Most pairs of features will not lead to good classiers, that are classiedin consist
a straightforward way. The styles
of a range of performance
and even if a good pair is known, parameters must be intentions: \frantic," \lyrical," \pointillistic," \syn-
chosen accurately. Casually guessing at good features copated," \high," \low," \quote,"
or combinations, plugging in \reasonable" parame- rst four of these styles are classiedand \blues." The
ters and testing will almost always fail. In contrast, 98% accuracy. Eight classiers, trainedbetter
with
to
than
return
machine learning approaches that automatically take \yes" or \no" for each of the eight styles had an over-
into account a large body of many-dimensional train- all accuracy of 77% to 90%. Condence measures can
ing data lead to very eective classiers.
It is worth noting that we never hand-tuned the be introduced
Further
to reduce the number of false positives.
work is required to study other classi-
original 13 features; rather, we allowed the inference cation problems, feature
algorithms to determine the eective use of the data. ing. We believe this workselection,
has
and feature learn-
applications in music
The classication of stylistic intent is relatively simple performance, composition, analysis, and education to
for machine learning algorithms (at least in the 4- mention just a few. We expect much more sophisti-
class case), which allows one to spend less time on cated music understanding systems in the future.
problems of representation.