Você está na página 1de 6

A Statistical Analysis of Tonal Harmony

By David Temperley
2009
------------

Overview
It is generally believed that harmony in common-practice music (i.e. 18th and 19th century Western art
music) is characterized by certain basic principles. Dominant harmonies (V and vii) go to tonics (I),
predominants (IV and ii) go to dominants, root motion by descending fifth is especially favored, and so
on. But to what extent are these principles actually followed in common-practice composition? There has
been surprisingly little empirical study of this question. [1]
This page presents a statistical analysis of harmonic progressions in a corpus of common-practice music.
The data files and programs used can be downloaded at the bottom of the page.
The data comes from the workbook accompanying Stefan Kostka and Dorothy Payne's theory textbook
Tonal Harmony, 3rd edition (McGraw-Hill, 1995). The workbook contains a number of excerpts of
common-practice pieces, to be analyzed by the student; an accompanying instructor's manual contains
"correct" analyses done by the textbook authors, in conventional Roman numeral notation. The analyses
also show modulations, and represent each chord in relation to the local key.
I created a corpus consisting of all of the analyzed excerpts in the workbook of 8 measures of more in
length; there were 46 such excerpts. I call this the "Kostka-Payne corpus." (A list of the excerpts is shown
here.) I created midifiles and "notefiles" (textfiles listing the notes with pitches and on/off times) of all the
excerpts. (This was done in connection with the testing of the Melisma music analysis system; the
notefiles and midifiles are available at the Melisma ftp site.) The harmonic analyses of the excerpts were
computationally encoded by Bryan Pardo, and added to the midifiles (these midifiles are available at
Pardo's website). I then converted Pardo's analyses into another format, which I call "chord-list" format.
The beginning of a chord-list (for the opening of the Minuet in G major from the Notebook for Anna
Magdalena Bach) is shown here:
0.000
2.608
3.913
5.217

2.608
3.913
5.217
6.521

- 0
- 5
- 0
- 11

1
4
1
7

7
7
7
7

7
0
7
6

Each line represents a chord segment. The first number indicates the beginning of the segment, in
seconds. (For each excerpt, I chose a tempo that I thought was reasonable, and then generated times for
the chord segments using this tempo.) The second number represents the end time of the segment.
Following this are four integers. The first is the "chromatic relative root": the chromatic interval from the
root to the tonic. I use the usual pitch-class notation for intervals: I = 0, bII (or #I) = 1, II = 2, etc. The
second integer indicates the "diatonic relative root" - the Roman numeral number (I = 1, bII = 2, II = 2,
etc.). The third number indicates the tonic (assuming the usual pitch-class notation: C = 0, Db/C# = 1,
etc.), and the fourth number indicates the _absolute_ root (again assuming the usual pitch-class notation).
So the first chord statement above indicates I in the key of G major - a G major chord, in absolute terms.
(Applied chords were relabeled in relation to the local key: for example, V/V was converted to II.)
Note that this format contains no information about the quality of chords (major/minor/diminished) or
extensions (e.g. sevenths, ninths). This information is available in Pardo's midifiles, but I did not encode
it. [2]

The file kp-chord-list contains the chord-lists for the complete KP corpus. The title of each excerpt (using
the short names shown in the corpus list) is indicated at the beginning of the excerpt. Dotted lines "---"
separate one key section from another. ("Pivot chords" - chords at key boundaries that function in both the
previous key and the following one - are represented in both key sections.) I also separated the corpus into
major-key and minor-key key sections; the file kp-chord-list-ma includes just the major-key ones, and kpchord-list-mi includes just the minor-key ones.
A few chords in the corpus were given chord symbols for which there is no widely accepted root, such as
"German 6th". For such chords, the label -1 is used for the chromatic, diatonic, and absolute roots.

Some Aggregate Statistics


Once I had the KP corpus in "chord-list" form, I then wrote a perl-script, tally.pl, which extracts various
kinds of aggregate statistics.
The corpus contains 919 chords, and a total time of 1354.116 seconds.
First I extracted the total count of each chromatic relative root, and the total amount of time spent on that
root.
proportion total
excluding time
Root count proportion tonic
(secs)
I
318 0.346
--- 553.792
bII
17 0.018
0.029
29.805
II
104 0.113
0.180 118.766
bIII
10 0.011
0.017
16.668
III
21 0.023
0.036
25.104
IV
70 0.076
0.121
91.622
#IV
17 0.018
0.029
18.652
V
214 0.233
0.370 302.102
bVI
34 0.037
0.059
44.383
VI
50 0.054
0.087
76.706
bVII
6 0.007
0.010
8.301
VII
35 0.038
0.061
37.552

proportion
0.409
0.022
0.088
0.012
0.019
0.068
0.014
0.223
0.033
0.057
0.006
0.028

(The first "proportion" column shows the count of the chord as a proportion of the total count; the second
"proportion" column shows the time spent on the chord as a proportion of the total time.)
There were also 23 "miscellaneous" chords, not assigned any explicit root (such as augmented-sixth
chords), taking a total time of 30.663 seconds. (These are assigned chromatic root of -1 in the chord list;
diatonic root and absolute root are also -1.)
Then I looked at the "chord transitions" -- the number of times each chord moves to each other chord.
"Antecedent" chords are shown on the vertical axis, "consequent" chords on the horizontal; for example,
the number of occurrences of I moving to II is 31. (The data only reflects transitions within a single key
section; no transition is recorded for moves from one key section to another.)
CHROMATIC ROOT TRANSITION COUNTS
Cons
Ant
I
bII
II
bIII
III
IV
#IV

bII

0
3
22
1
1
32
7

7
0
3
1
0
2
0

II bIII
31
8
0
0
2
10
0

1
0
1
0
0
0
0

III

IV

#IV

bVI

4
0
4
0
0
4
0

45
0
1
0
7
0
0

2
1
7
0
0
3
0

116
2
45
4
1
11
9

11
0
2
4
0
0
0

VI bVII
17
0
8
0
7
1
0

3
0
0
0
0
1
0

VII
19
1
6
0
1
4
0

V
bVI
VI
bVII
VII

167
5
4
0
27

0
2
2
0
0

8
8
28
0
0

1
0
0
5
0

2
1
1
0
3

4
3
4
0
0

0
0
2
0
1

0
2
1
1
1

7
0
0
0
1

6
3
0
0
0

0
2
0
0
0

2
0
1
0
0

It is useful to represent this data in two other ways. First, we represent chromatic root transitions as a
proportion of the total count for the consequent chord. The values in each column sum to 1; thus one can
see, for example, that I is approached by V 62.1% of the time.
CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD
Cons
Ant
I
bII
II
bIII
III
IV
#IV
V
bVI
VI
bVII
VII

bII

II

bIII

III

IV

#IV

bVI

VI

bVII

VII

0.000
0.011
0.082
0.004
0.004
0.119
0.026
0.621
0.019
0.015
0.000
0.100

0.412
0.000
0.176
0.059
0.000
0.118
0.000
0.000
0.118
0.118
0.000
0.000

0.326
0.084
0.000
0.000
0.021
0.105
0.000
0.084
0.084
0.295
0.000
0.000

0.125
0.000
0.125
0.000
0.000
0.000
0.000
0.125
0.000
0.000
0.625
0.000

0.211
0.000
0.211
0.000
0.000
0.211
0.000
0.105
0.053
0.053
0.000
0.158

0.703
0.000
0.016
0.000
0.109
0.000
0.000
0.062
0.047
0.062
0.000
0.000

0.125
0.062
0.438
0.000
0.000
0.188
0.000
0.000
0.000
0.125
0.000
0.062

0.601
0.010
0.233
0.021
0.005
0.057
0.047
0.000
0.010
0.005
0.005
0.005

0.440
0.000
0.080
0.160
0.000
0.000
0.000
0.280
0.000
0.000
0.000
0.040

0.405
0.000
0.190
0.000
0.167
0.024
0.000
0.143
0.071
0.000
0.000
0.000

0.500
0.000
0.000
0.000
0.000
0.167
0.000
0.000
0.333
0.000
0.000
0.000

0.559
0.029
0.176
0.000
0.029
0.118
0.000
0.059
0.000
0.029
0.000
0.000

Now the same for the antecedent chord. Now each row sums to 1. For example, I moves to V .453 of the
time.
CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD
Cons
Ant
I
bII
II
bIII
III
IV
#IV
V
bVI
VI
bVII
VII

bII

II

bIII

III

IV

#IV

bVI

VI

bVII

VII

0.000
0.200
0.222
0.100
0.053
0.471
0.438
0.848
0.192
0.093
0.000
0.818

0.027
0.000
0.030
0.100
0.000
0.029
0.000
0.000
0.077
0.047
0.000
0.000

0.121
0.533
0.000
0.000
0.105
0.147
0.000
0.041
0.308
0.651
0.000
0.000

0.004
0.000
0.010
0.000
0.000
0.000
0.000
0.005
0.000
0.000
0.833
0.000

0.016
0.000
0.040
0.000
0.000
0.059
0.000
0.010
0.038
0.023
0.000
0.091

0.176
0.000
0.010
0.000
0.368
0.000
0.000
0.020
0.115
0.093
0.000
0.000

0.008
0.067
0.071
0.000
0.000
0.044
0.000
0.000
0.000
0.047
0.000
0.030

0.453
0.133
0.455
0.400
0.053
0.162
0.562
0.000
0.077
0.023
0.167
0.030

0.043
0.000
0.020
0.400
0.000
0.000
0.000
0.036
0.000
0.000
0.000
0.030

0.066
0.000
0.081
0.000
0.368
0.015
0.000
0.030
0.115
0.000
0.000
0.000

0.012
0.000
0.000
0.000
0.000
0.015
0.000
0.000
0.077
0.000
0.000
0.000

0.074
0.067
0.061
0.000
0.053
0.059
0.000
0.010
0.000
0.023
0.000
0.000

As a final analysis, we consider the counts of different root interval motions. The left column below
shows each chromatic interval (+m2 = ascending minor second, +M2 = ascending major second, etc.)
along with its count. The right column groups these into diatonic intervals. (Each interval is represented
by its smallest possible form; so a descending fifth is represented as an ascending fourth, +P4.)
INTERVAL COUNTS
Chromatic
+m2 72
+M2 55
+m3
7
+M3 25
+P4 308
-TT 25
-P4 167

Diatonic
+M/m2 127
+M/m3
+P4
TT
-P4

32
308
25
167

-M3
-m3
-M2
-m2

21
43
34
31

-M/m3

64

-M/m2

65

Discussion
To a considerable extent, the conventional rules of harmony are supported by this data. This is perhaps
most clearly seen in the table of root transition counts. The most common root motions, in order, are V-I,
I-V, ii-V, and I-IV (the last two are equally common). All of these are standard, "correct" progressions of
tonal harmony. "Incorrect" progressions such as V-IV are generally less common.
A few things are surprising. In particular, the frequencies of ii-I and IV-I are surprisingly high. Both of
these represent "predominant-to-tonic" motions and are generally considered undesirable. IV-I
progressions do occur in certain circumstances (such as plagal cadences and I-IV-I motions expanding an
opening I) but their frequency here seems high. This appears to be largely due to cadential 6/4 chords; this
is discussed further below.
The interval counts are also of interest. Traditional theory holds that certain intervallic root motions are
preferred over others: descending fifths are most preferred (strongly favored over ascending fifths),
descending thirds over ascending thirds, and ascending seconds over descending seconds. This data
clearly shows all three of these preferences: descending fifths (+P4, 308) are much more common than
ascending fifths (-P4, 167), descending thirds (65) are more common than ascending (32), and ascending
seconds (127) are more common than descending (65). Overall, fourths are by far the most common
(475); seconds (192) are much more common than thirds (96), and tritones least common of all (25).

Aggregate Statistics (with Cadential 6/4's Reanalyzed)


A close inspection of the data revealed that the oddities noted above -- the high frequency of ii-I and IV-I - were largely due to cadential 6/4 chords. Cadential 6/4's, which are extremely common in the KP corpus
(and in common-practice music generally), are analyzed in the Kostka-Payne text in a "two-level"
fashion: A I6/4-V is placed inside a larger V. (This is in fact a common convention; under this convention,
the cadential 6/4 is labeled as V6/4.) The encoding of the data by Pardo reflected the lower level (I6/4-V),
and the data presented above reflects that as well. However, cadential 6/4's are frequently (indeed
normally) preceded by II or IV; thus it seemed likely that this largely accounted for the high frequency of
II-I and IV-I motions. I thought that using the "V6/4" analysis might permit the conventional principles of
tonal harmony to emerge more strongly. (This is surely one reason why many people prefer the V6/4
analysis.)
The data was therefore recoded, using the higher-level (V) analysis of cadential 6/4's. That is, every two
chord statements representing a cadential I6/4 followed by a V were replaced by a single statement
representing V. The modified chord-list is kp-chord-list-2. Consider just the transition table:
Cons
I
Ant
I
0
bII
2
II
5
bIII
1
III
1
IV
27
#IV
3
V 166
bVI
3
VI
4
bVII
0
VII
26

bII
7
0
3
1
0
2
0
0
2
2
0
0

II bIII
31
8
0
0
2
10
0
8
8
28
0
0

1
0
1
0
0
0
0
1
0
0
5
0

III

IV

#IV

bVI

4
0
4
0
0
4
0
2
1
1
0
3

45
0
1
0
7
0
0
4
3
4
0
0

2
1
7
0
0
3
0
0
0
2
0
1

84
3
62
4
1
16
13
0
4
1
1
2

11
0
2
4
0
0
0
7
0
0
0
1

VI bVII
17
0
8
0
7
1
0
6
3
0
0
0

3
0
0
0
0
1
0
0
2
0
0
0

VII
19
1
6
0
1
4
0
2
0
1
0
0

The recoding of cadential 6/4's has a significant effect. The count of II-I is reduced from 22 to 5; the count
of IV-I is reduced from 32 to 27. The top 10 transitions are now V-I; I-V; II-V; I-IV; I-II; VI-II; IV-I; VIII; I-VII; I-VI.
Once the "V6/4" analysis of cadential 6/4's is assumed, the conventional principles of tonal harmony
appear to be very strongly confirmed. Not a very earth-shattering conclusion (which is why I decided to
put this in a web page rather than trying to publish it!) but I think it's good to know.
A number of other comments could be made about this data. For example, compare the transitional
frequency of IV-II (10) to II-IV (1); IV-II is much more common, again confirming a conventional rule.
But I will leave further explorations to the reader. The reader could also use tally.pl to reproduce these
statistics, and to gather further statistics from the chord lists provided -- for example, analyzing major and
minor key sections separately. (In fact, the differences between the major and minor key distributions are
fairly modest. Perhaps this should not surprise us, since the primary tonic/dominant/predominant
harmonies - I, V, II, IV - are the same in both modes, and function similarly.)

Notes
1. A few sources deserve mention. Helen Budge's (1943) dissertation, "A Study of Chord Frequencies
Based on the Music of Representative Composers of the Eighteenth and Nineteenth Centuries," presents
an interesting statistical analysis of tonal harmony, systematically gathered from analyses by experts. But
only data on the frequency of individual (diatonic) chords is provided; there is no data about transitions
(motions from chord to chord). Allen Irvine McHose's (1947) study "The Contrapuntal Harmonic
Technique of the 18th Century" offers occasional statistics about the frequency of various chords and
progressions, but presents no complete data (such as tables of chord or progression frequencies). Philip
Norman's 1945 study "A Quantitative Study of Harmonic Similarities in Certain Specified Works of Bach,
Beethoven, and Wagner" has statistics about chord progressions, but he assumes a new chord on every
note - that is, he makes no allowance for non-chord-tones; this goes against the modern practice of
harmonic analysis. Dmitri Tymoczko's paper "Root Motion, Function, Scale Degree" (Musurgia 2005,
available in English at Tymoczko's website) analyzes a set of progressions from major-key Bach chorales.
Finally, David Huron, in his book Sweet Anticipation (2006), presents data about chord transitions for "a
sample of Baroque music" (pp. 250-1; no further information is given about the sample).
2. The mftext program available at the Melisma website) can be used to extract the chord labels from
Pardo's midifiles. While I have not analyzed the labels in detail with regard to mode and inversion, I did
extract a few basic statistics. There are 949 chord labels total (this is slightly greater than my count, since
in Pardo's annotations, there may be two chords of the same root and key in succession). Chords built on
major triads (including seventh chords that contain major triads, e.g. dominant sevenths) are 68.3% of the
total; those built on minor triads, 21.2%; those built on diminished triads, 9.9%. Root-position chords are
60.7 of the total; first-inversion, 23.3%; second inversion, 12.9%; third inversion, 3.1%.

Downloads
List of excerpts in the Kostka-Payne corpus
kp-nbck This directory contains "note-beat-chord-key" files for all excerpts in the corpus: A list of notes
("Note [ontime] [offtime] [pitch]"), beats ("Beat [time] [level]"), chords ("Chord [ontime] [offtime]
[root]") and key sections ("Key [start time] [end time] [tonic] [mode:ma=0,mi=1]"). I made these as an
intermediate step towards making the "chord-lists" below. These files bring together the "beat list" and
"note list" formats that I used with the Melisma system (see the Melisma website for explanation) with the
harmonic and key information from the Kostka-Payne analyses.
Chord list (list of chord statements) for the KP corpus

Chord list for the KP corpus, major key sections only


Chord list for the KP corpus, minor key sections only
Chord list for the KP corpus with the "V6/4" analysis of cadential 6/4 chords
The "V6/4" chord-list, major-key sections only
The "V6/4" chord-list, minor-key sections only
tally.pl, a perl script for extracting aggregate data from chord lists. (The tables presented above are all
outputs of tally.pl.)

Você também pode gostar