Você está na página 1de 2

Aashish Jain

Algo-rhythm
Aashish Jain1, a)
The Harker School
(Dated: 31 October 2016)

Algo-rhythm is a recurrent neural network written using the DL4J framework aimed at creating new jazz
improvised melodies.
Algo-rhythm started as a personal project to see
how well I could mimic jazz improvisation. Being
a jazz guitarist, I recognized how difficult improvisation actually was, but I could never pin point
what made the task so hard.
After reading about artificial intelligence in a
science fiction novel, I tried creating my own,
simple artificial intelligence systems. Finally, I
decided to try to adapt a character modelling,
recurrent neural network example to fit music Algo-rhthym was born.

I.

RECURRENT NEURAL NETWORKS

A recurrent neural network (RNN) is a neural network


in which past states are taken into account, essentially
creating a working memory. Developed in the 1980s, recurrent neural networks are often used today for speech
recognition, word prediction, and more.
RNNs include a hidden state which takes into account
the past classification. As a result, RNNs are able to dynamically grasp the structure of data over time. Essentially, the decision made at step t0 will affect the decision
made at step t1 .

A.

The Hidden State and Back Propagation through Time

The hidden state at time t1 is a function of the current input, modified by a weight matrix, then added to
the previous hidden state, which is multiplied by its own
weight matrix. The weight matrices are determined by a
method called back propagation through time (BPTT).
BPTT is essentially the same back propagation of a
feed-forward recurrent neural network, but modified so
that it takes into account the previous time steps. For
example, if a feedforward network is the simple function
f (g(x)), then the BPTT method would just include a
time function as well. Then, the derivatives (for the error
calculation) is calculated through the chain-rule.
Truncated BPTT (TBPTT) is a variation on BPTT
which only flows a certain distance backwards through
time. This is useful for large sequences, and is used in the

a) Electronic

mail: 17AashishJ@students.harker.org

src/java/DeepLearning/GravesLSTMNoteCreation
file. However, because the sequence of notes is not very
large, TBPTT is not very useful for algo-rhythm.

II.

MIDI COMPOSITION

The compose new jazz melodies, I used midi files to


train on. The convenience of midi files is that they can
be easily parsed using the JMusic library. JMusic allows
me to parse midi files into their respective parts, and from
each part I can get the individual notes. For example, if a
composition contains bass and guitar rhythm and piano
melody, I am quickly able to see what notes make up the
piano melody.
Currently, algo-rhythm only supports training on one
song. To do so, I first parse the midi file into its respective
components. After manually selecting the part to train
on, I can begin constructing the data for the RNN. The
RNN has a set of all the notes played in the song, and an
array of all the notes in a sequence. Then, it goes through
the array creating a probabilistic function for each note
in the sequence.
When composing the melody, I begin on a random sequence. From there, I let the RNN take the reigns and
suggest notes. I add each suggested note to a list and the
final composition is the amalgamation of these notes. As
rests are also considered notes in the JMusic library, not
all the time is filled with playing. Additionally, as the
duration of note is stored in the same class, algorhythm
can emulate complex rhythms and syncopation. For the
final MIDI composition, I take this new melody and add
it to the pre-existing bass and guitar rhythms. Finally, I
can output a brand new MIDI file and save it to wherever
the user chooses. Ive convereted a sample generated midi
to mp3, and have posted it on soundcloud. Take a listen! (https://soundcloud.com/user-200024695/autumnnights)

III.

AUDIO FEATURES

In the future, I hope to add more features to train


algo-rhythm on. Because of this, Ive build additional
audio analysis tools that allow me to extract features such
as the Frequency, Spectral Centroid, Spectral Flux, and
Spectral Roll off of a song. These features are outlined
below:
Frequency - The frequency is calculated over a small

Aashish Jain

window chunk after performing the fast-fourier transform on this chunk. This calculation is written in:
src/main/java/timbre/Frequency.java.

Spectral Centroid - The spectral centroid is a calculation that indicates where the center of mass of a wave
is. It is calculated via the following:

This calculation is written in:


src/main/java/timbre/SpectralFlux.java.
Spectral Rolloff - The spectral rolloff is a measure of
how quickly a wave tails off towards higher frequencies.
It is calculated via the following:

This calculation is written in:


src/main/java/timbre/SpectralRolloff.java.

This calculation is written in:


src/main/java/timbre/SpectralCentroid.java.

Spectral Flux - The spectral flux is a calculation that


indicates how quickly the power spectrum of a wave is
changing. It is calculated via the following:

IV.

CONCLUSION

Ultimately, the development of algo-rhythm has led me


to truly believe in the potential of artificial intelligence.
The simple AI that ive created is easily capable of creating complex melodies just based on a single song. Who
knows how far I can take my jazz improvisation network,
and what I can apply it to next?

Você também pode gostar