Query by Humming System

Performance Analysis of Signal Processing
Approach for Query-by-Humming systems

#
Jugnu Agrawal#1, Kaustubh Kumar#2, Vivek Goyal#3

B. Tech. (Batch of 2013), Department of Electronics Engineering,
Indian Institute of Technology (BHU), Varanasi.
1
jugnu.agrawal.min09@itbhu.ac.in
2
kaustubh.kumar.civ09@itbhu.ac.in
3
vivek.goyal.ece09@itbhu.ac.in
Abstract Query-by-Humming (QBH) is a process to

query music present in a database by simply humming
its melody. This paper presents the implementation of a
QBH system via signal processing approach. In an evergrowing crowd of DSP algorithms, this paper analyses
some algorithms and processes which increase the
accuracy of the result. The blocks that this paper
explores in a QBH system are, a double-threshold noise
gate, band-pass filter for filtering out unwanted signals,
dynamic time warping for signal matching and
database indexing via R-Tree. Finally the paper
analyses the result and proposes an efficient signal
processing approach for QBH systems.
noise (such as a fan, traffic or wind in background) or a

sudden noise (such as ticks, sharp thud, etc.). The above
mentioned noise can be removed using a double-threshold
noise gate and a band-pass filter.
Eventually, the accuracy of the system may vary from user
to user depending on the precision of his/her humming.
II.
Fig. 1 below shows different blocks in the proposed QBH

system.
Query Input
I.
PROPOSED QBH SYSTEM
MIDI file
from
Database
INTRODUCTION
In many cases where a user has a large music database,

he/she might feel more comfortable querying a particular
song by humming its melody, rather than performing a text
search/lyrical search. Query-by-Humming is the technical
name given to identify a song in a database just from the
hum of its melody. We consider a database of MIDI songs
for implementation and analysis. All the user has to do is:
hum on a microphone connected to the computer and the
system will give him the matching song(s) out of a
database with a defined limit of error.
The capturing of the subjective melody content, or pitch, of
a song is one of the major challenges today. Being
subjective, the perceived melody or pitch cannot be
accurately represented by a mathematical formula.
However we can approximate the pitch, over a period of
time, to be in relation with frequency. However the
inaccuracy involved in the above relation constraints the
efficiency of the system.
Another thing which can limit the accuracy is the humming
noise involved. The noise may be categorized into a white
Noise Gate
Band-pass
filter
Melody
Extraction
Time Series
Generation
Windowing
and Pitch
Detection
R-Tree
Indexing
Query Time
Series
Matching by
LDTW
Fig. 1 QBH System Block Diagram
III.
DESCRIPTION OF PROPOSED QBH SYSTEM
A. Query Input
The query input is recorded with the help of a microphone.
The recorded sound is monophonic, i.e. single channel
sound. It is sampled at 44.1 kHz which is a standard for
way format. Along with the user hum input, it also contains
white noise, ticks and other background noises. Now these
noises have to be removed first before subsequent QBH
algorithm is applied.
Noise gates often implement hysteresis, that is,

they have two thresholdsUpper threshold (Open Threshold) is the level
above which the signal must rise to open the gate
once the gate is closed.
Lower threshold (Close Threshold) is the level
below which the signal has to fall to close the gate
once the gate is open.
Advantage of using two thresholds is that it avoids

chattering.
B. Pre-processing with Noise Gate

C. Low Pass Filter
Noise Gate is the first step for removal of noise. Generally
there are many blank spaces in between the humming.
These blank spaces however consist of noises rather than a
perfect blank space. Noise gate is provided with a certain
threshold value of signal below which it makes the signal
equal to zero (Fig. 2). A modified Noise Gate is applied in
our technique where there are two thresholds: an upper
threshold and a lower threshold.
During hum analysis, it is found out that the hum generally

lies below 800 Hz. So a low pass filter is used with cut-off
frequency at 800 Hz. This removes all the noise outside the
selected band. Also sharp spikes in the signal are removed
by this filter.
D. Pitch Tracking
1) Auto Correlation Technique for Pitch Tracking:
The correlation between two waveforms is a
measure of their similarity. The waveforms are
compared at different time intervals, and their
"sameness" is calculated at each interval. The
result of a correlation is a measure of similarity as
a function of time lag between the beginnings of
the two waveforms. The autocorrelation function
is the correlation of a waveform with itself.
The autocorrelation function of a periodic function
is itself is periodic. As the time lag increases to
half of the period of the waveform, the correlation
decreases to minimum. The first peak in the
autocorrelation indicates the period of the
waveform.
Now this fundamental period of the waveform
obtained from the auto correlation technique can
be used as an indicator of pitch to generate a pitch
time series of the input query.
Fig. 2 Noise Gate with Single Threshold (upper) and

Double Threshold (lower)
Various parameters of Noise Gate used here are
The Attack time control sets the time for the gate
to change from closed to open, much like a fadein.
The Hold time control allows you to define the
amount of time the gate will stay open after the
signal falls below the threshold.
The amount of attenuation when the gate is closed
can be set by the Range control.
E. MIDI Database Processing

1)
Channel Selection: The midi files we use are

generally polyphonic with multiple overlapping
music track channels, since this kind is most
common for the popular music. However, the
polyphonic MIDIs introduce the challenge of
finding a systematic way to choose the channel of
the midi containing the melody. In our tests of
midi files, we found that the melody is usually
contained in just one channel of the midi.
However, in a few cases the melody is actually
split across multiple channels. For simplicity, we

work on the assumption that the melody will
always be found in just one channel.
First, we filter the channels so that only channels
with melody-carrying instruments remain. For
example, we remove all the drum channels since
drums never carry the melody. We then analyse
the music notes and estimate the average pitch of
each channel. Since the melody is typically higher
in pitch than the rest of the notes, we isolate the
channels with the top 3 average pitch values.
2) Skyline Algorithm: The melody channel obtained
at the end of channel selection can still have many
overlapping sections called phrases, but we want
to have only one note per unit of time in our final
representation in order to match incoming queries.
To do this we use the skyline algorithm, a simple
approach to melody extraction. This algorithm
prunes music down to just one pitch per time unit
essentially by keeping the top pitch whenever
simultaneous notes occur.
In order to run the skyline algorithm, we first
transform each of the phrases into time series
form. Each time sample is represented by a
quarter note because that is the smallest interval
supported by a MIDI system. Rests in this form
are indicated by the lack notes at times. This
representation makes it very easy to do the time
overlap and relative pitch comparisons necessary
for the Skyline algorithm.
3) Segmentation: The output of the melody
extraction blocks (as mentioned above) is sliced
into overlapping windows of a fixed length via the
segmentation filter. Each segment will eventually
become a subsequence of the original that can be
matched against user queries. We chose to set our
segment size to 60 data points, expecting the
average user query to be five to six seconds in
duration.
applies the PAA dimensionality technique in order

to reduce the size of the data. An n- dimensional
time series is reduced to dimension N (
) by
taking averages in N consecutive equal sized
frames. (See Equation below)
6) Indexing into R-Tree: An R-Tree (Rectangle Tree)

is a multidimensional indexing structure. Each
melody from the MIDI database (after application
of PAA Transform) is stored in a database and the
database is indexed using the R-Tree which
groups closely located melody segments in the
leaf node. It fastens up the LDTW matching
algorithm by reducing the number of MIDI
segments with which the hum segments are to be
compared with.
F. Matching of Melody and Hum
1) Dynamic Time Warping: The Dynamic Time
Warping distance between two time series X and
Y is defined as:
The process of computing the DTW distance can

be visualized as a string matching style dynamic.
We construct a
matrix to align time series
X and Y. The cell (i, j) corresponds to the
alignment of the element Xi with Yj. A warping
path P, from cell (0, 0) to (n-1, m-1) corresponds
to a particular alignment, element by element,
between X and Y.
4) Mean Removal Filter: To account for humming

queries being off-key, each window segment is
normalized by a Mean Removal filter that
subtracts the windows average pitch value from
each point in the window. A humming query will
be normalized in a similar fashion to remove the
effects of the key.
5) PAA (Piecewise Aggregate Approximation)
Transform: Because multidimensional index
structures typically degrade in performance past 810 dimensions, rather than inserting the
normalized window segments into an R-Tree, we
pass the segments through a PAA Transform that
Fig. 3 Warping Path with local constraint
2) LDTW (Linear Dynamic Time Warping):

Intuitively, humans will match two time series of
different lengths as follows. First, the two time
series are globally stretched to the same length.
They are then compared locally point by point,
with some warping within a small neighbourhood
in the time axis. Such a two-step transform can
emulate traditional DTW while avoiding some
unintuitive results as well as speeding it up. Here
is the definition of Local Dynamic Time Warping.
IV.
A.
The query belonged to the song `Jashn-e-Bahara.'

LDTW distance corresponding to R Trees of different
songs was computed from the Hum query. The number of
segments where the distance was either equal to 0 or 1 or 2
or 3 was tabulated. Also the time for each LDTW
comparison was noted down.
SYSTEM PARAMETERS AND ANALYSIS

(FOR MATLAB IMPLEMENTATION)
Parameters Taken
1) Noise Gate:
Release time = 0.05 sec
Attack time = 0.5 sec
Hold time = 0.005 sec
Lower threshold = 0.1
Upper threshold = 0.15
2) Band-Pass Filter:
Fpass = 600 Hz
Fstop = 800 Hz
Apass = 1 dB
Astop = 60 dB
Table 2 Warping distance of query obtained for different songs
As can be seen from above table, the song closest to the

humming query is indexed in RTree 1 which corresponds
to `Jashn-e-Bahara' track from the database. Hence the
recorded hum has been identified correctly.
The average time for above test case was 95.74 sec/song.
V.
3)
Segmentation Filter:
Window size = 60 data points
Offset of one window from the other = 1
4) Piecewise Aggregate Approximation Transform:

Initial dimension = 60 data points
Final dimension = 3 data points
Reduction ratio = 20
5)
B.
Local Dynamic Time Warping:

k=2
Analysis Reports
The QBH system provided us with the results that were in

coherence with the predicted results. The system has also
worked in a noisy environment (test case being under a fan
and a cooler).
The system opens a lot of scope for future improvement:
Table 1 MIDI Database and generated R-Trees
CONCLUSION AND FUTURE SCOPE
The time of evaluation can further be reduced by

considering the significant portions of hum query
as well as indexed MIDI file.
More accurate methods of faster indexing can
allow greater dimension than efficiently supported
by R Trees.
Dynamic pitch tracking methods can be
implemented.
A GUI for user can be built up.
VI.
REFERENCES
[1] Yunyue Zhu and Dennis Shasha, Query by

Humming: a Time Series Database Approach
[2] Edmond Lau, Annie Ding, Calvin On, MusicDB:
A Query-by-Humming system
[3] Antonin Guttman, R Trees: A dynamic structure
for spatial indexing
[4] Alexandra Uitdenbogerd and Justin Zobel,
Matching Techniques for Large Music Databases
[5] Emanuele Pollastri, A pitch tracking system
dedicated to process singing voice for music
retrieval
[6] Blog by Naotoshi Seo:
http://note.sonots.com/SciSoftware/Pitch.html
[7] Blog by Ken Schutte:
http://www.kenschutte.com/midi
[8] Iainf , "Noise Gate Hysteresis"
http://commons.wikimedia.org/wiki/File:Noise_G
ate_Hysteresis.svg#mediaviewer/File:Noise_Gate
_Hysteresis.svg

Query by Humming System

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Query by Humming System

Enviado por

Direitos autorais:

Formatos disponíveis

Performance Analysis of Signal Processing

Approach for Query-by-Humming systems

Jugnu Agrawal#1, Kaustubh Kumar#2, Vivek Goyal#3

Abstract Query-by-Humming (QBH) is a process to

noise (such as a fan, traffic or wind in background) or a

Fig. 1 below shows different blocks in the proposed QBH

PROPOSED QBH SYSTEM

In many cases where a user has a large music database,

Fig. 1 QBH System Block Diagram

DESCRIPTION OF PROPOSED QBH SYSTEM

Noise gates often implement hysteresis, that is,

Advantage of using two thresholds is that it avoids

B. Pre-processing with Noise Gate

During hum analysis, it is found out that the hum generally

Fig. 2 Noise Gate with Single Threshold (upper) and

E. MIDI Database Processing

Channel Selection: The midi files we use are

split across multiple channels. For simplicity, we

applies the PAA dimensionality technique in order

6) Indexing into R-Tree: An R-Tree (Rectangle Tree)

The process of computing the DTW distance can

4) Mean Removal Filter: To account for humming

Fig. 3 Warping Path with local constraint

2) LDTW (Linear Dynamic Time Warping):

The query belonged to the song `Jashn-e-Bahara.'

SYSTEM PARAMETERS AND ANALYSIS

Table 2 Warping distance of query obtained for different songs

As can be seen from above table, the song closest to the

4) Piecewise Aggregate Approximation Transform:

Local Dynamic Time Warping:

The QBH system provided us with the results that were in

Table 1 MIDI Database and generated R-Trees

CONCLUSION AND FUTURE SCOPE

The time of evaluation can further be reduced by

[1] Yunyue Zhu and Dennis Shasha, Query by

Você também pode gostar