Escolar Documentos
Profissional Documentos
Cultura Documentos
Version 3.0h
Users Manual
Copyright Notice
(C)2007 Catalin Grigoras, Ph.D.
forensicav@techemail.com
Page 2 of 31
Content
Introducing Catalina Forensic Audio Toolbox..............................................................4
System Requirements..........................................................................................4
Installation...........................................................................................................5
Getting Help........................................................................................................5
Interfacing Wavesurfer..................................................................................................5
Fundamental Frequency......................................................................................7
Formants..............................................................................................................8
Long Term Average Spectrum............................................................................8
Catalina Forensic Audio Toolbox
Basics
General Plots.............................................................................................9
Long Term Formants...............................................................................10
Formant Space.........................................................................................11
Long term average spectrum...14
Recommendations........................................................................................................16
Future Developments...................................................................................................16
References....................................................................................................................17
Appendix A..................................................................................................................19
Appendix B..................................................................................................................24
Appendix C..................................................................................................................29
Page 3 of 31
Historical Note
The first version of the Catalina Forensic Audio Toolbox ("Catalina") was developed
in 1993. At the time, the speed of PCs and sound card quality was relatively low
compared to present-day equipment. The most important updates were written
during completion of my Ph.D. dissertation1998-2001. The program evolved to use
an external software program to do the work of analyzing speech fundamental
frequency F0, formants F123 and the long term average spectrum LTAS. For this
current version I use Wavesurfer 1.8.5, developed at the KTH Institute in Stockholm
by Kre Sjlander and Jonas Beskow. More details about this software can be found
on http://www.speech.kth.se/wavesurfer/. In the chapter on "Interfacing Wavesurfer
I explain the use of this software with Catalina.
History
-
System Requirements
1. A PC running Windows 98 SE, Windows XP or Windows NT
2. A computer having a CPU of at least 133 MHz
3. A copy of Wavesurfer, version 1.8.5 or later
Special Thanks
I wish to thank IAFPA-International Association of Forensic Phonetics and Acoustics
for a grant to finish this latest version of this software. I also appreciate the very
important help of Professor Francis Nolan from Cambridge University, Professor
Brandusa Pantelimon from Bucharest University and Durand R. Begault, Ph.D.,
Audio Forensic Centre, Charles M. Salter Associates, Inc., San Francisco, CA, USA.
I am grateful to the Cambridge Colleges Hospitality Scheme for making possible my
visit to Cambridge in summer 2003.
Page 4 of 31
Installation
Run CatalinaSetup.exe. By default the program will be installed on C:\Catalina and a
shortcut will be placed on Desktop.
You should get the following folder structure:
C:\Catalina\bin - for executable files, do not modify it
C:\Catalina\Evidence - for WAV files to be analysed and saved with Wavesurfer
C:\Catalina\Plots - for graphical TIFF results
C:\Catalina\Results - for numerical results
C:\Catalina\toolbox - do not modify it.
Getting Help
For further details, you can contact the author directly at forensicav@techemail.com.
In the e-mail title/subject please indicate Catalina Toolbox.
Interfacing Wavesurfer
Catalina depends on the long-term average and formant analysis capabilities of
Wavesurfer. Other programs that can provide exported text versions of these analyses
can also be used, but the demonstrations given here use Wavesurfer. There is a
specific naming format that Catalina depends on when exporting data analyses from
Wavesurfer to the 'Evidence' folder that is explained in detail below.
Run Wavesurfer and open a WAV PCM file, 8 KHz, 16 bit, mono file recommended.
You should get a window like the following one (see Fig.1). Select the Speech
analysis configuration.
The Wavesurfer display will show 3 plots: waveform, spectrogram with formant
estimator tracking overlay, and fundamental frequency (see figure 2).
waveform
fundamental frequency
Fundamental frequency
Fundamental frequency (F0) is the frequency of repetition of the (quasi-)periodic
waveform of the voiced speech signal, corresponding closely to our perception of the
pitch of the speech. F0 analysis can be performed with different algorithms either in
the time or in the frequency domain.
In Wavesurfer the analysis can be carried out using the AMDF (Average Magnitude
Difference Function) algorithm or ESPS (Entropic Speech Processing System). By
default Wavesurfer uses the ESPS algorithm. See Wavesurfer manual for details
about F0 settings.
To reduce spurious (false) F0 values introduced by other sounds or non-normal
speech (e.g. Fig.3) three methods can be applied: filter the noises, delete these
samples or change the F0 limits in Wavesurfer. This last technique is primarily for
limiting non-normal speech effects (e.g., falsetto).
Figure 3. F0 selection
You may create an F0 text file using the following steps: select the entire wave by
pressing F11, select all with Ctrl+A, right click on F0 plot and Save data file as
C:\Catalina\Evidence\filename.f0
For example, to test.wav file will correspond the test.f0 file.
Page 7 of 31
Formants
In Wavesurfer the formants analysis can be carried out using linear prediction. By
default Wavesurfer uses the 12th order LPC algorithm. (Refer to the Wavesurfer
manual for details about formant settings).
Catalina requires a text file containing data for formants F1-F2-F3. You will need to
create an F123 text file using the following steps: (1) select the entire wave by
pressing F11 or select all with Ctrl+A, (2) right click on formants plot, (3) export the
formant data file as filename.frm
For example, the test.wav file will correspond the test.frm file.
General Plots
Create or copy the filename.f0, filename.frm and filename.lts files to
C:\Catalina\Evidence folder. Run Catalina from the desktop icon or
C:\Catalina\bin\win32\Catalina3x.exe and select a file from the C:\Catalina\Evidence
folder. Catalina will ask for the name of the F0 text file, and it will then search for this
file, along with similarly-named frm and lts text export files, from this same
'Evidence' folder. The program then writes plot files to C:\Catalina\Plots using the
same naming convention.
As an example to demonstrate the program, select the included file test20sec. The
software will start to compute statistics and create TIFF files stored in
C:\Catalina\Plots. Check the resulted TIFF files on C:\Catalina\Plots
01-test20sec.tif
02-test20sec.tif
03-test20sec.tif
04-test20sec.tif
05-test20sec.tif
06-test20sec.tif
07-test20sec.tif
08-test20sec.tif
Page 9 of 31
Page 10 of 31
Formant Space
Catalina creates the formants F2 vs F1 and F2 vs F3 plots, and automatically detects
vowels [a], [e], [i], [o] based on the user-defined settings in the editable text file
C:\Catalina\formants.txt. By default, the settings in formants.txt are as follows:
601 850 1100 1600 2200 2800
401 600 1500 2000 2100 2800
220 400 2000 2400 2400 2900
370 600 700 1200 2200 2600
low high
limits
for F1
vowel [a]
vowel [e]
vowel [i]
vowel [o]
low high low high
limits
limits
for F2
for F3
Page 11 of 31
These values are those indicated in different reference for different languages. Other
references may be used to determine the vowel limits for a specific language, or
vowel limits can be analyzed by inspecting formant values for a specific set of
speakers.
An example of F1-F2 vowel space display is presented in figure 8. Filled red circles
indicate the mean of the supplied values from the F123 file at those times when a
corresponding F0 value has been indicated for that specific time frame. When there is
no estimate for an F0 time frame, the corresponding F123 value is discarded from the
mean calculation. This removes any bias from the mean estimate that would be
caused by formant values analyzed during unvoiced sections.
An example of F2-F3 vowel space display is presented in figure 9.
In figures 8-9, the blue points adjacent to the filled red circles represent the average
values for first and second halves of the all analysed formants. These dots and their
values can be useful to analyse intra-speaker variability.
Page 13 of 31
Page 14 of 31
Recommendations
For comparison between plots generated for different voice samples such as
questioned and known exemplars, it is recommended that Catalina be used with:
- linear PCM, 8 kHz, 16 bits, mono recorded wav files, analyzed within
Wavesurfer,
- known (reference, suspect) and unknown (questioned) exemplars recordings
made as contemporaneously as is practically possible,
- known (reference, suspect) and unknown (questioned) recordings made with
the same recording/transmission channel,
- normal/modal phonation samples,
- exemplar durations of longer than 10 seconds,
- speech signal to noise ratio (SNR) greater than > 10 dB.
Users should note that some telephonic transmission systems or other recordings may
have high-pass filter characteristics (visible in the LTAS analysis) that can bias the
estimate of F1 to a higher frequency compared to what would be recorded for the
same voice, using a reference microphone and linear recording system.
Future Developments
Future options, including a means for calculating a likelihood ratio, will be added to
future releases of the Catalina Forensic Audio Toolbox. Check the website
periodically for updates.
Page 16 of 31
REFERENCES
Baldwin, J. and French, P. (1990) Forensic Phonetics, London: Pinter.
Byrne, C., Foulkes, P. (2004) The Mobile Phone Effect on Vowel Formants, International Journal
of Speech, Language and the Law 11(1), 83-102
Carlson, R., Fant, G., and Granstrm, B. (1975) Two-formant models, pitch and vowel perception,
in G. Fant and M.A.A. Tatham (eds), Auditory Analysis and Perception of Speech, London:
Academic, 55-82.
Gonzalez-Rodriguez, J., Ortega-Garcia, J. and Lucena-Molina, J.J. (2001) On the application of the
Bayesian approach in real forensic conditions with GMM-based systems, Proceedings of 2001:
A Speaker Odyssey - The Speaker Recognition Workshop, 135-138.
Grigoras, C. (2001) Digital voice processing system, unpublished PhD thesis, University of
Bucharest, Electric Department, Romania
Grigoras, C. (2003) Voice analysis on noisy recordings, Paper presented at Cambridge Forensic
Phonetics Workshop, August 2003, Cambridge, UK.
Hess, W. (1983) Pitch Determination of Speech Signals: Algorithms and Devices, Berlin: SpringerVerlag.
Hollien, H. (1990) The Acoustics of Crime: the New Science of Forensic Phonetics, New York:
Plenum.
Hollien, H. (2000) Forensic Voice Identification, New York: Academic Press.
Jessen, M., Kster. O. Gfroerer, S. (2005) Influence of vocal effort on average and variability of
fundamental frequency, International Journal of Speech, Language and the Law 12(2), 174-213
Knzel, H.J. (2001) Beware of the telephone effect: the influence of telephone transmission on
the measurement of formant frequencies, Forensic Linguistics 8(1), 80-99.
Ladd, D.R. and Terken, J. (1995) Modelling intra- and inter-speaker pitch range, Proceedings of
the 13th International Congress of Phonetic Sciences, Stockholm, vol.2, 386-89.
Laver, J. (1980) The Phonetic Description of Voice Quality, Cambridge: Cambridge University
Press.
McDougall, K. (2004) Speaker-specific formant dynamics: an experiment on Australian English
/aI/, International Journal of Speech, Language and the Law 11(1), 103-130.
Meuwly, D. (2001) Reconnaissance de locuteurs en sciences forensiques: l'apport d'une approche
automatique, PhD thesis, University of Lausanne.
Nolan, F. (1983) The Phonetic Bases of Speaker Recognition, Cambridge: Cambridge University
Press.
Nolan, F. (1990) The limitations of auditory phonetic speaker recognition, in H. Kniffka (ed.),
Texte zu Theorie und Praxis forensischer Linguistik, Tbingen: Niemeyer, 457-479.
Nolan, F. (1993) Auditory and acoustic analysis in speaker recognition, in J. Gibbons (ed.),
Language and the Law, London: Longman, 326-345.
Nolan, F. (2002) The telephone effect on formants: a response, Forensic Linguistics 9(1), 74-82.
Nolan, F. (2005) Forensic speaker identification and the phonetic description of voice quality, in
W.J. Hardcastle and J. MacKenzie Beck (eds), A Figure of Speech: a Festschrift for John Laver,
Mahwah, N.J.: Erlbaum, 385-411.
Nolan, F. and Grigoras, C. (2005) A case for formant analysis in forensic speaker identification,
International Journal of Speech, Language and the Law 12(2), 143-173
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E. and McGonegal, C.A. (1976) A comparative study of
several pitch detection algorithms, IEEE Transactions on Audio, Speech and Signal Processing
24, 399-413.
Repp, B. (1982) Phonetic trading relations and context effects: new experimental evidence for a
speech mode of perception, Psychological Bulletin 92, 81-110.
Rodman, R., McAllister, D., Bitzer, D., Cepeda, L. and Abbitt, P. (2002) Forensic speaker
identification based on spectral moments, Forensic Linguistics 9(1), 22-43.
Rose, P.J. (2002) Forensic Speaker Identification, London: Taylor and Francis.
Page 17 of 31
Scherer, K. R. (1986). Voice, stress, and emotion, in M. H. Appley and R. Trumbull (eds),
Dynamics of Stress: Physiological, Psychological, and Social Perspectives, New York: Plenum,
159-181.
Stevens, K.N. (1989) On the quantal nature of speech, Journal of Phonetics 17, 3-45.
Wells, J. (1982) Accents of English, Cambridge: Cambridge University Press.
Page 18 of 31
Page 19 of 31
Page 20 of 31
Page 21 of 31
Page 22 of 31
Page 23 of 31
Page 24 of 31
Page 25 of 31
Page 26 of 31
Page 27 of 31
Page 28 of 31
Page 29 of 31
Page 30 of 31
Page 31 of 31