Escolar Documentos
Profissional Documentos
Cultura Documentos
Patha Sreedhar
Sreedhar.patha@research.iiit.ac.in
Needleman Wunsch transitions: This is the other type transition which is used where we have deletion arc instead of diagonal arc.
Methodology: Gaussian posteriorgrams for both the test and model utterances are extracted from Mel Frequency Cepstral Coefficients (MFCC) of utterances. Segmental Dynamic Time Warping (SDTW) is applied on these features. For a given test utterance one cannot align it to the total model utterance, a part of utterance will be matched to the test utterance hence SDTW used instead of DTW directly. The window length for the SDTW procedure is selected, in general test utterance length plus 6 is
used to accommodate the speech rate variability. Trellis is calculated only if the distance between the test and model frame is less than 6 (in general). This window is shifted with 6 frames and glided along entire length of model utterance. At each shift, the trellis is computed and the alignment cost is noted. The minimum alignment cost among all wrapping paths is considered to spot the required test utterance. Experimentation: In trellis calculation, Itakura type and Needleman Wunsch type of transitions are also considered for experimenting. Frame shift of 1 instead of 6 was also tried. A variable speech rate of (query length/3) instead of constant 6 was also tried upon. Database and evaluation: African MediaEval 2012 database is used for experimentation and NIST evaluation is used. Results:
Table 1: Results of various experimentations
References: [1] Kishore Prahallad, Tutorial on Dynamic Programming as part of the lectures on Speech Technology: A practical introduction. [2] Y. Zhang and J. Glass, Unsupervised Spoken Keyword Spotting via Segmental DTW on Gaussian Posteriorgrams, Proc. ASRU, 398-403, Merano, Dec. 2009. [3] H. Wang and Tan Lee CUHK System for the Spoken Web Search task at Mediaeval 2012, in MediaEval 2012 Workshop, 2012. [4] Rabiner Lawrence, and Biing-Hwang Juang. "Fundamentals of speech recognition.
Window size
Window shift
Maximum Term Weighted Value Itakuras Needleman arc wunschs arc NA 0.2149 0.1432 0.2254 0.2251 NA
Future developments: Techniques like Pseudo Relevance Feedback (PRF) can be used on top to increase the accuracy. Various other features like voicing, sonority can be considered for betterment of results. A different type of band such as Itakura parallelogram can also be used to improve accuracy. Acknowledgement: This work is part of course Topics in speech processing: Audio Information Retrieval held at IIIT Hyderabad during spring 2013. Author is very much thankful to Dr. Kishore Prahallad, Dr. Suryakanth V Gangashetty for their support. Author is also thankful to Gautham Mantenna for his constant guidance.