Escolar Documentos
Profissional Documentos
Cultura Documentos
{tb,lbl,bli,mr,zt,hx}@kom.aau.dk
ABSTRACT
This paper presents a distributed speech recognition (DSR)
system for information retrieval on mobile devices. The
overall prototype system applies the state-of-the-art DSR
technique and knowledge-based Information Retrieval (IR)
processing for spoken question answering. A configurable
DSR system is implemented on the basis of the ETSI-DSR
advanced front-end and the SPHINX IV recognizer. Acoustic
modeling for the Danish language and language modeling for
a soccer test domain are presented in detail. Though the
prototype system can only answer queries and questions
within the chosen domain, the system has the flexibility for
being ported to other domains.
General Terms
Distributed Speech recognition, Intelligent Search Engines,
Language models, Keywords are your own designated
keywords.
Keywords
Distributed Speech recognition, Intelligent Search Engines,
Language models.
2. SYSTEM ARCHITECTURE
The system employs a fully distributed architecture that
includes both the speech recognizer and the IR system. The
overall architecture of the system is depicted in Fig. 1.
1. INTRODUCTION
Distributed Speech Recognition (DSR) has a wide range of
applications because of its advantages both in reducing the
computational requirements and power consumption for
devices at the client side and in facilitating the effortless
update of the core part of the recognizer at the server side [1].
The current paper applies DSR in accessing IR services on
remote servers from mobile devices such as Personal Digital
Assistants (PDAs) and mobile phones. In collaboration with
the Software Intelligence and Security Research Centre (SISRC), Esbjerg, Denmark, a prototype system [2] has been built
employing two main components: 1) An IR-system using
sophisticated IR techniques with a specialized question
answering engine and 2) a DSR-system [3] implemented on
the basis of the ETSI-DSR advanced front-end [4] and the
SPHINX IV recognizer [5].
The IR-engine has been designed to perform better than
3. DSR SYSTEM
This section describes the DSR system, acoustic modeling and
language modeling.
WER
37.6 %
37.4 %
35.2 %
34.7 %
33.0 %
31.9 %
12k
Unique unigrams
10k
8k
6k
4k
2k
0
10k
20k
30k
40k
Unigrams
50k
60k
45k
Unique bigrams
40k
35k
30k
25k
20k
15k
10k
5k
0
10k
20k
30k
40k
Bigrams
50k
60k
4. CONCLUSIONS
In this paper we presented a mobile information access
system with spoken query answering on the basis of the DSR
5. ACKNOWLEDGEMENTS
The project was supported by Center for TeleInFrastruktur
(CTIF) in the project POSH under CTIFs C3 program.
Furthermore, the authors thank our colleagues Henrik
Legind Larsen, Daniel Ortiz-Arroyo and Dan Saugstrup at
the Software Intelligence and Security Research Center
(SIS-RC) for providing the IR-server of the system.
6. REFERENCES
[1] Tan, Z.-H., Dalsgaard, P. and Lindberg, B.: Automatic
speech recognition over error-prone wireless networks,
Speech Communication, 47(12), 220242, 2005.
[2] Brndsted, T., Larsen, H.L., Larsen, L.B., Lindberg, B.,
Ortiz-Arroyo, D., Tan, Z.-H., Xu, H., Mobile
Information Access with Spoken Query Answering in
COST278 Final Workshop on Applied Spoken
Language Interaction in Distributed Environments"
ASIDE, Aalborg, Denmark, Nov 2005
[3] Xu, H., Tan, Z.-H., Dalsgaard, P., Mattethat, R. and
Lindberg, B.: A configurable distributed speech
recognition system, Biennial on DSP for in-Vehicle and
Mobile Systems, Sesimbra, Portugal, Sep. 2005.
[4] ETSI Standard ES 202 212. Distributed speech
recognition; extended advanced front-end feature
extraction algorithm; compression algorithm, back-end
speech reconstruction algorithm, November 2003.
[5] The CMU Sphinx Group Open Source Speech
Recognition Engines.
http://cmusphinx.sourceforge.net/html/cmusphinx.php
[6] Larsen, H.L., and Yager, R.R.: The use of fuzzy
relational thesauri for classificatory problem solving in
information retrieval and expert systems. IEEE J. on
System, Man, and Cybernetics 23(1):3141, 1993.
[7] Mathiassen, H., Nielsen N. N, and Pedersen, A.: Mining
Tables from Domain Specific HTML Text, Information
Retrieval Project Report, Aalborg University Esbjerg,
2005.
[8] 3GPP TS 26.243: ANSI-C code for the Fixed-Point
Distributed Speech Recognition Extended. Advanced
Front-end, December, 2004.
[9] Lindberg, B.: Speechdat, Danish FDB 4000 speaker
database for the fixed telephone network, pp. 198,
March 1999.
[10] Korpus2000: www.korpus.dsl.dk visited April 2006
[11] Rasmussen, M.H. and Svendsen, M.T.: Large Vocabulary
Continuous Speech Recognizer for Danish and Language
Model Adaptation, Master Thesis, Aalborg University,
2005.
[12] Harris, Z.: Mathematical Structures of Language. WileyInterscience, New York, 1968..
[13] Schofield and Zheng: A speech interface for opendomain question-answering Proceedings of 41st Annual
Meeting of the
Association for Computational
Linguistics, July 2003, pp. 177-180.
[14] Chang, E., Seide, F., Meng, H.M, et al.: A system for spoken
query information retrieval on mobile devices, IEEE Trans.
Speech and Audio Proc., 10(8):531541, 2002.
[15] Chen, B., Chen, Y.-T., Chang, C.-H., et al.: Speech retrieval
of Mandarin broadcast news via mobile devices, Interspeech
2005, Lisbon, Portugal, Sep. 2005.
[16] Reithinger, N. and Sonntag, D.: An integration framework
for a mobile multimodal dialogue system accessing the
semantic web, Interspeech 2005, Lisbon, Portugal, Sep.
2005.
[17] SmartWeb: Mobile Broadband Access to the Semantic Web.
http://www.smartweb-project.org
[18] Lindberg, B., Johansen, F.T., Warakagoda, et al., A noise
robust multilingual reference recogniser based on
SpeechDat(II), ICSLP2000, Oct. 2000.