Você está na página 1de 1

Voice to text Extraction:

Voice to text extraction is the process of extracting text from the spoken voice. This research aims at
developing a voice to text mining for Malay language.
This research aims to build a continuous voice identification for Malay language using hybrid approach.
Voice to text extraction system are stochastic in nature and typically based on HMM. From a spoken
language perspective, the different components of the HMM based system are as follows.
Feature extractor that extracts the relevant information from the voice signal yielding a sequence
of feature observations/vectors. Feature extraction is typically considered a language independent
process (i.e., common feature extraction algorithms are used in most systems regardless of
language), although in some cases (such as tonal languages) specific processing should be
explored.
Acoustic model that models the relation between the feature vector and units of spoken form
(sound units such as phones).
Lexicon that integrates lexical constraints on top of spoken unit level representation yielding a
unit representation that is typically common to both spoken form and written form such as, word
or morpheme.
Language model that tends to model syntactical/grammatical constraints of the spoken language
using the unit representation resulting after integrating lexical constraints.
Pre-Processing:
Basic tag removal operations are done by an embedded algorithm where its just a matter of how
efficient and effective the coding is written. The major part of the algorithm will be decided by the
attributes that the raw data hold (R.Gunasundari, 2012). The removal of tag is done due to preserve some
valuable data which inherits the information from the extracted data.
Spelling checking and correction is the one of the core element in this whole system where it plays
the vital part of identifying the particular spelling error in the given data set and replacing it with suitable
options. There are several major applications which can do it in English language with high precision but
they are very few Malay languages with high failure rates. The first example yet the famous one currently
is the spelling checking with R. it uses the language R and with several prebuild algorithm inside to
achieve high rate success in spelling corrections with prebuild logical reasoning and several language
laws (Kurt Hornik, 2010). Another prime exam to relate the study is Automated Spell Checker for Malay
Language where this study shows how the language is checked for spelling error. Using the traditional
method that has been already implemented to English language, this study showed how to use N-Gram
like method at the 2 level with stemming. There end result theoretically acceptable but in the real world
application, steaming will result change of meaning in Malay language (Surayaini Binti Basri, 2012).
And finally the ever famous N-Gram based study, Revised N-Gram based Automatic Spelling Correction
Tool to Improve Retrieval Effectiveness. This journal discusses about the research done on how bi-gram
combinations can largely effect certain languages spelling checking ability with addition help from
algorithm and push pull configurations of ram slot modal estimations (Farag Ahmed, 2006).

Você também pode gostar