Escolar Documentos
Profissional Documentos
Cultura Documentos
www.dikte.com.tr 1
INDEX
Dikte Technology
1. History
2. Highlights
3. Speed, Accuracy, and Vocabulary Size
Relation
4. Future
2
www.dikte.com.tr
1. History
Dikte project started by the year 2000. In the beginning, the target vocabulary size for
long term was 5 million words. This size may seem to be huge but Turkish is a suffixed
language with complex morphology. Subsequently, we would realize that even 5 million
word vocabulary size was 40,000 times less than the size actually needed.
A few months after starting the project, Dikte was able to do isolated recognition with a
vocabulary size of 100 words.
When the vocabulary size reached 5000 words in 2002, development for continuous
recognition was started. It took about a year to implement 3 different recognition
algorithms selected from literature. Since none of them has provided enough speed and
accuracy, research for a better recognition algorithm started by the end of 2003. This
research resulted in a new invention. One year later in 2004 we developed a new
recognition algorithm. This new continuous recognition algorithm was incredibly fast and
accurate using an approximate capacity of 200,000 words.
3
www.dikte.com.tr
1. History
The capacity was limited because of memory limitations. In fact, the speed and
accuracy was enough for much greater capacities. At this point we started to develop a
medical report dictation application since capacity of the technology allowed only for
medical speech recognition for Turkish. One year later in 2005, Medical Dikte was ready
for field testing.
In 2005, context free Turkish speech recognition project started. There were two main
challenges:
• developing a language model that can generate words during recognition because
there is not enough memory on computers that can store billions of different words.
• optimizing the acoustic recognition algorithm so that it can handle billions of different
words during recognition.
Two years later in 2007, we developed a speech recognition system with an incredible
vocabulary size of 300 billion words.
4
www.dikte.com.tr
2. Highlights
2.1. Incredible Speed and Vocabulary Size of Dikte
Speed of speech recognition algorithms decreases with the increasing vocabulary
size. While most of the speech recognition algorithms have a vocabulary size
around 50,000 words, Dikte has a real time recognition capacity of approximately
300 billion words. When the vocabulary sizes are compared, we can say that Dikte
does about 6 million times more than other recognition engines in the same time.
This is far beyond the technology of other speech recognition systems.
Turkish is a very productive language in terms of word forms because of its
agglutinative nature. By using only a single stem, millions of new word forms can be
generated using inflectional or derivational suffixes.
Considering the fact that there are more than 20,000 stems that are frequently used
in Turkish, and each stem can be in millions of different forms, it would be
reasonable to assume that vocabulary size of 300 billion words is needed. Yet,
computer memory is not sufficient to store 300 billion words. To deal with this issue,
Dikte generates new forms of words from Turkish stems on the fly, which are more
than 15 million derivatives for each stem.
5
www.dikte.com.tr
2. Highlights
Let's give an example. The Turkish word “al” means "to take" in English. These are
just a few derivatives of “al”:
Sample Generated Word From "al" English Meaning
alsam I wish I take
alacaklı creditor
almalıydım I would have taken
alıcı buyer
alabilirsem If I can take
almalıysanız If you should take
alıcısızlıkla With having no buyer (client)
He was one of those that were able to become
alıcılaşabilenlerdendi
buyer
6
www.dikte.com.tr
2. Highlights
2.2. Perplexity
One popular measure of the difficulty of the speech recognition task --combining the
vocabulary size and the language model-- is perplexity, loosely defined as the
geometric mean of the number of words that can follow a word after the language
model has been applied. According to general classification, if the perplexity is higher
than 100, the system is considered to be large.
Dikte realizes the impossible. Perplexity of Dikte is 300 billion. This number is 3 billion
times more than ordinary large systems. None of the existing speech recognition
systems can cope with such a huge perplexity other than Dikte.
Perplexity of Dikte is huge since Turkish language has no strict word order, each word
can be followed by any word. Here is an example the Turkish word "bak" means "to
look at" in English and "al" is the same word in the previous example.
Sample Turkish Word Sequence English Meaning Dikte can overcome
Aldığına bakmadı He did not look at what he bought
such a huge
difficulty very
Baktığını almadı He did not buy what he looked at
successfully.
7
www.dikte.com.tr
2. Highlights
2.3. Computational Efficiency
Real time recognition vocabulary size at billions means unbeatable efficiency. This
amazingly massive task is accomplished on a single core CPU. Dikte uses available
processing power and memory in a magic way.
There is no other speech recognition system that can provide such a hard to believe
efficiency.
Efficiency is vital for mobile speech recognition since resources are limited. Hence,
efficiency of Dikte will have drastic effects on mobile speech recognition.
8
www.dikte.com.tr
2. Highlights
2.4. Accuracy
The huge complexity that Dikte faces requires much more detailed and accurate
recognition capability than known technologies can provide.
Again, accuracy of recognition algorithms diminishes as the vocabulary size
increases. Accuracy of Dikte is more than 97% at a vocabulary size of 300 billion
words. The level of accuracy increases to more than 99% for a capacity of 100,000
words.
Speed means accuracy since recognition algorithm can process much more
candidate paths (word sequences) for a definite time period.
9
www.dikte.com.tr
2. Highlights
2.6. Speaker Independence
Some advanced learning algorithms made Dikte much more speaker independent
than other speech recognition systems. It is possible to build highly accurate
speaker independent speech recognition systems at a vocabulary size about 100K.
10
www.dikte.com.tr
2. Highlights
2.9. Dikte Microphone
Engineers of Dikte team devised a special 2-channel desktop microphone which
cancels noise and has a directional sensitivity. Dikte Microphone renders wearing
head sets unnecessary.
Dikte Microphone is a so powerful
solution that it makes using Dikte
possible while high volume music is
playing.
Dikte Microphone also eliminates
speech of people that comes from
unwanted directions. To illustrate, it is
possible to run 8 dictation systems in
one room when users are only 1 m
apart.
11
www.dikte.com.tr
3. Speed, Accuracy, and
Vocabulary Size Relation
12
www.dikte.com.tr
3. Speed, Accuracy and
Vocabulary Size Relation
14
www.dikte.com.tr