Você está na página 1de 14

Dikte Technology

www.dikte.com.tr 1
INDEX

Dikte Technology
‰ 1. History
‰ 2. Highlights
‰ 3. Speed, Accuracy, and Vocabulary Size
Relation
‰ 4. Future

2
www.dikte.com.tr
1. History
Dikte project started by the year 2000. In the beginning, the target vocabulary size for
long term was 5 million words. This size may seem to be huge but Turkish is a suffixed
language with complex morphology. Subsequently, we would realize that even 5 million
word vocabulary size was 40,000 times less than the size actually needed.
A few months after starting the project, Dikte was able to do isolated recognition with a
vocabulary size of 100 words.
When the vocabulary size reached 5000 words in 2002, development for continuous
recognition was started. It took about a year to implement 3 different recognition
algorithms selected from literature. Since none of them has provided enough speed and
accuracy, research for a better recognition algorithm started by the end of 2003. This
research resulted in a new invention. One year later in 2004 we developed a new
recognition algorithm. This new continuous recognition algorithm was incredibly fast and
accurate using an approximate capacity of 200,000 words.

3
www.dikte.com.tr
1. History
The capacity was limited because of memory limitations. In fact, the speed and
accuracy was enough for much greater capacities. At this point we started to develop a
medical report dictation application since capacity of the technology allowed only for
medical speech recognition for Turkish. One year later in 2005, Medical Dikte was ready
for field testing.
In 2005, context free Turkish speech recognition project started. There were two main
challenges:
• developing a language model that can generate words during recognition because
there is not enough memory on computers that can store billions of different words.
• optimizing the acoustic recognition algorithm so that it can handle billions of different
words during recognition.
Two years later in 2007, we developed a speech recognition system with an incredible
vocabulary size of 300 billion words.

4
www.dikte.com.tr
2. Highlights
2.1. Incredible Speed and Vocabulary Size of Dikte
Speed of speech recognition algorithms decreases with the increasing vocabulary
size. While most of the speech recognition algorithms have a vocabulary size
around 50,000 words, Dikte has a real time recognition capacity of approximately
300 billion words. When the vocabulary sizes are compared, we can say that Dikte
does about 6 million times more than other recognition engines in the same time.
This is far beyond the technology of other speech recognition systems.
Turkish is a very productive language in terms of word forms because of its
agglutinative nature. By using only a single stem, millions of new word forms can be
generated using inflectional or derivational suffixes.
Considering the fact that there are more than 20,000 stems that are frequently used
in Turkish, and each stem can be in millions of different forms, it would be
reasonable to assume that vocabulary size of 300 billion words is needed. Yet,
computer memory is not sufficient to store 300 billion words. To deal with this issue,
Dikte generates new forms of words from Turkish stems on the fly, which are more
than 15 million derivatives for each stem.

5
www.dikte.com.tr
2. Highlights
Let's give an example. The Turkish word “al” means "to take" in English. These are
just a few derivatives of “al”:
Sample Generated Word From "al" English Meaning
alsam I wish I take
alacaklı creditor
almalıydım I would have taken
alıcı buyer
alabilirsem If I can take
almalıysanız If you should take
alıcısızlıkla With having no buyer (client)
He was one of those that were able to become
alıcılaşabilenlerdendi
buyer

In Turkish suffixes express subject, tense, condition, wish, ability, ownership,


opposition, location, and so forth. Speakers can form various combinations of suffixes.

6
www.dikte.com.tr
2. Highlights
2.2. Perplexity
One popular measure of the difficulty of the speech recognition task --combining the
vocabulary size and the language model-- is perplexity, loosely defined as the
geometric mean of the number of words that can follow a word after the language
model has been applied. According to general classification, if the perplexity is higher
than 100, the system is considered to be large.
Dikte realizes the impossible. Perplexity of Dikte is 300 billion. This number is 3 billion
times more than ordinary large systems. None of the existing speech recognition
systems can cope with such a huge perplexity other than Dikte.
Perplexity of Dikte is huge since Turkish language has no strict word order, each word
can be followed by any word. Here is an example the Turkish word "bak" means "to
look at" in English and "al" is the same word in the previous example.
Sample Turkish Word Sequence English Meaning Dikte can overcome
Aldığına bakmadı He did not look at what he bought
such a huge
difficulty very
Baktığını almadı He did not buy what he looked at
successfully.
7
www.dikte.com.tr
2. Highlights
2.3. Computational Efficiency
Real time recognition vocabulary size at billions means unbeatable efficiency. This
amazingly massive task is accomplished on a single core CPU. Dikte uses available
processing power and memory in a magic way.
There is no other speech recognition system that can provide such a hard to believe
efficiency.
Efficiency is vital for mobile speech recognition since resources are limited. Hence,
efficiency of Dikte will have drastic effects on mobile speech recognition.

8
www.dikte.com.tr
2. Highlights
2.4. Accuracy
The huge complexity that Dikte faces requires much more detailed and accurate
recognition capability than known technologies can provide.
Again, accuracy of recognition algorithms diminishes as the vocabulary size
increases. Accuracy of Dikte is more than 97% at a vocabulary size of 300 billion
words. The level of accuracy increases to more than 99% for a capacity of 100,000
words.
Speed means accuracy since recognition algorithm can process much more
candidate paths (word sequences) for a definite time period.

2.5. Immunity To Noise and Robustness


Advanced signal processing algorithms of Dikte is very immune to environmental
noise especially in an office, or hospital. For example, 10 speakers each 1.5 m
apart can easily use Dikte in the same room.

9
www.dikte.com.tr
2. Highlights
2.6. Speaker Independence
Some advanced learning algorithms made Dikte much more speaker independent
than other speech recognition systems. It is possible to build highly accurate
speaker independent speech recognition systems at a vocabulary size about 100K.

2.7. Direct Support for SIMD


SIMD (Single Instruction Multiple Data) instructions are parallel CPU operations
supported by main CPU manufacturers. Compilers do not provide direct support for
SIMD instructions. Dikte uses SIMD instructions in frequently called functions. SIMD
instructions provide extra speed in addition to incredible recognition algorithm of
Dikte.

2.8. Support for Multi-Core CPU’s


Multi-Threaded architecture of Dikte makes use of each core and CPU that the
system has. Normally single core is enough but multi core provide speaker
independence and more accuracy.

10
www.dikte.com.tr
2. Highlights
2.9. Dikte Microphone
Engineers of Dikte team devised a special 2-channel desktop microphone which
cancels noise and has a directional sensitivity. Dikte Microphone renders wearing
head sets unnecessary.
Dikte Microphone is a so powerful
solution that it makes using Dikte
possible while high volume music is
playing.
Dikte Microphone also eliminates
speech of people that comes from
unwanted directions. To illustrate, it is
possible to run 8 dictation systems in
one room when users are only 1 m
apart.

11
www.dikte.com.tr
3. Speed, Accuracy, and
Vocabulary Size Relation

3.1. Accuracy Decreases as Vocabulary Size Increases


Increase in vocabulary size also increases entropy (uncertainty) and ambiguity.
Speech recognition algorithms make comparisons between observation
(microphone input) and statistically learned models. Since there is no capability for
meaning recognition, error rate increases as vocabulary size increases.
Generally, doubling vocabulary size doubles the error rate.

3.2. Speed Decreases as Vocabulary Size Increases


For almost all of the speech recognition systems, word is the basic unit to
recognize. The size of vocabulary directly affects recognition speed and memory
requirement.
Generally doubling vocabulary size doubles the time required for recognition; in
other words, halves the speed.

12
www.dikte.com.tr
3. Speed, Accuracy and
Vocabulary Size Relation

3.3. Accuracy Decreases As Speed Increases


Despite so many improvements in computer technology, today's computers still do
not have sufficient processing power and memory for an ambitious speech
recognition system. Therefore, most of the large-vocabulary speech recognition
systems sacrifice some accuracy for speed.
Although accuracy can be increased by utilizing more processing power, at some
point increasing accuracy through increased computation becomes very impractical.
To illustrate, in order to increase accuracy only by 1% you may need to double the
processing power.
3.4 Realizing The Impossible
The largest sized speech recognition systems have a vocabulary size about 100K
words.
Increasing vocabulary size million times means an increase of million times in
uncertainty, workload, and processing and memory need. Dikte realizes the
impossible: Real time speech recognition with a 300 billion word vocabulary size.
13
www.dikte.com.tr
4. FUTURE
4.1. The Future of Speech Recognition
Speech is the most natural way of communication for humans. Speech recognition
is the key to enable technology to provide a natural user interface for computer
systems, and it is expected to revolutionize how people use their computers and
other devices. In the near future, speech recognition will become the method of
choice for controlling appliances, toys, tools, computers, and robotics.
But the reality is that although there is some success, speech recognition
technology is still in its infancy and far from meeting the expectations. There is a
desperate need for new technologies like Dikte, which is ready to remove many
obstacles and will have dramatic effects as it spreads to other languages and
applications.

14
www.dikte.com.tr

Você também pode gostar