Soft Computing

Optical Character Recognition (OCR)
For Handwritten Devnagri Numerals

Pratik Gupta,
Dept. of Computer Science
Sharda University
Sharda University,
Greator Noida
Centre for Pattern Analysis and
Recognition
Outline
Introduction to Handwritten OCR systems
Devnagri Handwritten Numeral Database
Proposed OCR Methodology
Experimental Results
Future Work
OCR Systems
OCR systems consist of four major stages :
Image Processing & Quality Improvement

Normalize Character Size

Feature Extraction

Classification

Pre-processing
The raw data is subjected to a number of preliminary
processing steps to make it usable in the descriptive stages of
character analysis. Pre-processing aims to produce data that
are easy for the OCR systems to operate accurately. The main
objectives of pre-processing in this method is :
Binarization

Noise reduction

Binarization
Document image binarization (thresholding) refers to the
conversion of a gray-scale image into a binary image.
Noise Reduction
Noise reduction improves the quality of the document. The
main approach is :
Morphological Operations (erosion, dilation, etc)
CPAR- Devnagari Handwritten Character
Database
The database was developed
at CPAR, Sharda University.

Currently it has more than 80000
Numerals and 1.25 lakh Devnagri
character.

More than 5000 Hindi pangram
for document recognition.

Numerals from CPAR database
are used in small scale for this project.

More than 1300 Numerals
About 120 variations of each numeral
Database Creation Program
Feature Extraction
In feature extraction stage each character is represented as
a feature vector, which becomes its identity. The major goal
of feature extraction is to extract a set of features, which
maximizes the recognition rate with the least amount of
elements.
Due to the nature of handwriting with its high degree of
variability and imprecision , obtaining these features is a
difficult task.
Profiles
The profile counts the number of pixels (distance) between
the bounding box of the character image and the edge of the
character. The profiles describe well the external shapes of
characters and allow to distinguish between a great number of
letters.
L
e
f
t

P
r
o
f
i
l
e

R
i
g
h
t

P
r
o
f
i
l
e

Top Profile
Bottom Profile
Classification
Neural Network :Pattern recognition can be implemented by
using a feed-forward neural network that has been trained
accordingly. During training, the network is trained to associate
outputs with input patterns. When the network is used, it
identifies the input pattern and tries to output the associated
output pattern.
There is no such thing as the best classifier. The use of
classifier depends on many factors, such as available
training set, number of free parameters etc.
k-Nearest Neighbour (k-NN) , Neural Network (NN), Support Vector
Machines (SVM), etc.
We have used feed forward Neural Network in this Project.
Profile Based Feature Extraction
Methodology
Experimental Results
0 1 2 3 4 5 6 7 8 9(1) 9(2)
Total
Sample
%age
0 57 1 0 0 0 0 0 3 0 0 0 61 93.4426
1 0 49 1 2 0 3 0 1 0 0 5 61 80.3279
2 0 1 55 8 0 0 0 0 4 1 1 70 78.5714
3 0 1 3 53 1 0 0 0 1 1 0 60 88.3333
4 0 1 0 0 49 7 0 2 0 1 0 60 81.6667
5 0 0 4 1 1 52 0 0 0 1 0 59 88.1356
6 0 0 0 0 1 0 58 2 1 4 1 67 86.5672
7 1 0 0 0 3 0 0 55 0 0 0 59 93.2203
8 0 1 1 0 1 0 0 0 48 3 2 56 85.7143
9(1) 0 0 0 0 2 1 3 0 0 47 0 53 88.6792
9(2) 0 2 1 0 0 1 0 0 1 1 51 57 89.4737

Average Recognition
86.7393
Publications
1. R. Sukhaswami, P. Seetharamulu and A.K. Pujari, recognition of
Telugu characters using neural networks International Journal
of neural System, 6 (1995), pp. 317357.
2. N.P. Banashree, D. Andhre, R. Vasanta and P.S. Satyanarayana, OCR
for script identification of Hindi (Devnagari) numerals using error
diffusion Halftoning Algorithm with neural classifier. Proceedings of
World Academy of Science Engineering and Technology, 20 (2007), pp.
4650.
3.An Efficient Feature Extraction and Dimensionality Reduction
Scheme for Isolated Greek Handwritten Character Recognition, 9
th

International Conference on Document Analysis and Recognition (ICDAR
2007), Curitiba, Brazil, September 2007. Waiting...
Future Work
Creating new hierarchical classification schemes based
on rules after examining the corresponding confusion
matrix.
Exploiting new features to improve the current performance.

Soft Computing

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Soft Computing

Enviado por

Direitos autorais:

Formatos disponíveis

Optical Character Recognition (OCR)

For Handwritten Devnagri Numerals

Você também pode gostar