The document discusses optical character recognition (OCR) for handwritten Devnagri numerals. It describes the proposed OCR methodology, which includes preprocessing techniques like binarization and noise reduction, as well as feature extraction using character profiles. An experiment was conducted on a Devnagri numeral database containing over 1300 numerals using a neural network classifier. The methodology achieved an average recognition rate of 86.7%. Future work involves developing new classification schemes and exploring additional features.
The document discusses optical character recognition (OCR) for handwritten Devnagri numerals. It describes the proposed OCR methodology, which includes preprocessing techniques like binarization and noise reduction, as well as feature extraction using character profiles. An experiment was conducted on a Devnagri numeral database containing over 1300 numerals using a neural network classifier. The methodology achieved an average recognition rate of 86.7%. Future work involves developing new classification schemes and exploring additional features.
The document discusses optical character recognition (OCR) for handwritten Devnagri numerals. It describes the proposed OCR methodology, which includes preprocessing techniques like binarization and noise reduction, as well as feature extraction using character profiles. An experiment was conducted on a Devnagri numeral database containing over 1300 numerals using a neural network classifier. The methodology achieved an average recognition rate of 86.7%. Future work involves developing new classification schemes and exploring additional features.
Pratik Gupta, Dept. of Computer Science Sharda University Sharda University, Greator Noida Centre for Pattern Analysis and Recognition Outline Introduction to Handwritten OCR systems Devnagri Handwritten Numeral Database Proposed OCR Methodology Experimental Results Future Work OCR Systems OCR systems consist of four major stages : Image Processing & Quality Improvement
Normalize Character Size
Feature Extraction
Classification
Pre-processing The raw data is subjected to a number of preliminary processing steps to make it usable in the descriptive stages of character analysis. Pre-processing aims to produce data that are easy for the OCR systems to operate accurately. The main objectives of pre-processing in this method is : Binarization
Noise reduction
Binarization Document image binarization (thresholding) refers to the conversion of a gray-scale image into a binary image. Noise Reduction Noise reduction improves the quality of the document. The main approach is : Morphological Operations (erosion, dilation, etc) CPAR- Devnagari Handwritten Character Database The database was developed at CPAR, Sharda University.
Currently it has more than 80000 Numerals and 1.25 lakh Devnagri character.
More than 5000 Hindi pangram for document recognition.
Numerals from CPAR database are used in small scale for this project.
More than 1300 Numerals About 120 variations of each numeral Database Creation Program Feature Extraction In feature extraction stage each character is represented as a feature vector, which becomes its identity. The major goal of feature extraction is to extract a set of features, which maximizes the recognition rate with the least amount of elements. Due to the nature of handwriting with its high degree of variability and imprecision , obtaining these features is a difficult task. Profiles The profile counts the number of pixels (distance) between the bounding box of the character image and the edge of the character. The profiles describe well the external shapes of characters and allow to distinguish between a great number of letters. L e f t
P r o f i l e
R i g h t
P r o f i l e
Top Profile Bottom Profile Classification Neural Network :Pattern recognition can be implemented by using a feed-forward neural network that has been trained accordingly. During training, the network is trained to associate outputs with input patterns. When the network is used, it identifies the input pattern and tries to output the associated output pattern. There is no such thing as the best classifier. The use of classifier depends on many factors, such as available training set, number of free parameters etc. k-Nearest Neighbour (k-NN) , Neural Network (NN), Support Vector Machines (SVM), etc. We have used feed forward Neural Network in this Project. Profile Based Feature Extraction Methodology Experimental Results 0 1 2 3 4 5 6 7 8 9(1) 9(2) Total Sample %age 0 57 1 0 0 0 0 0 3 0 0 0 61 93.4426 1 0 49 1 2 0 3 0 1 0 0 5 61 80.3279 2 0 1 55 8 0 0 0 0 4 1 1 70 78.5714 3 0 1 3 53 1 0 0 0 1 1 0 60 88.3333 4 0 1 0 0 49 7 0 2 0 1 0 60 81.6667 5 0 0 4 1 1 52 0 0 0 1 0 59 88.1356 6 0 0 0 0 1 0 58 2 1 4 1 67 86.5672 7 1 0 0 0 3 0 0 55 0 0 0 59 93.2203 8 0 1 1 0 1 0 0 0 48 3 2 56 85.7143 9(1) 0 0 0 0 2 1 3 0 0 47 0 53 88.6792 9(2) 0 2 1 0 0 1 0 0 1 1 51 57 89.4737
Average Recognition 86.7393 Publications 1. R. Sukhaswami, P. Seetharamulu and A.K. Pujari, recognition of Telugu characters using neural networks International Journal of neural System, 6 (1995), pp. 317357. 2. N.P. Banashree, D. Andhre, R. Vasanta and P.S. Satyanarayana, OCR for script identification of Hindi (Devnagari) numerals using error diffusion Halftoning Algorithm with neural classifier. Proceedings of World Academy of Science Engineering and Technology, 20 (2007), pp. 4650. 3.An Efficient Feature Extraction and Dimensionality Reduction Scheme for Isolated Greek Handwritten Character Recognition, 9 th
International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, September 2007. Waiting... Future Work Creating new hierarchical classification schemes based on rules after examining the corresponding confusion matrix. Exploiting new features to improve the current performance.