Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract - Nowadays, automatic detection of text from the The text content present in the vehicles is a unique feature
vehicles is an important problem in many applications. Text which can be used for identifying vehicles in video
information present in an image can be easily understood by surveillance applications. Text is a distinct mark that can be
both human and computer. It has wide applications such as found in many vehicles. This paper deals with the
license plate reading, sign detection, identification of development of an algorithm for detecting the text area from
destination places, mobile text recognition and so on. This the vehicles especially buses for recognizing the places. In the
problem is challenging due to complex backgrounds, the bus stands, there may be circumstances where people rushing
non-uniform illuminations, variations of text font, size and behind each and every bus to know its destination place. This
line orientation. Once the text is identified, it can be situation may lead to a crowded stand, thereby increasing the
analyzed, recognized and interpreted. Hence, there is a need chances of accidents. So if we can develop a system which
for a better algorithm for detection and localization of text will detect the destination place written on the bus boards and
from vehicles. A method is proposed for detecting text from display the name of the place on a screen displayed in the
vehicles. The method makes use of features such as waiting room, it will be helpful for the people as well as the
Histogram of oriented Gradients (HOG) and Local Binary policemen to control the crowd. Here, the images of buses are
Pattern (LBP). These features are stored which can be captured from the camera which is mounted at the entrance of
further used for feature matching at the time of the bus stands. On vehicles, text marks are very small
classification. After the text region is being detected, it can compared to the imaged scene and are typically present on the
be further subjected to character segmentation and front of the bus. There are basically three different methods
recognition thereby identifying the destination places. The for text region detection. They are 1) Texture Based 2)
ability to recognize text area from the vehicles, especially Connected Component Based 3) Region Based.
buses has obvious applications like traffic management in
the bus stands. The obtained results are verified and The texture based approach considers the text as a
performance parameters like speed, precision and recall are special texture. Here, the features are extracted over a certain
determined. region. This method makes use of the texture feature
uniformity across the text regions. Then the classifier is
Keywords- HOG, LBP, Profile based features, Skew employed to identify the existence of text region. The most
detection and removal, Eigen value regularization commonly used method in texture based approach is to extract
the features from the DCT of the text region. This is based on
I.INTRODUCTION the observation that the text has certain horizontal and vertical
frequencies. The other method employed is collecting the
Automatic detection of interest regions is an active research features of the text region from the wavelet coefficients and
area in the design of machine vision systems and is used in classify the text using SVM.
many applications such as tourists assistant systems, mobile
The connected component based method makes use
robot navigation, vehicle license plate detection and
of the observation that the text pixels are connected to each
recognition. Vision systems are mainly focused on constantly
other. This method extracts the regions from the image and
monitoring traffic and observe passing vehicles, extracting
uses the geometric constraints to rule out non-text candidates.
important features such as vehicle type, color and distinct
There are different methods for finding out the connected
marks.
components. The widely used method is linking the
Assistant Professor, Computer Applications components based on the geometric properties. The other
Easwari Engineering College method is finding the connected components in a stroke width
Chennai 89. transformed image which is generated by shooting arrays from
Kssindia2004@yahoo.co.in
edge pixels along gradient direction.
Dr.Rm.Somasundaram
Professor Dept. of Computer Science and Engineering The region based method utilizes the feature that the
SNS College of Engineering text area has distinct intensity compared with the background.
Coimbatore
The region based methods rely on the text region analysis.
This method utilizes morphological operations to extract text
605
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
regions. Region based methods also includes edge based Li Sun Corner It is more Unable to
methods. The edge based methods uses the observation that et al Response robust than detect
the text has strong edges between the character and
(2009 Based Text other edge or smaller
background pixels. This method identifies the sub structures in
the image. Thus, the edges present in the image are identified
) [4] Detection texture based texts
and these sub structures are merged to mark the bounding methods.
boxes for text by learning based rules. Mainly, the text detection methods can be broadly classified
into three categories Texture based, connected component
II. LITERATURE SURVEY based and region/edge based. Based on these different
methods, different features are used for text detection. In the
The proposed approach consists of mainly 5 stages text proposed approach, connected component based method is
region detection, preprocessing stage which includes skew employed since each and every character is a connected
removal and noise removal, script identification, character component. The unwanted components can be eliminated by
segmentation and character recognition. The various considering the geometric properties of the components.
techniques employed for the different stages were studied and
recorded. B. SKEW DETECTION AND CORRECTION
AUTHO APPROACH ADVANTAGES DISADVANTAGE
A.TEXT REGION DETECTION R S
Andre Edge Able to detect Symbols Mehdi maximum Able to detect Complexity
j Ikica profile the text detected as Felhi gradient multiple skews of the
(2010 based text captured from text et al difference in a document algorithm is
) [7] detection any lighting (2010 and R- more
conditions ) [3] signature
Kai Learning High accuracy Fails when
P Boundary Works Not robust
Chen based 95%, Less text color is
Shiva growing efficiently for to noise
(2011 method computational similar to
kuma approach text binary
) [8] time background
ra et documents
Adam Unsupervi High accuracy Failed to al
Coate sed feature (85%) detect (2007
s learning smaller text ) [10]
(2011 algorithm regions
) [9] Yang Straight Reduces Cannot
Cao et line fitting computational detect
Maria Color High Unable to al complexity and multiple
no et Feature accuracy(87%) detect text if (2009 have better skews
al Extraction it is written ) [5] precision
(2003 with
) [2] different
colors.
606
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
The skew of the detected text region can be detected and D. CHARACTER SEGMENTATION
corrected using different techniques. The most widely used
methods are discussed here. The skew detection using AUTHO APPROACH ADVANTAGES DISADVANTAGE
projection profile method gives high accuracy. This method is R S
able to detect any skew angles.
Xiaod Vertical Algorithm is Prior
an Jia projection more efficient knowledge
C. SCRIPT IDENTIFICATION
et al profile under the about the
AUTHO APPROACH ADVANTAGES DISADVANTAGE (2008 condition that character
R S ) [14] license plate segments
image is are needed
D Word level Accuracy is Word should
Dhan identificati have more degraded
90%
ya et on than 5 P Gradient Accuracy rate Mathematica
al connected Shiva based is 94% l complexity
(2002 components kuma method is more
) [11] ra et
al
Huanf Gabor Accuracy rate Single
(2010
eng filter is 90% characters
Ma analysis of may not ) [15]
and textures have similar Huad Vertical Works well Mathematica
David texture ong with license l complexity
projection
Doer leading to Xia et plate images is more
mann incorrect al
(2004 classification (2011
) ) [16]
Mallik Directional Accuracy rate Time Youn Blob Accuracy rate Wont work
arjun visual is 97.5% complexity is gwoo extraction in low
is97.2%
et discrimina more Yoon resolution
based
al(20 ting et images
10) features al(20
[12] 11)
[17]
M. C. Profile Success rate Training
PAD based found to be phase and
Young Lees Accuracy rate When input
MA et features 99.5% testing woo character is 98.3% images are
al phase Yoon segmentati captured
(201 required et al on under bad
0) (2012 algorithm lighting
[13] ) [18] conditions,
accuracy is
less
Out of the different methods used for script identification, the
proposed approach makes use of the profile based features for
identifying the script. This approach mainly concentrates on 2
scripts- English and Malayalam. The widely used methods for segmentation were employed.
Blob extraction based method gives high accuracy. The noisy
blobs can be segmented out by considering the size of the
extracted blobs.
607
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
E. CHARACTER RECOGNITION Mesh using PCA different fonts case of real life
esha and LDA degraded
Template matching was the most popular method employed followed by documents
for character recognition. The other approaches can be roughly (2007 a decision
classified into feature based, structural based and neural ) [23] directed
network based classification. Feature matching approach graph based
extracts different features then calculates a distance metric SVM
between test sample and the trained class. But still the classifier
selection and extraction of feature vectors remains a major
issue especially if character contains noise. The paper
proposes a modified approach for character recognition using
Eigen feature regularization method.
III.PROPOSED METHOD
AUTHO APPROACH ADVANTAGES DISADVANTAGE
R S
A. OVERALL ARCHITECHURAL
J. A. Hierarchical Solves both the Large
Vlont system with context sensitivity computational DIAGRAM
zos et Hidden problem and the requirements -
Markov character Needs O (TN2)
al
Model instantiation for a N state
(1992 problem model and T
) [19] observations.
et al
(2010
) [22]
608
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
gc)* 2
p
connected component analysis are subjected to feature L B PP , R ( x c , y c ) = s(g p
extraction. The LBP and HOG features are extracted and their p0
Divide the examined window to cells (e.g. 16x16 Local Binary Pattern (LBP) is a simple efficient texture
pixels for each cell). operator which labels the pixels of an image by thresholding
the neighborhood of each pixel and considers the result as a
For each pixel in a cell, compare the pixel to each of binary number [25]. The most important property of the LBP
its 8 neighbors (on its left-top, left-middle, left- operator is its robustness to monotonic gray-scale changes
bottom, right-top, etc.). Follow the pixels along a caused by illumination variations. Another important property
circle, i.e. clockwise or counter-clockwise. is its computational simplicity, which makes it possible to
analyze images in challenging real-time settings.
609
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
Orientation Binning
C. SCRIPT IDENTIFICATION
The second step of calculation involves creating the The proposed model is based on the observation that every
cell histograms. Each pixel within the cell casts a script/language has a finite set of text patterns, each having a
weighted vote for an orientation-based histogram distinct visual appearance, which helps them in recognizing
channel based on the values found in the gradient the language. Every language could be identified based on its
computation. The cells themselves can either be discriminating features. The proposed approach is mainly
rectangular or radial in shape, and the histogram based on the concept of the top and bottom profiles of the
channels are evenly spread over 0 to 180 degrees or 0 input text lines. The character shape descriptors used in the
to 360 degrees, depending on whether the gradient is proposed approach is top profile and bottom profile. The top
unsigned or signed. profile of a text line represents a set of black pixels obtained
by scanning each column of the text line from top until it
Gradient magnitude is, reaches a first black pixel. Similarly, the bottom profile can be
obtained by scanning the image from bottom to top. From the
(( d x ( x , y ) ) ( d y ( x , y ) ))
2 2
m ( x, y) =
descriptors, certain features are extracted which can be
efficiently used for the identification of the script. The features
Gradient angle is, used in this method are Profile value, Bottom_max_row_no,
( x , y ) = tan
1
( d y ( x , y ) / d x ( x , y )) Coeff_profile, Top_component_density[13].
Descriptor Blocks
In order to account for changes in illumination and
contrast, the gradient strengths must be locally
normalized, which requires grouping the cells
together into larger, spatially connected blocks. The
HOG descriptor is then the vector of the components
of the normalized cell histograms from all of the
block regions. These blocks typically overlap,
meaning that each cell contributes more than once to
the final descriptor.
610
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
classify the text into appropriate class thereby
helping in identifying the script of the document. This paper proposes the Eigen feature Regularization and
Extraction Algorithm (ERE Algorithm) for Character
D. CHARACTER SEGMENTATION Recognition [28]. This algorithm was proposed by Xudong
Jiang. Here, the entire eigenspace is decomposed in to
subspaces and regularization and extraction of the significant
Eigen Vectors is done. The algorithm decomposes the
Eigenspace spanned within class scatter matrix into face, noise
and null subspaces. Eigen features are regularized differently
in these three subspaces based on an Eigen Spectrum model.
Then dimensionality reduction is applied thereby maximizing
the variances of the extracted features and reducing the error.
Finally, classifier is employed to recognize the character
feature vectors.
E. CHARACTER RECOGNITION
After the script is identified, the text document is fed into the
appropriate segmentation algorithm. The algorithm will
segment out the different characters in the text document and
saves each character as an image.
611
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
References
[1] Hong Liu et al, Skew detection for complex document
images using robust borderlines in both text and non-text
regions, Science Direct, Pattern Recognition Letters, 2009.
V. CONCLUSION [5] Yang Cao, Heng Li, Skew Detection and Correction in
Document Images Based on Straight-Line Fitting, IEEE-
This paper proposes an approach for detecting and identifying International Conference on Image processing,2009.
the destination place written on the vehicles especially buses.
Here, the boundary of the problem is that the destination place [6] Katherine L. Bouman, A Low Complexity Sign Detection
is written on a board placed in front of the bus. The proposed And Text Localization Method For Mobile Applications,
algorithm worked well and the results were analyzed and IEEE Transactions On Multimedia, Vol. 13, No. 5, October
recorded. 2011.
612
Proc. of the Intl. Conf. on Advances In Engineering And Technology - ICAET-2014
Copyright Institute of Research Engineers and Doctors. All rights reserved.
ISBN: 978-1-63248-028-6 doi: 10.15224/ 978-1-63248-028-6-01-127
International Journal of Computer Applications, Volume 4 [25] Chi Ho Chan, Multi-scale Local Binary Pattern
No.6, July 2010. Histogram for Face Recognition, Centre for Vision, Speech
and Signal Processing School of Electronics and Physical
[13] M. C. Padma, Script Identification From Trilingual Sciences University of Surrey, September 2008.
Documents Using Profile Based Features, International
Journal of Computer Science and Applications, Vol. 7 No. 4, [26] Yi-Feng Pan, Xinwen Hou, Cheng-Lin Liu ,A Robust
pp. 16 - 33, 2010. System to Detect and Localize Texts in Natural Scene
Images, Eighth IAPR Workshop on Document Analysis
[14] Xiaodan Jia and Xinnian Wang, A Novel Algorithm for Systems, IEEE 2008.
Character Segmentation of Degraded License Plate Based on
Prior Knowledge, scientific research foundation of Liaoning [27] Gang Zhou, Yuehu Liu, Quan Meng, Yuanlin Zhang
Province of China, 2009. ,Detecting Multilingual text in Natural Scene, 1st
International Symposium on Access Spaces (ISAS), IEEE-
[15] Palaiahnakote Shivakumara et al, A New Gradient based ISAS 2011.
Character Segmentation Method for Video Text Recognition,
IEEE Computer Vision and Pattern Recognition, 2010. [28] T.Senthil kumar et al, An improved approach for
Character Recognition in Vehicle Number plate using
[16] Huadong Xia and Dongchu Liao, The Study of License Eigenfeature Regularisation and Extraction Method,
Plate Character Segmentation Algorithm based on Vertical International Journal of Research and Reviews in Electrical
Projection, IEEE 2011. and Computer Engineering (IJRRECE) Vol. 2, No. 2, June
2012.
[17] Youngwoo Yoon et al, Blob Extraction based Character
Segmentation Method for Automatic License Plate
Recognition System, IEEE 2011.
613