Você está na página 1de 64

Abstract

This work deals with problematic from field of artificial intelligence, machine
vision and neural networks in construction of an automatic number plate recognit
ion system (ANPR). This problematic includes mathematical principles and algorit
hms, which ensure a process of number plate detection, processes of proper chara
cters segmentation, normalization and recognition. Work comparatively deals with
methods achieving invariance of systems towards image skew, translations and va
rious light conditions during the capture. Work also contains an implementation
of a demonstration model, which is able to proceed these functions over a set of
snapshots. Key Words: Machine vision, artificial intelligence, neural networks,
optical character recognition, ANPR

Contents
1 Introduction
1.1 1.2 1.3 1.4 ANPR systems as a practical application of artificial intelligen
ce Mathematical aspects of number plate recognition systems Physical aspects of
number plate recognition systems Notations and mathematical symbols
1 1 1 2 3
2 Principles of number plate area detection
2.1 2.2 2.3 2.4 Edge detection and rank filtering 2.1.1 Convolution matrices Hor
izontal and vertical image projection Double-phase statistical image analysis 2.
3.1 Vertical detection - band clipping 2.3.2 Horizontal detection - plate clippi
ng Heuristic analysis and priority selection of number plate candidates 2.4.1 Pr
iority selection and basic heuristic analysis of bands 2.4.2 Deeper analysis Des
kewing mechanism 2.5.1 Detection of skew 2.5.2 Correction of skew
5 5 5 7 8 9 10 13 13 14 15 16 18
2.5
3 Principles of plate segmentation
3.1 3.2 Segmentation of plate using a horizontal projection Extraction of charac
ters from horizontal segments 3.2.1 Piece extraction 3.2.2 Heuristic analysis of
pieces
20 20 22 22 23
4 Feature extraction and normalization of characters
4.1 4.2 Normalization of brightness and contrast 4.1.1 Histogram normalization 4
.1.2 Global thresholding 4.1.3 Adaptive thresholding Normalization of dimensions
and resampling 4.2.1 Nearest-neighbor downsampling 4.2.2 Weighted-average downs
ampling Feature extraction 4.3.1 Pixel matrix 4.3.2 Detection of character edges
4.3.3 Skeletonization and structural analysis
25 25 25 27 28 29 29 30 31 31 32 35
4.3
5 Recognition of characters
5.1 5.2 General classification problem Biological neuron and its mathematical mo
dels
42 42 43
iv
5.3 5.4
5.5
5.2.1 McCulloch-Pitts binary threshold neuron 5.2.2 Percepton Feed-forward neura
l network Adaptation mechanism of feed-forward neural network 5.4.1 Active phase
5.4.2 Partial derivatives and gradient of error function 5.4.3 Adaptation phase
Heuristic analysis of characters
44 45 46 47 48 49 50 53
6 Syntactical analysis of a recognized plate
6.1 Principle and algorithms 6.1.1 Recognized character and its cost 6.1.2 Synta
ctical patterns 6.1.3 Choosing the right pattern
56 56 56 57 57
7 Tests and final considerations
7.1 7.2 7.3 Choosing the representative set of snapshots Evaluation of a plate n
umber correctness 7.2.1 Binary score 7.2.2 Weighted score Results Summary Append
ix A: Case study Appendix B: Demo recognition software – user’s manual Bibliography
59 59 60 60 61 61 62 63 73 76
v

Chapter 1
Introduction
1.1 ANPR systems as a practical application of artificial intelligence
Massive integration of information technologies into all aspects of modern life
caused demand for processing vehicles as conceptual resources in information sys
tems. Because a standalone information system without any data has no sense, the
re was also a need to transform information about vehicles between the reality a
nd information systems. This can be achieved by a human agent, or by special int
elligent equipment which is be able to recognize vehicles by their number plates
in a real environment and reflect it into conceptual resources. Because of this
, various recognition techniques have been developed and number plate recognitio
n systems are today used in various traffic and security applications, such as p
arking, access and border control, or tracking of stolen cars. In parking, numbe
r plates are used to calculate duration of the parking. When a vehicle enters an
input gate, number plate is automatically recognized and stored in database. Wh
en a vehicle later exits the parking area through an output gate, number plate i
s recognized again and paired with the first-one stored in the database. The dif
ference in time is used to calculate the parking fee. Automatic number plate rec
ognition systems can be used in access control. For example, this technology is
used in many companies to grant access only to vehicles of authorized personnel.
In some countries, ANPR systems installed on country borders automatically dete
ct and monitor border crossings. Each vehicle can be registered in a central dat
abase and compared to a black list of stolen vehicles. In traffic control, vehic
les can be directed to different lanes for a better congestion control in busy u
rban communications during the rush hours.
1.2 Mathematical aspects of number plate recognition systems
In most cases, vehicles are identified by their number plates, which are easily
readable for humans, but not for machines. For machine, a number plate is only a
grey picture defined as a two-dimensional function f ( x, y ) , where x and y a
re spatial coordinates, and f is a light intensity at that point. Because of thi
s, it is necessary to design robust mathematical machinery, which will be able t
o extract semantics from spatial domain of the captured image. These functions a
re implemented in so-called “ANPR systems”, where the acronym “ANPR” stands for an “Automa
tic Number Plate Recognition”. ANPR system means transformation of data between th
e real environment and information systems. The design of ANPR systems is a fiel
d of research in artificial intelligence, machine vision, pattern recognition an
d neural networks. Because of this, the main goal of this thesis is to study alg
orithmic and mathematical principles of automatic number plate recognition syste
ms. Chapter two deals with problematic of number plate area detection. This prob
lematic includes algorithms, which are able to detect a rectangular area of the
number plate in original image. Humans define the number plate in a natural lang
uage as a “small plastic or metal plate attached to a vehicle for official identif
ication purposes”, but machines do not understand this definition. Because of this
, there is a need to find an alternative definition of the number plate based on
descriptors, which will be comprehensible for machines. This is a fundamental p
roblem of machine vision and of this chapter. Chapter three describes principles
of the character segmentation. In most cases, characters are segmented using th
e horizontal projection of a pre-processed number plate, but sometimes these pri
nciples can fail, especially if detected number plates are too warped or skewed.
Then, more sophisticated segmentation algorithms must be used. Chapter four dea
ls with various methods normalization and detection of characters. At first, cha
racter dimensions and brightness must be normalized to ensure invariance towards
a size and light conditions. Then, a feature extraction algorithm must be appli
ed on a character to filter irrelevant data. It is necessary to extract features
, which will be invariant towards character deformations, used font style etc. C
hapter five studies pattern classifiers and neural networks and deals with their
usage in recognition of characters. Characters can be classified and recognized
by the simple nearest neighbor algorithm (1NN) applied to a vector of extracted
features, or there is also possibility to use one of the more sophisticated cla
ssification methods, such as feed-forward or Hopfield neural networks. This chap
ter also presents additional heuristic analyses, which are used for elimination
of non-character elements from the plate. Sometimes, the recognition process may
fail and the detected plate can contain errors. Some of these errors can be det
ected by a syntactical analysis of the recognized plate. If we have a regular ex
pression, or a rule how to evaluate a country-specific license plate, we can rec
onstruct defective plates using this rule. For example, a number zero “0” can be aut
omatically repaired to a character “O” on positions, where numbers are not allowed.
Chapter six deals with this problematic.
1.3 Physical aspects of number plate recognition systems
Automatic number plate recognition system is a special set of hardware and softw
are components that proceeds an input graphical signal like static pictures or v
ideo sequences, and recognizes license plate characters from it. A hardware part
of the ANPR system typically consists of a camera, image processor, camera trig
ger, communication and storage unit. The hardware trigger physically controls a
sensor directly installed in a lane. Whenever the sensor detects a vehicle in a
proper distance of camera, it activates a recognition mechanism. Alternative to
this solution is a software detection of an incoming vehicle, or continual proce
ssing of the sampled video signal. Software detection, or continual video proces
sing may consume more system resources, but it does not need additional hardware
equipment, like the hardware trigger. Image processor recognizes static snapsho
ts captured by the camera, and returns a text representation of the detected lic
ense plate. ANPR units can have own dedicated image processors (all-in-one solut
ion), or they can send captured data to a central processing unit for further pr
ocessing (generic ANPR). The image processor is running on special recognition s
oftware, which is a key part of whole ANPR system. Because one of the fields of
application is a usage on road lanes, it is necessary to use a special camera wi
th the extremely short shutter. Otherwise, quality of captured snapshots will be
degraded by an undesired motion blur effect caused by a movement of the vehicle
. For example, usage of the standard camera with shutter of 1/100 sec to capture
a vehicle with speed of 80 km/h will cause a motion skew in amount of 0.22 m. T
his skew means the significant degradation of recognition abilities. There is al
so a need to ensure system invariance towards the light conditions. Normal camer
a should not be used for capturing snapshots in darkness or night, because it op
erates in a visible light spectrum. Automatic number plate recognition systems a
re often based on cameras operating in an infrared band of the light spectrum. U
sage of the infrared camera in combination with an infrared illumination is bett
er to achieve this goal. Under the illumination, plates that are made from refle
xive material are much more highlighted than rest of the image. This fact makes
detection of license plates much easier.
2

Figure 1.1: (a) Illumination makes detection of reflexive image plates easier. (
b) Long camera shutter and a movement of the vehicle can cause an undesired moti
on blur effect.
1.4 Notations and mathematical symbols
Logic symbols
p⊕q p∧q p∨q ¬p
Exclusive logical disjunction ( p xor q ) Logical conjunction ( p and q ) Logica
l disjunction ( p or q ) Exclusion ( not p )
Mathematical definition of image
f ( x, y )
x and y are spatial coordinates of an image, and f is an intensity of light at t
hat point. This function is always discrete on digital computers. x ∈ ℕ 0 ∧ y ∈ ℕ 0 , wher
e ℕ 0 denotes the set of natural numbers including zero.
The intensity of light at point p . f ( p ) = f ( x, y ) , where p = [ x, y ]
f ( p)
Pixel neighborhoods
ɺɺ p1 N 4 p2 ɺɺ pN p
1 8 2
Pixel p1 is in a four-pixel neighborhood of pixel p2 (and vice versa) Pixel p1 i
s in an eight-pixel neighborhood of pixel p2 (and vice versa)
Convolutions
a ( x) b( x) ɶ a ( x) b( x) Discrete convolution of signals a ( x ) and b ( x )
Discrete periodical convolution of signals a ( x ) and b ( x )
3

Vectors and sets


m [ x, y ]
max A min A mean A median A A x xi
The element in xth column and yth row of matrix m . The maximum value contained
in the set A . The scope of elements can be specified by additional conditions T
he minimum value contained in the set A The mean value of the elements contained
in the set A The median value of the elements contained in the set A The cardin
ality of the set A . (Number of elements contained in the set) Vectors or any ot
her ordered sequences of numbers are printed bold. The elements of vectors are d
enoted as xi , where i is a sequence number (starting with zero), such as i ∈ 0… n − 1
, where n = x is a cardinality of the vector (number of elements) The element a
of the vector x . For example, the vector x can contain elements a , b , c , d
, such as x = ( a, b, c, d ) If there is more than one vector denoted as x , the
y are distinguished by their indexes i . The upper index (i) does not mean the i
th element of vector.
x[a] x
(i )
Intervals
a< x<b x ∈ a… b
x lies in the interval between a and b . This notation is used when x is the spa
tial coordinate in image (discrete as well as continuous) This notation has the
same meaning as the above one, but it is used when x is a discrete sequence numb
er.
Quantificators
∃x ∃!x ∃n x ¬∃x ∀x
There exists at least one x There exists exactly one x There exists exactly n x
There does not exist x For every x
Rounding x x
Number x rounded down to the nearest integer Number x rounded up to the nearest
integer
4

Chapter 2
Principles of number plate area detection
The first step in a process of automatic number plate recognition is a detection
of a number plate area. This problematic includes algorithms that are able to d
etect a rectangular area of the number plate in an original image. Humans define
a number plate in a natural language as a “small plastic or metal plate attached
to a vehicle for official identification purposes”, but machines do not understand
this definition as well as they do not understand what “vehicle”, “road”, or whatever e
lse is. Because of this, there is a need to find an alternative definition of a
number plate based on descriptors that will be comprehensible for machines. Let
us define the number plate as a “rectangular area with increased occurrence of hor
izontal and vertical edges”. The high density of horizontal and vertical edges on
a small area is in many cases caused by contrast characters of a number plate, b
ut not in every case. This process can sometimes detect a wrong area that does n
ot correspond to a number plate. Because of this, we often detect several candid
ates for the plate by this algorithm, and then we choose the best one by a furth
er heuristic analysis. Let an input snapshot be defined by a function f ( x, y )
, where x and y are spatial coordinates, and f is an intensity of light at that
point. This function is always discrete on digital computers, such as x ∈ ℕ 0 ∧ y ∈ ℕ 0 ,
where ℕ 0 denotes the set of natural numbers including zero. We define operations
such as edge detection or rank filtering as mathematical transformations of fun
ction f . The detection of a number plate area consists of a series of convolve
operations. Modified snapshot is then projected into axes x and y . These projec
tions are used to determine an area of a number plate.
2.1 Edge detection and rank filtering
We can use a periodical convolution of the function f with specific types of mat
rices m to detect various types of edges in an image:
ɶ f ′ ( x, y ) = f ( x, y ) m [ x, y ] = ∑∑ f ( x, y ) ⋅ m mod w ( x − i ) , mod h ( y
i =0 j =0
w−1 h −1
where w and h are dimensions of the image represented by the function f
Note: The expression m [ x, y ] represents the element in xth column and yth row
of matrix m .
2.1.1
Convolution matrices
Each image operation (or filter) is defined by a convolution matrix. The convolu
tion matrix defines how the specific pixel is affected by neighboring pixels in
the process of convolution.
5

Individual cells in the matrix represent the neighbors related to the pixel situ
ated in the centre of the matrix. The pixel represented by the cell y in the des
tination image (fig. 2.1) is affected by the pixels x0 … x8 according to the formu
la:
y = x0 × m0 + x1 × m1 + x2 × m2 + x3 × m3 + x4 × m4 + x5 × m5 + x6 × m6 + x7 × m7 + x8 × m8
Figure 2.1: The pixel is affected by its neighbors according to the convolution
matrix.
Horizontal and vertical edge detection
To detect horizontal and vertical edges, we convolve source image with matrices
m he and m ve . The convolution matrices are usually much smaller than the actua
l image. Also, we can use bigger matrices to detect rougher edges.
−1 −1 −1 −1 0 1 m he = 0 0 0 ; m ve = −1 0 1 1 1 1 −1 0 1
Sobel edge detector The Sobel edge detector uses a pair of 3x3 convolution matri
ces. The first is dedicated for evaluation of vertical edges, and the second for
evaluation of horizontal edges.
−1 −2 −1 −1 0 1 G x = 0 0 0 ; G y = −2 0 2 1 2 1 −1 0 1
The magnitude of the affected pixel is then calculated using the formula G = G 2
+ G 2 . In x y praxis, it is faster to calculate only an approximate magnitude
as G = G x + G y .
Horizontal and vertical rank filtering
Horizontally and vertically oriented rank filters are often used to detect clust
ers of high density of bright edges in the area of the number plate. The width o
f the horizontally oriented rank filter matrix is much larger than the height of
the matrix ( w ≫ h ), and vice versa for the vertical rank filter ( w ≪ h ). To pre
serve the global intensity of an image, it is necessary to each pixel be replace
d with an average pixel intensity in the area covered by the rank filter matrix.
In general, the convolution matrix should meet the following condition:
6
∑∑ m [i, j ] = 1.0
hr i =0 j =0
w −1 h −1
where w and h are dimensions of the matrix. The following pictures show the resu
lts of application of the rank and edge detection filters.
Figure 2.2: (a) Original image (b) Horizontal rank filter (c) Vertical rank filt
er (d) Sobel edge detection (e) Horizontal edge detection (f) Vertical edge dete
ction
2.2 Horizontal and vertical image projection
After the series of convolution operations, we can detect an area of the number
plate according to a statistics of the snapshot. There are various methods of st
atistical analysis. One of them is a horizontal and vertical projection of an im
age into the axes x and y . The vertical projection of the image is a graph, whi
ch represents an overall magnitude of the image according to the axis y (see fig
ure 2.3). If we compute the vertical projection of the image after the applicati
on of the vertical edge detection filter, the magnitude of certain point represe
nts the occurrence of vertical edges at that point. Then, the vertical projectio
n of so transformed image can be used for a vertical localization of the number
plate. The horizontal projection represents an overall magnitude of the image ma
pped to the axis x .
7

Figure 2.3: Vertical projection of image to a y axis


Let an input image be defined by a discrete function f ( x, y ) . Then, a vertic
al projection p y of the function f at a point y is a summary of all pixel magni
tudes in the yth row of the input image. Similarly, a horizontal projection at a
point x of that function is a summary of all magnitudes in the xth column. We c
an mathematically define the horizontal and vertical projection as:
p x ( x ) = ∑ f ( x, j ) ;
j =0 h −1
p y ( y ) = ∑ f ( i, y )
i =0
w −1
where w and h are dimensions of the image.
2.3 Double phase statistical image analysis
The statistical image analysis consists of two phases. The first phase covers th
e detection of a wider area of the number plate. This area is then deskewed, and
processed in the second phase of analysis. The output of double phase analysis
is an exact area of the number plate. These two phases are based on the same pri
nciple, but there are differences in coefficients, which are used to determine b
oundaries of clipped areas. The detection of the number plate area consists of a
“band clipping” and a “plate clipping”. The band clipping is an operation, which is use
d to detect and clip the vertical area of the number plate (so called band) by a
nalysis of the vertical projection of the snapshot. The plate clipping is a cons
equent operation, which is used to detect and clip the plate from the band (not
from the whole snapshot) by a horizontal analysis of such band.
Snapshot
Assume the snapshot is represented by a function f ( x, y ) , where x0 ≤ x ≤ x1 and
y0 ≤ y ≤ y1 . The [ x0 , y0 ] represents the upper left corner of the snapshot, and
[ x1 , y1 ] represents the bottom right corner. If w and h are dimensions of the
snapshot, then x0 = 0 , y0 = 0 , x1 = w − 1 and y1 = h − 1 .
8

Band
The band b in the snapshot f is an arbitrary rectangle b = ( xb 0 , yb 0 , xb1 ,
yb1 ) , such as:
( xb 0 = xmin ) ∧ ( xb1 = xmax ) ∧ ( ymin ≤ yb0 < yb1 ∧ ymax )
Plate
Similarly, the plate p in the band b is an arbitrary rectangle p = x p 0 , y p 0
, x p1 , y p1 , such as:
(
)
(x
b0
≤ x p 0 ≤ x p1 ≤ xb1 ∧ y p 0 = yb 0 ∧ y p 0 = yb 0
)(
)(
)
The band can be also defined as a vertical selection of the snapshot, and the pl
ate as a horizontal selection of the band. The figure 2.4 schematically demonstr
ates this concept:
yb 0
yb1
x p0
x p1
Figure 2.4: The double phase plate clipping. Black color represents the first ph
ase of plate clipping, and red color represents the second one. Bands are repres
ented by dashed lines, and plates by solid lines.
2.3.1
Vertical detection – band clipping
The first and second phase of band clipping is based on the same principle. The
band clipping is a vertical selection of the snapshot according to the analysis
of a graph of vertical projection. If r h is the height of the analyzed image, t
he corresponding vertical projection p y ( y ) contains h values, such as y ∈ 0; h
− 1 . The graph of projection may be sometimes too “ragged” for analysis due to a big
statistical r dispersion of values p y ( y ) . There are two approaches how to
solve this problem. We can blur the source snapshot (costly solution), or we can
decrease the statistical dispersion of the ragged projection p r by convolving
its projection with a rank vector: y
r ɶ p y ( y ) = p y ( y ) m hr [ y ]
where m hr is the rank vector (analogous to the horizontal rank matrix in sectio
n 2.1.1). The width of the vector m hr is nine in default configuration. After c
onvolution with the rank vector, the vertical projection of the snapshot in figu
re 2.3 can look like this:
9

py ( y )
100%
yb 0 ybm yb1
0%
y
y0 y1
Figure 2.5: The vertical projection of the snapshot 2.3 after convolution with a
rank vector. The figure contains three detected candidates. Each highlighted ar
ea corresponds to one detected band.
The fundamental problem of analysis is to compute peaks in the graph of vertical
projection. The peaks correspond to the bands with possible candidates for numb
er plates. The maximum value of p y ( y ) corresponding to the axle of band can
be computed as:
ybm = arg max
y0 ≤ y ≤ y1
{ p ( y )}
y
The yb 0 and yb1 are coordinates of band, which can be detected as:
yb 0 = max yb1 = min
y0 ≤ y ≤ ybm
ybm ≤ y ≤ y1
{ y p ( y ) ≤ c ⋅ p ( y )} { y p ( y ) ≤ c ⋅ p ( y )}
y y y bm y y y bm
c y is a constant, which is used to determine the foot of peak ybm . In praxis,
the constant is calibrated to c1 = 0.55 for the first phase of detection, and c2
= 0.42 for the second phase.
Figure 2.6: The band detected by the analysis of vertical projection
This principle is applied iteratively to detect several possible bands. The yb 0
and yb1 coordinates are computed in each step of iterative process. After the d
etection, values of projection p y in interval yb 0 , yb1 are zeroized. This ide
a is illustrated by the following pseudo-code:
let L to be a list of detected candidates for i :=0 to number_of_bands_to_be_det
ected do begin detect save
yb 0 and yb1 by analysis of projection p y yb 0 , yb1
yb 0 and yb1 to a list L
zeroize interval end
The list L of coordinates yb 0 and yb1 will be sorted according to value of peak
( ybm ). The band clipping is followed by an operation, which detects plates in
a band.
10

2.3.2
Horizontal detection – plate clipping
In contrast with the band clipping, there is a difference between the first and
second phase of plate clipping.
First phase
There is a strong analogy in a principle between the band and plate clipping. Th
e plate clipping is based on a horizontal projection of band. At first, the band
must be processed by a vertical detection filter. If w is a width of the band (
or a width of the analyzed image), the r corresponding horizontal projection px
( x ) contains w values:
px ( x ) =
j = yb 0
∑ f ( x, j )
yb1
Please notice that px ( x ) is a projection of the band, not of the whole image.
This can be
r band. Since the horizontal projection px ( x ) may have a big statistical disp
ersion, we decrease
achieved by a summation in interval yb 0 , yb1 , which represents the vertical b
oundaries of the
r ɶ it by convolving with a rank vector ( px ( x ) = px ( x ) mvr [ x ] ). The wid
th of the rank vector is
usually equal to a half of an estimated width of the number plate. Then, the max
imum value corresponding to the plate can be computed as: xbm = arg max { px ( x
)}
x0 ≤ y ≤ x1
The xb 0 and xb1 are coordinates of the plate, which can be then detected as: xb
0 = max xb1 = min
x0 ≤ x ≤ xbm
xbm ≤ x≤ x1
{x p ( x ) ≤ c ⋅ p ( x )} {x p ( x ) ≤ c ⋅ p ( x )}
x x x bm x x x bm
where cx is a constant, which is used to determine the foot of peak xbm . The co
nstant is calibrated to cx = 0.86 for the first phase of detection.
Second phase
In the second phase of detection, the horizontal position of a number plate is d
etected in another way. Due to the skew correction between the first and second
phase of analysis, the wider plate area must be duplicated into a new bitmap. Le
t f n ( x, y ) be a corresponding function of such bitmap. This picture has a ne
w coordinate system, such as [0,0] represents the upper left corner and [ w − 1, h
− 1] the bottom right, where w and h are dimensions of the area. The wider area o
f the number plate after deskewing is illustrated in figure 2.8. In contrast wit
h the first phase of detection, the source plate has not been processed by the v
ertical detection filter. If we assume that plate is white with black borders, w
e can detect that borders as black to white and white to black transitions in th
e plate. The horizontal projection px ( x ) of the image is illustrated in the f
igure 2.7.a. To detect the black to white and white toblack transitions, there i
s a need to compute a derivative p′ ( x ) of the projection px ( x ) . Since x the
projection is not continuous, the derivation step cannot be an infinitely small
number
11

( h ≠ lim x ). If we derive a discrete function, the derivation step h must be an


integral number (for example h = 4 ). Let the derivative of px ( x ) be defined
as:
p x′ ( x ) = p x ( x ) − px ( x − h ) h
x→0
Where h = 4 .
px ( x )
100%
0%
w −1
x
p′ ( x ) x
0
w −1
x
Figure 2.7: (a) The horizontal projection of
px ( x ) . Arrows denote the “BW” and “WB” transitions, which are used to determine the
boundaries of the plate.
px ( x ) of the plate in figure 2.8. (b) The derivative
Figure 2.8: The wider area of the number plate after deskewing.
The left and right boundary of the plate can be determined by an analysis of the
projection p′ ( x ) . The left corner x p 0 is represented by the black to white
transition (positive peak in x figure 2.7.b), and right corner x p1 by the white
to black transition (negative peak in figure 2.7.b):
x p 0 = min x px′ ( x ) ≥ cd ⋅ max px′ ( x )
0≤ x< w 2 0≤ x< w
{
{
}}
x p1 = max x p′ ( x ) ≤ cd ⋅ min { p′ ( x )} x x
w ≤ x<w 2 0≤ x < w
{
}
where cd is a constant used to determine the most left negative and the most rig
ht positive peak. The left and right corners must lie on the opposite halves of
the detected plate according to the w w constraints 0 ≤ x < for x p 0 , and ≤ x < w
for x p1 . 2 2
12

In this phase of the recognition process, it is not possible to select a best ca


ndidate for a number plate. This can be done by a heuristic analysis of characte
rs after the segmentation.
2.4 Heuristic analysis and priority selection of number plate candidates
In general, the captured snapshot can contain several number plate candidates. B
ecause of this, the detection algorithm always clips several bands, and several
plates from each band. There is a predefined value of maximum number of candidat
es, which are detected by analysis of projections. By default, this value is equ
als to nine. There are several heuristics, which are used to determine the cost
of selected candidates according to their properties. These heuristics have been
chosen ad hoc during the practical experimentations. The recognition logic sort
s candidates according to their cost from the most suitable to the least suitabl
e. Then, the most suitable candidate is examined by a deeper heuristic analysis.
The deeper analysis definitely accepts, or rejects the candidate. As there is a
need to analyze individual characters, this type of analysis consumes big amoun
t of processor time. The basic concept of analysis can be illustrated by the fol
lowing steps: 1. 2. 3. 4. 5. Detect possible number plate candidates. Sort them
according to their cost (determined by a basic heuristics). Cut the first plate
from the list with the best cost. Segment and analyze it by a deeper analysis (t
ime consuming). If the deeper analysis refuses the plate, return to the step 3.
2.4.1
Priority selection and basic heuristic analysis of bands
The basic analysis is used to evaluate the cost of candidates, and to sort them
according to this cost. There are several independent heuristics, which can be u
sed to evaluate the cost α i . The heuristics c n be used sep r tely, or they c n
be combined together to compute n over ll cost of c ndid te by  weighted sum:
α = 0.15 ⋅ α1 + 0.25 ⋅ α 2 + 0.4 ⋅ α 3 + 0.4 ⋅ α 4
Heuristics α1 = yb 0 − yb1 Illustr tion Description The height of b nd in pixels. B 
nds with  lower height will be preferred. The “ p y ( ybm ) ” is  m ximum v lue of
pe k of vertic l projection of sn pshot, which corresponds to the processed b n
d. B nds with  higher mount of vertic l edges will be preferred. This heuristi
cs is simil r to the previous one, but it considers not only the v lue of the gr
e test pe k, but  v lue of re  under the gr ph between points yb 0 nd yb1 . T
hese points define  vertic l position of the ev lu ted b nd.
α2 =
1 p y ( ybm )
p y ( ybm )
α3 =
1
y = yb 0
∑ p ( y)
y
yb1
yb 0

yb1
13

α4 =
x p 0 − x p1 yb 0 − yb1
−5
The proportions of the one row number pl tes re simil r in the most countries.
If we ssume th t width/height r tio of the pl te is bout five, we c n comp re
the me sured r tio with the estim ted one to ev lu te the cost of the number pl 
te.
2.4.2
Deeper n lysis
The deeper n lysis determines the v lidity of  c ndid te for the number pl te.
Number pl te c ndid tes must be segmented into the individu l ch r cters to ext
r ct subst nti l fe tures. The list of c ndid tes is iter tively processed until
the first v lid number pl te is found. The c ndid te is considered s  v lid n
umber pl te, if it meets the requirements for v lidity. Assume th t pl te p is s
egmented into sever l ch r cters p0 … pn −1 , where n is  number of ch r cters. Let
wi be  width of ith ch r cter (see figure 2.9. ). Since ll segmented ch r cte
rs h ve roughly uniform width, we c n use  st nd rd devi tion of these v lues 
s  heuristics:
β1 =
1 n
∑ ( w − w)
i i =0
n −1
2

1 n −1 wi . n i=0 If we assume thatthe num er plate consists of dark characters o
n a light ackground, we can use a rightness histogram to determine if the cand
idate meets this condition. Because some country specific plates are negative,
 w
e canusethe histogram to deal with this type of plates (see figure 2.9. ). Let

H ( ) e a rightness histogram, where is a certain rightness value. Let
min and
where w is an arithmetic average of character widths w =

  
max e a value of a darkest and lightest point. Then, H ( ) is a count of pix
els, whose values

are equal to . The plate is negative when the heuristics β 2 is negative:
       
max mid β2 = H ( ) − H ( ) = mid = min


 
where mid is a middle point in the histogram, such as mid =
 
max − min . 2
14

p0
p1
p2
p3
p4
p5
p6
p7
p8
p9
w ( p2 )
 
H ( ) min
Pixel num ers

mid

max

 
Figure 2.9: (a) The num er plate
 must e segmented into individual characters fo
r deeper heuristic analysis. ( ) Brightness
 histogram of the num er plate is use
d to determine the positivity of the num er plate.
2.5 Deskewing mechanism 
The captured rectangular plate can e rotated and skewed in many ways due to the
positioning of vehicle
 towards the camera. Since the skew significantly degrade
s the recognition
 a ilities, it is important to implement additional mechanisms,

which are a le to detect and correct skewed plates. The fundamental pro lem of
this mechanism is to determine an angle,
 under which
 the plate is skewed. Then,
deskewing of so evaluated plate can e realized y a trivial affine transformati
on. It is important to understand the difference
 etween the “sheared” and “rotated” rec
tangular plate. The num er plate is an o ject in three dimensional space, which
is projected
 into the twodimensional snapshot during the capture. The positionin
g of the o ject can sometimes cause the skew of angles and proportions. If the v
ertical line of plate vp is not identical to the vertical line of camera o ject
ivevc , the plate may e sheared. If the vertical lines v p and vc are identica
l, utthe axis a p of plate is not parallel to the axis of camera ac , the plat
e may e rotated. (see figure 2.10)
15

ap ap
ap
vp
vc = v p ac
vc = v p
a p ac ∧ v p = vc
ac
ap
ac ∧ v p = vc
vc ac
a p ac ∧ v p ≠ vc
 
Figure 2.10: (a) Num er plate captured under the right angle ( ) rotated plate (
c) Sheared plate
2.5.1
Detection of skew
Hough transform is a special operation, which is used to extract features of a s
pecific shape within a picture. The classical Hough transform is used for the de
tection oflines. The Hough transform is widely used for miscellaneous purposes
in the pro lematic of machine vision, ut I have used it to detect the skew of c
aptured plate, and also to compute an angle
 of skew. It is important to know, th
at Hough transform does not distinguish etween the concepts such as “rotation” and “s

hear”. The Hough transform can e used only to compute an approximate angle of ima
ge in a two dimensional domain. The mathematical representation
 of line in the o
rthogonal coordinate system is an equation y = a ⋅ x + , where a is a slope and
is a y axis section of so defined line. Then, the line is a set of all points
[ x, y ] , for which this equation is valid. We know that theline contains an i
nfinite num er of points as well as there are an infinite
 num er of different li
nes, which can cross a certain point. The relation etween these two assertions

is a asic idea of the Hough transform. The equation y = a ⋅ x + can e also wri
tten as = − x ⋅ a + y , where x and y are parameters. Then, the equation defines a
set of all lines (a, ) , which can cross the point [ x, y ] . For each point i
n the “XY” coordinate system, there is a line in an “AB” coordinate system (so called “Hou
gh space”)
y

[ x0 , y0 ]
k l
m m

= x0 ⋅ a + y0
l
x
k
a
Figure 2.11: The “XY” and “AB” (“Hough space”) coordinate systems. Each point [ x0 , y0 ] i
the “XY” coordinate system corresponds to one line in the Hough space (red color).
The are several points (marked as k , l , m ) in the Hough space, that correspon
d to the lines in the “XY” coordinate system, which can cross the point. [ x0 , y0 ]
.
16

 
Let f ( x, y ) e a continuous function. For each point [ a, ] in Hough space,
there is a line in the “XY” coordinate system. We compute a magnitude of point [a,
] as a summary of all points in the “XY” space, which lie on the line a ⋅ x + . A
ssume that f ( x, y ) is a discrete function, which represents the snapshot with
definite dimensions ( w  h) . To compute the Hough transform of the function lik
e this, it is necessary to normalize it into a unified coordinate system in the
following way:
x′ =
2⋅ x 2⋅ y − 1 ; y′ = −1 w h

Although the space defined y a unified coordinate system is always discrete (fl
oating point) on digital computers, we will assume
 that it is continuous. Genera
lly, we can define the Hough transform h′ ( a′, ′ ) of a continuous function f ′ ( x′, y ′
) in the unified coordinate system as:

h′ ( a′, ′ ) =
−1

∫ f ′ ( x′, a′ ⋅ x′ + ′)dx′
1
x′

π
2
0
π
2
θ
−∞
0
−∞
a′
y′
b′
b′
Figure 2.12: (a) Number late in the unified “ XY ” coordinate system after alicat
ion of the horizontal edge detection filter (b) Hough transform of the number l
ate in the “ θ B ” coordinate system (c) Colored Hough transform in the “ AB ” coordinate
system.
We use the Hough transform of certain image to evaluate its skew angle. You can
see the colored Hough transform on the figure 2.12.c. The ixels with a relative
ly high value are marked by a red color. Each such ixel corresonds to a long w
hite line in the figure 13.a. If we assume that the angle of such lines determin
es the overall angle, we can find the longest line as:
′′ max ( am , bm ) = arg 0≤ a′≤1{h′ ( a′, b′)}
0 ≤ b′≤1
To comute the angle of such a line, there is a need to transform it back to the
original coordinate system:
[ am , bm ] = w ⋅

′ am − 1 b′ − 1 ,h ⋅ m 2 2
where w and h are dimensions of the evaluated image. Then, the overall angle θ of
image can be comuted as:
θ = arctan ( am )
17

The more sohisticated solution is to determine the angle from a horizontal roj
ection of the Hough transform h′ . This aroach is much better because it covers
all arallel lines together, not only the longest one: ˆ θ = arctan w ⋅

ˆ′ am − 1 ˆ′ ; am = arg −max1{ a′ ( a′ )} 1≤ a ′≤ 2


where a′ ( a′ ) is a horizontal rojection of the Hough sace, such as:
a′ ( a′ ) =

−1
∫ f ′ ( a′, b′)db′
1
2.5.2
Correction of skew
The second ste of a deskewing mechanism is a geometric oeration over an image
f ( x, y ) . As the skew detection based on Hough transform does not distinguish
between the shear and rotation, it is imortant to choose the roer deskewing
o eration. In raxis, lates are sheared in more cases than rotated. To correct

the late sheared by the angle θ , we use the affine transformation to shear it by
the negative angle −θ . For this transformation, we define a transformation matrix
A :
1 A = Sx 0 Sy 1 0 0 1 − tan (θ ) 0 0 = 0 1 0 1 0 0 1
where S x and S y are shear factors. The S x is always zero, because we shear th
e late only in a direction of the Y axis. Let P be a vector reresenting the ce
rtain oint, such as P = [ x, y,1] where x and y are
coordinates of that oint. The new coordinates Ps = [ xs , ys ,1] of that oint
after the shearing can be comuted as:
Ps = P ⋅ A
where A is a corresonding transformation matrix. Let the deskewed number late
be defined by a function f s . The function f s can be comuted in the following
way: f s ( x, y ) = f
([ x, y,1] ⋅ A ⋅ [1,0,0] ,[ x, y,1] ⋅ A ⋅ [0,1,0] )
T T
After the substitution of the transformation matrix A :
1 − tan (θ ) 0 1 f s ( x, y ) = f [ x, y,1] ⋅ 0 1 0 ⋅ 0 , 0 0 1
18

Figure 2.13: (a) Original number late. (b) Number late after deskewing.
19

Chater 3
Princiles of late segmentation
The next ste after the detection of the number late area is a segmentation of
the late. The segmentation is one of the most imortant rocesses in the automa
tic number late recognition, because all further stes rely on it. If the segme
ntation fails, a character can be imroerly divided into two ieces, or two cha
racters can be imroerly merged together. We can use a horizontal rojection of
a number late for the segmentation, or one of the more sohisticated methods,
such as segmentation using the neural networks. If we assume only one row lates
, the segmentation is a rocess of finding horizontal boundaries between charact
ers. Section 3.2 deals with this roblematic. The second hase of the segmentati
on is an enhancement of segments. The segment of a late contains besides the ch
aracter also undesirable elements such as dots and stretches as well as redundan
t sace on the sides of character. There is a need to eliminate these elements a
nd extract only the character. Section 3.3 deals with these roblems.
3.1 Segmentation of late using a horizontal rojection
Since the segmented late is deskewed, we can segment it by detecting saces in
its horizontal rojection. We often aly the adative thresholding filter to en
hance an area of the late before segmentation. The adative thresholding is use
d to searate dark foreground from light background with non uniform illuminatio
n. You can see the number late area after the thresholding in figure 3.1.a. Aft
er the thresholding, we comute a horizontal rojection x ( x ) of the late f
( x, y ) . We use this rojection to determine horizontal boundaries between seg
mented characters. These boundaries corresond to eaks in the grah of the hori
zontal rojection (figure 3.1.b).
y
x
x ( x )

vm va vb
x
Figure 3.1: (a) Number late after alication of the adative thresholding (b)
Horizontal rojection of late with detected eaks. Detected eaks are denoted b
y dotted vertical lines.
20

The goal of the segmentation algorithm is to find eaks, which corresond to the
saces between characters. At first, there is a need to define several imortan
t values in a grah of the horizontal rojection x ( x ) :

vm The maximum value contained in the horizontal rojection x ( x ) , such as
vm = max { x ( x )} , where w is a width of the late in ixels.
0≤ x < w
• •
1 w−1 x ( x ) w x =0 vb This value is used as a base for evaluation of eak hei
ght. The base value is always calculated as vb = 2 ⋅ va − vm . The va must lie on ve
rtical axis between the values vb and vm . va The average value of horizontal
rojection x ( x ) , such as va =


The algorithm of segmentation iteratively finds the maximum eak in the grah of
vertical rojection. The eak is treated as a sace between characters, if it m
eets some additional conditions, such as height of eak. The algorithm then zero
izes the eak and iteratively reeats this rocess until no further sace is fou
nd. This rincile can be illustrated by the following stes: 1. Determine the i
ndex of the maximum value of horizontal rojection: xm = arg max { x ( x )}
0≤ x < w
2. Detect the left and right foot of the eak as:
xl = max x x ( x ) ≤ cx ⋅  x ( xm )
0 ≤ x ≤ xm
xr =
xm ≤ x < w
{ min { x 
x
( x ) ≤ cx ⋅ x ( xm
} )}
3. Zeroize the horizontal rojection x ( x ) on interval xl , xr 4. If x ( xm
) < cw ⋅ vm , go to ste 7. 5. Divide the late horizontally in the oint xm . 6.
Go to ste 1. 7. End. Two different constants have been used in the algorithm ab
ove. The constant cx is used to determine foots of eak xm . The otimal value o
f cx is 0.7. The constant cw determines the minimum height of the eak related t
o the maximum value of the rojection ( vm ). If the height of the eak is below
this minimum, the eak will not be considered as a sace between characters. It
is imortant to choose a value of constant cw carefully. An inadeuate small va
lue causes that too many eaks will be treated as saces, and characters will be
imroerly divided. A big value of cw causes that not all regular eaks will be
treated as saces, and characters will be imroerly merged together. The otim
al value of cw is 0.86. To ensure a roer behavior of the algorithm, constants
cx and cw should meet the following condition: ∀ ( xl , xm , xr ) ∈ P : cw ⋅ vm > x (
xl ) ∧ x ( xr ) where P is a set of all detected eaks xm with corresonding foo
ts xl and xr .
21

3.2 Extraction of characters from horizontal segments


The segment of late contains besides the character also redundant sace and oth
er undesirable elements. We understand under the term “segment” the art of a number
late determined by a horizontal segmentation algorithm. Since the segment has
been rocessed by an adative thresholding filter, it contains only black and wh
ite ixels. The neighboring ixels are groued together into larger ieces, and
one of them is a character. Our goal is to divide the segment into the several 
ieces, and kee only one iece reresenting the regular character. This concet
is illustrated in figure 3.2.
Horizontal segment Piece 1 Piece 2 Piece 3
Piece 4
Figure 3.2: Horizontal segment of the number late contains several grous (iec
es) of neighboring ixels.
3.2.1
Piece extraction
such as [ 0,0] is an uer left corner of the segment, and [ w − 1, h − 1] is a bott
om right corner, where w and h are dimensions of the segment. The value of f ( x
, y ) is “1” for the black ixels, and “0” for the white sace. The iece Ρ is a set of al
l neighboring ixels [ x, y ] , which reresents a continuous element. The ixel
[ x, y ] belongs to the iece Ρ if there is at least one ixel [ x′, y′] from the Ρ , s
uch as
Let the segment be defined by a discrete function f ( x, y ) in the relative coo
rdinate system,
[ x, y ]
and [ x′, y′] are neighbors:
ɺɺ [ x, y ] ∈ Ρ ⇇ ∃[ x′, y′] ∈ Ρ : [ x, y ] N 4 [ x′, y′] ɺɺ The notation aN 4 b means a bi
s a neighbor of b in a four-pixel neighborhood”: ɺɺ [ x, y ] N 4 [ x′, y′] ⇇
Algorithm
x − x′ = 1 ⊕ y − y ′ = 1
The goal of the iece extraction algorithm is to find and extract ieces from a
segment of the late. This algorithm is based on a similar rincile as a common
ly known “seed fill” algorithm.
22

• • •

Let iece Ρ be a set of (neighboring) ixels [ x, y ] Let S be a set of all ieces
Ρ from a rocessed segment defined by the function f ( x, y ) . Let X be a set of
all black ixels: X = [ x, y ] f ( x, y ) = 1
Let A be an auxiliary set of ixels
{
}
Princile of the algorithm is illustrated by the following seudo code:
let set let set
S =0 /
X = [ x, y ] f ( x, y ) = 1 ∧ [ 0,0] ≤ [ x, y ] < [ w, h ]
{
}
while set X is not emty do begin let set Ρ = 0 / let set A = 0 / ull one ixel f
rom set X and insert it into set while set A is not emty do begin let
A
ull ixel if

[ x, y ] from a set A f ( x, y ) = 1 ∧ [ x, y ] ∉ A ∧ [ 0,0] ≤ [ x, y ] < [ w, h ]


[ x, y ]
be a certain ixel from
A
then
begin
ull ixel insert end end add end

[ x, y ] from set A and insert it into set Ρ ixels [ x − 1, y ] , [ x + 1, y ] , [


x, y − 1] , [ x, y + 1] into set
A
Ρ to set S
Note 1: The oeration “ull one ixel from a set” is non deterministic, because a se
t is an unordered grou of elements. In real imlementation, a set will be imle
mented as an ordered list, and the oeration “ull one ixel from a set” will be im
lemented as “ull the first ixel from a list” Note 2: The mathematical conclusion [
xmin , ymin ] < [ x, y ] < [ xmax , ymax ] means “The ixel [ x, y ]
lies in a rectangle ixels [ xmin , ymin ] and [ xmax , ymax ] ”. More
 defined by   
formally: [ x, y ] [ x′, y′] ⇇ x x′ ∧ y y′ where is a one of the binary relations: ‘ < ’,
and ’ = ’.
3.2.2
Heuristic analysis of ieces
The iece is a set of ixels in the local coordinate system of the segment. The
segment usually contains several ieces. One of them reresents the character an
d others reresent redundant elements, which should be eliminated. The goal of t
he heuristic analysis is to find a iece, which reresents character. Let us la
ce the iece Ρ into an imaginary rectangle ( x0 , y0 , x1 , y1 ) , where [ x0 , y0
] is an uer left corner, and [ x1 , y1 ] is a bottom right corner of the iec
e:
23

x1 = max { x [ x, y ] ∈ Ρ}
x0 = min { x [ x, y ] ∈ Ρ}
y1 = max { y [ x, y ] ∈ Ρ}
y0 = min { y [ x, y ] ∈ Ρ}
The dimensions and area of the imaginary rectangle are defined as w = x0 − x1 , h
= y0 − y1 and S = w ⋅ h . Cardinality of the set Ρ reresents the number of black ixe
ls nb . The number of white ixels nw can be then comuted as nw = S − nb = w ⋅ h − Ρ .
The overall magnitude M of a iece is a ratio between the number of black ixels
nb and the area S of an imaginary rectangle M = nb / S . In raxis, we use the
number of white ixels nw as a heuristics. Pieces with a higher value of nw will
be referred. The iece chosen by the heuristics is then converted to a monochr
ome bitma image. Each such image corresonds to one horizontal segment.
 These i
mages are considered as an outut of the segmentation hase of the ANP rocess
(see figure 3.3)
Figure    
 3.3: The in ut (a) and out ut (b) exam le of the segmentation hase of th
e ANP recognition rocess.
24

Chater 4
Feature extraction and normalization of characters
To recognize a character from a bitma reresentation, there is a need to extrac
t feature descritors of such 
 bitma . As an extraction method significantly affe
cts the uality of whole OC rocess, it is very imortant to extract features,

which will be invariant towards the various light conditions, used font tye and
deformations of characters caused by a skew of the image. The first ste is a n
ormalization of a brightness and contrast of rocessed image segments. The chara
cters contained in the image segments must be then resized to uniform dimensions
(second ste). After that, the feature extraction algorithm extracts aroriat
e descritors from the normalized characters (third ste). This chater deals wi
th various methods used in the rocess of normalization.
4.1 Normalization of brightness and contrast
The brightness and contrast characteristics of segmented characters are varying
due to different light conditions during the cature. Because of this, it is nec
essary to normalize them. There are many different ways, but this section descri
bes the three most used: histogram normalization, global and adative thresholdi
ng. Through the histogram normalization, the intensities of character segments a
re redistributed on the histogram to obtain the normalized statistics. Techniue
s of the global and adative thresholding are used to obtain monochrome reresen
tations of rocessed character segments. The monochrome (or black & white) rere
sentation of image is more aroriate for analysis, because it defines clear bo
undaries of contained characters.
4.1.1
Histogram normalization
The histogram normalization is a method used to re distribute intensities on the
histogram of the character segments. The areas of lower contrast will gain a hi
gher contrast without affecting the global characteristic of image. Consider a g
rayscale image defined by a discrete function f ( x, y ) . Let I be a total numb
er of gray levels in the image (for examle I = 256 ). We use a histogram to det
ermine the number of occurrences of each gray level i , i ∈ 0… I − 1 : H (i ) =
{[ x, y ] 0 ≤ x < w ∧ 0 ≤ y < h ∧ f ( x, y ) = i}
1 w−1 h −1 f ( x, y ) w ⋅ h x =0 y =0
The minimum, maximum and average value contained in the histogram is defined as:
H min = min { f ( x, y )} ; H max = max { f ( x, y )} ; H avg =
0≤ x < w 0≤ y < h 0≤ x < w 0≤ y < h
∑∑
25

where the values H min , H max and H avg are in the following relation: 0 ≤ H min ≤
H avg ≤ H max ≤ I − 1 The goal of the histogram normalization is to obtain an image wi
th normalized statistical I characteristics, such as H min = 0 , H max = I − 1 , H
avg = . To meet this goal, we construct a 2 transformation function g ( i ) as
a Lagrange olynomial with interolation oints
[ x1 , y1 ] = [ H min ,0] , [ x2 , y2 ] = H avg ,

I and [ x3 , y3 ] = [ H max , I − 1] : 2
3 i − xk g (i ) = y j j =1 k =1 x j − xk k≠ j
∑∏
3

This transformation function can be exlicitly written as: g ( i ) = y1 ⋅ i − x2 i − x


3 i − x1 i − x3 i − x1 i − x2 ⋅ + y2 ⋅ ⋅ + y3 ⋅ ⋅ x1 − x2 x1 − x3 x2 − x1 x2 − x3 x3 − x1 x
After substitution of concrete oints, and concrete number of gray levels I = 25
6 :
g ( i ) = +128 ⋅
i − H avg i − H max i − H min i − H min ⋅ + 255 ⋅ ⋅ H avg − H min H avg − H max H max − H m
H avg
g (i )
I −1
I 2
i
0
H min
H avg
H max
brightness before transformation
Figure 4.1: We use the Lagrange interolating olynomial as a transformation fun
ction to normalize the brightness and contrast of characters.
The Lagrange interolating olynomial as a transformation function is a costly s
olution. It is like harvesting one otato by a tractor. In raxis, there is more
useful to construct the transformation using a simle linear function that sre
ads the interval H min , H max into the unified interval 0, I − 1 :
26

g (i ) =
i − H min ( I − 1) H max − H min
The normalization of image is roceeded by the transformation function in the fo
llowing way:
f n ( x, y ) = g ( f ( x, y ) )
4.1.2
Global Thresholding
The global thresholding is an oeration, when a continuous gray scale of an imag
e is reduced into monochrome black & white colors according to the global thresh
old value. Let 0,1 be a gray scale of such image. If a value of a certain ixel
is above the threshold t , the new value of the ixel will be zero. Otherwise, t
he new value will be one for ixels with values above the threshold t . Let v be
an original value of the ixel, such as v ∈ 0,1 . The new value v′ is comuted as:
0 if v′ = 1 if
v ∈ 0, t ) v ∈ t ,1
The threshold value t can be obtained by using a heuristic aroach, based on a
visual insection of the histogram. We use the following algorithm to determine
the value of t automatically: 2. The threshold t divides the ixels into the two
different sets: Sa = [ x, y ] f ( x, y ) < t , and Sb = [ x, y ] f ( x, y ) ≥ t .
3. Comute the average gray level values µa and µb for the ixels in sets Sa and Sb
as: 1 1 µa = f ( x , y ) ; µb = f ( x, y ) S a [ x , y ]∈Sa Sb [ x , y ]∈Sb 1. Select a
n initial estimate for threshold t (for examle t = 0.5 )
{
}
{
}


4. Comute a new threshold value t =

1 ( µ a + µb ) 2 5. eeat stes 2, 3, 4 until the difference △t in successive iterati
ons is smaller than redefined recision t 
Since the threshold t is global for a whole image, the global thresholding can s
ometimes fail. Figure 4.2.a shows a artially shadowed number late. If we comu
te the threshold t using the algorithm above, all ixels in a shadowed art will
be below this threshold and all other ixels will be above this threshold. This
causes an undesired result illustrated in figure 4.2.b.
27

H (b)
t
Pixel numbers
b
B A C
Figure 4.2: (a) The artially shadowed number late. (b) The number late after
thresholding. (c) The threshold value t determined by an analysis of the histogr
am.
4.1.3
Adative thresholding
The number late can be sometimes artially shadowed or nonuniformly illuminated
. This is most freuent reason why the global thresholding fail. The adative th
resholding solves several disadvantages of the global thresholding, because it c
omutes threshold value for each ixel searately using its local neighborhood.
Chow and Kaneko aroach
There are two aroaches to finding the threshold. The first is the Chow and Kan
eko aroach, and the second is a local thresholding. The both methods assumes t
hat smaller rectangular regions are more likely to have aroximately uniform il
lumination, more suitable for thresholding. The image is divided into uniform re
ctangular areas with size of m  n ixels. The local histogram is comuted for eac
h such area and a local threshold is determined. The threshold of concrete oint
is then comuted by interolating the results of the subimages.
1 ?
2
3
4
5
6
Figure 4.3: The number late (from figure 4.2) rocessed by the Chow and Kaneko
aroach of the adative thresholding. The number late is divided into the seve
ral areas, each with own histogram and threshold value. The threshold value of a
concrete ixel (denoted by ) is comuted by interolating the results of the su
bimages (reresented by ixels 1 6).
Local thresholding
The second way of finding the local threshold of ixel is a statistical examinat
ion of neighboring ixels. Let [ x, y ] be a ixel, for which we comute the loc
al threshold t . For
28

simlicity we condider a suare neighborhood with width 2 ⋅ r + 1 , where


and [ x + r , y + r ] are corners of such suare. There are severals aroaches
of comuting the value of threshold:
[ x − r, y + r ] , [ x + r, y − r ]
• • •
[ x − r, y − r ] ,
Mean of the neighborhood : t ( x, y ) = mean
Median of the neighborhood : t ( x, y ) = median { f ( i, j )}
x − r ≤i ≤ x + r y −r ≤ j≤ y + r
x − r ≤i ≤ x + r y −r ≤ j≤ y + r
{ f ( i, j )}
Mean of the minimum and maximum value of the heighborhood: 1 t ( x, y ) = min { f
( i, j )} + max { f ( i, j )} x − r ≤i ≤ x + r 2 x − r ≤ij≤≤xy++rr y −r ≤ j ≤ y + r
The new value f ′ ( x, y ) of ixel [ x, y ] is then comutes as: 0 if f ′ ( x, y ) =
1 if f ( x, y ) ∈ 0, t ( x, y ) ) f ( x, y ) ∈ 0, t ( x, y )
4.2 Normalization of dimensions and resamling
Before extracting feature descritors from a bitma reresentation of a characte
r, it is necessary to normalize it into unified dimensions. We understand under
the term “resamling” the rocess of changing dimensions of the character. As origin
al dimensions of unnormalized characters are usually higher than the normalized
ones, the characters are in most cases downsamled. When we downsamle, we reduc
e information contained in the rocessed image. There are several methods of res
amling, such as the ixel resize, bilinear interolation or the weighted averag
e resamling. We cannot determine which method is the best in general, because t
he successfulness of articular method deends on many factors. For examle, usa
ge of the weighed average downsamling in combination with a detection of charac
ter edges is not a good solution, because this tye of downsamling does not re
serve shar edges (discussed later). Because of this, the roblematic of charact
er resamling is closely associated with the roblematic of feature extraction.
We will assume that m  n are dimensions of the original image, and m′  n′ are dimensio
ns of the image after resamling. The horizontal and vertical asect ratio is de
fined as rx = m′ / m and ry = n′ / n , resectively.
4.2.1
Nearest neighbor downsamling
The rincile of the nearest neighbor downsaming is a icking the nearest ixel
in the original image that corresonds to a rocessed ixel in the image after
resamling. Let f ( x, y ) be a discrete function defining the original image, s
uch as 0 ≤ x < m and 0 ≤ y < n . Then, the function f ′ ( x′, y ′ ) of the image after res
amling is defined as: x′ y ′ f ′ ( x′, y ′ ) = f , rx ry
29

where 0 ≤ x′ < m′ and 0 ≤ y′ < n′ . If the asect ratio is lower than one, then each ixel
n the resamled (destination) image corresonds to a grou of ixels in the orig
inal image, but only one value from the grou of source ixels affects the value
of the ixel in the resamled image. This fact causes a significant reduction o
f information contained in original image (see figure 4.5).
Figure 4.4: One ixel in the resamled image corresonds to a grou of ixels in
the original image
Although the nearest neighbor downsaming significantly reduces information cont
ained in the original image by ignoring a big amount of ixels, it reserves sha
r edges and the strong biolarity of black and white ixels. Because of this, t
he nearest neighbor downsaming is suitable in combination with the “edge detectio
n” feature extraction method described in section 4.3.2.
4.2.2
Weighed average downsamling
In contrast with the nearest neighbor method, the weighted average downsaming c
onsiders all ixels from a corresonding grou of ixels in the original image.
Let rx and ry be a horizontal and vertical asect ratio of the resamled image.
The value of the ixel [ x′, y ′] in the destination image is comuted as a mean of
source ixels in the range
[ xmin , ymin ] to [ xmax , ymax ] :
f ′ ( x′, y ′ ) = where:
( xmax
max 1 − xmin ) ⋅ ( ymax − ymin ) i = xmin
∑ ∑ f ( i, j )
j = ymin
x
ymax
y′ y′ + 1 x′ x′ + 1 xmin = ; ymin = ; xmax = ; ymax = ry
30

The weighted average method of downsamling does not reserve shar edges of the
image (in contrast with the revious method). You can see the visual comarison
of these two methods in Figure 4.5.
n′
n
n′
n
m′
m
m′
m
Figure 4.5: (a) Nearest neighbor resamling significantly reduces information co
ntained in the original image, but it reserves shar edges. (b) Weighted averag
e resamling gives a better visual result, but the edges of the result are not s
har.
4.3 Feature extraction
Information contained in a bitma reresentation of an image is not suitable for
rocessing by comuters. Because of this, there is need to describe a character
in another way. The descrition of the character should be invariant towards th
e used font tye, or deformations caused by a skew. In addition, all instances o
f the same character should have a similar descrition. A descrition of the cha
racter is a vector of numeral values, so called “descritors”, or “atterns”:
x = ( x0 ,…, xn−1 )
Generally, the descrition of an image region is based on its internal and exter
nal reresentation. The internal reresentation of an image is based on its regi
onal roerties, such as color or texture. The external reresentation is chosen
when the rimary focus is on shae characteristics. The descrition of normaliz
ed characters is based on its external characteristics because we deal only with
roerties such as character shae. Then, the vector of descritors includes ch
aracteristics such as number of lines, bays, lakes, the amount of horizontal, ve
rtical and diagonal or diagonal edges, and etc. The feature extraction is a roc
ess of transformation of data from a bitma reresentation into a form of descri
tors, which are more suitable for comuters. If we associate similar instances
of the same character into the classes, then the descritors of characters from
the same class should be geometrically closed to each other in the vector sace.
This is a basic assumtion for successfulness of the attern recognition roces
s. This section deals with various methods of feature extraction, and exlains w
hich method is the most suitable for a secific tye of character bitma. For ex
amle, the “edge detection” method should not be used in combination with a blurred
bitma.
4.3.1
Pixel matrix
The simlest way to extract descritors from a bitma image is to assign a brigh
tness of each ixel with a corresonding value in the vector of descritors. The
n, the length of such vector is eual to a suare ( w ⋅ h ) of the transformed bit
ma:
31

i xi = f , mod w ( i ) w
where i ∈ 0,… , w ⋅ h − 1 . Bigger bitmas roduce extremely long vector of descritors,
which is not suitable for recognition. Because of this, size of such rocessed
bitma is very limited. In addition, this method does not consider geometrical c
loseness of ixels, as well as its neighboring relations. Two slightly biased in
stances of the same character in many cases roduce very different descrition v
ectors. Even though, this method is suitable if the character bitmas are too bl
urry or too small for edge detection.
251, 181, 068, 041, 032, 071, 197, 196, 014, 132, 213, 187, 043, 041, 174, 011, 20
0, 254, 254, 232, 164, 202, 014, 012, 128, 242, 255, 255, x = 253, 212, 089, 005
, 064, 196, 253, 255, 255, 251, 196, 030, 009, 165, 127, 162, 251, 254, 197, 009,
105, 062, 005, 100, 144, 097, 006, 170, 207, 083, 032, 051, 053, 134, 250
h
w
Figure 4.6: The “ixel matrix” feature extraction method
4.3.2
Detection of character edges
In contrast with the revious method, the detection of character edges does not
consider absolute ositioning of each ixel, but only a number of occurrences of
individual edge tyes in a secific region of the character bitma. Because of
this, the resulting vector is invariant towards the intra regional dislacement
of the edges, and towards small deformations of characters.
Bitma regions
Let the bitma be described by a discrete function f ( x, y ) , where w and h ar
e dimensions, such as 0 ≤ x < w and 0 ≤ y < h . We divide it into six eual regions
organized to three rows and two columns in the following way:
(i ) (i ) (i ) (i ) Let xmin , ymin and xmax , ymax be an uer left and bottom

right oint of a rectangle, which determinates the region ri , such as:
• • • •

w h (0) (0) (0) (0) egion r0 : xmin = 0 , ymin = 0 , xmax = − 1 , ymax = − 1
) (1) egion r1 : xmin = , ymin = 0 , xmax = w − 1 , ymax = − 1 2 3 h w 2⋅
on r2 : xmin = 0 , ymin = , xmax = − 1 , ymax = −1 3 2 3 w h 2⋅h (3
= , ymin = , xmax = w − 1 , ymax = −1 2 3 3
32


2⋅h w (4) (4) (4) (4) egion r4 : xmin = 0 , ymin = , xmax = 2 , ymax = h − 1
(5) • egion r5 : xmin = , ymin = , xmax = w − 1 , ymax = h − 1 2 3 There are
s how to distribute regions in the character bitma. The regions can be disjunct
ive as well as they can overla each other. The figure 4.7 shows the several os
sible layouts of regions.

Figure 4.7: Layouts of regions in the character bitma. The regions can be disju
nctive as well as they can overla each other.
Edge tyes in region
Let us define an edge of the character as a 2x2 white to black transition in a b
itma. According to this definition, the bitma image can contain fourteen diffe
rent edge tyes illustrated in figure 4.8.
Figure 4.8: The rocessed bitma can contain different tyes of 2x2 edges.
The statistics of occurrence of each edge tye causes uselessly long vector of d
escritors. Because of this, the “similar” tyes of edges are considered as the same
. The following lists shows how the edges can be groued together: 1. 2. 3. 4. 5
. 6. 7. 8.
0 + 1 (vertical edges) 2 + 3 (horizontal edges) 4 + 6 + 9 (“/” tye diagonal edges)
5 + 7 + 8 (“\” tye diagonal edges) 10 (bottom right corner) 11 (bottom left corner)
12 (to right corner) 13 (to left corner)
For simlicity, assume thatedgetyes are not groued together. Let η be
 a number
of different edge types, w ere i is a 2x2 matrix t at corresponds to t e speci
fictype of edge: 
1 0 = 1 0 7 = 0
   
0 0 1 1 , 1 = 0 1 , 2 = 0 0 1 0 0 0 , 8 = 1 0 , 9 = 0
 
1 0 , 3 = 1 0 0 0 , 10 = 1 1
 
0 1 , 4 = 0 1 1 1 , 11 = 1 1
 
0 0 , 5 = 1 1 0 1 , 12 = 0 1
 
1 1 , 6 = 0 0 1 1 , 13 = 1 1
0 0 1 0
33

(i ) (i ) (i ) Let ρ be a numbe  of ectangula  egions in the cha acte  bitmap, w


he e xmin , ymin , xmax and (i ) ymax a e bounda ies of the
 egion i ( i ∈ 0… ρ − 1 ).
If the statistics conside  η different edge types for eac of ρ egions, the length
of the esulting vecto  x is computed as η ⋅ ρ :
x = ( x0 , x1 ,… , xη ⋅ρ −1 ) Featu e ext action algo ithm
At fi st, we have to embed the cha acte  bitmap f ( x, y ) into a bigge  bitmap
with white padding to ensu e a p ope  behavio  of the featu e ext action algo it
hm. Let the padding be one pixel wide. Then, dimensions of the embedding bitmap
will be w + 2 and h + 2 . The embedding bitmap f ′ ( x, y ) is then defined as: if
1 f ′ ( x, y ) = f ( x − 1, y − 1) if ¬ ( x = 0 ∨ y = 0 ∨ x = w + 1 ∨ y = h + 1) x =
= w +1∨ y = h +1
whe e w and h a e dimensions of cha acte  bitmap befo e embedding. Colo  of the
padding is white (value of 1). The coo dinates of pixels a e shifted one pixel t
owa ds the o iginal position. The st uctu e of vecto  of output desc ipto s is i
llust ated by the patte n below. The notation h j @ i means “numbe  occu ences o
f an edge ep esented by the mat ix h j in  the egion
 i ”.  
x = h 0 @ 0 , h1 @ 0 ,… , hη −1 @ r0 , 0 @ r1 , 1 @ r1 ,… , η −1 @ r1 , 0 @ rρ −1 ,
ρ −1 ,… , hη −1 @ rρ −1
egion 0 egion 1 egion ρ −1

(
)

Wecompute the position k of the h j @ i in the  vecto  x as k = i ⋅η + j , w ere η is
t e number
 of different edge
 types (and also
 t e number of corresponding
 matric
es). T e following algorit m demonstrates t e computation of t e vector of descr
iptors x : 
x for eac region ri , w ere i ∈ 0,… , ρ − 1 do
ze oize vecto  begin fo  each pixel begin fo  each mat ix begin if
[ x, y ]
in egion
(i ) (i ) (i ) (i ) i ,whe e xmin ≤ x ≤ xmax and ymin ≤ y ≤ ymax do
h j , whe e j ∈ 0,… ,η − 1 do

f ′ ( x, y ) j = f ′ ( x, y + 1)

f ′ ( x + 1, y ) t en f ′ ( x + 1, y + 1)
begin let end end end end
k = i ⋅η + j let x k = x k + 1
34

4.3.3
Skeletonization and structural analysis
   
T e feature
 extraction tec niques discussed  in t e previous two c apters are bas
ed on t e statistical image processing.
 T ese met ods do not consider structural
aspects of analyzed images. T e small difference
 in bitmaps sometimes means a b
ig
 difference in t e structure of
 contained c aracters. For example, digits ‘6’ and ‘8’
ave very
 similar bitmaps, but t ere is a substantial
  difference
 in t eir struct
ures.T e structural analysis is based on ig er concepts t an t e edge detectio
n met od. It does not deal wit terms suc as “pixels” or “edges”, butit considers more
complex structures (like  junctions,
 line ends
 or loops). To analyzet ese  struc
tures,
 we
 must involve t e
  t inning algorit m to get a skeleton of t e c aracter

. T is c apter deals wit t e principle of skeletonization as well as wit t e p
rinciple
 of structural analysis of skeletonized image.
T e concept of skeletonization
    
T e skeletonization is a reduction of t e structurals ape into a grap  . T is re
duction is accomplis
  ed by obtaininga skeletonof t e region via t e skeletoniz
ation algorit m. T e skeleton of  a s ape is mat ematically defined as a medial a
xis transformation.
 To define t e medial axis transformation and skeletonization
algorit m, we must introduce some elementary prerequisite terms. ɺɺ ɺɺ Let N be a binar
y relation between two pixels [ x, y ] and [ x′, y′] , such as aNb means “ a is a neig
hbor of b ”. This relation is defined as:
ɺɺ [ x, y ] N8 [ x′, y′] ⇇ ɺɺ [ x, y ] N 4 [ x′, y′] ⇇
x − x′ = 1 ∨ y − y ′ = 1 for eight ixel neighbourhood x − x′ = 1 ⊕ y − y ′ = 1 for four i
bourhood
The border B of character is a set of boundary ixels. The ixel [ x, y ] is a b
oundary ixel, if it is black and if it has at least one white neighbor in the e
ight ixel neighborhood:
ɺɺ [ x, y ] ∈ B ⇇ f ( x, y ) = 0 ∧ ∃[ x′, y′] : f ( x, y ) = 1 ∧ [ x, y ] N8 [ x′, y′]
The inner region I of character is a set of black pixels, which are not boundary
pixels:
[ x, y ] ∈ I ⇇ f ( x, y ) = 0 ∧ [ x, y ] ∉ B
Figure 4.9: (a) Illustration of the four ixel and eight ixel neighborhood. (b)
The set of boundary and inner ixels of character.
The iece Ρ is then a union of all boundary and inner ixels ( Ρ = B ∪ I ). Since ther
e is only one continuous grou of black ixels, all black ixels belong to the 
iece Ρ . The rincile and the related terminology of the skeletonization are simi
lar to the iece extraction algorithm discussed in section 3.2.1.
35

Medial axis transformation


The medial axis transformation of the iece Ρ defined as follows. For each inner 
ixel  ∈ I , we find the closest boundary ixel b ∈ B . If a ixel  has more than
one such neighbor, it is said to belong to the medial axis (or skeleton) of the Ρ
. The concet of the closest boundary ixel deends on the definition of the Euc
lidean distance between two ixels in the orthogonal coordinate system. Mathemat
ically, the medial axis (or skeleton) S is a subset of the Ρ defined as:  ∈ S ⇇ ∃p1∃p2 :
p1 ∈ B ∧ p2 ∈ B ∧ d ( p, p1 ) = d ( p, p2 ) = min {d ( p, p′ )}
p ′∈B
The pixel p belongs to the medial axis S if there exists at least two pixels p1
and p2 , such as Euclidean distance between pixels p and p1 is equal to the dist
ance between pixels p and p2 , and these pixels are closest boundary pixels to p
ixel p . The Euclidean distance between two pixels p1 = [ x1 , y1 ] and p2 = [ x
2 , y2 ] is defined as: d ( p1 , p2 ) = 2 ( x1 − x2 ) ⋅ ( x1 − x2 )
2 2
Skeletonization algo ithm
Di ect implementation of the mathematical definition of the medial axis t ansfo 
mation is computationally expensive, because it involves calculating the distanc
e f om eve y inne  pixel f om the set I to eve y pixel on the bounda y B . The m
edial axis t ansfo mation is intuitively defined by a so called “fi e f ont” concept
. Conside  that a fi e is lit along the bo de . All fi e f onts will advance int
o the inne  of cha acte  at the same speed. The skeleton of a cha acte  is then
a set of pixels eached by mo e than one fi e f ont at the same time. The skelet
onization (o  thinning) algo ithm is based on the “fi e f ont” concept. The thinning
is a mo phological ope ation, which p ese ves end pixels and does not b eak con
nectivity. Assume that pixels of the piece a e black (value of ze o), and backg 
ound pixels a e white (value of one). The thinning is an ite ative p ocess of tw
o successive steps applied to bounda y pixels of a piece. With efe ence to the
eight pixel neighbo hood notation in figu e 4.9, the fi st step flags a bounda y
pixel p fo  deletion if each of the following conditions is satisfied: • At least
one of the top, ight and bottom neighbo  of the pixel p must be white (the pix
el p is white just when it does not belong to the piece Ρ ). pt ∉ Ρ ∨ p  ∉ Ρ ∨ pb ∉ Ρ • At
e of the left, ight and bottom neighbo  of pixel p must be white. pl ∉ Ρ ∨ p  ∉ Ρ ∨ pb ∉
pixel p must have at least two, and at most six black neighbo s f om the piece Ρ
. This condition p events the algo ithm f om e asing end points and f om b eakin
g the connectivity.
36

ɺɺ 2 ≤ p ′ pN 8 p ′ ∧ p ′ ∉ Ρ ≤ 6
• The numbe  of white to black t ansitions in the o de ed sequence p t , p t  , p
 , p b  , p b , p bl , p l , p tl , p t must be equal to one.

{
}
( v( p
v p t ∈ Ρ ∧ p t  ∉ Ρ + v pt  ∈ Ρ ∧ p  ∉ Ρ + v p  ∈ Ρ ∧ p b  ∉ Ρ + v pb  ∈ Ρ ∧ p b ∉ Ρ
b
∈ Ρ ∧ p bl
)( ∉ Ρ) + v ( p
x ¬x
bl
)( ∈ Ρ ∧ p ∉ Ρ) + v ( p ∈ Ρ ∧ p
l l
tl
)( ∉ Ρ) + v ( p
tl
∈ Ρ ∧ pt ∉ Ρ = 1
)
)
0 if v ( x) = 1 if
The fi st step flags pixel p fo  deletion, if its neighbo hood meets the conditi
ons above. Howeve , the pixel is not deleted until all othe  pixels have been p 
ocessed. If at least one of the conditions is not satisfied, the value of pixel
p is not changed. Afte  step one has been applied to all bounda y pixels, the fl
agged pixels a e definitely deleted in the second step. Eve y ite ation of these
two steps thins the p ocessed cha acte . This ite ative p ocess is applied unti
l no fu the  pixels a e ma ked fo  deletion. The esult of thinning algo ithm is
a skeleton (o  medial axis) of the p ocessed cha acte . • • Let the piece Ρ be a set
of all black pixels contained in skeletonized cha acte . Let B be a set of all b
ounda y pixels.
The following pseudo code demonst ates the thinning algo ithm mo e fo mally. Thi
s algo ithm p oceeds the medial axis t ansfo mation ove  a piece Ρ .
do // ite ative thinning p ocess let continue = false let B = 0 / fo  each pixel
p in piece
Ρ do // c eate a set of bounda y pixels ɺɺ p′ then // if the pixel p has at least one wh
ite neighbor if ∃p′ : p′ ∉ Ρ ∧ pN8 inse t pixel p into set B // but keep it also in Ρ
fo  each pixel p in set B do // 1.step of the ite ation begin // if at least one
condition is violated, skip this pixel if if if
( ) then continue ¬ ( p ∉ Ρ ∨ p ∉ Ρ ∨ p ∉ Ρ ) then continue ɺɺ ¬ ( 2 ≤ { p′ pN p′ ∧ p′ ∉ Ρ}
¬ pt ∉ Ρ ∨ p  ∉ Ρ ∨ pb ∉ Ρ
l  b
8
( v( p
if
v p t ∈ Ρ ∧ p t ∉ Ρ + v p t ∈ Ρ ∧ p  ∉ Ρ + v p  ∈ Ρ ∧ p b ∉ Ρ + v p b ∈ Ρ ∧ p b ∉ Ρ
b
∈ Ρ ∧ p bl
)( ∉ Ρ) + v ( p
bl
)( ∈ Ρ ∧ p ∉ Ρ) + v ( p ∈ Ρ ∧ p
l l
tl
)( ∉ Ρ) + v ( p
tl
∈ Ρ ∧ pt ∉ Ρ ≠ 1
)
)
then begin continue end // all tests passed flag point p fo  deletion let contin
ue = t ue
37

end fo  each pixel p in set if p is flagged then pull point while continue = t u
e
B do // 2.step of the ite ation
p f om piece Ρ
Note: The pixel p belongs to the piece Ρ when it is black: p ∈ Ρ ⇇ f (  ) = 0
Figure 4.10: (a) The character bitma before skeletonization. (b) The thinning a
lgorithm iteratively deletes boundary ixels. Pixels deleted in the first iterat
ion are marked by a light gray color. Pixels deleted in the second and third ite
ration are marked by dark gray. (c) The result of the thinning algorithm is a sk
eleton (or a medial axis).
Structural analysis of skeletonized character
The structural analysis is a feature extraction method that considers more coml
ex structures than ixels. The basic idea is that the substantial difference bet
ween two comared characters cannot be evaluated by the statistical analysis. Be
cause of this, the structural analysis extracts features, which describe not ix
els or edges, but the more comlex structures, such as junctions, line ends and
loos.
Junction
The junction is a oint, which has at least three black neighbors in the eight 
ixel neighborhood. We consider only two tyes of junctions: the junction of thre
e and four lines. The number of junctions in the skeletonized iece Ρ is mathemati
cally defined as:
ɺɺ n3 = p ∃3 p′ : { p, p′} ⊆ Ρ ∧ pN8 p′ j n3 j
Line end
4 8
{ } ɺɺ = { p ∃ p′ : { p, p′} ⊆ Ρ ∧ pN p′}
The line end is a point, which has exactly one neighbo  in the eight pixel neigh
bo hood. The numbe  of line ends in a skeletonized piece Ρ is defined as: ɺɺ ne = p ∃! p
1 : { p, p1} ⊆ Ρ ∧ pN8 p1
{
}
The following algo ithm can be used to detect the numbe  of junctions and numbe 
line ends in a skeletonized piece Ρ :
38
let let
nj = 0 ne = 0 Ρ do
fo  each pixel p in piece begin let neighbo s = 0 fo  each pixel if
p′ in neighbo hood
{p , p
t
t
, p  , p b  , p b , p bl , p l , p tl
}
do
p′ ∈ Ρ then let neighbo s = neighbo s+1
neighbo s = 1 then let ne = ne +1 else if neighbo s ≥ 3 then let n j = n j +1
if end
Figu e 4.11: (a, b) The junction is a pixel, which as at least th ee neighbo s i
n eightpixel neighbo hood. (c) The line end is a pixel, which has only one neigh
bo  in eightpixel neighbo hood (d) The loop is a g oup of pixels, which encloses
the continuous white space.
Loops
It is not easy to dete mine the numbe  of loops nl in the skeletonized cha acte 
. The algo ithm is based on the following p inciple. At fi st, we must negate th
e bitmap of the skeletonized cha acte . Black pixels will be conside ed as backg
ound and white pixels as fo eg ound. The numbe  of loops in the image is equal
to a numbe  of lakes, which a e su ounded by these loops. Since the lake is a c
ontinuous g oup of white pixels in the positive image, we apply the piece ext ac
tion algo ithm on the negative image to dete mine the numbe  of black pieces. Th
en, the numbe  of loops is equal to the numbe  of black pieces minus one, becaus
e one piece ep esents the backg ound of the o iginal image (negated to the fo e
g ound). Anothe  way is to use a se ies of mo phological e osions.
Figu e 4.12: (a) We dete mine the numbe  of lakes in skeleton by applying the pi
ece ext action algo ithm on negative image. The negative image (b) contains th e
e pieces. Since the piece 3 is a backg ound, only two pieces a e conside ed as l
akes. (c)(d) The simila  skeletons of the same cha acte  can diffe  in the numbe
 of junctions

39

Since we do not know the numbe  of edges of the skeleton, we cannot use the stan
da d cyclomatic equation know f om the g aph theo y. In addition, two simila  sk
eletons of the same cha acte  can sometimes diffe  in a numbe  of junctions (see
figu e 4.12). Because of this, it is not ecommended to use const aints based o
n the numbe  of junctions.
St uctu al const aints
To imp ove the ecognition p ocess, we can assume st uctu al const aints in the
table 4.1. The syntactical analysis can be combined by othe  methods desc ibed i
n p evious chapte s, such as edge detection method o  pixel mat ix. The simplest
way is to use one global neu al netwo k that etu ns seve al candidates and the
n select the best candidate that meets the st uctu al const aints (figu e 4.13.a
). Mo e sophisticated solution is to use the st uctu al const aints fo  adaptive
selection of localneu al netwo ks (figu e 4.13.b). Line ends 0 1 2 3 4
BDO08 PQ69 ACGIJLMN SUVWZ123457 EFTY HKX
Loops 
CEFGHIJKLMNSTUVWXYZ123457 ADOPQ 09 B8
Junctions 
CDGIJLMNOSUVWZ012357 EFKPQTXY469 ABH 8
Table 4.1: St uctu al const aints of cha acte s.
A C B
Figu e 4.13: (a, b) St uctu al const aints can be applied befo e and afte  the 
ecognition by the neu al netwo k. (c) Example of the skeletonized alphabet.
Featu e ext action
In case we know the position of st uctu al elements, we can fo m a vecto  of des
c ipto s di ectly f om this info mation. Assume that the e a e seve al line ends
, loops, and junctions in the
40

image. The position of loop is defined by its cent e. To fo m the vecto , we mus
t conve t ectangula  coo dinates of the element into pola  coo dinates [  ,θ ] (
see figure 4.14): 2⋅ x − w 2⋅ y − h y′ r = x′2 + y′2 ; θ = atg ; x′ = ; y′ = w h
rmalized rectangular coordinates. The length and the structure of resulting vect
or vary according to a number and tye of structural elements contained in the c
haracter. Since the structural constraints divide characters into the several cl
asses, there are several ossible tyes of descrition vector. Each tye of vect
or corresonds to one class of character. For examle, consider character with t
wo line ends and one junction. This constraint determines the following class of
ossible characters: (G, I, J, L, M, N, S, U, V, W, Z, 1, 2, 3, 5, 7). We defin
e a vector of descritors to distinguish between these characters as follows:
x = ( r1 ,θ1 , r2 ,θ 2 , r3 ,θ3 )
line end line end junction 2 1
w
−1
0
1
−1
[ x, y ]
h
[ r ,θ ]
r
θ
0
1
Figure 4.14: (a) The skeleton of the character contains several structural eleme
nts, such as junctions, loos and line ends. (b, c) Each element can be osition
ed in the rectangular or olar coordinate system.
41

Chater 5

ecognition of characters
The revious chater deals with various methods of feature extraction. The goal
of these methods is to obtain a vector of descritors (so called attern), which
comrehensively describes the character contained in a rocessed bitma. The go
al of this chater is to introduce attern recognition techniues, such as neura
l networks, which are able to classify the atterns into the aroriate classes
.
5.1 General classification roblem
The general classification roblem is formulated using the maing between eleme
nts in two sets. Let A be a set of all ossible combinations of descritors, and
B be a set of all classes. The classification means the rojection of grou of
similar element from the set A into a common class reresented by one element in
the set B . Thus, one element in the set B corresonds to one class. Usually th
e grou of distinguishable instances of the same character corresonds to the on
e class, but sometimes one class reresents two mutually indistinguishable chara
cters, such as “0” and “O”. Let F be a hyothetic function that assign each element from
the set A to an element from the set B : F :A→ B ˆ x = F (x) where x ∈ A is a descri
tion vector (attern) which describes the structure of classified ˆ character and
x ∈ B is a classifier, which reresents the semantics of such character . The func
tion F is the robably best theoretical classificator, but its construction is i
mossible since we cannot deal with each combination of descritors. In raxis,
we construct attern classifier by using only a limited subset of the A → B main
gs. This subset is known as a “training set”, such as At ⊂ A and Bt ⊂ B . Our goal is to
construct an aroximation ɶ F ( x, w ) of the hypothetic function F , where w is
a parameter that affects the quality of the approximation:
ɶ F ( w ) : At → Bt ˆɶ x = F ( x, w )
ˆ ˆ where x ∈ At ⊂ A , x ∈ Bt ⊂ B . Formally we can say that F ( w ) is a restriction of th
projection ˆ F over a set At ⊂ A . We assume that for each xi ∈ At we know the desire
d value xi ∈ Bt : ˆ ˆ ˆ ˆ x 0 → x 0 , x1 → x1 , x 2 → x 2 ,… , x n −1 → x n −1
42

F
A At
x
ɶ F (w)
ˆ x
B Bt
Figure 5.1: The projection between sets A and B . The F is a hypothetic function
that maps every ˆ possible combination of input pattern x ∈ A to a corresponding cl
ass x ∈ B . This projection is
ɶ approximated by a function F ( w ) , which maps input patterns from training set
At into the
corresponding classes from the set Bt
The problem is to find an optimal value (or values) of a parameter w . The w is
typically a vector (or matrix) of syntactical weights in a neural network. Accor
ding to this parameter, the ɶ values of the function F ( x, w ) should be as close
st as possible to the values of F ( x ) for input patterns x from the training s
et At . We define an error function to evaluate worthiness of the parameter w :
E (w) = 1 m −1 ɶ ˆ F ( x i , w ) − xi 2 i =0
∑(
)
2
whe e m is a numbe  of patte ns x 0 … x m−1 in the t aining set At . Let w + to be a
n optimal ɶ value of the parameter w , such as w = arg min { E ( w )} . Then, the
approximation F ( x, w ) of
+
original function F ( x ) for patterns x from the training set At . In addition,
this approximation ˆ is able to predict the output classifier x for unknown patte
rn x from the “test” set Ax ( Ax = A − At ). The function with such p ediction ability
pa tially substitutes the hypothetic ɶ classificator F ( x ) . Since the function
F ( x, w ) is only a model, we use a feed-forward neural network for its implem
entation.
ɶ the function F ( x ) is considered as adapted. The adapted approximation F ( x,
w + ) simulates
w∈W
+
5.2 Biological neuron and its mathematical models
For a better understanding of artificial neural network architecture, there is a
need to explain the structure and functionality of a biological neuron. The hum
an brain is a neural network of about ten billions interconnected neurons. Each
neuron is a cell that uses a biochemical reaction to process and transmit inform
ation. The neural cell has a body of size about several micrometers and thousand
s of input connections called “dendrites”. It also has one output connection called “a
xon”, which can be several meters long. The data flow in the biological neural net
work is represented by electrical signal, which propagates along the axon. When
the signal reaches a synaptic connection between the axon and a consecutive dend
rite, it relieves molecules of chemical agent (called mediators or neuro-transmi
tters) into such dendrite. This action causes a local change of polarity of a de
ndrite transmission membrane. The difference in the polarity of the transmission
membrane activates a dendrite-somatic potential wave, which advances in a syste
m of branched dendrites into the body of neuron.
43
Figure 5.2: The biological neuron
The biological neural network contains two types of synaptic connections. The fi
rst is an excitive connection, which amplifies the passing signal. The second (i
nhibitive) connection suppresses the signal. The behavior of the connection is r
epresented by its “weight”. The neural network contains mechanism which is able to a
lter the weights of connections. Because of this, the system of synaptic weights
is a realization of human memory. As the weights are continually altered, the o
ld information is being forgotten little by little.
axon dendrite terminal button of axon
neuro-transmitters
Cell body
terminal buttons
dendrite
Figure 5.3: (a) Schematic illustration of the neural cell (b) The synaptic conne
ction between a dendrite and terminal button of the axon
Since the problematic of the biological neuron is very difficult, the scientists
proposed several mathematical models, such as McCulloch-Pitts binary threshold
neuron, or the percepton.
5.2.1
McCulloch-Pitts binary threshold neuron
The McCulloch-Pitts binary threshold neuron was the first model proposed by McCu
lloch and Pitts in 1943. The neuron has only two possible output values (0 or 1)
and only two types of the synaptic weights: the fully excitative and the fully
inhibitive. The excitative weight (1) does not affect the input, but the inhibit
ive one negates it (-1).
44

The weighted inputs are counted together and processed by a neuron as follows:
J −1 y = g wi , j ⋅ x j − ϑi j =0 0 if ξ < 0 g (ξ ) = 1 if ξ ≥ 0

This type of neuron can perform logical functions such as AND, OR, or NOT. In ad
dition, McCulloch and Pitts proved that synchronous array of such neurons is abl
e to realize arbitrary computational function, similarly as a Turing machine. Si
nce the biological neurons have not binary response (but continuous), this model
of neuron is not suitable for its appro imation.
5.2.2
Percepton
Another model of neuron is a percepton. It has been proved that McCulloch-Pitts
networks with modified synaptic connections can be trained for the recognition a
nd classification. The training is based on a modification of a neuron weights,
according to the reaction of such neuron as follows. If the neuron is not active
and it should be, we increase the weights. If the neuron is active and it shoul
d not be, we decrease them. This principle was been used in a first model of the
neural classifier called ADALINE (adaptive linear neuron). The major problem of
such networks is that they are not able to solve linearly nonseparable problems
. This problem has been solved when the scientists Rumelhart, Hilton and William
s proposed the error back-propagation method of learning for multilayered percep
ton networks. The simple McCulloch-Pitts binary threshold neurons have been repl
aced by neurons with continuous saturation input/output function. Percepton has
multiple analogous inputs and one analogous output. Let o … j −1 be inputs with c
o esponding weights wi ,0 … wi , j −1 . The weighted inputs a e counted, th esholde
d and satu ated togethe  in the following way: J −1 y = g wi , j ⋅ x j − ϑi j =0
= 1 + e −ξ

where g (ξ ) is a sigmoid saturation function (see figure 5.4.b) and ϑi is a thresho
ld value. Sometimes, the threshold is implemented as a dedicated input with a co
nstant weight of -1 (see figure 5.4.a). Then, the function of a neuron can be si
mplified to y = g ( ⋅ w ) , where is a vector of inputs (including the thresho
ld value), and w is a vector of weights (including the constant weight -1).
45

1 1 + e −ξ
ϑ 0 1 2

−1


g
y
ξ
Figure 5.4: (a) The summation Σ and gain (saturation) g function of the percepton
with a threshold implemented as a dedicated input. (b) The sigmoid saturation fu
nction.
5.3 Feed-forward neural network
Formally, the neural network is defined as an oriented graph G = ( N , E ) , whe
re N is a nonempty set of neurons, and E is a set of oriented connections betwee
n neurons. The connection e ( n, n′ ) ∈ E is a binary relation between two neurons n
and n′ . The set of all neurons N is composed of disjunctive sets N 0 , N1 , N 2
, where N i is a set of all neurons from the ith layer. N = N 0 ∪ N1 ∪ N 2 The jth w
eight of a ith neuron in a kth layer is denoted as wi(,kj) and the threshold of
ith neuron in a kth layer is denoted as ϑi( k ) . Numbers of neurons for the input
(0), hidden (1) and output (2) layer are denoted as m , n , o , such as m = N 0
, n = N1 and o = N 2 . The number of neurons in the input layer ( m ) is equal
to a length of an input pattern x in order that each value of the pattern is ded
icated to one neuron. Neurons in the input layer do not perform any computation
function, but they only distribute values of an input pattern to neurons in the
hidden layer. Because of this, the input layer neuron has only one input directl
y mapped into multiple outputs. Because of this, the threshold value ϑi(0) of the
input layer neuron is equal to zero, and the weights of inputs wi(0) are equal t
o one. ,0 The number of neurons in the hidden layer ( n ) is scalable, but it af
fects the recognition abilities of a neural network at a whole. Too few neurons
in the hidden layer causes that the neural network would not be able to learn ne
w patterns. Too many neurons cause network to be overlearned, so it will not be
able to generalize unknown patterns as well. The information in a feed-forward n
eural network is propagated from lower layers to upper layers by one-way connect
ions. There are connections only between adjacent layers, thus feedforward neura
l network does not contain feedback connections, or connections between arbitrar
y two layers. In addition, there exist no connections between neurons from the s
ame layer.
46

y0
y1
yo −1
ϑ0(2)
Data flow
ϑ1(2)
z0
z1
ϑo(2) −1
zn −1
w(2) i , j
ϑ0(1)
ϑ
(1) 1
ϑ
(1) n −1
w(1) i , j
ϑ0(0)
x0
ϑ1(0)
x1
(0) ϑm −1
xm −1
Figu e 5.5: A chitectu e of the th ee laye  feed fo wa d neu al netwo k.
5.4 Adaptation mechanism of feed fo wa d neu al netwo k
The e has been p oven that a multilaye ed neu al netwo k composed of pe ceptons
with a sigmoid satu ation function can solve an a bit a y non linea  p oblem. Ma
thematically, fo  each function F : ℝ m → ℝ o the e exists a multilaye ed feed fo wa d
neu al netwo k that is able to ealize this function. The p oof is based on the
Kolmogo ov’s theo em, which tells that eve y continuously g owing function f defi
ned on inte val 0,1 f ( x0 … xm −1 ) =
2⋅m i =0 m
can be w itten as:
∑α ∑φ ( x )
i
m −1
j =0
i, j
j
where α i re properly chosen continuous unctions with one p r meter. The problem
is how to construct the neur l network corresponding to  given non line r unc
tion. At irst, we choose  proper topology o the network. The number o neuron
s in the input nd output l yer is given by lengths o the input nd output p tt
erns, while the number o neurons in the hidden l yer is sc l ble. An d pt tion
o the neur l network me ns inding the optim l p r meter w + o the ɶ approximat
ion function F ( x, w ) discussed in section 5.1. Let us define two error functi
ons to evaluate a worthiness of the parameter w : Et = 1 2
ɶ ɶ ∑( F (x , w) − x )
i i i At 2
; Ex =
1 2
ɶ ɶ ∑(F (x ,w) − x )
i i i
Ax
2
where subscript “t” me ns “tr in”, nd “x” me ns “test”. The Et is n error unction de ine
p tterns rom the tr ining set, nd E x or p tterns rom the test set. The res
ponse o the neur l ɶ network to an input pattern x is given as y = F ( x , w ) .
i i
The error function Et goes down as a number of neurons in the hidden layer grows
. This relation is valid also between the function Et and a number of iterative
steps of the adaptation process. These relations can be mathematically described
as follows:
47

n →∞
lim Et = 0 ; lim Et = 0
k →∞
where n is the number of neurons in the input layer and k is the number of itera
tion steps of the adaptation process. The error function E x does not have a lim
it at zero as n and k goes to infinity. Because of this, there exists an optimal
number of neurons and optimal number of iteration steps, in which the function
E x has a minimum (see figure 5.6).
E
Ex
Et
n or k
Figure 5.6: Dependency of error functions Et and E x on the number of neurons in
input layer ( n ) and the number of iteration steps ( k ).
For simplicity, we will assume only a feed-forward neural network with one layer
of hidden neurons defined in section 5.3. All neurons in adjacent layers are co
nnected by oriented connections. There are no feedback connections, or connectio
ns between neurons within a single layer. The activities of hidden and output ne
urons are defined as:
m −1 n −1 zi = g wi(1) ⋅ x j − ϑi(1) ; yi = g wi(2) ⋅ z j − ϑi(2) ,j ,j j


ctivities o neurons in the hidden l yer
ctivities o neurons in the output l yer

where g (ξ ) is a sigmoid saturation function (see figure 5.4.b).


5.4.1
Active phase
Evaluation of the activities of hidden and output neurons is performed in so-cal
led “active phase”. The active phase consists of two steps in three-layer neural net
works. The first step is an evaluation of activities zi in the hidden layer, and
the second step is an evaluation of activities yi . Since the evaluation of act
ivities is performed from bottom layers to top ones, the term “feed-forward” refers
to this principle. The active phase corresponds to an appro imation ɶ F ( x, w ) o
f function F ( x ) , and it is performed every time when there is a need to clas
sify the input pattern x . The following pseudo-code demonstrates the active pha
se of feed-forward neural network. The notation is the same as in figure 5.5.
48

procedure activePhase (input:


w , // vector of thresholds and weights x ; // input pattern to be classified ou
tput: z , // vector of activities of neurons in hidden layer y // vector of acti
vities of neurons in output layer (neural
network response)
) begin // first step: evaluate activities of neurons in the hidden layer for ea
ch neuron in hidden layer with index begin let
i ∈ 0,… , n − 1 do
ξ = w ϑi(1)
j ∈ 0,…, m − 1 do
let
or e ch input with index
let end
zi = g (ξ )
ξ = ξ + w wi(1) ⋅ j ,j
// second step: evaluate activities of neurons in the output layer for each neur
on in output layer with inde begin let
i ∈ 0,…, o − 1 do
ξ = w ϑi(2)
j ∈ 0,… , n − 1 do
let
or e ch input with index
let end end
yi = g (ξ )
ξ = ξ + w wi(2) ⋅ z j ,j
5.4.2
Partial derivatives and gradient of error function
The goal of the training phase is to find optimal values of thresholds and weigh
ts to minimize the error function Et . Adaptation phase is an iterative process
in which a response y to an input ˆ pattern is compared with the desired respons
e . The difference between the obtained and desired response is used for a cor
rection of weights. The weights are iteratively altered until the value of the e
rror function Et is negligible.
Gradient of error function related to a single pattern
We compute a gradient g of an error function related to a single pattern with
desired and ˆ obtained responses y and . The gradient g is computed in direction
from upper layers to lower layers as follows: At first, we compute components o
f the gradient related to thresholds ϑi(2) in the output layer as
∂E ˆ = ( yi − xi ) ⋅ (1 − yi ) ⋅ yi ∂ϑi(2)
49

Then, we compute components o the gr dient rel ted to thresholds ϑi(1) in the hid
den l yer. ∂E These components re computed using the components rom the previous
step s ∂ϑi(2) ollows:
o −1 ∂E ∂E (2) = zi ⋅ (1 − yi ) ⋅ w (1) (2) i , j ∂ϑi j = 0 ∂ϑi

Simil rly, we compute components o the gr dient rel ted to weights wi(2) nd wi
(1) in the ,j ,j ollowing w y:
∂E ∂E ∂E ∂E = (2) ⋅ z j ; = (1) ⋅ x j (2) (1) ∂wi , j ∂ϑi ∂wi , j ∂ϑi
The gr dient g is  vector o components is given s ollows:
∂E ∂E ∂E ∂E ∂E ∂E ∂E ∂E g = (1) , , (1) , (2) , , (2) , (1) , , (1) , (2) , , (2)
0
Over ll gr dient
The over ll gr dient is de ined s  summ ry o gr dients rel ted to individu l
p tterns o the ˆ tr ining set At . Let g x / x be  gr dient rel ted to  tr inin
g p ir x / x . The over ll gr dient is ˆ computed s
∑g
ˆ x/x
At
ˆ x/x
.
5.4.3
Ad pt tion ph se
The d pt tion ph se is n iter tive process o inding optim l v lues o weight
nd thresholds, or which  v lue o the error unction Et is in  loc l minimu
m. The igure 5.7 schem tic lly illustr tes  gr ph o the unction Et so c ll
ed “error l ndsc pe”. Gener lly, the error l ndsc pe is w + 1 dimension l, where w i
s  c rdin lity o the vector o thresholds nd weights, such s:
(1) (1) (1) (2) (2) w = ϑ0(1) , ,ϑn −1 ,ϑ0(2) , ,ϑo(2) , w0,0 , , wn −1, m −1 , w0,0 , ,
(
)
50

Et
w
Figure 5.7: The numeric ppro ch o inding the glob l minimum in the error l nd
sc pe.
The vector o optim l thresholds nd weights w + is represented by  glob l mini
mum in the error l ndsc pe. Since we c nnot compute this minimum n lytic lly, w
e h ve to use  numeric ppro ch. There re v rious numeric optimiz tion lgorit
hms, such s Newton’s method, or the gr dient descent. We use the gr dient descent
lgorithm to ind the glob l minimum in the error l ndsc pe. The single step o
the iter tive lgorithm c n looks like ollows:
k +1 ( l ) i
ϑ
= kϑi(l ) − λ
∂E ∂ϑ
k k ( ) i
+ µ ⋅ kϑi( ) −
(
k −1 ( ) i
ϑ
) )
k +1
wi(, j) = k wi(, j) − λ
∂E +µ⋅ ∂ wi(, j)
(
k
wi(, j) −
k −1
wi(, j)
where k wi(, j) is a weight of the connection between the ith neuron in th ay
er and jth neuron in 1th ayer computed in a kth step of iterative process. The
speed of convergence is represented by a parameter λ . Too sma va ue of the par
ameter λ causes excessive y s ow convergence. Too big va ue of the λ breaks the mono
tony of convergence. The µ is a momentum va ue, which prevents the a gorithm of ge
tting stuck in oca minimums.
∂E ∂E Note: The notation g ( ) means “the component of the vector (or gradient). T
) ∂ϑi ∂E is a partia derivative of error function E by the thresho d va ue ϑi( ) .
imi ar y, the ∂ϑi( ) ∂E is a partia derivative of function E by the va ue of weight
wi(, j) . ∂wi(, j)
The who e adaptation a gorithm of feed forward neura network can be i ustrated
by the fo owing pseudo code.
51

procedure adaptation (input:


λ, µ,
ˆ At , // training set of patterns x / x
// speed of convergence // momentum va ue
ε
output: )
kmax , // maximum numb r of it rations
, // pr cision of adaptation proc ss
w + , // v ctor of optimal w ights and thr sholds
b gin initializ w ights and thr sholds in l t l t
w to random valu s w pr v = w // w hav n’t a pr vious valu of w at th b ginning
whil k ≤ kmax ∧ E > ε do b gin // comput ov rall gradi nt z roiz ov rall gradi nt f
or ach pair b gin
k =0, E =∞
g
ˆ x / x of At do
ˆ g x / x for training pair x / x ˆ z roiz gradi nt g x / x ˆ activ Phas ( w , x , z
, y )// comput activiti s z , y
// comput gradi nt for ach thr shold for ach thr shold
ϑi(2) ϑi(1)
do do
∂E ˆ g x / x (2) = ( yi − xi ) ⋅ (1 − yi ) ⋅ yi ˆ ∂ϑi
∂E g x / x (1) = zi ⋅ (1 − yi ) ⋅ ˆ ∂ϑi
for ach w ight
wi(2) ,j
∂E (2) (2) ⋅ w wi , j j =0 ∂ϑi ∂E ∂E do g x / x = g x / x (
∑g
o −1
ˆ x/x
for ach w ight
∂E ∂E wi(1) do g x / x (1) = g x / x (1) ⋅ x j ˆ ˆ ,j ∂wi , j ∂ϑi
l t g = g + g x / x ˆ nd // alt r valu s of thr sholds and w ights according to t
h gradi nt for ach thr shold l t
g
ϑi(l )
in
w + do
∂E w n xt ϑi(l ) = w ϑi(l ) − λ ⋅ g ( ) + µ ⋅ w ϑi( ) − w prev ϑi( )
do
et

(
) )
∂E w next wi(, j) = w wi(, j) − λ ⋅ g ( ) + µ ⋅ w wi(, j) − w prev
ev = w , w = w next
(
end et end
w+ = w
52

5.5 Heuristic ana ysis of characters


The segmentation a gotithm described in chapter three can sometimes detect redun
dant e ements, which do not correspond to proper characters. The shape of these
e ements after norma ization is often simi ar to the shape of characters. Becaus
e of this, these e ements are not re iab y separab e by traditiona OCR methods,
a though they vary in size as we as in contrast, brightness or hue. Since the
feature extraction methods described in chapter four do not consider these prop
erties, there is a need to use additiona heuristic ana yses to fi ter non chara
cter e ements. The ana ysis expects a e ements to have simi ar properties. E e
ments with considerab y different properties are treated as inva id and exc uded
from the recognition process. The ana ysis consists of two phases. The first ph
ase dea s with statistics of brighness and contrast of segmented characters. Cha
racters are then norma ized and processed by the piece extraction a gorithm. Sin
ce the piece extraction and norma ization of brightness disturbs statistica pro
perties of segmented characters, it is necessary to proceed the first phase of a
na ysis before the app ication of the piece extraction a gorithm. In addition, t
he heights of detected segments are same for a characters. Because of this, th
ere is a need to proceed the ana ysis of dimensions after app ication of the pie
ce extraction a gorithm. The piece extraction a gorithm strips off white padding
, which surrounds the character. Respecting the constraints above, the sequence
of steps can be assemb ed as fo ows: 1. 2. 3. 4. Segment the p ate (resu t is i
n figure 5.8.a). Ana yse the brightness and contrast of segments and exc ude fau
ty ones. App y the piece extraction a gorithm on segments (resu t is in figure
5.8.b). Ana yse the dimensions of segments and exc ude fau ty ones.
Figure 5.8: Character segments before (a) and after (b) app ication of the piece
extraction a gorithm. This a gorithm disturbs statistica properties of brightn
ess and contrast.
If we assume that there are not big differences in brightness and contrast of se
gments, we can exc ude the segments, which considerab y differs from the mean. L
et ith segment of p ate be defined by a discrete function f i ( x, y ) , where w
i and hi are dimensions of the e ement. We define the fo owing statistica prop
erties of an e ement: The g oba brightness of such segment is defined as a mean
of brightnesses of individua pixe s:
( pbi ) =
∑∑ f ( x, y )
x = 0 y =0
wi
hi
53

The g oba contrast of the ith segment is defined as a standard deviation of bri
ghtnesses of individua pixe s:
( pci ) =
∑∑ ( p
x =0 y =0
wi
hi
(i ) b
− f ( x, y )
)
2
wi ⋅ hi
The function f ( x, y ) represents on y an intensity of graysca e images, but th
e additiona heuristic ana ysis of co ors can be invo ved to improve the recogni
tion process. This ana ysis separates character and non character e ements on co
or basis. If the captured snapshot is represented by a HSV co or mode , we can
direct y compute the g oba hue and saturation of the segments as a mean of hue
and saturation of individua pixe s:
( phi ) =
∑∑
x = 0 y =0
wi
hi
( h ( x , y ) ; ps i ) =
∑∑ s ( x, y )
x = 0 y =0
wi
hi
where h ( x, y ) and s ( x, y ) is a hue and saturation of the certain pixe in
the HSV co or mode . If the captured snapshot is represented by a RGB co or mode
, there is need to transform it to the HSV mode first. To determine the va idi
ty of the e ement, we compute an average va ue of a chosen property over a e e
ments. For examp e, the average brightness is computed as pb =
∑p
i =0
n −1
(i ) b
, where n is
( a number of e ements. The e ement i is considered as va id, if its g oba brig
htness pbi ) does not differ more than 16 % from the average brightness pb . The
thresho d va ues of individua properties have been ca ibrated as fo ows: ( pb
i ) − pb < 0.16 pb ( phi ) − ph < 0.145 ph ( pci ) − pc < 0.1 pc ( ps i ) − p s < 0.24 p
s
brightness (BRI) hue (HUE) Height (HEI)
Contrast (CON) Saturation (SAT) width/height ratio (WHR)
hi − h < 0.2 h
0.1 <
wi < 0.92 hi
If the segment vio ates at east one of the constraints above, it is considered
as inva id and exc uded from the recognition process. The tab e 5.1 contains pro
perties of e ements from figure 5.8. According to this tab e, e ements 0 and 10
have been refused due to an uncommon width/height ratio, and e ements 1 and 4 du
e to a sma height.
54

i 0 1 2 3 4 5 6 7 8 9 10
BRI 0.247 0.034 0.002 0.084 0.001 0.117 0.063 0.025 0.019 0.019 0.062
CON 0.038 0.096 0.018 0.012 0.003 0.016 0.016 0.011 0.025 0.048 0.009
HUE 0.152 0.181 0.030 0.003 0.021 0.002 0.007 0.025 0.012 0.009 0.041
SAT 0.236 0.134 0.038 0.061 0.059 0.063 0.056 0.028 0.034 0.045 0.018
HEI 0.189 0.554 0.040 0.189 0.777 0.189 0.189 0.114 0.114 0.114 0.189
WHR 0.093 0.833 0.642 0.625 1.666 0.625 0.562 0.533 0.600 0.533 0.095
Vio ated constraints BRI,HUE,WHR HUE,HEI
HEI,WHR
WHR
Tab e 5.1: Properties of segments in figure 5.8. The meaning of abbreviations is
as fo ows: BRI=brightness, CON=contrast, HUE=hue, SAT=saturation, HEI=height,
WHR=width/height ratio.
55

Chapter 6
Syntactica ana ysis of recognized p ate
6.1 Princip e and a gorithms
In some situations when the recognition mechanism fai s, there is a possibi ity
to detect a fai ure by a syntactica ana ysis of the recognized p ate. If we hav
e country specific ru es for the p ate, we can eva uate the va idity of that p a
te towards these ru es. Automatic syntax based correction of p ate numbers can i
ncrease recognition abi ities of the who e ANPR system. For examp e, if the reco
gnition software is confused between characters „8“ and „B“, the fina decision can be m
ade according to the syntactica pattern. If the pattern a ows on y digits for
that position, the character „8“ wi be used rather than the character „B“. Another goo
d examp e is a decision between the digit „0“ and the character „O“. The very sma diff
erence between these characters makes their recognition extreme y difficu t, in
many cases impossib e.
6.1.1
Recognized character and its cost
In most cases, characters are recognized by neura networks. Each neuron in an o
utput ayer of a neura network typica y represents one character. Let y = ( y
0 , … , y 9 , y A , … , y Z ) be a vector of output activities. If there are 36 char
acters in the a phabet, the vector y wi be a so 36 dimensiona . Let yi be an i
th component of the vector y . Then, yi means how much does the input character
corresponds to the ith character in the a phabet, which is represented by this c
omponent. The recognized character χ is represented by the greatest omponent of t
he ve tor y:
χ = hr max { yi }
0≤ i ≤ z
(
)
where hr ( yi ) is the hara ter, whi h is represented by the ith omponent of
ve tor y . Let y ( S ) be a ve tor y des endingly sorted a ording to the values
of omponents. Then, the re ognized hara ter is represented by the first ompo
nent of so sorted ve tor:
( χ = hr y0S )
(
)
When the re ognition pro ess fails, the first omponent of y ( S ) an ontain i
nvalid hara ter, whi h does not mat h the syntax pattern. Then, it is ne essary
to use the next valid hara ter with a worse ost.
56

6.1.2
Synta ti al patterns
In praxis, ANPR systems must deal with many different types of plate numbers. Nu
mber plates are not unified, so ea h ountry has own type. Be ause of this, numb
er plate re ognition system should be able to re ognize a type of the number pla
te, and automati ally assign the orre t synta ti al pattern to it. The assignat
ion of the right synta ti al pattern is a fundamental problem in synta ti al ana
lysis. Synta ti al pattern is a set of rules defining hara ters, whi h an be u
sed on a ertain position in a plate number. If the plate number P is a sequen e
of n alphanumeri al hara ters
P = p (0) … p ( n −1) , then the synta ti al pattern `P is a n tuple of sets `P = (
`p (0) … `p ( n −1) ) ,
(
)
and `p (i ) is a set of all allowed hara ters for the ith position in a plate.
For example, ze h number plates an ontain digit on a first position followed
by a hara ter denoting the region, where the plate has been registered and five
other digits for a registration number of a ar. Formally, the synta ti al patt
ern `P for ze h number plates an looks like this:
{0,1,2,3, 4,5,6,7,8,9} , {C,B,K,H,L,T,N,E,P,A,S,U,J,Z} , `P = {0,1,2,3, 4,5,6,7,
8,9} , {0,1, 2,3, 4,5,6,7,8,9} ,{0,1, 2,3, 4,5,6,7,8,9} , {0,1,2,3, 4,5,6,7,8,9}
, {0,1, 2,3, 4,5,6,7,8,9} ,{0,1, 2,3, 4,5,6,7,8,9}
6.1.3
Choosing the right pattern
If there are n synta ti al patterns `P (0) … `P ( n−1) , we have to hoose the most
suitable one for the evaluated plate number P . For this purpose, we define  a me
tri s (ora ost) δ for a computation of a similarity between the evaluate plate
number an the correspon ing syntactical pattern:
δ (`P) = p
{
(i )
p ∉ `p
(i )
(i )
}
1 −2 +∑ × 10 , (i ) i = 0 max y j 0≤ j ≤ z
n −1
{}
where
{p
(i )
p ( i ) ∉ `p ( i )
}
 
is a number of characters, which o not match to correspon ing
positions in the
 syntactical pattern `P . Let y (i ) be an output vector for the
ith recognize character in a plate. The greatest component of that vector max
y (ji )
0≤ j ≤ z
{}

then in icates how
0≤ j ≤ z

successfully the plate has been recognize . Then, the reciprocal value of max y
(ji )
{ } is a cost
of the character. Another way of
 the cost evaluation is a usageof the Smith Wat
erman
 algorithm to compute the ifference between the recognize plate number an
the syntactical pattern. For example, assume that plate number ‘0B01234’ has b n
r cogniz d as ‘0801234’, and th r cognition patt rn do s not allow digit at th s c
ond position of a plat . If th charact r “8” has b n r cogniz d with similarity ra
tio of 0.90, and oth r charact rs with th ratio of 0.95, th m trics for this p
att rn is d t rmin d as follows.
δ (`P ) = (1) +
10−2
0.95
+
10−2 10−2 10−2 10−2 10−2 10−2 + + + + + = 1,07426 0.90 0.95 0.95 0.95 0.95 0.95
57


If there is a pattern that exactly matches tothe evaluate plate number, we can
say
 that number
 has been correctly recognize
 , an no further corrections aren
ee e . In a ition, it is not possible to etect a faulty number plate, if it o
es not break
 rules of a syntactical pattern. Otherwise, it is necessary to corre
ct etecte plate using the pattern with lowest cost δ :
( sel ) = arg min δ `P ( i ) `P 0≤ i < n
{ ( )}

The correction of a plate means the replacement of each invali character  by ano
ther one. If the
 character p (i ) at the ith position of the plate P oes not
 ma
tch the selecte pattern `P ( sel ) , it will be replace by thefirst vali one
from y ( s ) .y ( s ) is a sorte vector of output  activities enoting how muc
h the recognize character is similar  to an in ivi ual character from the alphab
et. Heuristic analysis of a segmente plate can sometimes incorrectly evaluate n
on character elements as
 characters. Acceptance  of
 the non characterelements
 ca
uses that the recognize plate
 will contain re un ant characters. Re un ant char
actersoccur usually on si es of the plate, but rarely in the mi le. If the rec
ognize plate number islonger than  the
 longest syntax pattern,
 we can select th
e nearest pattern, an rop the re un ant characters accor ing to it.
58

Chapter 7
 
Tests an final consi erations
7.1 Choosing the
 representative set of snapshots 
I have capture
 many
 of static snapshots of vehicles
 for the test purposes. Ran
om moving
 an stan ing vehicles with Slovak  an Czech number plates have been in
clu e . At first, my objective was to fin a representative set of number plates
, which are recognizable by humans. Of course,  the set like this contains extrem
elywi espectrum
 of plates, such as clear an easy recognizable as well as plat
es egra e by the significant
 motion blur or skew. Then, a recognition ability
of a machine is represente by a ratio
 between the number of plates,
 which have
been recognize by the machine, an the  number of plates recognize by a human.
Practically, it is impossible to buil a machine with the same recognition  abili
ties as a human has. Because of this, the test like this is extremely ifficult
 
an useless. In praxis, it ismore useful to fin a representative set of number
plates, which can be capture by an ANPR camera.  The position
 of the camera has
a significantly affects the quality of capture images, an a successfulness  of
the whole recognition
 process. The suitable position of the camera towar s the
lane canlea to a better set of all possible snapshots. In some situations, we
can avoi of getting skewe snapshots
 by a suitable positioning
 of the camera. S
ometimes, this is cleverer than a evelopment of the robust e skewing mechanism
s. Let S be a representative set of all snapshots, which can be capture by a co
ncrete
 instance of the ANPR camera. Some of the snapshots in this  set can
 be blu
rre , some of themcanbe
 too small, too big, too skewe or too eforme . Becaus
e of this, I have ivi e the whole set into a following subsets:
S = Sc ∪ Sb ∪ S s ∪ Se ∪ Sl 
where S c is a subset
 of “clear” plates, S b is a subset of blurre
 plates, S s is a
subset
 of skewe plates,
 S e is a subset of plates, which has a ifficult surro
un ing environment, an S l is a subset of plates with little characters.
59

A C
B D
Figure 7.1: Typical
 snapshot from theset of (a)
 clear plates (b) plates with li

ttle, or blurre characters (c) skewe plates ( ) plates with ifficult surroun
ing environment
7.2 Evaluation of a plate
 number correctness 
Plate numbers recognize by a machine
 can sometimes iffer
 from the correct ones
. Because of this,there is a nee to efine formulas an rules, which will  be u
se to evaluate a egree of plate correctness. Let P be a plate number, an S =
P (0) ,… , P ( n1) Then, recognition rate R ( S ) of the ANPR system teste on se
t S is calculate as:
R(S ) =
{
}

be a set of all teste plate numbers.
1 n
∑ s (P ) ,
(i )
n −1 i =0
 
where n is a car inality of the set S , an s (P ) is a correctness score of the
plate P . The correctness
 score is a value, which express
 how successfully the
plate has been
  recognize . Now the question
 is how to efine the correctness sco
re of in ivi ual plates. There are
 two ifferent
 approaches,
 how to evaluate it.
The first is a binary score, an the secon is a weighte score.
7.2.1
Binary score
(0) ( n −1)
Let us say, that plate number P
P = p ,… , p
(
) . If
is a sequence of n
alphanumerical characters
P
(r )
 
is the plate number recognize by a machine, an P ( c ) is the

correct one, then binary score sb of plate P ( r ) is evaluate as follows:
60

sb P
()
(r )
0 if = 1 if
P ( r ) ≠ P (c ) P(r ) = P(c)

Two plate numbers are equal, if all characters on correspon ing positions are eq
ual:
(P
(r )
= P ( c ) ⇇ ∀i∀j : pi( r ) ≠ p (jr ) ⇉ i ≠ j ,
)(
)
where pi( r ) is the ith character of plate number P ( r )
7.2.2
Weighted score
If P ( r ) is a plate number recognized by a machine, and P ( c ) is the correct
one, then weighted score s w of plate P ( r ) is given as:
sw ( P ( r ) )
{p =
(r ) i
pi( r ) = pi( c )
{p }
(r ) i
} =m
n
where m is the number of correctly recognized characters, and n is the number of
all characters in plate. For example if the plate “KE123AB” has been recognized as “K
E128AB”, the weighted correctness score for this plate is 0.85, but the binary sco
re is 0.
7.3 Results
The table 7.1 shows recognition rates, which has been achieved while testing on
various set of number plates. According to the results, this system gives good r
esponses only to clear plates, because skewed plates and plates with difficult s
urrounding environment causes significant degradation of recognition abilities.
Total number of plates 68 52 40 177 Total number of characters 470 352 279 1254
Weighted score 87.2 46.87 51.64 73.02
Clear plates Blurred plates Skewed plates Average plates
Table 7.1: Recognition rates of the ANPR system.
61

Summary
The objective of this thesis was to study and resolve algorithmic and mathematic
al aspects of the automatic number plate recognition systems, such as problemati
c of machine vision, pattern recognition, OCR and neural networks. The problemat
ic has been divided into several chapters, according to a logical sequence of th
e individual recognition steps. Even though there is a strong succession of algo
rithms applied during the recognition process, chapters can be studied independe
ntly. This work also contains demonstration ANPR software, which comparatively d
emonstrates all described algorithms. I had more choices of programming environm
ent to choose from. Mathematical principles and algorithms should not be studied
and developed in a compiled programming language. I have considered usage of th
e Matlab™ and the Java™. Finally, I implemented ANPR in Java rather than in Matlab,
because Java™ is a compromise programming environment between the Matlab™ and compil
ed programming language, such as C++. Otherwise, I would have to develop algorit
hms in Matlab, and then rewrite them into a compiled language as a final platfor
m for their usage in the real environment. ANPR solution has been tested on stat
ic snapshots of vehicles, which has been divided into several sets according to
difficultness. Sets of blurry and skewed snapshots give worse recognition rates
than a set of snapshots, which has been captured clearly. The objective of the t
ests was not to find a one hundred percent recognizable set of snapshots, but to
test the invariance of the algorithms on random snapshots systematically classi
fied to the sets according to their properties.
62

Appendix A: Case study


134.jpg width:488 px height:366 px
The plate has been successfully recognized no further comment needed. Detected
band width: 488 px height: 30 px
Detected plate
Skew detection
1.44°
Segmentation Number of detected characters: 10 Recognized plate
RK959AF
63

149.jpg
width:526 px
height:350 px
The plate has been successfully recognized no further comment needed. Detected
band width: 526 px height: 28 px
Detected plate
Skew detection
0.0° Segmentation Number of detected characters: 10 Recognized plate
RK959AD
64

034.jpg
width:547 px
height:410 px
The plate has been successfully recognized no further comment needed. Detected
band width: 547 px height: 42 px
Detected plate
Skew detection
2.656°
Segmentation Number of detected characters: 11 Recognized plate
LM010BE
65

049.jpg
width:410 px
height:360 px
The plate has been successfully recognized no further comment needed. Detected
band width: 410 px height: 27 px
Detected plate
Skew detection
5.762°
Segmentation Number of detected characters: 10 Recognized plate
RK878AC
66

040.jpg
width:576 px
height:432 px
The plate has been successfully recognized no further comment needed. Detected
band width: 576 px height: 21 px
Detected plate
Skew detection
0.0° Segmentation Number of detected characters: 11 Recognized plate
BA738DE
67

098.jpg
width: 425 px
height: 330 px
The plate has been successfully recognized no further comment needed. Detected
band width: 425 px height: 28 px
Detected plate
Skew detection
0.0° Segmentation Number of detected characters: 9 Recognized plate
1B19839
68

60.jpg
width:424 px
height:336 px
Class: blurred characters A significant blur caused improper detection of the ba
nd in a graph of vertical projection. In addition, further heuristic analyses di
d not detect this fault. Because of this, the incorrect candidate to number plat
e has been deskewed and then segmented even though it does not have any semantic
s. Point of failure: vertical projection and heuristic analysis of band.
Detected band
width: 424 px
height: 141 px
Detected plate
Skew detection
0.136°
Segmentation Number of detected characters: 6 Recognized plate
N/A
69

023.jpg
width:354 px
height:308 px
Detected band
width: 354 px
Detected plate
Class: difficult environment This is a typical snapshot with a difficult surroun
ding environment. The table in the background contains more characters in one li
ne than a number plate in the foreground. This fact causes a bigger amount of ho
rizontal edges in an area of the table. The three detected peaks in the graph of
the horizontal projection correspond height: 19 px to three rows in the table.
Although the number plate candidate is wrong, the further analysis did not refus
e it, because the number of characters (10) is within the allowed range. Skew de
tection
1.169° Segmentation Number of detected characters: 10 Recognized plate
0CNCKEP
70

044.jpg
width:530 px
height:397 px
Class: skewed plates The part of graph corresponding to the peak “2” (in vertical gr
aph) has a wider distribution due to the improper vertical projection caused by
a skew of the plate. Because of this, the bottom of the last character “E” has been
cut off improperly. Point of failure : vertical projection – band clipping
Detected band
width: 530 px
height: 30 px
Detected plate
Skew detection
5.599°
Segmentation Number of detected characters: 7 Recognized plate
RK892AF
71

067.jpg
width:402 px
height:298 px
Class: extremely small characters This snapshot contains extremely small charact
ers, which are not distinguishable by a machine. The number plate has been prope
rly detected and characters have been segmented as well, but it is very hard to
distinguish between the “B” and “8” on the first position and between the “6” and “8” on th
rd position of plate. Point of failure : character recognition
Detected band
width: 402 px
height: 21 px
Detected plate
Skew detection
4.00° Segmentation Number of detected characters: 10 Recognized plate
8Y849A4
72

Appendix B: Demo recognition software User’s manual


JavaANPR is an ANPR recognition software that demonstrates principles described
in this thesis. It is written in the Java programming language. If you want to r
un it, you will need the Java 1.5.0 SE runtime environment (or higher). After do
wnloading the distribution package, please unpack it into a chosen directory. Th
e distribution package contains compiled program classes, jar archive, source co
des and additional program resources such as bitmaps, neural networks etc.
build dist lib nbproject resources src
Compiled classes Distribution directory, contains JAR file and additional resour
ces Compile time libraries Project metadata and build configuration Resources, c
onfiguration file, bitmaps, neural networks Source files
1. Cleaning, compiling and building the project (optional)
Normally, you do not have to compile the project, because distribution package a
lready contains precompiled binaries. If you want to recompile it again, you can
do it using the “Apache Ant” utility. At first, change a working directory to the “ja
vaanpr”, and type the following command to clean the previous build of the JavaANP
R. Issuing this command will delete whole content of the build and dist director
ies:
javaanpr # ant clean
Then, issue the “ant compile” and “ant jar” commands. The “compile” target will compile all
source files in the src directory. The “jar” target will create the “dist” directory wit
h a jar archive and additional run time resources.
javaanpr # ant compile javaanpr # ant jar
2. Running the viewer
You can run the interactive ANPR viewer using the ant by typing the following co
mmand:
javaanpr # ant run
If you do not have installed the ANT utility, you can run viewer manually by the
following commands:
javaanpr # cd ./dist dist # java –jar javaanpr.jar
Another way to run the viewer is a double click to a javaanpr.jar archive (in th
e MS Explorer)
73

Figure B.1: Graphical user interface of the JavaANPR viewer


Important: By default, the program expects the configuration file “config.xml” and o
ther resources in the working directory. Because of this, please do not run the
jar archive from other directories. Otherwise, the program will not be able to s
tart.
3. Using command line arguments
Besides the graphical user interface, program also contains additional functions
, which are accessible using the command line arguments. For more information ab
out it, please run the jar file with a “ help” command:
Automatic number plate recognition system Copyright (c) Ondrej Martinsky, 2006 2
007 Licensed under the Educational Community License Usage : java jar javaanpr.
jar [ options] Where options include: help Displays this help gui Run GUI view
er (default choice) recognize i <snapshot> Recognize single snapshot recogniz
e i <snapshot> o <dstdir> Recognize single snapshot and save report html into
specified directory newconfig o <file> Generate default configuration file ne
wnetwork o <file> Train neural network according to specified feature extractio
n method and learning parameters (in config. file) and saves it into output file
newalphabet i <srcdir> o <dstdir> Normalize all images in <srcdir> and save
it to <dstdir>.
3.1
Command line recognition
If you do not want to use the GUI viewer, you can recognize snapshot by issuing
the following command. The recognized plate will be written to standard output.
74

dist # java –jar javaanpr.jar –recognize –i <name of image>


3.2
Recognition report
Sometimes, it is good to see inside the recognition process of concrete image. B
ecause of this, JavaANPR supports a generation of HTML reports. The recognition
report contains images and verbose debugging information about each step of the
recognition process. HTML report can be used to determine a point, in which the
recognition process failed. The following command will recognize the image speci
fied by its name, and save the report into a specified destination directory:
dist # java –jar javaanpr.jar –recognize –i <name of image> o <destination directory>
3.3
Creating the default configuration file
Configuration file contains settings and parameters, which are needed during the
recognition process. If configuration file does not exist, program will not be
able to start. Because of this, JavaANPR is able to generate a default configura
tion file with recommended configuration settings by the following command:
dist # java –jar javaanpr.jar newconfig o <file>
75

Bibliography
[1] Fajmon B.: Numeric Math and Probability, scripts, Faculty of Electrical Engi
neering
and Communication, Brno, Czech Republic, 2005
[2] Fraser N.: Introduction to Neural Networks,
http://www.virtualventures.ca/~neil/neural/neuron.html
[3] Fukunaga K.: Introduction to statistical pattern recognition, Academic Press
, San
Diego, USA, 1990
[4] Gonzalez R., Woods R.: Digital Image Processing, Prentice Hall, Upper Saddle
River,
New Jersey, 2002
[5] Kovar M.: Discreet Math, scripts, Faculty of Electrical Engineering
and Communication, Brno, Czech Republic, 2003
[6] Kuba M.: Neural Networks, scripts, Faculty of Informatics, Masaryk Universit
y, Brno,
Czech Republic [7] Kvasnicka V., Benuskova L., Pospichal J., Farkas I., Tino P.,
Kral A.: Introduction to Neural Networks, Technical University, Kosice, Slovak
Republic [8] Minsky M., Papert S.: Perceptons. An Introduction to Computational
Geometry, MIT Press:. Cambridge, Massachusetts, 1969
[9] Shapiro V., Dimov D., Bonchev S., Velichkov V., Gluhchev G.: Adaptive Licens
e Plate
Image Extraction, International Conference Computer Systems and Technologies, Ro
usse, Bulgaria, 2004
[10] Smagt P.: Comparative study of neural network algorithms applied to optical
character recognition, International conference on Industrial and engineering ap
plications of artificial intelligence and expert systems, Charleston, South Caro
lina, USA, 1990
[11] Srivastava R: Transformations and distortions tolerant recognition of numer
als using
neural networks, ACM Annual Computer Science Conference, San Antonio, Texas, USA
, 1991
[12] Wang J., Jean J.: Segmentation of merged characters by neural networks and
shortest path, Symposium on Applied Computing, Indianapolis, Indiana, USA, 1993
[13] Zboril F.: Neural Networks, Department of Intelligent Systems, Faculty of I
nformation
Technology, BUT Brno, Czech Republic
[14] Zhang Y., Zhang C.: New Algorithm for Character Segmentation of License Pla
te,
Intelligent Vehicles Symposium, IEEE, 2003
[15] ANPR tutorial.com, Quercus technologies, 2006
76

Você também pode gostar