1 s2.0 S0167865597000044 Main

Pattern Recognition
Letters
ELSEVIER
Pattern Recognition Letters 18 (1997) 173-185
Colour image segmentation by modular neural network 1

A. Verikas a,b,,, K. Malmqvist a, L. Bergman a
a Center for Imaging Sciences and Technologies, Halmstad University, Box 823, S-301 18, Halmstad, Sweden
b Kaunas University of Technology, Studentu 50, 3031, Kaunas, Lithuania
Received 20 June 1996; revised 22 October 1996
Abstract
In this paper segmentation of colour images is treated as a problem of classification of colour pixels. A hierarchical
modular neural network for classification of colour pixels is presented. The network combines different learning techniques,
performs analysis in a rough to fine fashion and enables to obtain a high average classification speed and a low classification
error. Experimentally, we have shown that the network is capable of distinguishing among the nine colour classes that occur
in an image. A correct classification rate of about 98% has been obtained even for two very similar black colours. @ 1997
Elsevier Science B.V.
Keywords: Colour classification; Image segmentation; Modular neural networks
1. Introduction
Colour image processing and analysis is increasingly used in industry, medical applications, and other
fields. Quality inspection, process control, material
analysis, and medical image processing are a few examples. Therefore, a research in colour perception and
development of efficient computational models for real
world problems is of crucial importance. One task
that often arises in colour image processing is image
segmentation. Colour image segmentation techniques
can, roughly, be categorised into techniques for chromatically dividing an image space and those for clustering a feature space derived from an image. Region
growing, region splitting and merging are the common
approaches used by methods of the first group (Liu
* Corresponding author. E-mail: Antanas.Verikas@cbd.hh.se.
I Electronic Annexes available. See http://www.elsevier.nl/
locate/patrec.
and Yang, 1994, Panjwani and Healey, 1995). Methods of the second group divide colour space into clusters (Uchiyama and Arbib, 1994, Tominaga, 1992).
The colour image segmentation method we discuss
here belongs to the latter category. We treat the colour
image segmentation problem as a problem of classification of colour pixels.
The most common goal in colour image segmentation is to partition a colour image into a set of uniform colour regions. However, the aim of this work
is slightly different. The motivation for this work is
a need to determine colours of inks used to produce
a multi-coloured picture created by printing dots of
cyan (c), magenta ( m ) , yellow (y) and black (k)
primary colours upon each other through screens having differing raster angles. The answer must be given
for any possible combination of cyan, magenta, yellow and black ink and for any area of the picture. One
factor that influences the colour impression of the picture is the size and shape of the areas covered by the
0167-8655/97/$17.00 @ 1997 Elsevier Science B.V. All fights reserved.

PII S 0 1 6 7 - 8 6 5 5 ( 9 7 ) 00004-4
174
A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185
Table 1
The mean values and standard deviations of the variables R, G and B for five overlapping classes of colours ( R, G and B E [0, 255] )
Colour class
Variab~
Mean
Stand. dev.
my
cm
cmy
119
5
41
5
53
7
I12
6
37
5
33
6
32
5
31
6
48
10
30
5
31
6
33
7
25
4
25
3
24
4
different inks. This information can be used to control the amount of ink transferred to the paper in each
of the four printing nips holding cyan, magenta, yellow and black. The measurement of the area covered
by ink of the different colours can be done automatically using an image analysis system, if the image
taken from the printed picture can be segmented into
regions according to the following two rules:
1. Pixels should be assigned to the same cluster
(colour class) if they correspond to areas of the picture
that were printed with the same inks.
2. Pixels corresponding to areas printed with different inks should be assigned to different clusters.
Determination of a colour class for every pixel of
the image is the way the task is solved. In order to
solve the task with acceptable accuracy in classification and a high average speed we propose the use of a
hierarchical modular neural network. Note that classification speed is of primary interest in our application.
The rest of the paper is organised as follows. In the
next two sections we briefly describe the input data
and the colour space used. Architecture of the network
is presented in Section 4. Procedures for training the
network are given in Section 5. Section 6 summarises
the results of experimental investigations. Section 7
concludes the work.
2. The data
When mixing dots of cyan, magenta and yellow
colours eight combinations are possible for every pixel
in the picture. The combination emy produces the
black colour. However, in practice black ink is most often also printed. We assume the black ink to be opaque.
Therefore, we have to distinguish between 9 colour
classes, namely c, m, y, w (white paper), cy, cm, my,
emy (black resulting from overlay of cyan, magenta
and yellow) and k (black resulting from black ink).
Discrimination between some of the colour classes
is a rather complicated matter, since they are highly
overlapping in the colour space. For example, m - m y

and em--emy-k are two clusters of such highly overlapping colour classes. To illustrate this we present in
Table 1 the mean values and standard deviations of the
variables R, G and B for these five classes of colours.
Note that the intensity values shown in the table are
for the data taken from solid print areas only. This
means no pixels from the dots and "fuzzy borders"
of the dots are included. The pixels from the borders
create "bridges" between white and the other colours
and make the classification problem more difficult to
solve. Besides that, the R, G and B parameters of class
k acquire very large variance, since the black ink can
appear beneath (or above) all possible combinations
of the other coloured inks.
The number of clusters with highly overlapping
classes of colours depends on several factors, such as
the amount of black ink printed on the picture, printing technology, properties of inks used and some other
factors. By increasing the amount of black ink, we
make the other colours darker and more and more similar until we get only one cluster with only one black
colour. We assume, therefore, the range of variation of
the amount of black ink to be 0-50%. In such a range
of variation of black ink printers would like to measure
the percentage of an area covered by inks of different
colours. Though several clusters of rather overlapping
colour classes can appear and it is important to recognise with an acceptable classification accuracy all the
colour classes, the most difficult task is to distinguish
between the colour classes cmy and k.
3. Choice of colour space

Five colour spaces, namely RGB, HSV, CIELuv,
C1ELab and "IJK", have been tested and compared
experimentally. The choice of colour space was based
on experimental testing. The "IJK" colour space was
the choice. The highest correct classification rate was
obtained in this space for the most overlapping colour
175
Additional features from the |

surrounding ( Yl..... Ym) r/
[
Colourpixel (X)
] New variables from R, G, and B [

I
A binary decision tree

]-~ Class label )
r
[ A set of ambiguous classes [
A counterpropagationnetwork for each set of ambiguous classes
[ A set of CP weightvectors{W1..... Wi..... wk}
~------*IClasslabell
Weighted Euclidean Distances
d(x,
C i ) ~-" ~
[~
Zk~l[% (xi -- wik )]2 + ~1=1[all(Yt -- Uil)12}1'2

( Class label }
,k
[
Fuzzypost-processing
Weights % from
random optimisation
[ Class label)
Fig. 1. Architecture of the network.
classes (era, emy and k).

The "IJK" colour space uses colour difference signals. If we assume the random variables R, G and B
to be of equal variances (0 "2) and covariances (only
variances of the variables have been normalised to be
equal to one in our experiments), the covariance matrix of these variables can be written as (Tan and Kittier, 1993)
X = o "2 I r1
r1
il ,
(1)
where r is the correlation coefficient. The eigensolution of the covariance matrix gives the following
eigenvectors (ei) and the corresponding eigenvalues
(ai):
el = {1, 1, 1}a';
e2 = { 1 , 0 , - 1 } T ;
(2)
e3 = { 1 , - 2 , 1}T;
At =0-2(1 + 2 r ) ;
a 2 = h 3 = 0"2( 1 -- r).
(3)
The linear transform of the {R, G, B} vector by the

eigenvectors produces other random variables
I=R+G+B,
(4)
J=R-B,
(5)
K = R - 2G + B,
(6)
which are almost uncorrelated and have zero covariances. 1, J and K are the variables of the "IJK" colour
space.
According to Hunt ( 1991 ), three signals representing colour are transmitted via nerve fibres from the
human eye to the brain. One of these signals is usually
referred to as an achromatic signal and the other two
as colour difference signals. In this sense the 1, J and
K variables mimic the signals transmitted to the human brain, since I can be referred to as an achromatic
signal, while J and K as colour difference signals.
Distances measured in the "IJK", as well as the
RGB colour space do not represent colour differences
in a uniform scale from the point of view of perception. The C1ELuv and CIELab colour spaces are more
uniform in this sense. In spite of that we have cho-
176
A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185
sen the " I J K " c o l o u r space. A low classification error

and a high processing speed, but not the good correspondence between measured and perceived colour
differences, is our primary interest when choosing the
colour space. Such an approach comes from the goal,
which is to determine colours of inks used to print any
arbitrary area of a given picture, but not to segment
an image of the picture in a way as similar as possible
to the way how the humans do. Even when calculated
without pre-processing, the I, J and K variables are
much less correlated than R, G and B. The amount
of calculation required to obtain 1, J and K variables
is less than for {L, u, v}, {L, a, b} or {H, S, V}. Besides, the variable H is undefined for a grey colour.
Therefore, the choice of the " I J K " colour space seems
reasonable in this particular case.
vl I
I
Grossberg
layer
Competitive
layer
W I
Input
layer
x,
x2
x, y~ Y2 Ym
Fig. 2. A forward-onlycounterpropagationnetwork.
classification pertbrmed by the tree is very fast, since
the tree consists of only a few nodes and only one
neuron is used in every node of the tree.
4.2. C o u n t e r p r o p a g a t i o n n e t w o r k
4. Architecture of the neural network

The architecture of the network is shown in Fig. 1.
There are four steps in the proposed classification
procedure. A binary decision tree performs the first
step of the procedure. We carry out the second and the
third classification steps by using weight vectors of the
counterpropagation (CP) network. The last classification step (fuzzy post-processing) is based on analysis of the decisions made in the previous step. We
use only three variables I, J and K to describe a pixel
in the first two classification steps. In the third step
adjacency information is also exploited, since a pixel
acquires additional co-ordinates, the values of which
are calculated from the surrounding of the pixel being
classified. Next, we briefly describe network modules,
that perform different classification steps.
4.1. B i n a r y decision tree
The binary decision tree performs the first classification step. Two types of terminal nodes can be encountered in the tree: (1) the node representing one
colour class only, and (2) the node representing a
cluster (a set) of ambiguous colour classes. The classification performed by the tree is final for the pixels arriving at terminal nodes of the first type. Pixels
reaching terminal nodes of the second type are transferred to the CP network for further analysis. The tree
divides colour space into several colour regions. The
Devoted to each set of ambiguous coiour classes

is a CP network (Fig. 2). The size of the network's
competitive layer depends on the number of ambiguous classes in the set. Each class from this set is
represented by a part of the layer. These parts are
trained separately and concatenated to make one network for each and every set of ambiguous classes.
The second classification step is performed by using
the weight vectors wi = (Wil . . . . . win) of the competitive layer as reference patterns in a k-NN classification rule. The number of nodes in the Grossberg
layer is equal to the number of features yl . . . . . Ym extracted from the surroundings of the pixel being classified. The learned values of the features are stored
as weights of the Grossberg layer. Therefore, each
weight vector of the competitive layer wi (a rough
reference pattern) is associated with one Grossberg
layer's weight vector ui = ( v i i , Pi2 . . . . . Pint) (reference pattern containing more details). The association
of the weight vectors produces a concatenated weight
vector ci = ( wil . . . . . win, P / l , Pi2 . . . . . Pint). The third
classification step is performed by using the concatenated weight vectors. The vectors are treated as reference patterns in a minimum distance classifier.
The CP network acts as a quantiser of a colour region. The weight vectors wi quantise the region in a
3-dimensional " I J K " colour space, while the concatenated weight vectors ci quantise the region in an extended (3 + m)-dimensional space.
A. Verikaset al./Pattern Recognition Letters 18 (1997) 173-185

4.2.1. Classification by using CP network
During the second classification step k nearest
weight vectors wi of the competitive layer are selected. Note that the nearest weight vectors are found
among the all N weight vectors of the concatenated
competitive layer. The classification result is final
if, among these weight vectors, vectors of one class
dominate, i.e. the ratio of a number of weight vectors
representing the two most frequently appearing colour
classes exceeds some pre-specified threshold. Otherwise, k weight vectors ci are emitted as an output of
the CP network. The vector cl is a concatenation of
the first winner Wl of the competitive layer and an associated with it vector ul of the Grossberg layer. The
weight vectors ci are used in the third classification
step that is performed by calculating the weighted
Euclidean distances. The weights a 0 that appear in
the weighted Euclidean distance measure are specific
for each reference pattern. The weights are found
by performing a random optimisation in the weight
space (Verikas et al., 1996). By doing such steps we
perform analysis in a rough to fine fashion. We perform a rough classification with the binary decision
tree, more precise with competitive layer weights, and
accurate classification with the concatenated weights.
4.3. Fuzzy post-processing
As it has been mentioned, one counterpropagation
network for each and every set (cluster) of ambiguous classes is constructed. During the learning process the weight vectors of the network are distributed
in the colour space according to the class-conditional
probability density function of the input data used for
learning. The weight vectors of the trained network
are treated as reference patterns and they represent regions of the colour space.
Let Mac be the number of ambiguous classes in the
set. Each class q (q = 1,2 . . . . . MAC) is represented
by Nq weight vectors. The set of these vectors is:
{Cj} = {c~,i = 1. . . . . Nj},
j = 1 . . . . . MAC,
(7)
{c} = UAG),
(8)
3--`MAcNj.
N = ~_~/
(9)
As far as highly overlapping colour classes are considered, most of the weight vectors will be located in
177
the overlapping regions of the class-conditional distributions. However, some of the vectors will be also
placed in the non-overlapping "tails" of the distributions. Therefore, the decisions made by using the different weight vectors are not of the same reliability. We
say that the decision is made (when classifying pixel
x) by using the weight vector c~ ( i = 1,2 . . . . . Nj; j =
1 . . . . . MAC), if the minimum distance d ( x , c~) has
been obtained by using the weight vector c~.. Some of
the decisions made by using weight vectors from the
overlapping regions can be rather doubtful. Therefore,
a correction of the decisions (the post-processing)
takes place after the pixels have been classified. The
concept of the correction is as follows.
The decision classes (the colour classes) and the
weight vectors ~ representing the regions of the colour
space are considered as fuzzy sets. Membership values for the fuzzy sets and the fuzziness of the decisions made by the weight vectors are defined. Classification of an image by the counterpropagation network results in the classified image as well as in a
number MAC (the number of ambiguous classes in
the set) of supplementary images. Every pixel x in
the supplement image j is represented by the value of
the membership function Aj ( x ) of the jth ambiguous
class. Post-processing is based on information about
the membership values and the fuzziness of the decisions. More details about the post-processing can be
found in (Verikas and Malmqvist, 1995).
4.4. Benefits of the architecture

A high averaged classification speed and a low classification error are the main attributes of the architecture chosen. The binary decision tree performs fast
classification by using only the three colour space coordinates and assigning each colour pixel to one of
several colour regions. In the second classification step
location of the pixel in the region is analysed by using 3-dimensional weight vectors, representing subregions of the region being considered. As a result
of the analysis, several sub-regions for the possible
location of the pixel are selected. In addition to the
three main colour space co-ordinates adjacency information is also exploited in the third classification step
for making a decision about the pixel's colour class.
The colour class assigned to the pixel may be further
A. Verikaset al./Pattern Recognition Letters 18 (1997) 173-185
178
changed in the last classification step, depending on

which weight vectors have been used to make decisions about the colour classes of the adjacent pixels.
Therefore, in each step the dimensionality of the decision space is reduced, while the amount of information used to make a decision is increased. Depending
on the colour of a pixel, the classification process can
be accomplished at any of the steps.
By performing nested analysis and by adaptively
using the amount of information for the classification process we achieve the required accuracy and
gain analysis speed. In contrast, image segmentation
methods based on region growing, region splitting and
merging require intensive calculations for performing
multiple splits and merges. Besides, such methods are
not directly applicable in our case, since our goal is to
determine colours of inks used to print any arbitrary
area of a given picture, but not to segment an image of
the picture in a way as similar as possible to the way
how the humans do. Sometimes, for example some
"cmy" regions are perceived as being more similar to
the class "k" than to their own class, however pixels
from these regions should acquire the label "emy".
5. Training the network

5.1. Binary decision tree
The binary decision tree is constructed by sequentially dividing the learning set into two parts. In every
node of the tree the learning set is divided into two
subsets according to the decision boundary developed
during learning. Only one neuron (of any order desired in a general case) is used to solve the task (of
dividing the learning set into two parts) in every node
of the tree. The neuron can classify a data point x into
one of two subsets according to the sign of the neuron's output value. The output is given by
y = f(u) = f(wo + Zwixi+
i
+
EWil...iLXil'''XiL),
ij 4"" ~ic
(lO)
with xi being the ith component of the input data, w i

the corresponding weight, and L the neuron's order.
The function f ( ) is from - 1 to +1 with f ( 0 ) = 0,

for example f ( u ) = tanh(u).
The learning set X is partitioned into two subsets
X+ and X_ according to the following rule:
x E
X+
i f g ( x ) ~> 0
X_
ifg(x) < 0
Vx C X,
(11)
Z
Wil"'iLXil'''XiL"
iI~'"~it.
(12)
where g ( x ) is given by
g( x ) = wo + Z
wixi + " "

i
The learning set X contains labelled as well as unlabelled pixels. The unlabelled pixels are those coming
from the borders of the dots. Labels for such pixels are
hard or even impossible to obtain. Therefore an unsupervised learning algorithm, we have recently proposed, is used for the binary decision tree construction
(Verikas et al., 1995). For every node of the tree the
algorithm tries to locate the decision boundary (12)
in a place with few learning samples. A node of the
tree is labelled as being a terminal node of the first
type when all labelled samples falling in the node belong to the same class or if only one class has the
number of labelled samples above the threshold T1 and
the ratio of samples of two major classes represented
by the node is above the threshold T2. A node of the
tree is labelled as being the terminal node of the second type if the number of labelled samples falling in
the node is above the threshold T1 more than for only
one class and the samples falling into the node form
a "compact cluster". The algorithm that can find the
"compact clusters" is given in (Verikas et ai., 1996).
5.2. Counterpropagation network
5.2.1. Process of designing the network
The CP networks with input from the second type
nodes of the tree are trained separately. In order to
avoid overtraining and to achieve better generalisation
properties of the network, separate data sets have been
used in different steps of the designing process.
Data sets used to design a pattern recognition system are always limited and very often not representative enough. This often happens because of a lack of
experimental data (it was not the case in this study),

limited resources of computer memory or computation time. It also often happens that sets are collected
in favour of one or another class. Therefore, the use
of different data sets in different steps of the design
process reduces the possibility that the system be designed in favour of some classes and improves generalisation properties of the system.
Six sets of data, namely a learning set, two validation sets, two optimisation sets and a testing set, have
been used for constructing each network. Each set represents the number MAc of classes. It is clear, if only
a small amount of experimental data is available, the
optimisation sets can be replaced by the learning set,
as well as two validation sets by only one, or "leave
few out" techniques can be applied.
First, the weight vectors of the competitive and the
Grossberg layers of the CP network are obtained for
each of the MAc classes (using the learning set) by
means of competitive learning with "conscience" and
the Grossberg learning law, respectively. See the section below for the detailed learning procedure.
Next, the set of concatenated weights, is optimised
by using the modified Ivq algorithm (Section 5.3) and
the optimisation set 1. We use the "pocket optimisation" strategy. The best set of weights c is traced during
the optimisation process and kept in the "pocket". The
optimisation terminates with the best set of weights.
The quality of the weights is tested on the validation
set 1.
In the next step of the designing process we find
weights aij that appear in the weighted Euclidean distance (Section 5.4). The Alopex algorithm (Unnikrishnan and Venugopal, 1994) performing a random
search in the weight space and the optimisation set 2
are used for obtaining the weights. During the optimisation process some of the weights decrease to zero
and eliminate corresponding features (Verikas et al.,
1996). The features eliminated are different for different reference patterns. Therefore we say that the features are selectively used for classification. The features used are different for different regions of the
colour space. The whole CP network is tested on the
validation set 2 after the optimisation.
This process of designing is repeated for a different number of the CP network's nodes. The network
yielding a reasonable trade-off between the classification error and complexity is chosen as the final one.
179
5.2.2. Training the network

First, the weight vectors w of the competitive layer
of the CP network are obtained for each of the MAC
classes by means of competitive learning with "conscience" (Verikas and Malmqvist, 1995). The "conscience" mechanism is similar to that proposed by Desieno (1988).
In each iteration of the learning process we find a
"winning" weight vector using the following equation:
k=argnfin(d(x,w~)-bq),
/=1,2
.....
Nq,
(13)
where d(x,w~) is the distance between pixel x and
the ith weight vector of the qth class, and bq is the
winning frequency sensitive term, that penalises too
frequent "winners" and rewards those that win seldom
(Verikas and Malmqvist, 1995). Then the winning
weight vector w~ (t) is updated according to the rule
W~q(t+l)=Wkq(t)+at[x(t)--Wkq(t)],
(14)
where parameter {at} i s a slowly decreasing sequence

of learning coefficients.
When training of the competitive layer terminates,
the weights w are frozen and the learning proceeds
for the Grossberg layer (separately for each of the
MAC classes). The learning of the Grossberg layer is
governed by the Grossberg learning law:
pij(t+l)=l.'ij(t)+fl[yj-l,'ij(t)]zi,
(15)
where t is the iteration index,/3 the learning rate (0 <
/3 < 1), yj the jth feature from the surrounding, v i =

(v jl, vj2 . . . . . vjNq) the weight vector associated with
the jth node of the Grossberg layer, Nq the number
of the competitive layer nodes representing the qth
class, and zi is the output signal of the ith node of the
competitive layer which is given by
zi =
1,
if d(x,wiq) = j=l,2,...,u,,
min d(x,wJ,),
0,
otherwise,
(16)
where d (x, W/q) is the distance between the pixel being

classified and the ith competitive layer weight vector
representing the qth class. Here we assume that the
training of the CP network proceeds for the qth class.
q E Iac, where Iac is the set of class indices from one
cluster of ambiguous classes,
180
After learning, the network will output a vector ui =

Pim) whenever node i wins the competitive's layer competition. The vector ui is an approximate average of the features Yl . . . . . Ym associated
with those pixels x that cause node i of the competitive layer to win.
tion process, NL is the number of samples in the learning set, k is a constant, Q is the number of classes,
and Ntwi is given by
5.3. Modified lvq
where N~i is the number of samples from class i classifted correctly at the zeroth iteration of the optimisation process. The second term in the performance
measure penalises an increase in wrong classifications.
The Alopex algorithm (Unnikrishnan and Venugopal,
1994) performing a random search in the weight space
is used for the optimisation.
(b'il, 11i2. . . . .
Assume that d i and dj are the Euclidean distances

from pixel x to the weight vectors ei and c j, respectively. Note that the vector ci is obtained by concatenating wi and ui. Then x is defined so as to fall into a
window of the relative width .,/, if
dj) > 11-a

+-----~"
min \ d j
(17)
Ci(t + 1) = ci(t) -- a ( t ) [X(t) -- Ci(t) ] ,
t O,
if
N~i
>
Ntci,
(22)
otherwise,
5.5. Additional features
E[li],
cj( t + 1) = ej( t) + a( t) Ix(t) - cj( t) ] ,
where a ( t ) decreases with time and 0 < a ( t ) < 1, ei

and cj are the two closest weight vectors to x, whereby
x belongs to the same class as cj, but not as ci.
If x, ci and c i belong to the same class,
[x(t)--ck(t)]
= f 1~ci -- Ntci ,
The features extracted from the surrounding

Y~. . . . . Ym are defined to be
For all x falling into the window adapt:
ck(t+l)=ck(t)+e(t)a(t)
N'w,
(19)
min[&],
E[J/],
E[gi],
max[&],
min[Ki],
max[K/],
where E[ ] is an average operator. The operators E,

min, max and ~ are calculated in the window around
the pixel being classified.
for ck representing the closest weight vector.

If x, ci and cj belong to different classes:
ck(t + 1) = ck(t) - e ( t ) a ( t ) [ x ( t ) - c~(t)]
(20)
for k E {i, j}. The modified lvq is similar to that described by Song and Lee (1996). However, we allow
only modifications of weights inside the window h.
5.4. Determining weights f o r the Euclidean distance
The weights aij that appear in the weighted Euclidean distance are specific for each reference pattern.
The weights are found by maximising the following
function of classification performance.
o
F =
Ntci - k
-
Nwi
)/
NL,
(21 )
i=l
where Nti denotes the number of samples from class i

classified correctly at the tth iteration of the optimisa-
6. Experimental testing
6.1. Learning and testing sets
The system was tested by a segmentation of colour

images containing the nine colour classes mentioned
above. The learning set for designing the binary decision tree consisted of 30000 pixels. 18000 of them
were labelled and the others unlabelled. The labelled
pixels have been collected from both full tone and half
tone prints. Fig. 3 illustrates an example of an image
taken from the full tone print of the c class. An example of the "half tone image" used to collect the "cyan"
pixels is given in Fig. 4. Since class k (dots printed
with black ink) can appear on the all eight possible
backgrounds, pixels of class k have been collected
from all the backgrounds. Figs. 5 and 6 illustrate dots
of class k on the yellow and magenta backgrounds,
respectively. Pixels from several windows containing

all the colour classes (with the k and without the k)
have been included into the learning set as unlabelled
data. Fig. 7 presents an example of such a window.
Collection of data for training the CP networks starts
after the labelling of the terminal nodes of the decision
tree. Pixels falling into the nodes of the second type
are used to train the respective CP network. Two nodes
of the second type have been found. One representing
the colour classes m and my and the other the colour
classes era, emy and k. For example, pixels from the
image shown in Fig. 8 would fall into two nodes. Node
"y" of the first type and node "m, my" of the second
type. Since we know that the image shown in Fig. 8
contains only two colour classes, namely y and my
(100% of the area is covered by a yellow ink), all
the pixels falling into the node "In, my" can be used
to train the my part of the respective CP network. In
the same manner the data from half tone prints are
collected to train the "era, emy, k" CP network. For
example, the image shown in Fig. 9 contains no "era"
and no "k" pixels. Therefore, all the pixels falling into
the node "era, emy, k" can be used to train the emy
part of the respective CP network.
The learning, optimisation and validation sets for
designing the CP networks contained 10000 pixels
from each class.
The networks were tested on about 100000 pixels
from each class. An exact number of testing samples
processed from different classes is given in Table 2.
"Window images" of a 256 x 64 pixel size, extracted
from larger primary images have been used for evaluating the performance of the developed system. All the
"window images" were extracted from different primary images. More than 200 "window images" have
been processed.
The "ground truth" of the classification was established by visual inspection knowing the desired result.
For example, all pixels coming from the red dots of
Fig. 8 should be assigned the label "my". The other
pixels of the image should acquire the label "y". Any
other classification result would be treated as error.
6.2. Parameter setting
There are six coefficients controlling different steps

of the training process of the CP networks. These parameters are:
181
(at}: the learning rate controlling training of the

competitive layer,
/3: the learning rate controlling training of the
Grossberg layer,
k: a constant, that controls the degree of penalising
an increase in wrong classifications, if compared with
the initial optimisation state (the constant appears in
the criterion function for obtaining weights to be used
in the weighted Euclidean distance measure),
a, A and e: coefficients controlling the behaviour of
the lvq algorithm.
At the beginning of training, the coefficient at was
set to a relatively large value 0.4. As the weight vectors
wi move into the area of the input data, the coefficient
was then lowered for final convergency. Therefore, the
following learning coefficients at have been used in
the training process. If the training process contains a
total of t2 steps, then
for 0 <~ t <~ tl,
at = kl (I - t/tl ),
fortl <~ t<~ t2,
at=k2(1 -t/t2).
Values Ofkl = 0.4, k2 = 0.02 and tl = 0.1tz were chosen. The value of t2 depended on the number of nodes
and was set to a value of 500 (number of nodes).
Note that parts of the network representing different
classes are trained separately. Therefore, the Number
of Nodes is relatively small. To keep the training process of the Grossberg layer well-behaved, the parameter/3 should be kept suitably small (0 ~</3 << 1 ). After preliminary experiments the parameter/3 was set
to a value 0.002. To ensure that there be no increase in
incorrect classifications after the optimisation starts,
the constant k should be set to a relatively large value.
The value of k = 10 has been found to be appropriate
for our task.
The optimal size of the lvq window depends on the
number of training samples. If a large number of sampies is available, a narrow window would guarantee
the most accurate location of the decision boundary.
For good statistical accuracy, however, the number of
samples falling into the widow must be suffcient (Kohonen, 1990). The optimal value of e depends on the
size of the window, being smaller for narrower windows (Kohonen, 1990). After some preliminary experiments the following values of the lvq parameters
have been used: A = 0.01, t~ = 0.02 and e = 0.02.
182
A. Verikas et al. /Pattern Recognition Letters 18 (1997) 173-185
Fig. 3, An example of an image taken from a full-tone print of

the e class.
Fig. 4. An example of an image taken from a half-tone print of

the e class.
Fig. 5. Dots of class k on the yellow background.
Fig. 6. Dots of class k on the magenta background.
Fig, 7. An example of the image containing eight colour classes.
6.3. Results obtained

Two CP networks have been constructed. One for
the cluster cm--cmy-k and the other for the cluster
m - m y . The other four colour classes have been classifted by the binary decision tree. The CP network
constructed for m - m y contained 16 nodes and that for
cm--cmy-k 32 nodes.
Let p denote the correct classification and f the
observed frequency. Then the 1 -oe confidence interval
P(pl < P < P2) = 1 - oe is given by
'12/2 q- '1oe/2
Pl,2 = 2 f + ~
~ / 4 f ( 1 - f ) -'}- -"t2/2
Nr
N2 , (23)
2 1+
NrJ
where '12/2 is the fractil of the normal distribution at

the risk a / 2 and Nr is the size of the testing set.
Table 2 presents network's performance for different colour classes, as well as the confidence intervals
and the number of testing samples used.
The values presented in the column w, y, cy of the
table are averaged values for these colour classes. The
correct classification rates obtained for these classes
were very similar. Pixels coming from the c colour
class were also classified by the binary decision tree.
The lower classification accuracy for the class results
from its similarity to the cm colour class. The similarity arises due to the widely varying properties of
the m layer of the cm coverage as well as the varying properties of paper used for printing. For example,
dark micro-spots in paper make the c pixels look like
those of cm colour class. The last column of the table
Fig. 8. An example of an image containing "y" and "my" colour

classes.
183
Fig. 9. An example of an image containing several colour classes.
Fig. 10. An example of an image containing eight classes of colours (no k). A part of the image was classified by the developed neural
network.
Fig. 11. An image of dots printed with black ink on a magenta-yellow background.
Fig. 12. An image of dots printed with cyan, magenta and yellow
inks on a magenta-yellow background.
Table 2
Performance of the network and confidence intervals for different colour classes
Colours
w, 3,, ey
my
em
emy
era, emy, k
f
P1
P2
Nr
0.992
0.9914
0.9926
9 * 104
0.981
0.9801
0.9819
12 104
0.982
0.9811
0.9828
9 * 104
0.978
0.9770
0.9789
9 * 104
0.941
0.9395
0.9424
10 * 104
0.902
0.9003
0.9037
12 * 104
0.908
0.9064
0.9096
12 * 104
0.980
0.9791
0.9808
10 * 104
184
presents an averaged performance of the network for

full tone (solid) prints of the cm, cmy and k colour
classes. The other columns illustrate network's performance for half tone prints.
About 85% of pixels coming from the m and my
colour classes and about 80% of pixels from the em
colour classes have been classified in the second classification step. Nearly 65% ofpixels from the cmy and
k colour classes reach the third classification step.
Figs. 10, I 1 and 12 illustrate some examples of the
classification results. An example of an image containing eight colour classes (no k) is presented in
Fig. 10. A part of the image was classified by the developed neural network. Eight colour classes can be
easily found in the classified part of the image. The
colour classes w, e, m and y are displayed with colours
of their names. The m y colour class is shown in red,
cm in blue, ey in green, and cmy in black. Note that
no post-processing has been applied in the examples
presented. Figs. 11 and 12 illustrate the classification
results for the most overlapping pair of colour classes,
namely, emy and k. The dots presented in Fig. 11 have
been printed with black ink, while those presented in
Fig. 12 with a coverage of cyan, magenta and yellow
inks. The dots have been printed on the same magentayellow background in both pictures. The classification
results are shown only for the central part of the pictures. After the classification we display the class k
with a black colour and class cmy with a brown colour;
my class is displayed with red as in the previous picture. Therefore, a brown colour inside the classified
part of the image of Fig. 11 and a black colour inside
the classified part of the image of Fig. 12 means classification errors. Some small green spots can also be
found inside the brown dots. It means that these areas of paper were occasionally printed with only cyan
and yellow inks and no magenta. Since the ey pixels
are classified at the first step, without using adjacency
information, the green spots appear.
As has been already mentioned, the black ink can
appear on the all eight possible backgrounds. The
background cyan-magenta-yellow has proven to be
the most difficult one. Since the cyan-magenta-yellow
coverage also produces a black colour, the classification task in this case becomes a task of"finding" black
dots on a black background. For the human eye such
areas of a picture look completely black. Effects of
light diffusion in paper makes the classification task
very difficult. A correct classification rate of about 7075% has been obtained for the black dots on the cyanmagenta-yellow background. We hope that the classification results for such dark areas of pictures can
be improved by exploiting knowledge about the lightpaper-ink interaction and by using more elaborate extraction of additional features. The work on how an
artificial neural network can be used for finding a set
of additional features is going on. On the other hand,
for the application it is important to "find" black ink
on lighter areas of pictures.
In order to evaluate the results obtained from the
system we attempted to distinguish between two
"black" images using some other method for colour
image segmentation. It has been recently reported
good segmentation results of textured colour images
obtained using Gaussian Markov random field models (Panjwani and Healey, 1995). In this model it is
assumed that the rgb colour vector at each location is
a linear combination of neighbours in all three planes
plus Gaussian noise. The coefficients of the combination are estimated as parameters of the model. There
are three colour planes and four directions used.
Therefore there are 12 parameters of the model for
each colour plane. For two textures a difference in
estimated values of the parameters indicates the difference between textures themselves. This approach
has been chosen for the comparison. The two "black"
images have been treated as two textures with different spatial interaction of coloured pixels. Table 3
provides an example of the estimated values of the
parameters for R colour plane. As we can see from the
table there is no significant difference in the values
of the model parameters estimated from the image of
the picture printed in black ink and that printed in
cyan, magenta and yellow inks in this order on top of
each other. The same range of difference between the
estimated parameter values has been obtained for the
colour planes G and B.
7. Conclusions
Small neural networks of different origins and

different learning techniques have been combined
to make a hierarchical network for classification of
colour pixels. The hierarchical neural network performs analysis in a rough to fine fashion and enables
Table 3
Means and standard deviations of the estimated values of model
parameters for R colour plane
"Black image. . . .
Parameter
1
2
3
4
5
6
7
8
9
10
11
12
cmy image"
Mean
St. dev.
Mean
St. dev.
-0.1330
0.2248
-0.1364
0.5558
-0.0545
0.0751
-0.0352
0.0040
-0,0200
0.0411
-0.0251
0.0072
0.062
0.078
0.053
0.056
0.025
0.021
0.012
0.006
0.015
0.020
0.010
0.003
-0.1431
0.2458
-0.1500
0.5577
-0,047
0,0653
-0.0351
0.0151
-0.0171
0.0351
-0.0255
0.0062
0.050
0.071
0.064
0.072
0.023
0.021
0.015
0.015
0.013
0.025
0.016
0.005
a h i g h average classification speed and a low classification error. Experimentally, w e have shown that the
n e t w o r k is capable o f distinguishing a m o n g the nine
c o l o u r classes that o c c u r in a h a l f t o n e c o l o u r image.
A correct classification rate o f about 9 8 % has been
o b t a i n e d even for two very similar black colours,
n a m e l y the black printed in black ink and the black
printed in cyan, m a g e n t a and y e l l o w inks in this order
on top o f each other.
Acknowledgements
W e gratefully a c k n o w l e d g e the support we have
received f r o m T h e S w e d i s h National B o a r d for Industrial and Technical D e v e l o p m e n t and The R o y a l
S w e d i s h A c a d e m y o f Sciences. We also wish to thank
two a n o n y m o u s r e v i e w e r s for their valuable c o m m e n t s
on the manuscript.
185
References
Desieno, D. (1988). Adding a conscience to competitive learning.
Proc. ICNN I. IEEE Press, New York, 117-124.
Hunt, R.W.G. (1991). Measuring Colour. Ellis Horwood,
Chichester, UK.
Kohonen, T. (1990). The self-organizing maps. Proc. IEEE 78
(9), 1461-1480.
Liu, J. and Y.-H. Yang (1994). Multiresolution color image
segmentation. IEEE Trans. Pattern Anal. Machine lntell. 16 (7),
689-700.
Panjwani, D. K. and G. Healey (1995). Markov Random Field
models for unsupervised segmentation of textured color images.
IEEE Trans. Pattern Anal. Machine Intell. 17 (10), 939-954.
Song, H.-H. and S.-W. Lee (1996). LVQ combined with simulated
annealing for optimal design of large-set reference patterns.
Neural Networks 9 (2), 329-336.
Tan, T.S.C. and J. Kittler (1993). Colour texture classification
using features from colour histogram. Proc. SCIA-93, Tromso,
Norway, 807-813.
Tominaga, S, (1992). Color classification of natural color images.
Color Research and Application 17 (4), 230-239.
Uchiyama, M. and M.A. Arbib (1994). Color image segmentation
using competitive learning. IEEE Trans. Pattern Anal. Machine
Intell. 16 (12), 1197-1206.
Unnikrishnan, K.P. and K.P. Venugopal (1994). Alopex: A
correlation-based learning algorithm for feedforward and
recurrent neural networks. Neural Computation 6, 469-490.
Verikas, A. and K. Malmqvist (1995). Increasing colour image
segmentation accuracy by means of fuzzy post-processing. Proc.
IEEE Internat. Conf. on Artificial Neural Networks, Perth,
Australia, Vol. 4, 1713-1718.
Verikas, A., K. Malmqvist, L. Bergman and A. Gelzinis (1995).
An unsupervised learning technique for finding decision
boundaries. Proc. 5th European Conf on Artificial Neural
Networks, ICANN-95, Paris, Vol. 2, 99-104.
Verikas, A., K.Malmqvist and A. Gelzinis (1996a), A new
technique to generate a binary decision tree. Proc. Symposium
on Image Analysis, Lund, Sweden, 164-168.
Verikas, A., K. Malmqvist, L. Malmqvist and L. Bergman (1996b).
Weighting colour space coordinates for colour classification.
Proc. Symposium on Image Analysis, Lund, Sweden, 49-53.

1 s2.0 S0167865597000044 Main

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

1 s2.0 S0167865597000044 Main

Enviado por

Direitos autorais:

Formatos disponíveis

Pattern Recognition

Pattern Recognition Letters 18 (1997) 173-185

Colour image segmentation by modular neural network 1

0167-8655/97/$17.00 @ 1997 Elsevier Science B.V. All fights reserved.

A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

overlapping in the colour space. For example, m - m y

3. Choice of colour space

A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

Additional features from the |

] New variables from R, G, and B [

A binary decision tree

A counterpropagationnetwork for each set of ambiguous classes

[ A set of CP weightvectors{W1..... Wi..... wk}

Weighted Euclidean Distances

Zk~l[% (xi -- wik )]2 + ~1=1[all(Yt -- Uil)12}1'2

classes (era, emy and k).

The linear transform of the {R, G, B} vector by the

A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

sen the " I J K " c o l o u r space. A low classification error

4. Architecture of the neural network

Devoted to each set of ambiguous coiour classes

A. Verikaset al./Pattern Recognition Letters 18 (1997) 173-185

4.4. Benefits of the architecture

A. Verikaset al./Pattern Recognition Letters 18 (1997) 173-185

changed in the last classification step, depending on

5. Training the network

with xi being the ith component of the input data, w i

The function f ( ) is from - 1 to +1 with f ( 0 ) = 0,

wixi + " "

A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

experimental data (it was not the case in this study),

5.2.2. Training the network

where parameter {at} i s a slowly decreasing sequence

where t is the iteration index,/3 the learning rate (0 <

/3 < 1), yj the jth feature from the surrounding, v i =

where d (x, W/q) is the distance between the pixel being

A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

After learning, the network will output a vector ui =

5.3. Modified lvq

Assume that d i and dj are the Euclidean distances

dj) > 11-a

Ci(t + 1) = ci(t) -- a ( t ) [X(t) -- Ci(t) ] ,

5.5. Additional features

cj( t + 1) = ej( t) + a( t) Ix(t) - cj( t) ] ,

where a ( t ) decreases with time and 0 < a ( t ) < 1, ei

The features extracted from the surrounding

For all x falling into the window adapt:

where E[ ] is an average operator. The operators E,

for ck representing the closest weight vector.

where Nti denotes the number of samples from class i

The system was tested by a segmentation of colour

A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

respectively. Pixels from several windows containing

There are six coefficients controlling different steps

(at}: the learning rate controlling training of the

fortl <~ t<~ t2,

A. Verikas et al. /Pattern Recognition Letters 18 (1997) 173-185

Fig. 3, An example of an image taken from a full-tone print of

Fig. 4. An example of an image taken from a half-tone print of

Fig. 5. Dots of class k on the yellow background.

Fig. 6. Dots of class k on the magenta background.

Fig, 7. An example of the image containing eight colour classes.

6.3. Results obtained