Escolar Documentos
Profissional Documentos
Cultura Documentos
1
0
0
D
.
B
r
o
d
n
j
a
k
-
V
o
n
c
i
n
a
e
t
a
l
.
/
A
n
a
l
y
t
i
c
a
C
h
i
m
i
c
a
A
c
t
a
4
6
2
(
2
0
0
2
)
8
7
1
0
0
9
1
9
2
D
.
B
r
o
d
n
j
a
k
-
V
o
n
c
i
n
a
e
t
a
l
.
/
A
n
a
l
y
t
i
c
a
C
h
i
m
i
c
a
A
c
t
a
4
6
2
(
2
0
0
2
)
8
7
1
0
0
D
.
B
r
o
d
n
j
a
k
-
V
o
n
c
i
n
a
e
t
a
l
.
/
A
n
a
l
y
t
i
c
a
C
h
i
m
i
c
a
A
c
t
a
4
6
2
(
2
0
0
2
)
8
7
1
0
0
9
3
9
4
D
.
B
r
o
d
n
j
a
k
-
V
o
n
c
i
n
a
e
t
a
l
.
/
A
n
a
l
y
t
i
c
a
C
h
i
m
i
c
a
A
c
t
a
4
6
2
(
2
0
0
2
)
8
7
1
0
0
D. Brodnjak-Von cina et al. / Analytica Chimica Acta 462 (2002) 87100 95
Fig. 1. Plot of the normalised AOX variable and sampling time settled by translation of the 9 year period into days. The important fall of
this parameter and consequently the improvement of water quality after the year 1994 is evident. Samples are numbered from 1 to 207.
207 18 elements. A total of 207 rows represent wa-
ter samples composed of 18 variables. Data was addi-
tionally pre-processed on two different ways. First the
column centring of the data was used, what means
that the mean value of each column was subtracted
from individual (207) elements. Second, the autoscal-
ing of individual variables was performed, called col-
umn standardisation. With this procedure the mean
of the column elements is subtracted from individual
elements and divided by the column standard devia-
tion. Consequently, each column has zero mean and
unit variance. The percentages of variances in resulting
eigenvectors (PCs) for both types of pre-processing of
the data is shown in Table 2 .
From Table 2, it can be seen that using column cen-
tred data, 99.8% of variance is gathered in the rst two
PCs. However, analysing the composition of the rst
and the second PCs it was found out that almost all of
variance is that of AOX (variable 18, v18 in Table 1).
Consequently, there would not be much different to
analyse plots of samples v18 (AOX) against v7 (COD)
or v17 (suspended solids), which are the two second
most informative variables. For this reason, only the
PCA using autoscaled variables was further analysed.
With the autoscaled variables, 49.5% of total variance
was achieved in the rst two principal components.
Any conclusion on the basis of plots shown in the
space of PC1 and 2 would neglect >50% of total infor-
mation about the data. Some rough indications from
the obtained distribution of transformed samples were
derived anyway, however, for further evaluation of the
water samples other chemometrical methods, such as
96 D. Brodnjak-Von cina et al. / Analytica Chimica Acta 462 (2002) 87100
Table 2
Comparison of variances in PCA using two different scaling modes,
column centring of data (m = 0.0) and autoscaling (m = 0.0,
s = 1.0)
PC Column centring
of data
Column standardisation
(autoscaling) of data
Variance (%) Total Variance (%) Total
1 99.42 99.42 35.47 35.47
2 0.38 99.80 13.99 49.46
3 0.10 99.90 10.83 60.29
4 0.04 99.94 7.50 67.79
5 0.02 99.96 5.50 73.29
6 0.02 99.98 4.59 77.89
7 0.01 99.99 3.83 81.72
8 0.01 100.00 3.63 85.35
9 0.00 100.00 2.91 88.26
10 0.00 100.00 2.74 91.00
11 0.00 100.00 2.24 93.25
12 0.00 100.00 1.98 95.23
13 0.00 100.00 1.52 96.75
14 0.00 100.00 1.11 97.85
15 0.00 100.00 0.70 98.55
16 0.00 100.00 0.56 99.12
17 0.00 100.00 0.46 99.58
18 0.00 100.00 0.42 100.00
Kohonen and counterpropagation ANNs were imple-
mented.
In Fig. 2, the biplot resulting from PCA of the water
samples represented with 18 variables is shown. It can
be seen that the rst component, PC1, is associated
with a group of variables such as nitrite, nitrate con-
centrations, phosphate, suspended solids, AOX, COD,
and BOD. The second component PC2 represents
mainly the dependence on temperature (variables 1
and 2, printed bold in Fig. 2, correspond in Table 1
to v2 and v3, respectively). It is evident from Fig. 2
that samples separated from the main central cluster
and distributed in the region of larger values of PC1
were all collected before the year 1994 (sample labels
<90). Since it is known that the main source of pollu-
tion (the old technology of leaching of cellulose) was
eliminated after the year 1994, it was derived that the
rst principal component explains properties of water
samples, which are describing high pollution.
3.3. Kohonen neural network
The Kohonen neural network has been used as a
non-linear mapping method. Again, the analysis of
formed clusters shows that the samples before the year
1994 are separated from samples collected later. The
clusters coincide with the quality of river water which
has been improving through all 9 years. It is also evi-
dent from the experimental data that the quality of wa-
ter from the rst sampling site located in the upper part
of the river stream (Spielfeld) is better than the quality
of the water downstream, in Bad RadkersburgGornja
Radgona. Regarding the different sampling sites it is
evident that the quality of water is worse in the Slove-
nian side of the river which also coincides with the dis-
tribution of samples in the Kohonen map. The weights
in the ith level of the Kohonen neural network [27,29]
correspond to the ith variable of the sample represen-
tation vector (Table 1). The distribution of weights in
individual levels indicates the reason for clustering of
samples.
Grouping of measurements with respect to the
months of sampling show that the variables that have
lower values in October compared to the values ob-
tained in winter months (January and February) are
the content of dissolved oxygen, BOD5, and ammo-
nium (see Table 1, v4, v9, and v13, respectively).
The high water temperature and low production of
the oxygen in October cause the differences.
3.4. Prediction of biologically determined classes
Among 207 samples that were analysed for physico-
chemical variables, 56 samples were analysed ad-
ditionally for biological variables. From the latter
variables, the biological classes are determined. Four
sample labels (58) were associated with the follow-
ing biologically determined classes: class II, mod-
erately pollutedlabel 5; class II with the tendency
IIIII, moderately to critically pollutedlabel 6,
classes IIIII, critically pollutedlabel 7 and class
III, heavily pollutedlabel 8. From the described
experimental classication of 56 samples it can be
seen that it is difcult to classify water into a few bi-
ological classes. For example, between the classes II
and III, two additional levels were introduced indicat-
ing the intermediate quality of water. Consequently,
the classes could be transformed to real numbers in
a range between 5 and 8. With these 56 samples,
the counterpropagation articial neural network (CP
ANN) [26,29] model was built. The 18-dimensional
neurons were placed in the 9 9 network. The CP
D. Brodnjak-Von cina et al. / Analytica Chimica Acta 462 (2002) 87100 97
Fig. 2. Biplot (scores and loadings) of 207 samples and 18 variables in the PC12 co-ordinate system for water samples of river Mura.
The sample numbers from 1 to 207 are given in Table 1, while the original variables (118, printed bold in the biplot) forming the PC1
and 2 components are dened in Table 1 as v219 because the water ow (the variable v1) was previously eliminated (explained in the
paragraph statistical screening of data).
ANN was trained for 240 epochs, which was sufcient
for a satisfactory recognition of the training samples.
The 18 components of each samples vector repre-
sentation are physico-chemical variables described
in Section 2. The maximal and minimal correction
factors in the modelling procedure were 0.4 and 0.01,
respectively. The prediction results of the 56 training
samples are shown in Fig. 3.
In Fig. 3, the regression line between the experimen-
tal and predicted biological class numbers of training
samples is shown. The standard deviation of prediction
residuals, SEP = 0.247, and the correlation coef-
cient R = 0.958 prove that the CP ANN model trained
with 56 samples describes a good correlation between
18-component vector representation of samples
(physicochemical properties) and biological classes.
The constructed model was tested with remaining
151 samples (out of 207) for which the biological
class was not known nor determined experimentally.
Since there is no information about experimental bi-
ological classes for these 151 samples, the quality of
the prediction results can not be conrmed. However,
the trend of improving the water quality assessed by
the biological classication of 56 training samples is
obvious. The resulting predictions with respect to the
sampling year are shown in Fig. 4.
The biological classes predicted for 151 samples
show the same trend of improvement of the quality
98 D. Brodnjak-Von cina et al. / Analytica Chimica Acta 462 (2002) 87100
Fig. 3. Regression line of predictions of 56 training samples with constructed CP ANN model. A and B are the estimated parameters,
intercept and slope, of the regression line. Their standard errors are also given. S.D. is estimated standard deviation of the tting, and R
the correlation coefcient between the experimental and predicted biological classes.
Fig. 4. The prediction of biological class numbers of 151 samples using CP ANN model. The samples are discriminated by the year in
which they were gathered.
D. Brodnjak-Von cina et al. / Analytica Chimica Acta 462 (2002) 87100 99
of water as observed for the 56 training samples. The
biological investigations are time consuming in com-
parison to the determination of physico-chemical pa-
rameters and already a rough prediction of biological
class numbers is helpful.
4. Conclusions
The study has given us the opportunity to follow
all processes involved in the complex system of sur-
face water pollution. The time series on overall pol-
lution levels as well as results of specic measuring
parameters are important indicators and can be used
for planing short term and long term preventive action.
In this work, standard multivariate statistical methods
and PCA was used for pre-screening of the data. It
was shown that it is necessary to use autoscaled vari-
ables. From the results, it was concluded that the PCA
method is not discriminant enough since the variables
are weakly correlated. Less than 50% variance is ex-
plained in the rst two principal components. For the
classication of this kind of data the non-linear meth-
ods such as articial neural networks are more suit-
able. The articial neural networks were implemented
as the method for clustering of all 207 water samples
as well as for the predictions of biological classes. The
analysis has shown that AOX content is the parame-
ter with the greatest discriminating power. The results
obtained from the evaluation of data gathered during
the 9-years monitoring of Mura river water conrmed
that the improvement of the quality of water during the
last 9 years is signicant and, therefore, the Austrian
Project for improving the quality of rivers can be con-
sidered as successful.
One of the goals of the research presented in
this work was to nd correlation between biologi-
cal classes and chemical parameters. Because of the
time-consuming biological analyses, only a small
amount of water samples were chosen for the pro-
cedure of determination of biological classes. The
experience-based CP ANN model was built using the
water samples for which the biological activity was
known. By the constructed model the rest of the sam-
ples were examined to obtain the prediction of bio-
logical activity. The predicted values were in the same
range as training samples values; besides, from the
predicted biological activities the trend of the water
quality improvement was evident. Although the usual
validation procedures to estimate the quality of the
model were not applicable because of low number of
available training samples, the overview of prediction
results indicates that the biological activity obtained
from the proposed model is of signicant value in the
case that the experimental values are not available.
Acknowledgements
The authors thank the Ministry of Education,
Science and Sport of Republic of Slovenia, con-
tract numbers P1-0507-0104, and P1-0508-0104 for
nancial support. The Amt der Steiermarkischen
Landesregierung, Graz, Austria, is kindly acknowl-
edged for completing the data about Mura river water
samples with their results.
References
[1] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De
Jong, P.J. Lewi, J.S. Verbeke, Handbook of Chemometrics
and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.
[2] P. Zhang, N. Dudley, A.M. Ure, D. Littlejohn, Anal. Chim.
Acta 258 (1992) 110.
[3] W.D. Alberto, D.M. Del Pilar, A.M. Valeria, P.S. Fabiana,
H.A. Cecilia, B.M. De Los Angeles, Water Res. 35 (2001)
28812894.
[4] R. Lindegren, M. Josefson, Chemometr. Intell. Lab. Syst. 44
(1998) 403409.
[5] A.K. Meng, I.H. Suffet, Environ. Sci. Technol. 31 (1997)
337345.
[6] E. Marengo, M.C. Gennaro, D. Giacosa, C. Abrigo, G. Saini,
M.T. Avignone, Anal. Chim. Acta 317 (1995) 5363.
[7] W.M. Jarman, G.W. Johnson, C.E. Bacon, J.A. Davis, R.W.
Risebrough, R. Ramer, Fresenius J. Anal. Chem. 359 (1997)
254260.
[8] P. Barbieri, G. Adami, A. Favretto, E. Reisenhofer, Fresenius
J. Anal. Chem. 361 (1998) 349352.
[9] M.M.C. Ferreira, C.G. Faria, E.T. Paes, Chemometr. Intell.
Lab. Syst. 47 (1999) 289297.
[10] J.B. Marzo, M.J.M. Hernandez, S. Sagrado, E. Bonet, R.
Gimenes, J. Chemometr. 12 (1998) 323336.
[11] M.P. Kallio, S.P. Mujunen, G. Hatzimihalis, P. Koutoudes,
P. Minkkinen, P.J. Wilkie, M.A. Connor, Anal. Chim. Acta
393 (1999) 181191.
[12] M.F. Wilkins, L. Boddy, C.W. Morris, Binary-Comput.
Microb. 6 (1994) 6472.
[13] Water Quality-SamplingPart 11: Guidance on Sampling of
Ground Waters, ISO 5667-11: 1992 (E).
[14] Water Quality, Guidelines for the Determination of Total
Organic Carbon (TOC), ISO 8245: 1987 (E).
100 D. Brodnjak-Von cina et al. / Analytica Chimica Acta 462 (2002) 87100
[15] Water Quality, Determination of Adsorbable Organic Halogen
(AOX), ISO 9562: 1989 (E).
[16] Water Quality, Determination of the Chemical Oxygen
Demand, ISO 6060: 1989 (E).
[17] Water Quality, Determination of Biochemical Oxygen
Demand After 5 Days (BOD5), ISO 5815: 1989 (E).
[18] Water Quality, Determination of Suspended Solids by
Filtration Through GlassFibre Filters, ISO/DIS 11923: 1995
(E).
[19] Water Quality, Determination of pH, ISO 10523: 1994 (E).
[20] Water Quality, Determination of Ammonium, ISO 7150/1:
1984 (E).
[21] Water Quality, Determination of Nitrate, ISO 7890/1: 1986
(E).
[22] Water Quality, Determination of Dissolved Anions by Liquid
Chromatography, ISO 10304-2: 1995 (E).
[23] Water Quality, Determination of Nitrite, ISO 6777: 1984 (E).
[24] Water Quality, Determination of Phosphorus, ISO 6878/1:
1986 (E).
[25] Teach/Me, SDLSoftware Development Lohninger Teach/
Me DataLab 2.002 1999 Springer, Berlin, Developed by
H. Lohninger and the Teach/Me People.
[26] J. Zupan, J. Gasteiger, Neural Networks for Chemists: An
Introduction, Verlag Chemie, Weinheim, 1993.
[27] T. Kohonen, Self-Organization and Associative Memory,
Springer, Berlin, 1988.
[28] J. Lozano, M. Novi c, F.X. Rius, J. Zupan, Chemometr. Intell.
Lab. Syst. 28 (1995) 6172.
[29] J. Zupan, M. Novi c, I. Ruisanchez, Chemometr. Intell. Lab.
Syst. 38 (1997) 123.
[30] J. Zupan, M. Novi c, J. Gasteiger, Chemometr. Intell. Lab.
Syst. 27 (1995) 175187.
[31] N. Majcen, K. Rajer-Kandu c, M. Novi c, J. Zupan, Anal.
Chem. 67 (1995) 21542161.