Escolar Documentos
Profissional Documentos
Cultura Documentos
www.elsevier.com/locate/chemolab
Abstract
Didactic data sets representing a range of real-world processes are used to illustrate how to do representative process sampling and process
characterisation. The selected process data lead to diverse variogram expressions with different systematics (no range vs. important ranges; trends
and/or periodicity; different nugget effects and process variations ranging from less than one lag to full variogram lag). Variogram data analysis
leads to a fundamental decomposition into 0-D sampling vs. 1-D process variances, based on the three principal variogram parameters: range, sill
and nugget effect. The influence on the variogram from significant trends and outliers in the original data series receive special attention, due to
their critical adverse effects. We highlight problem-dependent interpretation of variographic analysis a.o. the problem-dependent background for
periodicities and trends. All presented cases of variography either solved the initial problems or served to understand the reasons and causes
behind the specific process structures revealed in the variograms. Process Analytical Technologies (PAT) are not complete without process TOS.
2006 Elsevier B.V. All rights reserved.
Keywords: Representative process sampling; Variography; Theory of Sampling (TOS); TSE estimation; Sampling protocol development; Process data structure;
Process interpretation
1. Introduction
The Theory of Sampling (TOS) has been introduced in chemometrics, analytical chemistry and process technologies by a
recent, dedicated Scandinavian effort, based on classical TOS
references [16]. TOS presents a complete methodology for
evaluating the total sampling error associated with both static
(so-called 0-D) sampling as well as process sampling (1-D
sampling). We here focus on the practical aspects of process
sampling especially in the light of the current focus on PAT:
Process Analytical Technologies. We aim to demonstrate that
TOS forms the missing link to PAT, i.e. that PAT is seriously
remiss without proper regard to representative process sampling.
Sampling representativity will always be strongly coupled to
process and product types, because each process/product poses
an intrinsic heterogeneity characteristic. It is not possible to rely
on one general sampling scheme for all process sampling, indeed
Corresponding author.
E-mail address: kes@aaue.dk (K.H. Esbensen).
0169-7439/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.chemolab.2006.09.011
42
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
aS aL
:
aL
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
N
X
FSEn
n1
43
44
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Q
Q
X
2
2
1 X
1
hqj hq
aqj aq :
2
2Qj q1
2QjaL q1
Fig. 1. Generic variogram v(j), illustrating the three key parameters: the nugget
effect, the range and the sill. For this particular variogram, the nugget effect
3.5, the sill 13.5 and the range = 10 lags.
The nugget effect is estimated by extrapolating the variogram backwards to v(0). A time lag equal to zero has no
physical meaning in itself, except that two samples should be
extracted at the same time but they can never represent true
replicates of the (same) volume. Samples with identical extraction time collapse the 1-D sampling situation to a 0-D
situation, for which the nugget effect reflects the smallest error
made by sampling twice in the same material at the same
localization (in reality: back-to-back increments/samples).
The nugget effect is appropriately also called the variance of the
2
minimal practical error SMPE
for 1-D sampling.
The nugget effect is a sum of all variances in the 0-D
sampling situation correct and incorrect sampling errors
as well as the total analytical error [16]. The practical
implication of the nugget effect is, that a 1-D sampling
experiment will not only result in an optimal sampling scheme
based on auto correlation in the process, but it will also result
in a reliable estimate of the variance for the minimal practical
2
error SMPE
.
The historical TOS literature [1,2], as well as the recent
didactic expos by Petersen and Esbensen [4] treats process
sampling and variography in full conceptual and theoretical detail. Below we shall treat practical process sampling
exclusively.
3. Experimental
This section makes use of industrial, technological and
other real-world data sets and gives a description of the
sampling and analytical setup that supports a variographic
evaluation and estimation of the total sampling error for each.
The latent information in process data series, brought about by
a careful problem-dependent interpretation of the corresponding variograms known as variographic analysis, or
variography, will accompany all data sets. After presentation
and analysis of these data sets (upper panels show the original
data series in raw units as well as in heterogeneity contributions) and their corresponding variograms (lower left panels),
they are all looked at again, now allowing a comprehensive
understanding of the systematics of estimation of TSE, the
total sampling error.
Practical estimation of TSE is extremely easy, as it follows
directly from the same basic calculations attending the variogram. It is possible at no additional sampling or analytical
cost to simulate all conceivable sampling schemes that might
be contemplated for improvement of a current sampling
process. All process sampling scheme are fully characterized
by two parameters only: the sampling rate, rsamp, and the
number of increments per sample, Q, one is willing to employ
in order to reduce TSE. A computer program, VARIO, has been
designed so as to perform the fundamental variogram calculations as well as to furnish a platform for evaluating optional
combinations of Q and sampling rate with the resulting TSE.
For practical reasons the sampling rate is often quantified by the
inversely related lag distance.
From a didactic point-of-view, it proves highly advantageous
first to present the specific data sets and their variographic
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Fig. 2. Aquarium lot, loaded with a mixture system of 25% yellow pellets
(analyte), shown at a stage of 2/3 complete emptying. Also shown, the three
scoop sizes: 5 ml, 25 ml and 100 ml, involved in the experiments used below for
variographic analysis. Pre-mixing left no visible concentration trends while still
retaining a significant spatial heterogeneity. Note relative size of lot and
sampling increment scoops relative to individual pellets.
45
Fig. 3. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 0.1% analyte concentration sampled by a 5 ml scoop.
46
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
0.1%
25
100
25
Fig. 4. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 0.1% analyte concentration sampled by a 25 ml scoop.
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
47
Fig. 5. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 0.1% analyte concentration sampled by a 100 ml scoop.
Fig. 6. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 25% analyte concentration sampled by a 5 ml scoop. A significant trend is present within the range: 250 lags.
48
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Fig. 7. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 25% analyte concentration sampled by a 100 ml scoop. A significant trend is present within the range 20 lags.
Fig. 8. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 25% analyte concentration sampled by a 100 ml scoop. Outlier No. 17 removed, compare to Fig. 7.
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
49
Fig. 9. Ribe Biogas plant, Denmark, the core of which consists of three 1800 m3
fermentation reactors. Feeding large fermentors (diameter: 8 m, height: 22 m) is
a delicate matter, in which optimal pre-mixing of the different raw materials
composition end-members has a critical influence on the efficiency of the biogas
process.
50
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Fig. 10. Scatter plot of CH4 and H2S revealing no significant correlation.
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
51
Fig. 11. VARIO results of H2S monitoring in produced biogas in Ribe Biogas plant. This three month data series is stationary (upper two panels), with no outliers,
allowing maximally informative variographic analysis. Note highly distinct 7-day periodicity in the variogram (lower left).
information present in a variogram leads to increased understanding of the process being analysed. Unrecognised
periodicities may easily have significant negative economic
consequences.
3.3. Process analysis proper (commodity prices)
3.3.1. Daily end-of-trade Zn prices
This data set (downloaded from the public domain)
represents a sector in which variographic analysis plays an
important, growing role, the financial sector. Any commercial
company with a significant interest in metallic raw materials or
products, or similar materials traded and quoted on the
international commodities exchanges, will be interested in
52
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Fig. 13. VARIO result for biogas (CH4) production. The total data set includes two significant outliers, dramatically revealed in the serial data plots (upper two panels).
analyzing (thereby hopefully more full understanding) the timedynamic fluctuations involved. Fig. 16 shows as an archetype
example from this arena, the daily end-of-trade prices of Zink
(Zn) in US $ per ton for a full five-year period.
Fig. 14. Identical data series as in Fig. 13, with the sole exception of original increment No. 2 deleted.
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
53
Fig. 15. Identical data series as in Fig. 14 with the sole exception of original increment No. 56 also deleted. Note the complete transformation of the variograms in Figs. 1315.
Fig. 16. VARIO results for full 5-year Zn-price data set. This variogram is for reference only.
54
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Fig. 17. VARIO results from the Zn2 data set restricted to prices <900 US$/ton, i.e. a bona fide stationary data series.
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
55
Fig. 18. VARIO results from the Zn1 data set for prices <1300 US$/ton, still influenced by too much trending.
both 100 days and 150 days and why not: this issue is truly
open for speculation.
For a more strict scientific data analysis point-of-view, there
is one new feature not yet encountered in any variograms above
however: there is no nugget effect at all! This would be a very
Fig. 19. Brent oil prices in the period January 1986 to September 2004.
56
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
Fig. 20. Brent oil process the largest coherent stationary segment cropped from Fig. 19.
uncertainty; the price was fixed with total certainty on the daily
closing time of the commodity trading session.
The proper variogram speaks very directly: here be only
process variation, Fig. 17. It is instructive to compare with the
trend-influenced variograms, Figs. 16 and 18.
3.3.2. Oil: Europe Brent spot price FOB
A second illustration pertains to a similar variographic
analysis of the daily spot price for North Sea Brent crude oil,
which serves as a benchmark commodity.
These two oil price data series and variograms, one
apparently suffering both from a prominent outlier peak as
well as potentially debilitating severe trends (Fig. 19), the other
carefully pruned never-the-less show practically identical
features, undoubtedly of trading interest, Fig. 20: there is a very
clearly expressed macro-scale periodicity (period: 800 days)
superposed on a zero range (again no nugget effect) trending
variogram showing signs of a decreasing trend slop with very
large lags. These features are, of course, most pronounced for
the pruned data set, but may never-the-less also be gleaned from
judicious analysis of the total data set albeit only with
considerable experience. Detailed interpretation of the meaning
of such complex relationships belongs to the geostatistic realm.
A comprehensive, authoritative reference for complex variogram analysis is Gringarten and Deutsch [8].
3.3.3. PITARD data: Flotation plant rougher feed
The last demonstration data set stems from the comprehensive TOS textbook by Pitard [2], used here to highlight the value
of de-trending. While the raw data set shows a clear increasing
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
57
Fig. 21. Pitard [2] gives several process data sets; shown here is the Flotation plant rougher feed data series, totaling 70 samples (p. 84). Compare with Pitard's
variogram on p. 94.
Two options for this pre-treatment were shown in action (timesegmentation and de-trending) as effective approaches.
The power of variographic analysis was demonstrated with
full clarity: absolutely all pertinent structural and/or MPE
Fig. 22. Pitard's [2] Flotation plant rougher feed data series, only de-trended.
58
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159
59
References
6. Vario
VARIO was programmed by Hans Henrik Friis-Pedersen.
VARIO is a freeware, available from the ACABS webpage
(www.acabs.dk). The program allows for importing data of
different types, performing basic statistic analysis in addition to
variographic analysis; it is easy to use and works on large data
sets as well as small. The present paper can be viewed as a
user's guide to analysis and interpretation by VARIO; more
information can be found on the home-page.
Acknowledgements
We thank two anonymous referees for the scholarly, penetrating and helpful reviews.