Você está na página 1de 19

Chemometrics and Intelligent Laboratory Systems 88 (2007) 41 59

www.elsevier.com/locate/chemolab

Representative process sampling in practice: Variographic


analysis and estimation of total sampling errors (TSE)
Kim H. Esbensen , Hans Henrik Friis-Petersen, Lars Petersen,
Jens Bo Holm-Nielsen, Peter P. Mortensen
ACABS, Aalborg University Esbjerg (AAUE) 6700 Esbjerg, Denmark
Received 20 June 2006; received in revised form 26 September 2006; accepted 29 September 2006
Available online 5 December 2006

Abstract
Didactic data sets representing a range of real-world processes are used to illustrate how to do representative process sampling and process
characterisation. The selected process data lead to diverse variogram expressions with different systematics (no range vs. important ranges; trends
and/or periodicity; different nugget effects and process variations ranging from less than one lag to full variogram lag). Variogram data analysis
leads to a fundamental decomposition into 0-D sampling vs. 1-D process variances, based on the three principal variogram parameters: range, sill
and nugget effect. The influence on the variogram from significant trends and outliers in the original data series receive special attention, due to
their critical adverse effects. We highlight problem-dependent interpretation of variographic analysis a.o. the problem-dependent background for
periodicities and trends. All presented cases of variography either solved the initial problems or served to understand the reasons and causes
behind the specific process structures revealed in the variograms. Process Analytical Technologies (PAT) are not complete without process TOS.
2006 Elsevier B.V. All rights reserved.
Keywords: Representative process sampling; Variography; Theory of Sampling (TOS); TSE estimation; Sampling protocol development; Process data structure;
Process interpretation

1. Introduction
The Theory of Sampling (TOS) has been introduced in chemometrics, analytical chemistry and process technologies by a
recent, dedicated Scandinavian effort, based on classical TOS
references [16]. TOS presents a complete methodology for
evaluating the total sampling error associated with both static
(so-called 0-D) sampling as well as process sampling (1-D
sampling). We here focus on the practical aspects of process
sampling especially in the light of the current focus on PAT:
Process Analytical Technologies. We aim to demonstrate that
TOS forms the missing link to PAT, i.e. that PAT is seriously
remiss without proper regard to representative process sampling.
Sampling representativity will always be strongly coupled to
process and product types, because each process/product poses
an intrinsic heterogeneity characteristic. It is not possible to rely
on one general sampling scheme for all process sampling, indeed
Corresponding author.
E-mail address: kes@aaue.dk (K.H. Esbensen).
0169-7439/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.chemolab.2006.09.011

such a notion is but a wishful thinking. Empirical, experimental


sampling error evaluations are needed for every principally new
product or process. TOS offers a completely general approach
for characterizing the 1-D heterogeneity of any process or 1-D
material stream termed variography. The experimental variogram will allow a simulation of any-and-all sampling scheme
contemplated, based on one series of a maximum of some 4060
samples only.
In this work focus is on the practical issues associated with a
variographic approach for estimating the total sampling errors,
including the analytical errors s.s. Typically sampling procedures for processes or 1-D material streams can be divided into
three separate steps:
Primary sampling, e.g. a flow sampling device, a valve or
any other, local implement. Sampling devises may either be
structurally correct or incorrect, as fully delineated by the
Theory of Sampling (TOS) [14,6].
A secondary sampling step is often necessary for (representative) mass reduction [5] of the primary process sample

42

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

into a manageable sample size for the subsequent laboratory


handling and analysis (quality control laboratory or similar).
A tertiary sampling step, associated with producing the
often minute amount of material actually used for analysis
[16].
For process sampling in industry at large, over time many
different combinations of equipment and procedures have been
developed locally, very many of which are far from based on a
full understanding of the Theory of Sampling (TOS) unfortunately, ibid.
The scope of the present work is to present a universal
approach for evaluating the existing sampling equipment and
the 1-D heterogeneity present for any process/product stream
and to suggest improvement initiatives where necessary. Even
though there exist a widely varying plant and production
logistics situation at industrial process technological production
sites at large, it will never-the-less be possible to delineate an
extremely simple approach based for the estimation of the total
sampling error (TSE) based on a data analytic (variographic)
treatment of as little as 4060 samples only.
2. Theory of Sampling (TOS) brif
The process of taking a sample characterizing a complex,
large system based on a small part hereof is, contrary to many
beliefs, not an easy one, although the task specification could
not be simpler: a sample is a portion, piece or segment representing a class or a larger whole (the lot). If the sample does in
fact not represent what it is supposed to, erroneous deductions
and conclusions will invariably follow no matter how
precise the subsequent analysis. The great statistician and data
analyst John Tukey said: It is better to be approximately right,
than being precisely wrong, signifying that there has to be a
balance between the precision of the assaying technique and the
representativity of the physical materialization of a sample. This
distinction between the accuracy of the sampling process (also
known as the bias) and the precision of the analysis is crucial for
any appreciation of TOS; it is laid out in full in the basic
sampling literature, e.g. [16].
A minimalist terminology used in TOS:
The lot is the original material subjected to the sampling
procedure. A lot can be zero-dimensional (0-D) like a sample
container, a truckload of material or all the material in a
reactor or one-dimensional (1-D), i.e. flowing through a
pipeline or on a conveyor belt.
A sample is the correctly extracted material from the lot
(definition follows).
A specimen is a material extracted from the lot in an
incorrect fashion.
An increment is a partial sample unit, which combined with
other increments form a composite sample.
A fragment is the smallest physically separable unit in the
lot. E.g. a molecule, granule or grain.
A group of fragments consists of spatially correlated
fragments, which act as a coherent unit during sampling.

The analytical grade aL of the lot is the mass of analyte


divided by the total mass of the lot.
The analytical grade aS of the sample is the mass of analyte
divided by the total sample mass.
TOS defines sampling as a multi-stage process, which covers
all operations from an increment is materialized until an aliquot
(measured portion of a sample taken for analysis) is administered to the final analytical operation for example a
spectrometric determination in the quality control laboratory.
2.1. Correct sampling (representative sampling)
The goal of a sampling procedure is to extract a sample with
the same properties as the lot where the sample came from a
representative sample. The basic prerequisite for a representative sampling procedure is that all elements in a batch, container, or in a pipeline cross-section have the same probability of
being selected, and that the elements selected are not altered in
any way after the sample (or increment) has been taken. All
elements that do not belong to the batch or sample container
must have zero probability of being selected meaning e.g.
cross contamination between samples (increments) has been
eliminated. These two criteria make up the fundamental
sampling principle (FSP), which cannot ever be violated.
With process sampling, a sample is preferably always
materialized through several increments from the lot to make
up a composite sample.
The relative sampling error is defined as:
e

aS aL
:
aL

A sampling process is said to be accurate if the average error


me practically equals zero (me2 0); this results in no sampling
bias. Likewise the sampling process is said to be reproducible if
the variance of the sampling error is less than a small
predetermined value se2 se2.
The notion representative is a compound property of the
relative error, which includes both the systematic and random
part of the sampling error, termed re:
re2 m2e s2e :
Only a correct selection procedure [16] results in samples
that are both accurate (property of the mean), reproducible
(property of the variance), and hence representative. Still, any
specific analytic results, aS, is but an estimate of the true
(average) aL.
The random component of the sampling error represented by
the variance tends to reduce when averaging over a large number
of increments (or samples, as the case may be). The systematic
part, however, does not. It is essential to assure a correct and
hence accurate sampling in order to cancel the systematic part,
the bias part of the sampling error. When a correct sampling is in
place the potential in TOS lies in characterizing and minimizing
(if not eliminating) as many as possible of the remaining
sampling errors, ibid.

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

2.2. Sampling errors


Analytical results always have an inherent uncertainty. This
uncertainty is a consequence of imprecision caused by sampling
errors, in addition to the ultimate analytical error, at all events
from the moment the sample is defined (delineated) until the
analytical procedure is completed. Sampling is not limited to
the actual materialization of the sample but also composed of all
steps from materialization until an aliquot is administered to the
analyzer. The global estimation error (GEE) is a sum of the
total analytical error (TAE) and the total sampling error (TSE):
GEE TAE TSE:
In TOS the total sampling error (TSE) splits into seven
errors components, some of which characterize the material
sampled and some the sampling procedure itself. Five errors are
present at every sampling stage (n). GEE can be expressed as:
GEE TAE TFE CFE

N
X

FSEn

n1

GSEn IDEn IEEn IPEn :


With exception of IPEn (incorrect preparation error) all
errors are regarded as random variables with a given average
(might be zero) and variance (never zero).
FSEn and GSEn are the fundamental sampling error and the
grouping and segregation error. These error components together
represent the minimal practical error or the correct sampling
error and are material-specific errors that will always be present in
a sampling situation. The remainder of indexed errors concerns
the sampling equipment and procedures. Together they are termed
the incorrect sampling errors and they can all be eliminated from
a sampling scheme, although this is not necessarily an easy task.
These fundamental aspects are treated in complete detail and
depth in the basic TOS literature [16].
TFE and CFE (time fluctuation error and cyclic fluctuation
error) relates only to 1-D sampling, process sampling, where
one dimension is dominating. 1-D lots in the industrial arena
would typically be a material flowing through a pipeline or
being conducted on a conveyor belt. However a conceptual 1-D
body, e.g. a series of production units (bags, sacks, drums,
boxes, etc.) is just as open for process sampling.
2.2.1. Fundamental sampling error (FSE); grouping and
segregation error (GSE)
The compositional differences between particles in a lot
always result in a sampling error (called the fundamental
sampling error) because not all particles are analyzed. The
adjective fundamental is substantiated by the fact that this
error is always present in any practical sampling situation,
meaning that even a perfect sampling implement will not be
able to materialize two samples with the exact same composition. The fundamental sampling error (FSE) is the smallest
possible sampling error, but usually there will be other sampling
errors present if not extreme measures have been taken. FSE can
only be altered (minimized) by changing the constitution of the
material system sampled, e.g. by comminution or crushing.

43

The grouping and segregation error (GSE) originates from


an inherent tendency for particles to segregate and/or to group
together (spatial coherence to a larger or smaller degree) in a
lot. Unlike FSE this error is not invariant, since mixing or
segregation will change (reduce) its numerical magnitude.
2.2.2. Minimal practical error (MPE)
MPE is by definition the sum of the fundamental error (FSE)
and the grouping and segregation error (GSE):
MPE FSE GSE:
The minimal practical error plays a particularly important
role in variography.
2.2.3. Incorrect sampling errors (ISE)
Different errors are made during sampling and they all
affect the degree of representativity of the final aliquot going
into the analytical equipment in the laboratory. The incorrect
sampling errors all come from ill-designed sampling procedures or equipment, non-optimal maintenance hereof or
human errors. However, they are possible to avoid when one
is aware of their existence. Two of the incorrect sampling
errors the incorrect delimination error (IDE) and incorrect
extraction error (IEE) relates to the selection of material. IDE
is related to the shape of the sampling tool, which physically
pull out an increment from the lot. The tool must carefully
divide the sample from the bulk in a way that ensures a
reproducible sampling process. Faulty, non-identical, delimitation of increments is often the cause for severe sampling
errors. IEE concerns the actual filling of the sampling tool.
Physically, according to the requirements of TOS, all particles
with their center of gravity inside the boundaries of the
sampling tool must be extracted with the increment. The incorrect preparation error (IPE) relates to all forms of alteration of the increment after it has been extracted. This could be
loss of material (e.g. fines or moisture), pollution, or other
alterations of the sample material.
These incorrect sampling errors should all be eliminated
instead of estimated [16].
In principle, there is no trade-off or bargain with incorrect
sampling errors, but in practice one can investigate to which
degree a specific sampling scheme deviates from a fully representative process for a specific type of material and whether to
accept structural errors or not from a practical point of
view. If this is contemplated, the absolute minimum argumentation must include a quantitative assessment of the
magnitude of all sampling, and analytical errors!
All this, and much more, can be achieved in one single
operation, establishing a variographic heterogeneity characterisation of the 1-D process or material body in question.
2.3. 1-D sampling the variogram
The true concentration of a particular component in a material stream (pipeline, conveyor belt, product series) is never
known. The conceptual series of successive concentrations of

44

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

component A along the 1-dimension is in theory a continuous


function of time, a(t). For practical purposes, a(t) is always
estimated by extracting and analyzing discrete increments at
several points in time however, i.e. as a discrete function in
time. The sampling frequency adapted determines the resolution
of the time series a(t) and hence the general perception of the
process the sampling frequency limits our knowledge of how
fast the process varies or fluctuates. One should always pose the
question: What is the optimal sampling frequency? There can
never be a unique answer to this question, but the variogram
provides a very informative answer by quantifying autocorrelation as a function of distance between the sampling points (time
lags, or just lags).
The variogram is defined by the following master
equation:
m j

Q
Q
X
2

2
1 X
1
hqj hq
aqj aq :
2
2Qj q1
2QjaL q1

Q is the number of equidistantly distributed increments in the


time series a(t), j is the lag a dimensionless parameter
reflecting the distance between two increments. Samples are
either characterized by the analytical concentrations, aq or by
the relative heterogeneity contribution, hq, as defined in the
basic literature [16]; for the present purpose it is the relative
variations which matter most. Variograms have many
different appearances, but three important features are the
sill, the range, and the nugget effect, see Fig. 1.
The sill represents the average overall variance of the
heterogeneity s2 (hq). It is an important feature, because it
represents the maximum variation within a time series and is
an indicator for spatial randomness meaning that when the
variogram converges towards the sill, points in the time
series a(t) are no longer correlated. The range is that part of
the variogram for which the variogram function v(j) lies
below the sill. The smaller the lag, and the smaller V(j), for
lags within the range, increments are progressively more
auto-correlated. This implies that sampling with these increment distances will pick up variation in the process with
increasing reliability (as the lag distance goes down) and this
feature will enable a concomitant measure of monitoring
control of the process.

Fig. 1. Generic variogram v(j), illustrating the three key parameters: the nugget
effect, the range and the sill. For this particular variogram, the nugget effect
3.5, the sill 13.5 and the range = 10 lags.

The nugget effect is estimated by extrapolating the variogram backwards to v(0). A time lag equal to zero has no
physical meaning in itself, except that two samples should be
extracted at the same time but they can never represent true
replicates of the (same) volume. Samples with identical extraction time collapse the 1-D sampling situation to a 0-D
situation, for which the nugget effect reflects the smallest error
made by sampling twice in the same material at the same
localization (in reality: back-to-back increments/samples).
The nugget effect is appropriately also called the variance of the
2
minimal practical error SMPE
for 1-D sampling.
The nugget effect is a sum of all variances in the 0-D
sampling situation correct and incorrect sampling errors
as well as the total analytical error [16]. The practical
implication of the nugget effect is, that a 1-D sampling
experiment will not only result in an optimal sampling scheme
based on auto correlation in the process, but it will also result
in a reliable estimate of the variance for the minimal practical
2
error SMPE
.
The historical TOS literature [1,2], as well as the recent
didactic expos by Petersen and Esbensen [4] treats process
sampling and variography in full conceptual and theoretical detail. Below we shall treat practical process sampling
exclusively.
3. Experimental
This section makes use of industrial, technological and
other real-world data sets and gives a description of the
sampling and analytical setup that supports a variographic
evaluation and estimation of the total sampling error for each.
The latent information in process data series, brought about by
a careful problem-dependent interpretation of the corresponding variograms known as variographic analysis, or
variography, will accompany all data sets. After presentation
and analysis of these data sets (upper panels show the original
data series in raw units as well as in heterogeneity contributions) and their corresponding variograms (lower left panels),
they are all looked at again, now allowing a comprehensive
understanding of the systematics of estimation of TSE, the
total sampling error.
Practical estimation of TSE is extremely easy, as it follows
directly from the same basic calculations attending the variogram. It is possible at no additional sampling or analytical
cost to simulate all conceivable sampling schemes that might
be contemplated for improvement of a current sampling
process. All process sampling scheme are fully characterized
by two parameters only: the sampling rate, rsamp, and the
number of increments per sample, Q, one is willing to employ
in order to reduce TSE. A computer program, VARIO, has been
designed so as to perform the fundamental variogram calculations as well as to furnish a platform for evaluating optional
combinations of Q and sampling rate with the resulting TSE.
For practical reasons the sampling rate is often quantified by the
inversely related lag distance.
From a didactic point-of-view, it proves highly advantageous
first to present the specific data sets and their variographic

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

Fig. 2. Aquarium lot, loaded with a mixture system of 25% yellow pellets
(analyte), shown at a stage of 2/3 complete emptying. Also shown, the three
scoop sizes: 5 ml, 25 ml and 100 ml, involved in the experiments used below for
variographic analysis. Pre-mixing left no visible concentration trends while still
retaining a significant spatial heterogeneity. Note relative size of lot and
sampling increment scoops relative to individual pellets.

interpretations; Section 4 then summarizes their specific TSEestimation issues.


3.1. The aquarium experiments
2003 ACABS carried out an extensive didactic experimental campaign, designed to illustrate major sampling factors
and their influence on the total sampling error, and its
breakdown into FSE, GSE, ISE, CSE, amongst other grab
sampling vs. composite sampling, increment size vs. lot size,

45

as well as the fundamental influence on sampling variation


related to the concentration level(s) of the analyte. All experiments were made according to an identical scheme: a
model lot (a plexi-glass aquarium, hence the naming of these
experiments) was always emptied completely, scoop by scoop.
Fig. 2 shows the three increment scoop sizes involved in the
present selection of experiments. The model lot was filled with
a system of differently coloured polymer pellets (yellow:
analyte and black: matrix (also often called gangue in TOS).
Each scoop extracted was analysed by separating the yellow
pellet fraction from the black, followed by weighing. Using a
precise laboratory scale for this weighing, the resulting TAE
was essentially nil, allowing the full influence of the sampling
error components to be manifested as clearly as possible. Only
a small part of the total results of the campaign are used here
for variographic analysis (the complete experiments are
presented elsewhere).
Two different concentration levels are investigated here:
0.1% and 25% (w/w). These mixture systems were sampled
completely using three scoop sizes: 5 ml, 25 ml and 100 ml
respectively (realistically related to the total volume of the
model lot). The total weight of mixture material in the container
was 4.000 kg, corresponding to a volume of approximately
3500 ml. For both the trace concentration, 0.1% and the major
level concentration, 25%, each loading of the model lot was
performed as a pre-mixing of the yellow and black pellet
fractions in the aquarium. Mixing was effective, but not
complete (i.e. not reaching a complete minimum residual
heterogeneity), as it was desired to illustrate sampling from
significantly heterogeneous systems; mixing was therefore

Fig. 3. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 0.1% analyte concentration sampled by a 5 ml scoop.

46

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

performed only until visual inspection did not reveal any


systematic trends. Fig. 2 shows a 2/3 emptied model lot
(concentration 25%).
Nominal concentration
25%

0.1%

Nominal scoop size [ml]


100

25

100

25

Realised scoop avr. (gram)


90.89 24.26 4.98 95.24 27.02 6.23
Scoop std. dev. (gram)
11.55 2.24 0.55 11.51 2.56 0.65
Realized concentration avr. (%) 25.14 25.05 25.01
0.10 0.10 0.10
Concentration std. dev. (%)
2.30 2.25 3.098 0.04 0.08 0.15

3.1.1. Concentration 0.1%5 ml increment size642 scoops in


total
Fig. 3 displays the original data series pertaining to all 642
individual analytical concentration determinations (shown
as both absolute concentrations, as well as expressed in
heterogeneity contribution units top left and right panel
respectively). The lower left panel shows the variogram
stemming from the data series above (left panel); the right
panel shows the estimated total sampling error, TSE, as derived
from the variogram (more of which in later sections).
Fig. 3 shows a very distinct flat variogram, indicating that
the original data are without trend. The numerical estimate of
the sill variance is approximately 2.0, which can be read off
directly as the intercept of the variogram with the V(j) axis. This
estimate shall be compared between the related systems, also
analysed below.

3.1.2. Concentration 0.1%25 ml increment size148 scoops in


total
The heterogeneous mixture system with identical analyte
concentration (0.1%), when sampled by a five times larger
(25 ml) scoop, reveals a sill level estimate of 0.6, compared to
2.0 for the smaller (5 ml) scoop, Fig. 4. By increasing the
scoop size five times, a reduction in overall sampling variance
error by a factor of three is obtained. The variogram is again
completely flat with no range at all there are no clear trends
in the data series produced by complete successive emptying
the model lot (perhaps a very weak increasing tendency
towards the end?).
3.1.3. Concentration 0.1%100 ml increment size42 scoops in
total
By way of contrast, a 100 ml scoop is now used, resulting in
only 42 scoops before the lot is completely emptied. A further
reduction in sampling variance results, by a factor of 6
compared to the 25 ml scoop above, and by a factor of 20,
compared to the 5 ml scoop, see Fig. 5 below. These constitute
substantial reductions in the sampling process induced
variances. The point here is that neither this complete emptying
reveals any trend, nor any range.
3.1.3.1. Variography lesson I. The level of the sill of the
variogram carries essential information of the sampling variance
experienced in repeating the unit sampling operation with differently sized increments, witnessed by the progressive reduction
of the overall level of the sampling variance: 2.00.60.09
respectively. Variography of all three trace concentration systems

Fig. 4. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 0.1% analyte concentration sampled by a 25 ml scoop.

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

47

Fig. 5. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 0.1% analyte concentration sampled by a 100 ml scoop.

Fig. 6. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 25% analyte concentration sampled by a 5 ml scoop. A significant trend is present within the range: 250 lags.

48

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

Fig. 7. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 25% analyte concentration sampled by a 100 ml scoop. A significant trend is present within the range 20 lags.

Fig. 8. VARIO result displaying raw data (top left), heterogeneity contributions (top right), experimental variogram (lower left) and TSE simulation (lower right) for the
mixture system with 25% analyte concentration sampled by a 100 ml scoop. Outlier No. 17 removed, compare to Fig. 7.

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

revealed no trends in the spatial (3-D) constitution of the entire lot,


evidenced by there being no ranges or trends in the variograms.
3.1.4. Concentration 25%5 ml increment size803 scoops in
total
For a typical major concentration of 25%, the material system no longer behaves in the same manner, as it is obviously no
longer in the trace concentration domain. It is generally assumed
that sampling should be much easier at such a compositionally
dominating concentration level (less heterogeneity is the
expectation often met with). The 25% system was mixed exactly as for the trace concentration system, compare Fig. 6.
The overall sampling variance, the sill, is now approximately
0.011, but most importantly, a distinct trend is observed in the
variogram. The trend is present to a range of approximately 250
lag. Inspecting the raw data series (upper left panel) reveals a
statistically distinct, increasing trend almost exclusively for the
last 250300 scoops for which an average level of 2324%
increases towards an ultimate concentration of > 35% (admittedly only for the very, very last few scoops). Both the raw data
series, as well as the variogram thus pick up a distinct spatial (3D) vertical trend in the lot which is heavily disguised by a
significant stochastic error in the individual 5 ml analytical
results however: despite the high concentration level, there is
never-the-less a highly significant fundamental sampling error
(FSE) present when sampling with such a small increment
volume as 5 ml.
The major issue here is that FSE (together with GSE and
TAE) is isolated, and subtracted, into the nugget effect (MPE),
so prominently depicted in the lower left variogram panel of
Fig. 6. It can be concluded that the effective mixing of this
high concentration was anything but effective (sic).
3.1.5. Concentration 25%100 ml increment size44 scoops in
total
When sampled by the largest scoop, 100 ml, the trend in the
system is still present in fact it is distinctly clearer; the
variogram in Fig. 7 shows an obvious trend within a range of
approximately 20 lags, or so (the results for the 25 ml scoop also
revealed this trend not shown here). This isperhaps
counterintuitive at first thought, but the underlying cause of this
trend is related to the filling-and-mixing efficacy in the lot only,
and thus is present to a smaller or larger degree (or not at all)
solely as a function of the ineffectiveness of hand-mixing.
Comparing Figs. 6 and 7 it is clear that the general trend of a
variogram, can be superposed by quite a fraction of scatter in the
disposition of the individual lag, j, V(j) data points, most clearly
delineated by the scarce variogram delineation in Fig. 7. It is the
underlying trend which carries the message: in natural and
industrial systems one should always be prepared to have to
deal with any level of significant scatter when assessing the sill
level, the range and the nugget effect in the variogram. Figs. 6
and 7 constitute a case in pont, a very clearly (low noise)
defined trend/range vs. a highly noisy ditto.
In Fig. 7 one observes a very weak possible outlier (or
perhaps an extreme value, which may be omitted for data
analytical purposes), increment No. 17. As an introductory

49

didactic illustration of the influence outliers may have on the


variogram, we also present results when this (very minor)
outlier has been discarded, Fig. 8.
Comparison reveals that the only effect of the removal of a
marginal outlier is a slightly more well-expressed flat variogram
for the important smallest lags (first half lag series). We shall see
more adverse effects from significant outliers further below.
3.1.5.1. Variography lesson II. The trend of the variogram
carries essential information about the range of important
trending in the original data series. Variography of the two
selected 25% systems revealed both significant trends as well as
ranges which could be related to the problem context from
which the data series originate. Small outliers do not influence
the variogram overly.
3.2. Ribe Biogas plant: process monitoring
The Ribe Biogas plant was commissioned August 1990. For
a decade it has been the largest agricultural manure and food
waste-based anaerobic digestion plant of its kind in northern
Europe. It is based on three bioreactors (each of a volume:
1800 m3) of the CSTR digester type (Continuously Stirred Tank
Reactor: fermentation temperature is kept constant at 53 C),
Fig. 9. Feeding strategies have to cope with biomasses which by
nature are very heterogeneous. A conservative feeding strategy
has been running for 15 years, preferentially opting for a proven
formula for mixing three, very different types of raw material:
cow- and pig manure (in huge quantities) as well as organic
industrial waste. The daily biogas production is of the order of
15,00018,000 m3, fluctuations kept as low as possible by the
feeding management routines.
Process monitoring and laboratory analysis of routine
samples from the anaerobic digestion plant is performed daily
as part of a systematic quality assurance program. Several
parameters are analyzed in the raw materials and the product
gas. Here we make use of the most important product parameter,

Fig. 9. Ribe Biogas plant, Denmark, the core of which consists of three 1800 m3
fermentation reactors. Feeding large fermentors (diameter: 8 m, height: 22 m) is
a delicate matter, in which optimal pre-mixing of the different raw materials
composition end-members has a critical influence on the efficiency of the biogas
process.

50

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

the methane (CH4) yield and one pollutant of overall importance


hydrogen sulphide (H2S). Because of the slow anaerobic
bioconversion process (retention time is typically of the order
of 23 weeks in digestion plants like this), daily measurements
of the essential process analytical parameters are often
sufficient. Focus for biogas plant monitoring is on the daily/
weekly/monthly levels and trends above all.
The biogas product yield and the pollutant H2S are not
correlated to one another, Fig. 9, for which reason variographic
analysis must be undertaken of each parameter separately.
3.2.1. Monitoring of H2S (ppm) in produced biogas, January to
March 2005 (90 samples)
H2S is considered a serious pollutant in produced biogas. In
the presence of even minute amounts of water (vapour or
liquid)and there is an abundance of this in every biogas
systemsulphuric acid is produced, which do not go well in any
metallic parts of the costly combined heat and power generation
systems utilizing the produced biogas, nor any pipeline or other
instrument in contact with the biogas, for obvious reasons. The
concentration of H2S in the produced gas is therefore monitored
closely. Fig. 10 shows a three month data series.
From the science of geostatistics, inducted into TOS [1,2], it
is well-known that variographic analysis is dependent on the
essential feature that the data series analysed be stationary, i.e.
that whatever variability present in the series is assumed to be
relative to some form of overall, general average level; Fig. 11
constitute a particularly good example of compliance with this
prerequisite. Therefore it is possible to get the maximum of
information out of the variographic analysis.
The variogram in the lower left panel in Fig. 11 shows a
composite nature, with a clear sill and range ( 14 days), but
most interestingly in this case, a very pronounced cyclicity of
exactly 7 days, expressing itself on top (superposed) on the
generic variogram form shown in Fig. 1. It is abundantly clear
that this cyclicity is very much more discernable in the
variogram than in the original date series because of the 0-D
error subtraction into the nugget effect (MPE), Fig. 12.
3.2.1.1. Interpretation I. The exact 7-day period is related
to the fact that during weekends the capacity for a constant premix of the loading raw feedstock is restricted. Even though all
storage capacities are full by Friday evening, there is lag of
concentrated industrial waste being mixed in the bioreactors by
the end of the weekend, most pronounced on Sundays. As a

Fig. 10. Scatter plot of CH4 and H2S revealing no significant correlation.

consequence, the biogas raw material is changing its profile,


experiencing reductions in the amount of industrial waste
containing iron chloride. Iron chloride in the organic industrial
waste reduces the H2S content. The result is a systematic H2S
drop centred on the weekends superposed by a smaller or larger
effect from the various degrees of decomposition propensity of
the various other types of waste material in the biomass
makeup. The raw H2S concentration series show this combined
regular and stochastic effect very clearly indeed.
3.2.2. Monitoring of the daily biogas yield: CH4 (m3), January
to March, 2005 make (90 samples)
Variographic analysis is fatally influenced by significant
outliers, as is very well illustrated by the sequence of Figs. 13
15, which only differ by sequential exclusion of the two
prominent outliers, increment No. 2 and 56 respectively.
The variogram in Fig. 13 is unintelligible. It corresponds to
no known type of proper variogram ever encountered [14].
Deletion of increment No. 2 leads to the variogram depicted in
Fig. 14.
The difference in the resulting variogram could not be any
more contrasting. The variogram now reveals one, maybe two
cyclicities superposed on a somewhat unclear sill and range,
characterised by a range of 68 days(?). Rather than jumping
into any interpretations however, the next outlier has to be
removed first, leading to the true underlying variogram depicted
in Fig. 15.
The resulting variogram is now simplicity itself as compared
to its two predecessors: the nugget effect shows a drastic
reduction (0.40, 0.006 and 0.001 respectively), as does the sill
(s), but most importantly, the true underlying variogram reveals
a hitherto unknown periodicity of some 20 + / 2 days
duration. This feature does not correspond to any known
periodicity in the internal bioconversion process design or
operation (known from more than 15 years of successful
operation of this particular plant). Detailed interpretation of this
feature will illustrate the essential problem-dependent nature of
variography and the derived new process insight.
3.2.2.1. Interpretation II. The biomass feedstock composition consists roughly of 60% cow manure, 20% pig manure and
20% organic industrial food processing waste. All these types
are distinctly heterogeneous. The newfound periodicity could
conceivably either represent a long term variation in the
composition of the food waste (stemming from a production
cycle of 3 weeks), orconsidered more likely hereit may
reflect unrecognized systematics in the logistics behind intransportation of animal manure: some farms deliver raw
manure on a weekly basis, while larger ones use of 3-week
basis. Manure from these larger farms tends to be slightly more
diluted. Hitherto unrecognized transportation patterns of hauling higher proportions of pig manure vs. cow manure may also
impact the new periodic change in biogas production
observed, along the possibility of influence from variations in
the total solids content of manure from various farms.
Irrespective of the detailed explanations eventually laid bare,
the effect of the variographic analysis has already resulted in a

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

51

Fig. 11. VARIO results of H2S monitoring in produced biogas in Ribe Biogas plant. This three month data series is stationary (upper two panels), with no outliers,
allowing maximally informative variographic analysis. Note highly distinct 7-day periodicity in the variogram (lower left).

new focus for the operation of the biogas plant: instead of


focusing on narrow, day-to-day economic fuel minimization for
the delivery tankers via a conventional traveling salesman
optimization, focus can now advantageously be transferred to
devising new, more equalizing transportation logistics, actively
making use of these newfound patterns, so as to help out
optimizing a better pre-mixing of the types of manure collected
and delivered to the plant with the effect of a more constant
overall raw material composition. This constitutes a new,
potentially more profitable operational realm for this low-tech,
bulk industrial producer all because of a simple, outlierpoliced variogram (sic).
3.2.2.2. Variography lesson III. Outliers, defined here as
extreme, or very irregularmissing or otherwise bona fide
unreliable increments influence all variograms significantly
sometimes fatally; outliers should be delineated and deleted
sequentially. Significant outliers mask the underlying true
variogram to serious degrees. No variogram interpretation
should be carried out before complete outlier removal has been
secured.
The periodicity of a variogram carries essential information
about the underlying process characterised by the data series.
Variography of the two selected biogas parameters revealed
cyclicities with very different expression and distinctly
different causes fully understood only when detailed
interpretation of the variogram information was undertaken in
the proper problem context. Reasons and causes behind
periodicities were knownor inferredin the present context.
This demonstrates how detailed interpretation of the potential

information present in a variogram leads to increased understanding of the process being analysed. Unrecognised
periodicities may easily have significant negative economic
consequences.
3.3. Process analysis proper (commodity prices)
3.3.1. Daily end-of-trade Zn prices
This data set (downloaded from the public domain)
represents a sector in which variographic analysis plays an
important, growing role, the financial sector. Any commercial
company with a significant interest in metallic raw materials or
products, or similar materials traded and quoted on the
international commodities exchanges, will be interested in

Fig. 12. Fundamental decomposition of the Total Sampling Variance, as


represented by V(j), into the nugget effect also termed the minimal
practical error (MPE), which represent all 0-D error sources involved (including
TAE) and the true process variance, the 1-D process variation.

52

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

Fig. 13. VARIO result for biogas (CH4) production. The total data set includes two significant outliers, dramatically revealed in the serial data plots (upper two panels).

analyzing (thereby hopefully more full understanding) the timedynamic fluctuations involved. Fig. 16 shows as an archetype
example from this arena, the daily end-of-trade prices of Zink
(Zn) in US $ per ton for a full five-year period.

As is well-known, data types like stock exchange share prices,


commodity prices and similar economic forecast data, are
overwhelmingly characterised by trends and/or cyclicities. Before
variographic analysis therefore, de-trending is often applied.

Fig. 14. Identical data series as in Fig. 13, with the sole exception of original increment No. 2 deleted.

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

53

Fig. 15. Identical data series as in Fig. 14 with the sole exception of original increment No. 56 also deleted. Note the complete transformation of the variograms in Figs. 1315.

In financial circles it is strongly believed that analysis into the


specifics of the trend-, periodic or chaotic behaviour of such data
series, termed chartism, may lead to a valuable (in the most
original sense!) insight in the future behaviour of prices, which

may be cashed in (again in the most original sense) by


judicious futures trading, hedging etc. Understandably, the
specific elements of such analysis are proprietary and not in the
public domain but it is certainly no coincidence, for example,

Fig. 16. VARIO results for full 5-year Zn-price data set. This variogram is for reference only.

54

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

Fig. 17. VARIO results from the Zn2 data set restricted to prices <900 US$/ton, i.e. a bona fide stationary data series.

that the prestigious Paris-based French School of Mines boasts a


department for economic analysis in which geostatistics plays
a central role (geostatistics and variography are very closely
related, historically and methodologically [1,2,7].
The objective of the analysis of this data set is here simply to
illustrate what can be gained from a variographic decomposition
(and interpretation) in which the process relationships behind
the data series are not known with certainty, but very tenuous at
best if existing at all:
Each daily end-of-trade price quotation is the result of a
multitude of individual assessments of the likely future of
Zn-prices. At the outset there is a very little direct interaction
between actors in the trading field, but there will be a marketeffect (follow-the-leader) as soon as trending prices becomes
established and noticeable, at which point a self-amplifying process sets in. This may influence prices substantially
and rapidly (sometimes inflationary) in both positive as well
as negative directions, although the latter quickly, can also
lead to catastrophic collapse. Much more can be said, but the
point here is not to give a first lesson in capitalistic market
analysis and market force understanding, if such exists, but
only to introduce a type of data in which the underlying
driving force is not known with certainty (market specialists
and financial analysts claims notwithstanding). For this data
type there is a tacit understanding that each datum carries an
identical statistical mass.
A variographic analysis of such data will be an objective,
neutral description only (but anyone is free to speculate and

interpret the results, of course). Our main interest in such data


analysis will be to isolate, and to interpret if possible, the
nugget effect (MPE) and, if possible, to interpret the meaning
of this as well as the other variogram parameters in this
particular setting.
The major trends in the Zn-data cover a dramatic large-scale
increase at the ending of the 5 years in question. From what has
transpired above, it would not be of interest to include these
mega-scale trends in the variographic analysis. One way to get
rid of annoying trends is to prune the data series into a set of
stationary segments; proper de-trending shall also be illustrated
below. We shall first limit the variographic analysis to two
subsets, both with approximately stationary characteristics.
Data sets Zn1 and Zn2 are defined by prices <1300 US$/ton and
<900 US$ respectively.
3.3.1.1. Interpretation. There is a remarkable evolution to the
series of three variograms above. While the full data set
variogram, as well as that for the <1300 US$/ton data set, are in
reality nothing but one mega-trend itself and for all of 750
and lags 600+ lags at that, for the central de-trended data
series shown in Fig. 17, detailed insight might now be gained
into the stationary market volatility relationships (important
parameters in economic analysis). Depending on the level of
one's market analysis competence, one could note that there
would appear to be a rather distinct 30-day range, and nothing
much more (sic): the Zn-market experienced a market
tranquillity in this time period. IF one is interested in pushing
this particular envelope to the absolute limit, one might also
entertain notions regarding possible superposed periodicities of

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

55

Fig. 18. VARIO results from the Zn1 data set for prices <1300 US$/ton, still influenced by too much trending.

both 100 days and 150 days and why not: this issue is truly
open for speculation.
For a more strict scientific data analysis point-of-view, there
is one new feature not yet encountered in any variograms above
however: there is no nugget effect at all! This would be a very

rare situation for most physical sampling scenarios, although


not totally unheard of. The TOS-specific meaning is clear: this
data set does not encompass any TAE, FSE, GSE nor ISE. Upon
reflection this must be related to the specific nature of these
data: end-of-day trading prices do not experience any

Fig. 19. Brent oil prices in the period January 1986 to September 2004.

56

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

Fig. 20. Brent oil process the largest coherent stationary segment cropped from Fig. 19.

uncertainty; the price was fixed with total certainty on the daily
closing time of the commodity trading session.
The proper variogram speaks very directly: here be only
process variation, Fig. 17. It is instructive to compare with the
trend-influenced variograms, Figs. 16 and 18.
3.3.2. Oil: Europe Brent spot price FOB
A second illustration pertains to a similar variographic
analysis of the daily spot price for North Sea Brent crude oil,
which serves as a benchmark commodity.
These two oil price data series and variograms, one
apparently suffering both from a prominent outlier peak as
well as potentially debilitating severe trends (Fig. 19), the other
carefully pruned never-the-less show practically identical
features, undoubtedly of trading interest, Fig. 20: there is a very
clearly expressed macro-scale periodicity (period: 800 days)
superposed on a zero range (again no nugget effect) trending
variogram showing signs of a decreasing trend slop with very
large lags. These features are, of course, most pronounced for
the pruned data set, but may never-the-less also be gleaned from
judicious analysis of the total data set albeit only with
considerable experience. Detailed interpretation of the meaning
of such complex relationships belongs to the geostatistic realm.
A comprehensive, authoritative reference for complex variogram analysis is Gringarten and Deutsch [8].
3.3.3. PITARD data: Flotation plant rougher feed
The last demonstration data set stems from the comprehensive TOS textbook by Pitard [2], used here to highlight the value
of de-trending. While the raw data set shows a clear increasing

trend, perhaps to a degree threatening to debilitate variographic


analysis (Fig. 21 shown mainly for comparison purposes), the
de-trended data set in Fig. 22 reveals with all possible clarity
TOS strength for bringing forth hidden data structures, in this
case a hidden periodicity. Thus the de-trended variogram
delineates a quasi-cyclic behaviour with a period of 18
19 days, impossible to discern in the original data. While this
periodicity also is present in the variogram of the original data,
but as a much less pronounced feature (it may be identified by
judicious analysis of the raw variogram but again only
with considerable experience), there is no doubt whatsoever
regarding the marked distinctive periodic variogram in Fig. 22.
One legitimate concern could be that too trigger-happy detrending may perhaps endanger full understanding of the
behaviour of the original data series. This worry can be fully
alleviated by always treating both the original series as well as
the de-trended version, as exemplified by Figs. 21 and 22 above.
If need be, the original data series may also be subjected to an
appropriate time-series analysis (the objective of time-series
analysis is quite distinct from variography however. Time-series
analysis is primarily concerned with behaviour that can be fully
modelled by a sum of periodic functions, without specific MPE
regards to the errors).
Pierre Gy placed great faith in the extraordinary didactic
strength and clarity in what he termed crono-statistical data
analysis in his many works on process sampling and
variography [1,9], especially highlighted in his life-long
summary of TOS [10], which corresponds to the exemplifications given above. It is befitting to close the present process
sampling tutorial on this note.

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

57

Fig. 21. Pitard [2] gives several process data sets; shown here is the Flotation plant rougher feed data series, totaling 70 samples (p. 84). Compare with Pitard's
variogram on p. 94.

3.3.3.1. Variography lesson V. Significant trends (and/or


periodicities) in the original data series may be subjected to
appropriate time-series analysis before being pre-treated so as
to correspond with the prerequisite of a stationary data series.

Two options for this pre-treatment were shown in action (timesegmentation and de-trending) as effective approaches.
The power of variographic analysis was demonstrated with
full clarity: absolutely all pertinent structural and/or MPE

Fig. 22. Pitard's [2] Flotation plant rougher feed data series, only de-trended.

58

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

details will be revealed in the variogram; examples includes a


small, yet fully distinguishable MPE and pervasive systematic,
but hidden periodicities, most clearly accentuated by problemdependent de-trending. It is significant that detection and
quantification of MPE plays a crucial role in this endeavour.
4. Estimation of TSE simulation of potential sampling
schemes
Estimation of TSE, the total sampling error, is extremely
easy, as it follows directly from the same basic calculations
attending the variogram. It is possibleat no additional sampling or analytical costto simulate all conceivable sampling
schemes that might be contemplated for improvement of a
current sampling procedure. All VARIO output illustrations
above also contain a TSE-estimation facility (lower right panel
in all figures).
Any process sampling scheme is fully characterized by two
parameters only: the sampling rate, r, and the number of
increments per sample, Q, one is willing to employ in order to
reduce the current TSE if deemed to large. VARIO has been
designed so as to evaluate all possible combinations of Q at
sampling rate rsamp. For practical reasons the sampling rate is
often quantified by the inversely related lag distance. For all
VARIO result plots given above, the lower right panel has
shown the estimated TSE (total sampling error) in absolute units
corresponding directly with the original data series for a twofactor experimental design in rsamp and Q respectively.
For all examples a set of Q spanning 1, 2, 4, and 8 has been
used throughout (any appropriate alternative set of Q may be
specified by the user). This corresponds to substituting all
individual samples hitherto considered, by composite samples
made up of alternatively 2, 4 and 8 increments (instead of just
one increment), i.e. making optimal use of the composite
sampling advantage also in the process analysis scenario; these
alternative samples are composed by the appropriate number of
increments centered on the individual sampling times/locations
corresponding to the sampling rate decided upon.
In addition, VARIO also allows for completely free
specification of the sampling rate, rsamp.
Together, these two optional features will allow simulation
of any potential sampling scheme that might be envisaged for a
particular process sampling situation. Based only on one bona
fide regular data set (preferentially 60 samples; absolute minimum 42 [11]), all possible sampling scenarios can be simulated
at no extra cost at all (assuming the variographic experiment has
been carried out in a TOS-correct fashion, naturally). This data
set must be as representative as possible of the prevalent process
behavior. It will always be a specific, problem-dependent challenge to select such a data set; local process knowledge and
experience is at a premium of any such undertaking, and is
much more important than any standard statistical prescriptions.
4.1. Value of TSE-estimation plots
It is easy to find an optimal sampling strategy by inspecting the
plot of TSE as a function of Q and rsamp. It will be very instructive

to review the TSE-plots only of all VARIO outputs above and


thus to follow how to evaluate any alternative sampling scheme.
The best approach is to inspect the four cornerstones of the
TSE-panel, and decide on which direction will provide the
steepest descent decrease in TSE. This will either be by the use of
a higher sampling rate rsamp, or by using more increments Q, to
make up the final sample. If both factors appear to influence in an
equal fashion, any combination may be chosen which will also
minimize external, especially economical or practical, constraints. Often, especially when using automated equipment, it is
easier to include more increments in a composite sample, than
increasing the sampling rate. However, this is heavily dependent
on the sampling situation at hand. The user may invoke his/her
own sampling creativity pitted against the realities of economic,
practical or other constraints. All TSE simulations are real-time;
there is no extra sampling not computation necessary.
5. Discussion and conclusions
The level of the sill of the variogram carries essential
information re. the sampling variance at different lags.
The trend of the variogram carries essential information
about the range of important trending in the original data series.
Outliers and extreme, or irregular, increments influence all
variograms significantly sometimes fatally. Outliers should
always be deleted (perhaps sequentially). Significant outliers
mask and modify the underlying true variogram to serious
degrees. No variogram interpretation should be carried out
before complete outlier removal has been secured. Small outliers
do not influence the variogram overly data analysts are
obliged to develop the necessary experience with variography.
The periodicity of a variogram carries essential information
about the underlying process dynamics and/or the 1-D data
structure, e.g. variography of two selected biogas parameters
revealed cyclicities with very different expressions and
distinctly different causes, fully understood only when interpretation of the variogram information was undertaken in the
proper problem context. Detailed interpretation of the potential
information present in a variogram leads to an increased process
understanding. Unrecognised periodicities may easily have
significant negative economic consequences.
Special data types, here illustrated by commodity price series,
carry a feature not encountered in variograms pertaining to
sampling of physical systems: no nugget effect. Such data sets
do not encompass any TAE, FSE, GSE nor ISE! It is conceivable
that this data structure is an important complement in 1-D
sampling.
Significant trends in the original data series may often advantageously be pre-treated so as to correspond with the geostatistical prerequisite of stationarity. Two options for effective
pre-treatments before variographic analysis were illustrated:
time-segmentation and de-trending proper, leading to a significantly increased interpretation platform, especially regarding more effective delineation of hidden periodicities, the latter
most clearly accentuated by problem-dependent de-trending.
Detailed interpretation of complex variogram relationships
belong to the geostatistics realm. An authoritative reference for

K.H. Esbensen et al. / Chemometrics and Intelligent Laboratory Systems 88 (2007) 4159

59

complex variogram analysis is Gringarten and Deutsch [8],


greatly recommended.
It has been shown how TOS-correct, i.e. representative 1-D
process sampling forms a critical prerequisite for a reliable
variographic analysis. Variogram analysis reveal a multitude of
1-D or process data structures by a set on only three, systematic
parameters: the sill, the range and the nugget effect. Selected
didactic examples of data series contexts and their manifestation
in interpreting the resulting variograms were outlined, designed
to illustrate the practical aspects of process sampling and
estimation of the associated total sampling errors, TSE. It is
possible to simulate any pertinent sampling procedure based only
on the variographic experiment, no additional sampling is
necessary. Estimation of TSE for all possible rsamp and Q options
allows complete command of any proposed sampling situation.
In the context of the popular simple time-series plotting of a
plethora of current Process Analytical Technologies (PAT)
parameters with direct interpretation, e.g. SPC, multivariate
SPC o.a. it has become clear that process TOS can illuminate
process state, process variability (heterogeneity) and process
data structures with distinctly more objective interpretation
power. Process TOS forms a missing link for PAT.

independent drawing from the population (in the statistical and


the chemometric parlance), the test set. Only by including the
latter in the data analytical validation, will all CSE + ISE be
allowed their full impact, which is crucial for realistic prediction
error estimation (to take the case of multivariate calibration). This
is not the place for a full discussion of these issues however.

5.1. Process TOS vs. analytical chemistry/chemometrics

References

Process variographic analysis will always be able to


decompose 0-D vs. 1-D variances. It is especially useful to be
able to quantify the specific MPE associated with an existing
sampling procedure or any contemplated alternative. All sampling errors are covered in the comprehensive TOS approach,
including the sum total analytical error.
However, not all analytical, and data analytical errors can be
delineated and brought forward by process TOS. It is still necessary to be on the guard for e.g. systematic (constant) analytical
errors, which cannot be detected by process variographics
alone but a varying systematic error will be detected, by
contributing to a sampling bias, i.e. by inflating MPE. All pertinent
calibration and validation issues associated with analysis per se
(chemical, physical, other) are still very much on the agenda.
It also bears mentioning that none of the traditional data
analytical calibration and validation issues in chemometrics has
any bearing on the issues discussed in this paper. Data analysis
is the last step in the trinity: samplinganalysisdata analysis/
modelling, and as such cannot influence the presence, nor the
magnitude of any sampling errors. There is no way any form of
bias correction can ever substitute for correct sampling; it is
not even possible to estimate the magnitude of a sampling bias.
Inaccurate sampling will always lead to a signifccant sampling bias, which can only be reduced by applying correct, i.e.
representative sampling procedures [110].
These issues have a profound bearing on the standing discussion within chemometrics of the merits pro. et con. regarding
cross-validation vs. test set validation; a first introduction to this
issue was included in [12], augmented by in-depth backgrounds in
[37,9,10]. In brief, there cannot be full information pertaining to
the variability of sampling errors in one single data set alone, the
calibration data set, relative also to having access to a second,

[1] P. Gy, Sampling for Analytical Purposes, Wiley0-471-97956-2, 1998.


[2] F.F. Pitard, Pierre Gy's sampling theory and sampling practice, Heterogeneity, Sampling Correctness, and Statistical Process Control, CRC Press,
LLC, ISBN: 0-8493-8917-8, 1993.
[3] L. Petersen, P. Minkkinen, K.H. Esbensen, Representative sampling for
reliable data analysis: theory of sampling, Chemometrics and Intelligent
Laboratory Systems 77 (2005) 261277.
[4] L. Petersen, K.H. Esbensen, Representative process sampling for reliable
data analysis a tutorial, Journal of Chemometrics 19 (1112) (2005)
625647 (NovemberDecember).
[5] L. Petersen, C.K. Dahl, K.H. Esbensen, Representative mass reduction in
samplinga critical survey of techniques and hardware, Chemometrics
and Intelligent Laboratory Systems 74 (2004) 95114.
[6] Mortensen, Peter Paasch; Process analytical chemistry prospects and
problem in biotechnological implementation. Ph.D. thesis Aalborg
University Esbjerg. 2006.
[7] D. Francois-Bongarcon, Theory of sampling and geostatistics: an intimate
link, in: Esbensen, Minkkinen (Eds.), Proceedings First World Conference
on Sampling and Blending (WCSB1), Chemometrics and Intelligent
Laboratory Systems, vol. 74, No. 1, 2004, pp. 143148.
[8] E. Gringarten, C.V. Deutsch, Variogram Interpretation and Modeling,
Mathematical Geology 33 (4) (2001) 507534.
[9] P. Gy, Sampling of discrete materials: III. Quantitative approach
sampling on one-dimensional objects, in: Esbensen, Minkkinen (Eds.),
Proceedings First World Conference on Sampling and Blending (WCSB1),
Chemometrics and Intelligent Laboratory Systems, vol. 74, No. 1, 2004,
pp. 3948.
[10] P. Gy, 50 years of sampling theory a personal history, in: Esbensen,
Minkkinen (Eds.), Proceedings First World Conference on Sampling and
Blending (WCSB1), Chemometrics and Intelligent Laboratory Systems,
vol. 74, No. 1, 2004, pp. 4960.
[11] D. Adams, The Hitchhiker's Guide to the Galaxy, Wing Books, New York.
ISBN 0-517-14925-7, 1986 (chap. 27).
[12] K.H. Esbensen, Multivariate Data Analysis in practise. An introduction
to multivariate data analysis and experimental design 5th Ed., CAMO AS
Publ. ISBN 82-993330-2-4, 2001 (Oct. 600 pp.).

6. Vario
VARIO was programmed by Hans Henrik Friis-Pedersen.
VARIO is a freeware, available from the ACABS webpage
(www.acabs.dk). The program allows for importing data of
different types, performing basic statistic analysis in addition to
variographic analysis; it is easy to use and works on large data
sets as well as small. The present paper can be viewed as a
user's guide to analysis and interpretation by VARIO; more
information can be found on the home-page.
Acknowledgements
We thank two anonymous referees for the scholarly, penetrating and helpful reviews.

Você também pode gostar