Você está na página 1de 10

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/281550844

Application of principal component analysis


and cluster analysis to mineral exploration and
mine geology
Conference Paper August 2015

CITATIONS

READS

419

6 authors, including:
Michael F. Gazley

Benjamin R Hines

The Commonwealth Scientific and Industri

Victoria University of Wellington

33 PUBLICATIONS 127 CITATIONS

16 PUBLICATIONS 77 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate,


letting you access and read them immediately.

SEE PROFILE

Available from: Michael F. Gazley


Retrieved on: 14 September 2016

AusIMM New Zealand Branch Annual Conference 2015

Application of principal component analysis


and cluster analysis to mineral exploration
and mine geology
M.F. Gazley1,2, K.S. Collins2, J. Roberston1, B.R. Hines2, L.A. Fisher1 and A. McFarlane1
1 Mineral

Resources Flagship, CSIRO, 26 Dick Perry Avenue, Perth, Western Australia


Victoria University of Wellington, PO Box 600, Wellington, New Zealand

2 SGEES,

Abstract
Large datasets are routinely collected during mineral exploration and mining of ore bodies. These datasets often
reach a size, or complexity, that makes it difficult to visualise their structure, let alone convert this structure into
meaningful knowledge that is useful to the exploration or mining geologist. Geochemical data are typically
reported as compositions, which means a complete analysis must total 100%. This extra constraint introduces
correlations in the variance of the data which cannot be handled by normal univariate or bivariate statistical
methods. The use of log-ratio transforms can overcome this by converting the data into real number space, to
which standard statistical methods can then be applied. Applying log-ratio transforms to a dataset requires that
all elements of interest, in all samples, need to be quantified. Routines are available that impute, or substitute
missing values such that spurious correlations are not introduced. Once cleaned and transformed, principal
components analysis (PCA) can be used to simplify geochemical datasets, allow interpretation of variance within
datasets, and reveal high-level data structures that cannot be detected using univariate or bivariate methods. Once
PCA (or another ordination) has been performed, cluster analysis can then be used to determine groupings of
samples with no a priori knowledge of their spatial or temporal relationships to each other. This paper presents
examples of the workflow outlined above from both New Zealand and Australian case studies to demonstrate the
usefulness of this approach to exploration and mining geologists, and illustrate the useful - and geologically
meaningful - interpretations that these methods can produce.

Keywords: principal components analysis, ordination, log-ratio transform, cluster analysis.

Introduction
In many geological settings datasets are collected that comprise many (potentially millions)
individual observations. In an exploration setting, these may take the form of soil or rock-chip
samples; in a mine setting they may be assays or multi-element geochemical drill-hole data,
but can also include categorical data such as geological logging (lithology, alteration,
structural observations). All of these types of data produce n-dimensional datasets that are
extremely difficult for a human to parse. Multivariate ordinations re-orientate these datasets in
a mathematically robust and consistent fashion such that the greatest variance can be observed
in two dimensions. This process is not without difficulties; this paper summarises some of the
challenges of implementation, the benefits of this approach and presents three case studies.
Multivariate ordinations are readily applied either through geochemical exploration software
such as ioGAS or through packages such as R or Python.

131

AusIMM New Zealand Branch Annual Conference 2015

Multivariate statistics
For multivariate statistics to be applied to geochemical datasets, all elements of interest in all
samples must have a numeric value. This can be achieved by excluding samples in which all
of the elements are not 100% detected, but this will exclude samples with low concentrations
which, in some situations, may well be the samples of interest. Alternatively, if a given
element is not detected in <10% of samples, it is appropriate to substitute a value of 66% limit
of detection (LOD); and if 10 30% of samples are <LOD for that element, it is appropriate
to then impute the missing values. If >30 % of the dataset is <LOD for an element, that
element should be discarded from multivariate analyses. This is consistent with the
recommendations of Martn-Fernndez et al. (2012).
The difficulties associated with compositional data analysis in geochemistry have recently
been summarised in Buccianti and Grunsky (2014) and references therein, and the approaches
that we have adopted here are consistent with their recommendations. As those authors note,
geochemical data are typically reported as compositions, which are subject to a constant sum
(e.g. they must total 100% or 1,000,000 ppm). As such these data are closed; that is to say
that for a composition of n-components, only n-1 components are required (Buccianti &
Grunsky, 2014). The use of the log-ratio transform of Aitchison (1982, 1986) overcomes
these constraints by converting the data into real number space, to which standard statistical
methods can be applied without spurious correlations being identified. Thus, as explained by
Aitchison et al. (2000) log-ratio transformations allow us to make meaningful statements on
data only involving compositional data. Log-ratio transforms include additive log-ratio
transformation (ALR), inverse log ratio transformation (ILR) and centred log-ratio
transformation (CLR); each with a different purpose the discussion of which is beyond the
scope of this paper.
Principal Components Analysis (PCA) is an ordination method that takes a multivariate
dataset and reorients it in such a way that the axis of greatest variance becomes the first
principal component (PC), and the axis of second-greatest variance becomes the second PC,
and so on. A bivariate plot of PC1 vs. PC2 will thus summarise the greatest amount of
variance in the dataset and be a more comprehensive summary of that dataset than a plot of
any two of the original variables. Grunsky et al. (2014) note that an advantage of using PCs
over a priori or user-defined groups of elements as variables for investigation is that they
represent linear combinations of elements that are likely controlled by mineral stoichiometry.
Such linear relationships may provide a more realistic representation of geological variability.
While the examples that we present here utilise PCA as an ordination method, many other
ordination methods are available to explore multivariate datasets of all kinds. While, PCA is a
powerful and flexible method that can be applied broadly and will work in most cases, but it
is worth being aware that other ordinations may be more suitable for some datasets.

132

AusIMM New Zealand Branch Annual Conference 2015

Cluster analysis
In some cases (e.g. Case Studies 1 and 2 here), a PCA alone will reveal data structure.
However, in some cases (e.g. Case Study 3), variation and structure within a point cloud is too
subtle for the human eye to determine. Cluster analysis can be applied to any dataset to
determine groupings of samples with no a priori knowledge of their spatial or temporal
relationships to each other. However, it becomes especially powerful when the dataset being
clustered is multivariate and has already been subject to PCA, which has reoriented the data
cloud so that those first two dimensions summarise a significant proportion of all the
variance. Often PC1 and PC2 will account for >60% of the variance in a multivariate dataset,
with careful selection of components. Accordingly, clustering on ordinated data provides for
much greater insight into the relationships between samples than any individual elementelement bi-plot. There are a very large number of clustering approaches available, and some
are easily implemented, either through geochemical exploration software, such as ioGAS,
or through packages such as R or Python. In practice, determination of the best clustering
outcome is somewhat subjective and may be influenced by the objective in mind and the
selection of variables, as well as the clustering method chosen. Validity indices and methods
of validating cluster stability are implemented in R package fpc (Hennig, 2003). Other useful
discussions of cluster validation are provided by Hennig (2007) and Fang and Wang (2012).
Some iteration between the selection of the number of clusters, and their spatial continuity (in
one, two or three dimensions) may be needed to balance the mathematical optimum with the
clusters practical use. Once completed, sample classifications resulting from clustering can
be plotted back to drill holes or map data so that spatial trends between clusters can be
examined. This is particularly useful in studies where the goal of the multivariate analysis is
to examine lithostratigraphy, as in Case Study 3. A summary of the workflow we have
outlined here is presented in Fig. 1.

Case studies
Case Study 1: Gold mineralisation and lithostratigraphy (Fisher et al., 2014)
The Agnew Gold Mine, Western Australia, comprises a series of discrete lodes located along
a sheared mine corridor. Ore is currently mined from the Kim and Main Lodes, which are
localised along the lithological contact between the ultramafic conglomerate and the overlying
Scotty Creek sandstones. Gold mineralisation in the Kim Lode is associated with a 4 - 9 metre
wide quartz breccia. The high-grade core of the deposit is surrounded by an alteration halo
dominated by quartz stock-work veining, silicification and disseminated arsenopyrite. Zones
of biotite alteration and cross-cutting bands of actinolite alteration are noted throughout the
mine corridor in the Scotty Creek Sandstone and the conglomerates.
To differentiate between lithological variations and hydrothermal alteration in the
geochemical data, a robust PCA was carried out to limit the impact of outlying data points,
using the software package ioGAS. This is achieved by replacing the standard estimation of
the covariance or correlation matrix with a robust estimate; each of the samples is weighted
by their robustly estimated Mahalanobis distance, with outlying samples assigned low weights
so as to reduce their impact on the estimate of the dispersion matrix (Campbell, 1980). No
CLR was utilised in this study. A 10 element subset was selected for the PCA of the drill fan
data. Fig. 2 shows the data plotted on PC1 vs. PC2 axes; these two main axes account for
68.50 % of the total data variance among the set of samples. Eigenvectors for Cr, Cu, Fe, Mn
and Ni drive PC1, as does Ti to a lesser degree, whereas Ca, As, K and Ba drive PC2. As the
133

AusIMM New Zealand Branch Annual Conference 2015

samples plotted in Fig. 2 are all from a single, relatively homogeneous, lithology we conclude
that the majority of variation accounted for by PC1 is a function of variation in background
rock chemistry, most likely a function of fine scale layering, and that variation accounted for
by PC2 is likely to show evidence for chemical changes as a result of hydrothermal alteration.
A small number of samples show a marked deviation away from the main cluster of data, the
majority of these, outlined by red circles in Fig. 2, correlate with highest gold concentrations
in the grade control data. The eigenvectors for K and Ba show an antipathetic relationship to
the Ca-As-Au trend, suggesting that although these elements are not controlled by lithology
they are not part of the alteration assemblage associated with mineralisation either, and that
the mineral phases that host K and Ba may be replaced during alteration associated with
mineralisation. Alteration around the zones of mineralisation at the Kim ore lode is
characterised by a halo of arsenopyrite, bands of biotite and carbonate, and an extensive
calcic-amphibole overprint. The biotite alteration is commonly proximal to the high grade
mineralisation but the PCA analysis suggests that it does not correlate with mineralisation,
with K and Ba concentrations not showing a positive relationship to Au enrichment. In
comparison the relationship between mineralisation and the arsenopyrite and calcicamphibole alteration, which has been recognised from petrographic studies, is confirmed. As
the goal of this study was to explore elements associated with Au mineralisation, no cluster
analysis was performed.
Case Study 2: Dolerite characterisation (Gazley et al., 2014)
Dolerite dykes at Plutonic Gold Mine, Western Australia, are considered to be Proterozoic in
age, post-dating the Au mineralisation as they cross-cut it. All dolerites are microcrystalline
and can visually be separated into two groups depending on the texture of the matrix: (1) is
microlithic with euhedral plagioclase phenocrysts (< 5 mm); while (2) is homogenous with a
matrix of crystallites. A small portable X-ray fluorescence (pXRF) dataset (n = 497) of
dolerite data was collected directly on the cut surface of diamond-drill core. To better
differentiate between lithological variations in the geochemical data, a PCA was conducted
using a CLR in CoDaPack v. 2.01.14 (Thi-Henestrosa & Martn-Fernndez, 2005). The first
two axes account for 67.6% of the variation and the first three axes account for 81.5% of the
variation. The most useful PC-space to discriminate between the dolerites is PC1-PC3 space
(Fig. 3), which shows a strong correlation between Zr, Ti, and V and a corresponding
antipathetic relationship between Cr and Ca.
These dolerites are known to represent at least two different intrusive events, as in the Dover
dolerite area, both a fine-grained and coarse-grained dolerite have intruded adjacent to each
other. In this study, no cluster analysis was run, visual groupings are apparent. Gibraltar
dolerites have high PC3 scores; Dardanelles dolerites, with high Zr; the main cluster, which
defines an evolutionary trend from high Cr and low Ti+Zr to high Ti+Zr and low Cr; and the
coarse-grained dolerites, which have high PC1 and negative PC3 values, and are associated
with Cu. The relationship between these groups is not clear, but, but it is probable that this
last group represents a sulphide-bearing magma. Geochemical characterisation of the dolerite
intrusions at Plutonic has improved the mine geologists ability to model these units, and this
approach was adopted across the mine to the mine-site pulp dataset to help model the dolerites
more robustly, thus improving the ability of geologists to project Au-mineralisation in these
areas of increased geological complexity.

134

AusIMM New Zealand Branch Annual Conference 2015

Case Study 3: Lithostratigraphy (Hines et al., 2015)


The Late Cretaceous-Eocene marine successions in the East Coast Basin, New Zealand, are
dominated by the thick, lithologically homogenous, siliceous to moderately calcareous
mudstones of the Whangai and Wanstead formations. In many localities, this sequence is
interrupted by the Late Paleocene highly prospective, organic-rich source rocks of the
Waipawa organofacies. The importance of these source rocks to the understanding of New
Zealands petroleum systems is well recognized; however, the depositional controls on their
formation are not yet well constrained. Utilising pXRF data systematically collected from
samples collected from on-shore sections, the approach outlined in Fig. 1 was adopted to
explore these data. A sub-set of eleven elements (Mn, Sr, Y, Zn, Zr, Si, Ti, Al, K, Rb, Pb, Fe)
was selected which had >95% occurrence (or higher) across the samples from all sections.
The CLR and a robust sparse PCA were performed in R using the packages hotelling (Curran,
2013) and pcaPP (Filzmoser et al., 2014) respectively. To statistically determine what
groupings, or clusters, of samples in PC1-PC2 space were related, a model-based hierarchical
cluster analysis was run using the function Mclust (Fraley et al., 2012) in R. Mclust performs
model-based clustering based on finite normal mixture modelling, which is flexible in terms
of the volume and shape of clusters it can find, and uses Bayesian methods to estimate
support for cluster arrangements. Once re-plotted against stratigraphic heights in their
respective sections to produce a fence diagram, this approach readily differentiates units at a
formation level (Fig. 4).
While this example utilises unconsolidated, often homogeneous, sediments as its lithology,
the approach utilised here has substantial potential to allow for correlations in metamorphosed
or altered rocks where lithological boundaries may be difficult to identify. Thus, it has
substantial application to mineral exploration and mine geology as robust correlations
between geological units can be established utilising numerous elements in multivariate
space, rather than few elements in one, two, or three-dimensional space.

Conclusion
Datasets of n-dimensions are extremely difficult for humans to parse and multivariate
methods provide an opportunity to re-orientate these datasets in a mathematically robust and
consistent fashion such that the components of greatest variance are readily observable. While
the approaches that have been discussed here may be unfamiliar to geologists, they have been
utilised in a large number of other scientific fields for a long time, and thus are well
established. With limited additional work, the approaches that we have outlined here, and the
powerful interpretations that multivariate methods can contribute to a mining or exploration
setting are readily available to geologists. We would urge them to consider them as part of
their data exploration practices. Robust correlations between different rock units, or
identification of alteration or mineralogical associations that are not evident in two
dimensional plots, are often readily identified through multivariate approaches.

135

AusIMM New Zealand Branch Annual Conference 2015

Figure 1. Generalised multivariate statistic workflow.

136

AusIMM New Zealand Branch Annual Conference 2015

Figure 2. PC1 vs. PC2 for robust principal component analyses of Scotty Creek Sandstone (SKg) samples from
Kim ore body drill fan dataset (Fisher et al., 2014). Red open circles denote samples with high Au content.

Figure 3. Principal component analysis of CLR-transformed average dolerite margin data. PC1 vs. PC3; two
groups are indicated by shading with the Gibraltar and Dardanelles dolerites comprising the other two groups;
after Gazley et al. (2014b).

137

AusIMM New Zealand Branch Annual Conference 2015

Figure 4. Summary PC1-PC2 plot for data from Hines et al. (2015) showing eigenvectors and clusters; right
panel shows clustered data in stratigraphic position over six measured sections.

138

AusIMM New Zealand Branch Annual Conference 2015

Acknowledgements
This work draws on a large number of projects that have been completed or are ongoing within CSIROs
Mineral Resources Flagship, Australia, and Victoria University of Wellington, New Zealand; we are grateful to
all those who have been associated with these. We are grateful for the review of June Hill on this paper.

References
Aitchison, J., 1982. The statistical analysis of compositional data (with discussion). Journal of the Royal
Statistical Society Series B (Statistical Methodology), 44: 139-177.
Aitchison, J., 1986. The statistical analysis of compositional data. Monographs on Statistics and Applied
Probability, Chapman & Hall Ltd., London. Reprinted (2003) with additional material by The Blackburn
press, Caldwell, NJ.
Aitchison, J., Barcel-Vidal, C., Martn-Fernandez, J., Pawlowsky-Glahn, V., 2000. Logratio analysis and
compositional distance. Mathematical Geology, 32: 271-275.
Buccianti, A., & Grunsky, E., 2014. Compositional data analysis in geochemistry: Are we sure to see what really
occurs during natural processes? Journal of Geochemical Exploration, 141: 1-5.
Campbell, N.N. A., 1980. Robust procedures in multivariate analysis I: Robust covariance estimation. Applied
Statistics, 29: 231-237.
Curran, J.M., 2013. Hotelling: Hotelling's T-squared test and variants. R package version 1.0-2. http://CRAN.Rproject.org/package=Hotelling
Fang, Y., & Wang, J., 2012. Selection of the number of clusters via the bootstrap method. Computational
Statistics and Data Analysis, 56: 468-477.
Filzmoser, P., Fritz, H., & Kalcher, K., 2014. pcaPP: Robust PCA by Projection Pursuit. R package version 1.960. http://CRAN.R-project.org/package=pcaPP
Fisher, L., Gazley, M.F., Baensch, A., Barnes, S.J., Cleverley, J., & Duclaux, G. 2014. Resolution of
geochemical and lithostratigraphic complexity: a work flow for application of portable X-ray
fluorescence to mineral exploration. Geochemistry: Exploration, Environment, Analysis, 14: 149-159.
Fraley, C., Raftery, A.E., Murphy, T.B., & Scrucca, L., 2012. mclust Version 4 for R: Normal Mixture Modeling
for Model-Based Clustering, Classification, and Density Estimation
Technical Report No. 597, Department of Statistics, University of Washington.
Gazley, M.F., Tutt, C.M., Brisbout, L.I., Fisher, L.A., & Duclaux, G., 2014. Application of portable X-ray
fluorescence analysis to characterise dolerite dykes at Plutonic Gold Mine, Western Australia.
Geochemistry: Exploration, Environment, Analysis, 14: 223-231.
Grunsky, E.C., Mueller, U.A., & Corrigan, D., 2014. A study of the lake sediment geochemistry of the Melville
Peninsular using multivariate methods: Applications for predictive geological mapping. Journal of
Geochemical Exploration, 141: 15-41.
Hennig, C., 2003. Clusters, outliers, and regression: fixed point clusters. Journal of Multivariate Analysis, 86:
183-212.
Hennig, C., 2007. Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52:
258-271.
Hines, B.R., Ventura, G.T., Gazley, M.F., Bland, K., Crampton, J.S., & Collins, K.S., 2015. Chemostratigraphic
framework for changing depositional conditions of prospective Late Cretaceous Paleogene marine
source rocks of the East Coast Basin, New Zealand. AAPG conference, Melbourne, Australia.
Martin-Fernandez, J.A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J., 2012. Model-based
replacement of rounded zeros in compositional data: Classical and robust approaches. Computational
statistics and Data Analysis, 56: 2688-2704.
Thi-Henestrosa, S., Martn-Fernndez, J.A., 2005. Dealing with compositional data: the freeware CoDaPack.
Mathematical Geology, 37: 773-793.

139

Você também pode gostar