Você está na página 1de 42

Weeks 1st

TA3114 Geostatistics

Introduction: Review on Basic


Statistics
Univariate and Bivariate Analysis

Measure on Central Tendency


Dispersion
Skewness and Kurtosis
Covariance
Histogram
Measure on Central Tendency

The most technique used to grouping the data is mean ().

x i

n
Mean
n

x , where fi is frequency xi
1
Mean is defined as: x i
n n i 1

and n f . The following example is given for the mean


i
i 1

calculation on ash content of coal:

An example for mean calculation


Median of xi (%adb) Frequency fi fixi
5 5 25
15 20 300
25 42 1050
35 26 910
45 7 315
100 2600

1 2600
x [(5x5) (15x 20) (25x 42) (35x 26) (45x 7)] 26%adb
100 100
Mean of Sample and Population

Mathematical expectation of random variable (mean of population)


is related so much to the mean of samples.
n

Ex x x
1
Mean = i
n i 1

Calculation on the mean of samples population mean grade


with weighting:
Weighting

w w i x i
factor

wi 1 Non-biased condition
Median
Median is middle value of a set of data which have been
ordered from the smallest to highest ones or vise versa.
In other word, 50% of data have the values below median and
other 50% upper median.
For the small number of data, median is a good estimator for
central tendency rather than mean.

Simple examples
Series of data:
3, 4, 4, 5, 6, 8, 8, 9, 10
Median = 6.

3, 4, 4, 5, 6, 8, 8, 8, 9, 10
Median = (6+8)/2 = 7
Mode

Mode is the value which have the largest frequency.


Mode either might exist or not exist.

Simple example
Group of data:

3, 4, 4, 5, 6, 8, 8, 8, 9, 10
Mode = 8

3, 4, 4, 5, 6, 8, 8, 9, 10
Mode = 4 and 8 Bimodal

3, 4, 5, 6, 8, 9, 10
Data do not have mode.
Simple example

Median = 3.5

Modus = 3.0
Mean = 3.5
Measure on Dispersion

Dispersion is measure on the distribution of data values.


A frequently used measure on dispersion is range (= maximum
- minimum) but not appropriate because it is very sensitive
to the extreme values.
Other frequently used measure on data distribution is variance.
Variance

x
2

2
i

n 1
where:
xi is data values,
is mean of data,
n is number of data.
Standard Deviation

Standard deviation is product of the square root of variance.

It is a measure on dispersion which is more frequently used


because having the same unit with variable, rather than
variance which has the unit in quadratic.
Standard Error

If ` ` is the mean, deviation of a number of data (`n`) is ` `,


then standard error of the mean is defined as:

e n n
2 12 12
Skewness and Kurtosis

Skewness is defined as a measure on symmetric or


unsymmetric of a histogram curve (data distribution).
Kurtosis is a measure which shows the tendency of
acute peak of data distribution.
Skewness and kurtosis are rarely used in reserve
estimation. This measure is used to indicate whether
the data is normally distributed or not.
Negative Skewness Normal Distribution

Positive Skewness
Coefficient of Variation

Coefficient of variation is the ratio between standard deviation


and mean (CV = /).
When the CV is relatively high data values are widespread.
Commonly, CV < 0.5 data is normally distributed.
CV > 0.5 data is lognormally distributed (positive
skewness).
Example of coefficient of variation of some grade values of mineral deposit in the world

Coefisient of
Type of mineral deposits
variation
Gold: California, USA; placer Tertiary 5.10
Gold: Loraine, South Africa; Black Bar 2.81
Gold: Norseman, Australia; Princess Royal Reef *) 2.22
Gold: Grasberg, Papua, Indonesia**) 2.01
Gold: Norseman, Australia; Crown Reef *) 1.63
Gold: Carlin, USA 1.58
Tungsten: Alaska 1.56
Gold: Shamva, Rhodesia 1.55
Gold: Western Holdings, South Africa 1.28
Uranium: Yeelirrie, Australia 1.19
Gold: Mt. Charlotte, Australia **) 1.19
Gold: Fimiston, Australia *) 1.12
Gold: Vaal Reefs, South Africa 1.02
Zinc: Frisco, Mexico 0.85
Gold: Loraine, South Africa; Basalt Reef 0.80
Nickel: Kambalda Australia 0.74
Copper 0.70
Manganese 0.58
Lead: Frisko, Mexico 0.57
Iron ore 0.27
Bauxite 0.22

*) ore samples from mine, **) samples from core drilling


Bivariate Analysis
The most used description method for bivariate analysis is
scatter plot.
Two variables are positively correlated if they indicate
proportional correlation.
Two variables are negatively correlated if they indicate
proportional correlation.
Two variables are not correlated if they indicate random
correlation.
Scatter Plot
A pair of data can be formed as scatter plot which shows the
correlation of both.
As for example data pairs (x1, y1), (x2, y2), (x3, y3), (x4, y4), (x5,
y5), ...(xn, yn) are plotted in Cartessian coordinates XY, then will
produce the following scatter plots:

Scatter plots of some data pairs which show the correlation between them.
The most left figure shows positively linear correlation means that
the increasing of x values followed by the increasing of y values,
and presented by linear regression lines.

The middle figure shows a parabolic correlation (non-linear), while


the most right figure shows no correlation between variables x and y,
indicates that both variables are not correlated.

A measure on correlation is expressed by a coefficient of correlation.


Example of scatter plots between variables x and y
Covariance and Coefficient of Correlation

1 n
Mean of variable x: x xi
n i 1

1 n
Mean of variable y: y yi
n i 1

1 n
Variance of variable x: S
2
x
n - 1 i 1
( xi x ) 2

1 n
Variance of variable y: S
2
y
n - 1 i 1
( yi y ) 2

1 n
Covariance: S xy ( xi x )( yi y )
n - 1 i 1

Sxy
Coefficient of correlation: r
Sx S y
Simple example:

xi yi 12
2
R = 0.9423
10
1 3
8

2 5 6

yi
3 10 4

0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
xi

Define the mean and variance of each variables!


Define the covariance for variables x and y!
Define the coefficient of correlation!
Plot correlation diagram for variables x and y!
Calculation on statistical parameters for bivariate data

xi yi xi x yi y x i x y
2
i y x
2
i
x xi y
1 3 -1 -3 1 9 3
2 5 0 -1 0 1 0
3 10 +1 +4 1 16 4
6 18 0 0 2 26 7

S2 1 13 3.5
S 1 3.6

x2
y 6
3.5
r 0.97
13.6
Coefficient of Determination (r2)

r2 can be used to determine the contribution of a variable to the


change of other variable.
As illustration: If correlation coefficient between two variables
is 0.9 (r = 0.9), then the coefficient of determination is 0.81 (r2
= 0.81=81%) variable x has contribution of 81% to the
change on variable y, and 19% caused by another factor.
Histogram
In statistical analysis known a random variables which means that
there is no correlation between samples value and their location, as
for example:
grade

location

frequency

Example of data distribution and their histogram


If the data location is changed randomly, the histogram will remain
the same as before. It means that the distribution is the same as well
as their arithmatic mean, modus and median.
n

x i
187
x i 1
= = 11
n 17
In the data distribution above, mode = 11 (there are 4 data, see the
peak of histogram).

For the median can be seen below:


7, 8, 9, 9, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 13, 15, 16
Then calculation on variance and standard deviation is:
x
n
2
x
i
(7 11)2 (8 11)2 ..... (16 11)2 84

2 i 1
= = = 5.25 ppm2
n 1 (17 1) 16
SD = 5.25 = 2.29 ppm
Sample Au Sample Au
no. (ppm) no. (ppm)
The data location even is changed
1 0.7 10 1.2 randomly, it will result the same
2 0.9 11 1.6
histogram, mean, modus, and
3 0.8 12 1.2
4 1.0 13 1.0
median.
5 0.9 14 1.1
6 1.1 15 1.0 Based on the Sturges Rule:
7 1.1 16 1.2
range
8 1.3 17 1.4
Class interval
9 1.1 18 1.5 1 3.322 log n

Mean = 1.1 ppm ; Median = 1.1 ppm ; Modus = 1.1


Class Interval = 0.0859 ~ 0.1 (approximation)
5

4 Standard deviation = 0.23


4 Coefficient of variation = 0.23/1.1 = 0.2

3
Frequency

0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 More
Kadar (ppm)
Univariate Statistics:

Population ........... 18
Minimum Value ........ 0.7
Maximum Value ........ 1.6
Range ................ 0.9
Mean ................. 1.116667
Standard Deviation ... 0.233263
Standard Error ....... 0.054981
Median ............... 1.1
Sum .................. 20.1
Sum of Squares ....... 23.37
Variance ............. 0.054412
An o m alo u s

An o m alo u s
M -2 SD M -1 SD M ean M + 1 SD M + 2 SD
(0 .7 ) (0 .9 ) (1 .1 ) (1 .3 ) (1 .6 )
Slig h tly An o m alo u s Back g roStan
u n dd ard Deviatio n Slig h tly An o m alo u s

6 6

4 4

2 2

0 0
0 .6 0 .7 0 .8 0 .9 1 .0 1 .1 1 .2 1 .3 1 .4 1 .5 1 .6 1 .7
Why spatial analysis ??

Statistical description has not taken the data location into


account.

Statistical description has not taken the data density into


account.

Statistical description will produce the same result even


though the data location is changed randomly.

Spatial analysis can be prepared by plotting the data


distribution or by using iso-maps.
Distribution of Spatial Data

Isotropic
Different Population

Trend (plane)
An example on spatial correlation
of data: The maps show good
correlation between Cu and Au
grades.
1 1 1 1 2 2 2 2 2 1 1 1
1 1 2 2 2 3 2 3 3 2 2 1
1 2 2 2 2 4 3 3 4 3 2 1
1 2 2 4 4 5 5 5 3 3 3 2
2 2 3 7 8 6 7 6 4 2 2 2
2 2 4 7 9 7 6 5 6 4 2 2
2 2 4 5 8 6 5 7 5 4 2 1
1 2 3 3 2 4 5 3 1 2 2 1
1 1 2 2 2 2 3 2 1 1 1 1
1 1 2 2 2 2 2 2 1 1 1 1

Example of data distributon in blocks

In the blocks (population), if we


select specific area, we will obtain
the different histogram (means
different distribution).
In the blocks, selecting all blocks Histogram of data distribution according to the
will produce histogram C, then blocks selection
selecting blocks color light grey
will produce histogram A, while
selecting blocks color dark grey
will produce hitogram B.
(a) Example of blocks distribution in
four different mine sites
(b) Histogram of data
If the cut-off grade is known to be 2%, then the blocks with distribution from the blocks
dimension 5050 m2 contained grade 2% will be distributed in four different mine sites
as shown in figure (a) for the four different mine sites.
The selected blocks in four different mine sites (by chance)
have the same histogram as shown in figure (b).
If due to the technical reason that the mineable blocks should
have area minimum of 100100 m2 (four blocks in vicinity),
then not all selected blocks is mineable.
Pattern-1

Block size = 50 50 m
Histogram of Pattern-1

40 40

30 30

20 20

10 10

0 0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Pattern-2

Block size = 50 50 m
Histogram of Pattern-2

15 15

10 10

5 5

0 0
2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Pattern-3

Block size = 50 50 m
Histogram of Pattern-3

Population for low


15
grade 15

Population for high


grade
10 10

5 5

0 0
2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0

Você também pode gostar