Lognormal Distribution

0
The normal distribution is

the log-normal distribution
Werner Stahel, Seminar fr Statistik, ETH Zrich
and Eckhard Limpert
2 December 2014
The normal Normal distribution
We know it. Passed the exam.
0
2/3 (68%)
95% (95.5%)
Named after Gauss. Decorated the 10 DM bill.
Nice shape.
+ 2
We like it!
Why it is right.
It is given by mathematical theory.
Adding normal random variables gives a normal sum.
Linear combinations
Y = 0 + 1X1 + 2X2 + ...
remain normal.
Means of normal variables are normally distributed.

Central Limit Theorem:
Means of non-normal variables
are approximately normally distributed.
Hypothesis of Elementary Errors:

If random variation is the sum of many small random effects,
a normal distribution must be the result.
Regression models assume normally distributed errors.
Is it right?
Mathematical statisticians believe(d) that it is prevalent in Nature.
Well, it is not.
Purpose of this talk: What are the consequences?
1. Empirical Distributions
2. Laws of Nature
3. Logarithmic Transformation, the Log-Normal Distribution
4. Regression
5. Advantages of using the log-normal distribution
6. Conclusions
Measurements:
size, weight, concentration, intensity, duration, price, activity
All
> 0 amounts (John Tukey)
250
150
50
0
frequency
350
450
Example: HydroxyMethylFurfurol (HMF) in honey (Renner 1970)
10
15
20
25
concentration
30
35
40
45
50
Measurements:
size, weight, concentration, intensity, duration, price, activity
All
> 0 amounts
>0
unless coefficient of variation cv(X) = sd(X)/E(X) is small.
Distribution is skewed: left steep, right flat, skewness
Other variables may have other ranges and negative skewness.

They may have a normal distribution.
They are usually derived variables, not original measurements.
Any examples?
Our examples: Position in space and time, angles, directions. Thats it!
For some, 0 is a probable value: rain, expenditure for certain goods, ...
pH, sound and other energies [dB]
log scale!
The
95% Range Check
For every normal distribution, negative values have a probability
> 0.
normal distribution inadequate for positive variables.

b reaches below 0.
Becomes relevant when 95% range x 2
250
150
50
0
frequency
350
450
Then, the distribution is noticeably skewed.
15
10
10
15
20
concentration
25
30
35
40
45
50
2. Laws of Nature
2. Laws of Nature
(a) Physics
E = m c2
s = v 2/(2 a) ; Velocity v = F t/m

Gravitation F = G m1 m2/r 2
Stopping distance
Gas laws
p V = n R T ; R = p0 V0/T0
Radioactive decay
Nt = N0 ekt
(b) Chemistry
v = k [A]nA [B]nB
change with changing temperature t +100C = v 2
based on Arrhenius law k = A eEA /R T
EA = activation energy; R = gas constant
Reaction velocity
Law of mass action:
A+B C +D : Kc = [A][B]/[C][D]
2. Laws of Nature
(c) Biology
Multiplication (of unicellular organisms)
Growth, size st = s0 k t
1 2 4 8 16
Hagen-Poiseuille Law; Volume:
Vt = (P r4 )/(8 L) ;
Permeability
Other laws in biology?
P : pressure difference
3. Logarithmic Transformation, Log-Normal Distribution
250
200
150
300
100
200
30
50
50 100
0
frequency
400
300
500
Transform data by log transformation
20
10
10
concentration
20
30
40
50
1.6 1.2 0.8 0.4
0.0
log(concentration)
0.4
0.8
1.2
The log transform
Z = log(X)
turns multiplication into addition,
turns variables
reduces (positive) skewness (may turn it negatively skewed)
Often turns skewed distributions into normal ones.
X > 0 into Z with unrestricted values,
Note: Base of logarithm is not important.
natural log for theory,
log10 for practice.
10
11
The Log-Normal Distribution

If
Z = log(X) is normally distributed (Gaussian), then

the distribution of X is called log-normal.
Densities
density
1
1.2
1.5
2.0
4.0
8.0
0.0
0.5
1.0
1.5
2.0
green: normal distribution
2.5
Density:
exp
2
2 x
Parameters:
log(x)
12
2
, : Expectation and st.dev. of log(X)
More useful:
e = : median, geometric mean, scale parameter
e = : multiplicative standard deviation, shape parameter

(or ) determines the shape of the distribution.
Contrast to
expectation
E(X) =
standard deviation
Less useful!
2/2
sd(X) from var(X) = e
1 e2
13
Ranges
Probability
normal
log-normal
2/3 (68%)
95%
/
/ 2
1
2/3 (68%)
95% (95.5%)
* *2
* *
* *
* *2
/ : times-divide
Properties
We had for the normal distribution:
Adding normal random variables gives a normal sum.
Linear combinations
Y = 0 + 1X1 + 2X2 + ...
remain normal.
Means of normal variables are normally distributed.

Central Limit Theorem:
Means of non-normal variables
are approximately normally distributed.
Hypothesis of Elementary Errors:

If random variation is the sum of many small random effects,
a normal distribution must be the result.
14
15
Properties: We have for the log-normal distribution:
Multiplying log-normal random variables gives a log-normal product.
Geometric means of log-normal var.s are log-normally distr.

Multiplicative Central Limit Theorem:
Geometric means
of (non-log-normal) variables are approx. log-normally distributed.
Multiplicative Hypothesis of Elementary Errors:

If random variation is the product of several random effects,
a log-normal distribution must be the result.
Better name:
Multiplicative normal distribution!
16
Qunicunx
Galton: Additive
Limpert (improving on Kaptayn): Multiplicative
50
100
50
0
50
100
1
100
0
:
100
150
50
67
200
150
100
1.5
200
:
100
44
250
67
30
300
1
0 20
150
150
100
44
1 : 4
225
338
225
506
:
17
Back to Properties
Multiplicative Hypothesis of Elementary Errors:

If random variation is the product of several random effects,
a log-normal distribution must be the result.
Note: For many small effects, the geometric mean will have
a small approx. normal AND log-normal!
Such normal distributions are intrinsically log-normal.
Keeping this in mind may lead to new insight!
Regression models assume normally distributed errors! ???
18
4. Regression
19
4. Regression
Multiple linear regression:
Y = 0 + 1X1 + 2X2 + ... + E

Regressors
Xj may be functions of original input variables
model also describes nonlinear relations, interactions, ...

Categorical (nominal) input variables = factors
dummy binary regressors

Model includes Analysis of Variance (ANOVA)!
Linear in the coefficients
simple, exact theory, exact inference

estimation by Least Squares simple calculation
4. Regression
20
Characteristics of the model:

Formula:
Y = 0 + 1X1 + 2X2 + ... + E
additive effects, additive error
Error term E N (0, 2)
constant variance
symmetric error distribution
Target variable has skewed (error) distribution,

standard deviation of error increases with
transform Y log(Y ) !
log(Ye ) = Y = 0 +1X1 +2X2...+E
4. Regression
Ordinary, additive model
21
Multiplicative model
Formula
Y = 0 + 1X1 + 2X2 + ... + E log(Ye ) = Y = 0 +1X1 +2X2...+E
e 1 X
e 2 ... E
e
Ye = e0 X
1
2
additive effects, additive error
multiplicative effects, mult. errors

Error term
E N (0, 2)
e `N (1, )
E
constant variance
symmetric error distribution
constant relative error

skewed error distribution
4. Regression
22
4. Regression
Yu et al (2012): Upregulation of transmitter release probability improves a conversion of synaptic

analogue signals into neuronal digital spikes
Figure 1. The probability of releasing glutamates increases during sequential presynaptic spikes...
23
4. Regression
Yu et al (2012): Upregulation of transmitter release probability improves a conversion of synaptic

analogue signals into neuronal digital spikes
Figure 4. Presynaptic Ca 2+ enhances an efficiency of probability-driven facilitation.
24

... or of applying the log transformation to data.
The normal and log-normal distributions are difficult to distinguish
for < 1.2 cv < 0.18
where the coef. of variation cv
We discuss case of larger .
25
26
More meaningful parameters
The expected value of a skewed distribution is less typical

than the median.
characterizes size of relative error
( cv or)
Characteristic
found in diseases:
latent periods for different infections:
1.4 ;
survival times after diagnosis of cancer, for different types:
Deeper insight?
Fulfilling assumptions, power

What happens to inference based on the normal distribution
if the data is log-normal?
Level = prob. of falsely rejecting the null hypothesis

coverage prob. of confidence intervals are o.k.
Loss of power!
wasted effort!
27
Loss of power!
28
wasted effort!
300
300
Difference between 2 groups (samples)
250
100
100
150
200
effort, n n0 (%), for power 90%

150
200
250
n0
5
10
50
1.0
1.5
2.0
2.5
s*
3.0
3.5
ad.0
ad.30
leth.0
leth.20
leth.30
*^
*
latency
10 15 20 25 30 35 40
More informative graphics
5
time
11
29
30
ad.0
ad.30
leth.0
leth.20
leth.30
*^
** +
latency (log scale)

2
5
10
20
latency
10 15 20 25 30 35 40
More informative graphics
5
time
11
5
time
11
More significance
6. Conclusions
6. Conclusions
Genesis
The normal distribution is good for estimators, test statistics,

data with small coef.of variation, and log-transformed data.
The log-normal distribution is good for original data.
Summation, Means, Central limit theorem, Hyp. of elem. errors
normal distribution
Multiplication, Geometric means, ...
log-normal distribution
31
6. Conclusions
32
Applications
/ 2 covers 95% of the data
Adequate ranges:
Gain of power of hypothesis tests
save efforts for experiments
(e.g., saves animals!)
Regression model for log(Y ) instead of Y .

e0 X 1 X 2 ... E
e
Back transformation: Y =
1
2
may characterize a class of phenomena

(e.g., diseases) new insight ?!
Parameter
6. Conclusions
33
Mathematical Statistics adds.
Nature multiplies
uses normal distribution
yields log-normal distribution
Scientists (and applied statisticians)

add logarithms!
use the normal distribution for log(data) and theory
use log-normal distribution for data
Thank you for your attention!

Lognormal Distribution

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lognormal Distribution

Enviado por

Direitos autorais:

Formatos disponíveis

0

The normal distribution is

The normal Normal distribution

We know it. Passed the exam.

Named after Gauss. Decorated the 10 DM bill.

Adding normal random variables gives a normal sum.

Y = 0 + 1X1 + 2X2 + ...

Means of normal variables are normally distributed.

Means of non-normal variables

are approximately normally distributed.

Hypothesis of Elementary Errors:

Regression models assume normally distributed errors.

Purpose of this talk: What are the consequences?

> 0 amounts (John Tukey)

Example: HydroxyMethylFurfurol (HMF) in honey (Renner 1970)

Distribution is skewed: left steep, right flat, skewness

Other variables may have other ranges and negative skewness.

95% Range Check

For every normal distribution, negative values have a probability

normal distribution inadequate for positive variables.

Then, the distribution is noticeably skewed.

s = v 2/(2 a) ; Velocity v = F t/m

Law of mass action:

Hagen-Poiseuille Law; Volume:

3. Logarithmic Transformation, Log-Normal Distribution

3. Logarithmic Transformation, Log-Normal Distribution

Transform data by log transformation

1.6 1.2 0.8 0.4

3. Logarithmic Transformation, Log-Normal Distribution

The log transform

turns multiplication into addition,

reduces (positive) skewness (may turn it negatively skewed)

Often turns skewed distributions into normal ones.

X > 0 into Z with unrestricted values,

Note: Base of logarithm is not important.

natural log for theory,

log10 for practice.

3. Logarithmic Transformation, Log-Normal Distribution

The Log-Normal Distribution

Z = log(X) is normally distributed (Gaussian), then

green: normal distribution

3. Logarithmic Transformation, Log-Normal Distribution

, : Expectation and st.dev. of log(X)

e = : median, geometric mean, scale parameter

e = : multiplicative standard deviation, shape parameter

sd(X) from var(X) = e

3. Logarithmic Transformation, Log-Normal Distribution

3. Logarithmic Transformation, Log-Normal Distribution

Adding normal random variables gives a normal sum.

Y = 0 + 1X1 + 2X2 + ...

Means of normal variables are normally distributed.

Means of non-normal variables

are approximately normally distributed.

Hypothesis of Elementary Errors:

Regression models assume normally distributed errors.

3. Logarithmic Transformation, Log-Normal Distribution

Properties: We have for the log-normal distribution:

Multiplying log-normal random variables gives a log-normal product.

Geometric means of log-normal var.s are log-normally distr.

of (non-log-normal) variables are approx. log-normally distributed.

Multiplicative Hypothesis of Elementary Errors:

Multiplicative normal distribution!

3. Logarithmic Transformation, Log-Normal Distribution

Limpert (improving on Kaptayn): Multiplicative